Data Leakage Detection Using Dynamic Data Structure and Classification Techniques

Authors

  • Cesar Byron Guevara Maldonado UNIVERSIDAD COMPLUTENSE DE MADRID

Keywords:

Data Leakage, Data Structure, Decision Tree C4.5, UCS, Naive Bayes

Abstract

Data leakage is a permanent problem in public and private institutions around the world; particularly, identifying the information leakage efficiently. In order to solve this problem, this paper poses an adaptable data structure based on human behavior using all the activities executed within the computer system. When applying this structure, the normal behavior is modeled for each user, so in this way, detects any abnormal behavior in real time. Moreover, this structure enables the application of several classification techniques such as decision trees (C4.5), UCS, and Naive Bayes, these techniques have proven efficient outcomes in intrusion detection. In the testing of this model, a scenario demonstrating the proposal’s effectiveness with real information from a government institution was designed so as to establish future lines of work.

Downloads

Download data is not yet available.

Author Biography

Cesar Byron Guevara Maldonado, UNIVERSIDAD COMPLUTENSE DE MADRID

MASTER EN INVESTIGACION INFORMATICA. ESTUDIANTE DE DOCTORADO EN INGENIERÍA INFORMÁTICA

References

[1] A. Kumar, A. Goyal, N. K. Chaudhary, and S. Sowmya Kamath, “Comparative evaluation of algorithms for effective data leakage detection,” in Information & Communication Technologies (ICT), IEEE Conference, 2013, pp. 177–182. DOI:10.1109/CICT.2013.6558085
[2] E. Summary, “Data Leakage Worldwide: Common Risks and Mistakes Employees Make,” Europe, pp. 1–8, 2008.
[3] InfoWatch Research Center, "Global Data Leakages & Insider Threats Report, 2012". Disponible en: http://tech-titan.com/infowatch/pdf/InfoWatch%20Global%20Data%20Leakages%20and%20Insider%20Threats%20Report%202012.pdf
[4] W. L. W. Lee, S. J. Stolfo, and K. W. Mok, “A data mining framework for building intrusion detection models,” IEEE Symp. Secur. Priv., vol. 00, no. c, pp. 120–132, 1999. DOI:10.1109/SECPRI.1999.766909
[5] C. Guevara, M. Santos and J. A. Martín, "Método para la Detección de Intrusos basado en la Sinergia de Técnicas de Inteligencia Artificial," in Proceedings of the IV Congreso Español de Informática CEDI 2013, pp. 963-972.
[6] NSL-KDD. Disponible en: http://nsl.cs.unb.ca/NSL-KDD/
[7] J. McHugh, “Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory,” ACM Transactions on Information and System Security, vol. 3. pp. 262–294, 2000. DOI:10.1145/382912.382923
[8] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, 2009. DOI:10.1109/CISDA.2009.5356528
[9] M. Møller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural networks, vol. 6, pp. 525–533, 1993. DOI:10.1016/S0893-6080(05)80056-5
[10] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Perceptron, Madaline, and backpropagation,” Proc. IEEE, vol. 78, no. 9, pp. 1415–1442, 1990. DOI:10.1109/5.58323
[11] J. Quinlan, C4.5: Programs for Machine Learning, 240th ed. Londres: Morgan Kaufmann, 1993.
[12] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986. DOI:10.1023/A:1022643204877
[13] J. C. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Adv. Kernel Methods Support Vector Learn., vol. 208, pp. 1–21, 1998.
[14] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, “Improvements to Platt’s SMO Algorithm for SVM Classifier Design,” Neural Computation, vol. 13, no. 3. pp. 637–649, 2001. DOI:10.1162/089976601300014493
[15] D. Heckerman, “Bayesian Networks for Data Mining,” Data Min. Knowl. Discov., vol. 119, no. 1, pp. 79–119, 1997. DOI:10.1023/A:1009730122752
[16] S. W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, vol. 3, no. 2. pp. 149–175, 1995. DOI:10.1162/evco.1995.3.2.149
[17] T. G. Dietterich, “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms,” Neural Computation, vol. 10, no. 7. pp. 1895–1923, 1998. DOI:10.1162/089976698300017197
[18] E. Bernadó-Mansilla and J. M. Garrell-Guiu, “Accuracy-based learning classifier systems: models, analysis and applications to classification tasks,” Evol. Comput., vol. 11, no. 3, pp. 209–238, 2003. DOI:10.1162/106365603322365289
[19] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Los,” Mach. Learn., vol. 29, no. 2–3, pp. 103–130, 1997.
[20] D. Pyle, Data Preparation for Data Mining, 1st ed., vol. 1. San Francisco: Morgan Kaufmann, 1999. DOI:10.1023/A:1007413511361
[21] M. Basu, “Complexity measures of supervised classification problems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 289–300, 2002. DOI:10.1109/34.990132
[22] H. Brighton and C. Mellish, “Advances in instance selection for instance-based learning algorithms,” Data Mining and Knowledge Discovery, vol. 6, no. 2. pp. 153–172, 2002. DOI:10.1023/A:1014043630878
[23] F. Ceballos, L. E. Muñoz, and J. Moreno, “Selección de perceptrones multicapa usando aprendizaje bayesiano,” Sci. Tech., no. 49, pp. 110–115, 2011.
[24] L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33, no. 1–2, pp. 1–39, 2010. DOI:10.1007/s10462-009-9124-7
[25] H. Liu and R. Setiono, “Feature Selection and Classification: A Probabilistic Wrapper Approach,” in Proceedings of the 9th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1996, pp. 419–424.
[26] J. Alcala-Fdez, L. Sanchez, S. Garcia, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernandez, and F. Herrera, “KEEL: a software tool to assess evolutionary algorithms for data mining problems,” Soft Comput., vol. 13, no. 3, pp. 307–318, 2009. Disponible: //www.keel.es/

Downloads

Published

2015-01-05

How to Cite

Guevara Maldonado, C. B. (2015). Data Leakage Detection Using Dynamic Data Structure and Classification Techniques. INGE CUC, 11(1), 79–84. Retrieved from https://ojstest.certika.co/ingecuc/article/view/382

Most read articles by the same author(s)