Data Leakage Detection Using Dynamic Data Structure and Classification Techniques
Keywords:
Data Leakage, Data Structure, Decision Tree C4.5, UCS, Naive BayesAbstract
Data leakage is a permanent problem in public and private institutions around the world; particularly, identifying the information leakage efficiently. In order to solve this problem, this paper poses an adaptable data structure based on human behavior using all the activities executed within the computer system. When applying this structure, the normal behavior is modeled for each user, so in this way, detects any abnormal behavior in real time. Moreover, this structure enables the application of several classification techniques such as decision trees (C4.5), UCS, and Naive Bayes, these techniques have proven efficient outcomes in intrusion detection. In the testing of this model, a scenario demonstrating the proposal’s effectiveness with real information from a government institution was designed so as to establish future lines of work.Downloads
References
[2] E. Summary, “Data Leakage Worldwide: Common Risks and Mistakes Employees Make,” Europe, pp. 1–8, 2008.
[3] InfoWatch Research Center, "Global Data Leakages & Insider Threats Report, 2012". Disponible en: http://tech-titan.com/infowatch/pdf/InfoWatch%20Global%20Data%20Leakages%20and%20Insider%20Threats%20Report%202012.pdf
[4] W. L. W. Lee, S. J. Stolfo, and K. W. Mok, “A data mining framework for building intrusion detection models,” IEEE Symp. Secur. Priv., vol. 00, no. c, pp. 120–132, 1999. DOI:10.1109/SECPRI.1999.766909
[5] C. Guevara, M. Santos and J. A. Martín, "Método para la Detección de Intrusos basado en la Sinergia de Técnicas de Inteligencia Artificial," in Proceedings of the IV Congreso Español de Informática CEDI 2013, pp. 963-972.
[6] NSL-KDD. Disponible en: http://nsl.cs.unb.ca/NSL-KDD/
[7] J. McHugh, “Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory,” ACM Transactions on Information and System Security, vol. 3. pp. 262–294, 2000. DOI:10.1145/382912.382923
[8] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, 2009. DOI:10.1109/CISDA.2009.5356528
[9] M. Møller, “A scaled conjugate gradient algorithm for fast supervised learning,” Neural networks, vol. 6, pp. 525–533, 1993. DOI:10.1016/S0893-6080(05)80056-5
[10] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: Perceptron, Madaline, and backpropagation,” Proc. IEEE, vol. 78, no. 9, pp. 1415–1442, 1990. DOI:10.1109/5.58323
[11] J. Quinlan, C4.5: Programs for Machine Learning, 240th ed. Londres: Morgan Kaufmann, 1993.
[12] J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81–106, 1986. DOI:10.1023/A:1022643204877
[13] J. C. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Adv. Kernel Methods Support Vector Learn., vol. 208, pp. 1–21, 1998.
[14] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, “Improvements to Platt’s SMO Algorithm for SVM Classifier Design,” Neural Computation, vol. 13, no. 3. pp. 637–649, 2001. DOI:10.1162/089976601300014493
[15] D. Heckerman, “Bayesian Networks for Data Mining,” Data Min. Knowl. Discov., vol. 119, no. 1, pp. 79–119, 1997. DOI:10.1023/A:1009730122752
[16] S. W. Wilson, “Classifier Fitness Based on Accuracy,” Evolutionary Computation, vol. 3, no. 2. pp. 149–175, 1995. DOI:10.1162/evco.1995.3.2.149
[17] T. G. Dietterich, “Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms,” Neural Computation, vol. 10, no. 7. pp. 1895–1923, 1998. DOI:10.1162/089976698300017197
[18] E. Bernadó-Mansilla and J. M. Garrell-Guiu, “Accuracy-based learning classifier systems: models, analysis and applications to classification tasks,” Evol. Comput., vol. 11, no. 3, pp. 209–238, 2003. DOI:10.1162/106365603322365289
[19] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Los,” Mach. Learn., vol. 29, no. 2–3, pp. 103–130, 1997.
[20] D. Pyle, Data Preparation for Data Mining, 1st ed., vol. 1. San Francisco: Morgan Kaufmann, 1999. DOI:10.1023/A:1007413511361
[21] M. Basu, “Complexity measures of supervised classification problems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 289–300, 2002. DOI:10.1109/34.990132
[22] H. Brighton and C. Mellish, “Advances in instance selection for instance-based learning algorithms,” Data Mining and Knowledge Discovery, vol. 6, no. 2. pp. 153–172, 2002. DOI:10.1023/A:1014043630878
[23] F. Ceballos, L. E. Muñoz, and J. Moreno, “Selección de perceptrones multicapa usando aprendizaje bayesiano,” Sci. Tech., no. 49, pp. 110–115, 2011.
[24] L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33, no. 1–2, pp. 1–39, 2010. DOI:10.1007/s10462-009-9124-7
[25] H. Liu and R. Setiono, “Feature Selection and Classification: A Probabilistic Wrapper Approach,” in Proceedings of the 9th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1996, pp. 419–424.
[26] J. Alcala-Fdez, L. Sanchez, S. Garcia, M. J. del Jesus, S. Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit, V. M. Rivas, J. C. Fernandez, and F. Herrera, “KEEL: a software tool to assess evolutionary algorithms for data mining problems,” Soft Comput., vol. 13, no. 3, pp. 307–318, 2009. Disponible: //www.keel.es/
Downloads
Published
How to Cite
Issue
Section
License
Published papers are the exclusive responsibility of their authors and do not necessary reflect the opinions of the editorial committee.
INGE CUC Journal respects the moral rights of its authors, whom must cede the editorial committee the patrimonial rights of the published material. In turn, the authors inform that the current work is unpublished and has not been previously published.
All articles are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.