The methodology of Data Mining. An application to alcohol consumption in teenagers
DOI:
https://doi.org/10.20882/adicciones.253Keywords:
Artificial Neural Networks, Decision Trees, Naive Bayes, Association Rules, alcoholAbstract
This paper is aimed mainly at making researchers in the field of drug addictions aware of a methodology of data analysis aimed at knowledge discovery in databases (KDD). KDD is a process consisting of a series of phases, the most characteristic of which is called data mining (DM), whereby different modelling techniques are applied in order to detect patterns and relationships among the data. Common and differentiating factors between the most widely used DM techniques are analysed, mainly from a methodological viewpoint, and their use is exemplified using data related to alcohol consumption in teenagers and its possible relationship with personality variables (N=7030). Although the overall accuracy obtained (% correct predictions) is very similar in the three models analyzed, the Artificial Neural Network (ANN) technique generates the most accurate model (64.1%), followed by Decision Trees (DT) (62.3%) and Naïve Bayes (NB) (59.9%).References
Agrawal, R. y Srikant, R. (1994). Fast algorithms for mining association
rules. Proceedings of the 20th International Conference on Very Large Databases, 487-499.
Agrawal, R., Imielinski, T. y Swami, A. (1993). Mining association rules
between sets of items in large databases. Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, 207-216.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. y Verkamo, A. I. (1996). Fast Discovery of Association Rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth y R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining (pp. 307-328). AAAI/MIT Press.
Bigus, J.P. (1996). Data mining with neural networks: solving business
problems from application development to decision support. New York: McGraw-Hill.
Breiman, L., Friedman, J. H., Olshen, R. A. y Stone, C. J. (1984).
Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
Caspi, A., Roberts, B. W. y Shiner, R. L. (2005). Personality development:
stability and change. Annual Review of Psychology, 56, 453-484.
Ghosh, J. (2003). Scalable Clustering. In N. Ye (Ed.), The Handbook of
Data Mining (pp. 247-277). Mahwah, NJ: Lawrence Erlbaum Associates.
Hahsler, M., Grün, B. y Hornik, K. (2005). Arules - A Computational
Environment for Mining Association Rules and Frequent Item Sets. Journal of Statistical Software, 14, 1-25.
Hahsler, M., Hornik, K. y Reutterer, T. (2005). Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Austria.
Han, J. y Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Hand, D., Mannila, H. y Smyth, P. (2001). Principles of Data Mining.
Cambridge, MA: The MIT Press.
Hastie, T., Tibshirani, R. y Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
Hernández, J., Ramírez, M. J. y Ferri, C. (2004). Introducción a la Minería de Datos [Introduction to Data Mining]. Madrid: Pearson Educación, S.A.
Hipp, J., Güntzer, U. y Nakhaeizadeh, G. (2000). Algorithms for Association Rule Mining – A general survey and comparison. SIGKDD Explorations, 2, 58-64.
Ihaka, R. y Gentleman, R. (1996). R: A Language for Data Analysis and
Graphics. Journal of Computational and Graphical Statistics, 5, 299-314.
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and
Algorithms. New York: Wiley.
Kass, G. V. (1980). An exploratory technique for investigating large
quantities of categorical data. Applied Statistics, 29, 119-127.
Kitsantas, P., Moore, T. W. y Sly, D. F. (2007). Using classification trees
to profile adolescent smoking behaviors. Addictive Behaviors, 32, 9-23.
Larose, D. T. (2005). Discovering Knowledge in Data: An Introduction
to Data Mining. Hoboken, NJ: Wiley.
Larose, D. T. (2006). Data Mining Methods and Models. Hoboken, NJ:
Wiley.
MacDonald, K. (2005). Personality, Evolution, and Development. In R.
Burgess and K. MacDonald (Eds.), Evolutionary Perspectives on Human Development (pp. 207-242). Thousand Oaks, CA: Sage.
Michie, D., Spiegelhalter, D. J. y Taylor C. C. (Eds.) (1994). Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood Ltd.
Palmer, A. y Montaño, J. J. (1999). ¿Qué son las redes neuronales artificiales? Aplicaciones realizadas en el ámbito de las adicciones [What are artificial neural networks? Applications in the field of addictions]. Adicciones, 11, 243-255.
Palmer, A., Fernández, C. y Montaño, J. J (2001). Sensitivity Neural Network 1.0 [Computer program]. Available at mailto:alfonso. palmer@uib.es
Palmer, A., Montaño, J. J. y Calafat, A. (2000). Predicción del consumo
de éxtasis a partir de redes neuronales artificiales [Ecstasy consumption prediction on the basis of artificial neural networks]. Adicciones, 12, 29-41.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning,
, 81-106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo:
Morgan Kaufmann.
Quinlan, J. R. (1997). C5.0 Data Mining Tool. RuleQuest Research,
http://www.rulequest.com.
Shmueli, G., Patel, N. R. y Bruce, P. C. (2007). Data Mining for Business
Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. New Jersey: John Wiley & Sons, Inc.
Two Crows Corporation (1999). Introduction to Data Mining and Knowledge Discovery (3th. ed.). Maryland: Two Crows Corporation.
Witten, I. H. y Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Witten, I. H., Frank, E., Trigg, L., Hall, M., Holmes, G. y Cunningham, S.
J. (1999). Weka: Practical machine learning tools and techniques with Java implementations. In N. Kasabov and K. Ko (Ed.), Proceedings of the ICONIP/ANZIIS/ANNES’99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems (pp. 192-196). Dunedin, New Zealand.
Ye, N. (Ed.) (2003). The Handbook of Data Mining. Mahwah, NJ: Lawrence Erlbaum Associates.


