Exploration of Obesity Status of Indonesia Basic Health Research 2013 With Synthetic Minority Over-Sampling Techniques

Eksplorasi Status Obesitas Riset Kesehatan Dasar 2013 Indonesia dengan Teknik Synthetic Minority Over-Sampling

Authors

  • Sri Astuti Thamrin Departemen Statistika, Hasanuddin University, Indonesia
  • Dian Sidik Departemen Epidemiologi, Universitas Hasanuddin, Indonesia
  • Hedi Kuswanto Departemen Statistika, Universitas Hasanuddin, Indonesia
  • Armin Lawi Departemen Matematika, Universitas Hasanuddin, Indonesia
  • Ansariadi Departemen Epidemiologi, Universitas Hasanuddin, Indonesia

DOI:

https://doi.org/10.29244/ijsa.v5i1p75-91

Keywords:

Imbalanced data, machine learning, obesity status, SMOTE

Abstract

The accuracy of the data class is very important in classification with a machine learning approach. The more accurate the existing data sets and classes, the better the output generated by machine learning. In fact, classification can experience imbalance class data in which each class does not have the same portion of the data set it has. The existence of data imbalance will affect the classification accuracy. One of the easiest ways to correct imbalanced data classes is to balance it. This study aims to explore the problem of data class imbalance in the medium case dataset and to address the imbalance of data classes as well. The Synthetic Minority Over-Sampling Technique (SMOTE) method is used to overcome the problem of class imbalance in obesity status in Indonesia 2013 Basic Health Research (RISKESDAS). The results show that the number of obese class (13.9%) and non-obese class (84.6%). This means that there is an imbalance in the data class with moderate criteria. Moreover, SMOTE with over-sampling 600% can improve the level of minor classes (obesity). As consequence, the classes of obesity status balanced. Therefore, SMOTE technique was better compared to without SMOTE in exploring the obesity status of Indonesia RISKESDAS 2013.

Downloads

Download data is not yet available.

References

Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., & Sakr, S. (2017). Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLOS ONE, 12(7), e0179805.

ASEAN / UNICEF / WHO Regional Report. (2016). World health statistics 2016: Monitoring health for the SDGs, sustainable development goals.

Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A Survey of Predictive Modeling on Imbalanced Domains. ACM Comput. Surv., 49(2).

Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. Dalam O. Maimon & L. Rokach (Ed.), Data Mining and Knowledge Discovery Handbook (hlm. 853–867). Springer.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.

Chen, X., & Wasikowski, M. (2008). FAST: A Roc-Based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 124–132.

Cost, S., & Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, 10(1), 57–78.

Ente, D. R., Thamrin, S. A., Arifin, S., Kuswanto, H., & Andreza, A. (2020). Klasifikasi Faktor-Faktor Penyebab Penyakit Diabetes Melitus Di Rumah Sakit Unhas Menggunakan Algoritma C4.5. Indonesian Journal of Statistics and Its Applications, 4(1), 80–88.

Jo, T., & Japkowicz, N. (2004). Class Imbalances versus Small Disjuncts. SIGKDD Explor. Newsl., 6(1), 40–49.

[KEMENKES RI], K. K. R. I. (2014). Profil Kesehatan Indonesia Tahun 2013.

Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232.

Mustaqim, M., Warsito, B., & Surarso, B. (2019). Kombinasi Synthetic Minority Over-sampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 5(2), 116–127.

[RISKESDAS], R. K. D. (2013). Hasil Riset Kesehatan Dasar 2013.

Selya, A. S., & Anshutz, D. (2018). Machine Learning for the Classification of Obesity from Dietary and Physical Activity Patterns. Dalam P. Giabbanelli, V. Mago, & E. Papageorgiou (Ed.), Advanced Data Analytics in Health (hlm. 77–97). Springer.

Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of Imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687–719.

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques (3rd ed.). Elsevier Inc.

Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Over-sampling, Under-sampling, Bagging and Boosting in Handling Imbalanced Datasets. Dalam T. Herawan, M. Deris, & J. Abawajy (Ed.), Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (13–22). Springer.

Zhang, D., Liu, W., Gong, X., & Jin, H. (2011). A Novel Improved SMOTE Resampling Algorithm Based on Fractal. Computational Information Systems, 2204–2211.

Zhu, T., Lin, Y., & Liu, Y. (2017). Synthetic Minority Over-sampling Technique for Multiclass Imbalance Problems. Pattern Recogn., 72(C), 327–340.

Downloads

Published

2021-03-31

How to Cite

Thamrin, S. A., Sidik, D., Kuswanto, H., Lawi, A., & Ansariadi, A. (2021). Exploration of Obesity Status of Indonesia Basic Health Research 2013 With Synthetic Minority Over-Sampling Techniques: Eksplorasi Status Obesitas Riset Kesehatan Dasar 2013 Indonesia dengan Teknik Synthetic Minority Over-Sampling. Indonesian Journal of Statistics and Its Applications, 5(1), 75–91. https://doi.org/10.29244/ijsa.v5i1p75-91

Issue

Section

Articles