Joko Ade Nursiyono(1*), Qorinul Huda(2),

(1) Badan Pusat Statistik Provinsi Jawa Timur
(2) Politeknik Statistika STIS
(*) Corresponding Author


As technology and information advances, the main defense and security aspects in the protection of personal data become very important. Protection of personal data is a human right that must be protected by the state. Data digitization is a demand and challenge in the advancement of information. Efforts in protecting personal data are basically carried out through legal certainty instruments in the form of regulations that regulate a system in order to realize a strong system in protecting cyber crime. Various regulations already exist in the legal system in Indonesia. Nevertheless, there are still cases of personal data leakage among Indonesians. The purpose of this study is to describe the condition of personal data protection in Indonesia and analyze cases of data leaks detected in Twitter tweets in the period July 1, 2021 to September 29, 2022. The study was conducted by using Twitter tweet scrapping techniques and classifying netizen responses based on positive, negative, and negative sentiments. neutral. Each sentiment is analyzed with wordcloud by finding what topics are often discussed by netizens on the protection of personal data. Furthermore, the classification evaluation is continued by looking at the accuracy of the machine learning classification algorithm, namely naive bayes and random forest. The results of the study stated that in the period from July 1, 2021 to September 29, 2022, the public's response to the protection of personal data was still negative. Which means that the data protection system in Indonesia is still not effective with the occurrence of various cases of data leakage. Based on the accuracy value, the Naive Bayes algorithm is very good at classifying tweets based on their sentiments, which is 99.84% compared to the random forest algorithm.


cyber crime, machine learning, naive bayes, random forest, tweet, twitter.

