Cyberbullying Detection - UKSim

Cyberbullying Detection - UKSim

Cyberbullying Detection A Survey On Multilingual Techniques Authors Batoul Haidar PhD Student, Saint Joseph University Beirut, Lebanon [email protected] Maroun Chamoun

Professor, Saint Joseph University Beirut, Lebanon [email protected] Fadi Yamout Associate Professor, Lebanese International University Beirut, Lebanon [email protected]

This presentation will: Give a thorough background about Cyberbullying Detection and all its underlying techniques. Present a survey of all existing literature in multilingual techniques of cyberbullying detection. Present future plans in Multilingual Cyberbullying Detection. I. Introduction

Presence of Cyberbullying Cyberbullying is the new form of bullying. It is executed by Internet and electronic media. Cyberbullying is affecting a lot of children around the world including Arab countries. Awareness for cyberbullying is rising around the world. Research for multilingual cyberbullying had been done (English, Dutch, Indian, Chinese ) but None for Arabic Cyberbullying. Percentage of teens around the world reporting

being bullied / according to countries 50% 39% 32% 34% 21% America

Morocco Lebanon Cyberbullying Among Youth Jordan UAE

II. Background A. Cyberbullying Definition The use of Internet, cell phones, video game systems, or other technologies to send or post text or images intended to hurt or embarrass another person or group of people Cyberbullying is more severe than physical bullying due to the fact that it is wider, public, and the victim has nowhere to escape. A Predator or Bully attacking a Victim.

Categories

Flaming: starting a form of online fight. Masquerade: a bully pretending to be someone else for malicious intents. Denigration: sending or posting gossip to ruin someones reputation. Impersonation: Pretending to be someone else and sharing material to get that person in trouble or danger or damage his reputation or friendships. Harassment: Repeatedly sending profane and cruel messages. Outing: Publishing someones embarrassing information, images or secrets. Trickery: Talking someone into revealing secrets or embarrassing information to

share them. Exclusion: Intentionally and cruelly excluding someone from an online group. Cyberstalking: Repeated, intense harassment and denigration that includes threats or creates significant fear. Consequences On the Victim On the Predator

Mental and physical effects. Mental and physical effects. Emotional, concentration, and behavioral issues. Online predators have tendency to become actual predators outside cyberspace. Trouble getting along with peers.

1 out of 4 felt unsafe at school. Frequent headaches, recurrent stomach pain, and sleeping difficulties. More likely to be hyperactive, have conduct problems, abuse alcohol, and smoke cigarettes. Might lead to suicide. II. Background B. Machine Learning

Machine Learning Definition Machine Learning (ML) is defined as the ability of a computer to teach itself how to take a decision using available data and experience. Available Data is known as Training Data. A computer classifies a new piece of data depending on a Learning Algorithm. Leaning Algorithms : Data Labelling Supervised Learning Algorithm When the training data is labeled (classified by human experts) Unsupervised Learning Algorithm

When the training data is unlabeled Semi-supervised Learning Algorithm When both supervised and unsupervised learnings are combined together by using labeled and unlabeled data, to get the most out of both ways Learning Algorithms : Tasks Binary Classifier Classify a certain object as belonging or not belonging to a certain category :

Email Filtering (Spam / Not Spam) Multi-Class Classier Match a certain object against several classes or Regression Predicting a value for an object. Priority level for an incoming email categories.

Available ML Algorithms Naive Bayes Probabilistic supervised learning method. Calculates the probability of an item belonging to a certain class. Was used for sexual predation detection. Nearest Neighbor Estimators A simple estimator . Uses distance between data instances, in order to map a certain instance to its closest distance neighbor.

Available ML Algorithms (Cont.) Support Vector Machine (SVM) Supervised algorithm. A binary classifier.

Assumes a clear distinction between data samples. Tries to nd an optimal hyper plane that maximizes the margin between classes. Decision Tree

Supervised learner. Classify data using a command and conquer approach. An implementation is C4.5 algorithm . Was used by Santos et al. and Reynolds. II. Background C. Natural Language Processing NLP Definition Linguistics + Artificial Intelligence +Computer Science.

Used to make computers capable of understanding the natural unprocessed language spoken between humans. Extracting grammatical structure and meaning from input. NLP Areas include: Acoustic Phonetic Morphological Syntactic Semantic - Pragmatic II. Background D. Performance Measures

Performance Measures Definition Evaluation metrics at first were adapted in Information Retrieval (IR). Then extended to other computer science fields such as ML. Measures Available Recall Proportion of returned documents (or values) which are relevant (or correct) out of all relevant documents returned and not returned.

Also known as Sensitivity of a system. Precision Proportion of returned documents (or values) which are relevant (or correct) RlRt. Also known as Accuracy of a system. Measures Available F-Measure Proposed by van Rijsbergen in 1979.

Weighted harmonic mean of precision and recall. Overcome the negative correlation between Precision and Recall. F1 Special case of F- measure with =1. 0 III. Previous Work A. Cyberbullying Detection

Methods of Detection Filtration Methods Automatic Detection Has to be employed by social networking platforms, in order to automatically delete or shade profane words.

Uses Machine Learning and other techniques. Limited by its inability for detecting subtle language harassment. Has to be manually installed. All the rest of Previous Work talks about automatic detection. Previous Work in Automatic Detection (Topics)

Subtle Language Detection Dinakar et al. Common sense reasoning to detect cyberbullying content. Dataset built from Youtube and Formspring for training and testing. Used Unigrams, profane words, tf-idf weighting scheme, Ortony Lexicon for negative effect, Part-of-speech tags for commonly occurring bigrams, and Label Specic Features for the feature set. SVM

Yin et al., tf-idf for local features. Dadvar et al., they proved including context (such as gender) enhances detection. Bullying on Social Networks Santos et al. Detect and associate fake profiles on twitter. Bayzick, Kontostathis and Edwards Proposed the BULLYTRACER software

Detected cyberbullying in chat rooms 58.63% of the time Chen et al Proposed Lexical Syntactic Feature-based Detect harassment in online posts. Used semantic analysis and NLP techniques. Fuzzy Logic and Genetic algorithms Nandhinia and Sheebab Proposed a new system using those two methods.

Achieved better Accuracy, F1-measure and Recall than previous fuzzy methods. Previous Work in Automatic Detection (Researches) Nahar, Li and Pang Tf-idf weighting scheme for building features. Building a network of victims and predators.

Chayan and Shylaja Enhanced the performance of cyberbullying detection through looking at comments from peers. Using supervised ML and logical regression. Didnt detect sarcasm. Hosseinmardi et al. Distinguished between cyberbullying and cyber aggression. Proved that Linear SVM enhances classification to 87%. Used features other than text : Images for better detection.

Potha and Maragoudakis Used Window of Time. Time series model and SVM for Feature selection. SVD for Feature reduction. DTW for matching time series collections. III. Previous Work B. Arabic Language

Arabic Language Characteristics Complex morphological nature. A script language which is read and written from right to left. Constituting of 28 alphabet letters. Diacritics : representing vowels. Arabic Diglossia : Classical Arabic Modern Standard Arabic (MSA) Dialects

Arabizi (Or Arabish) Key Phrase Extraction Ghaleb Ali and Omar. Used Machine Learning. SVM, Linear Logistic Regression and Linear Discriminant Analysis. Proved that SVM was best in the three algorithms for key phrase extraction. Arabic Named Entity Extraction Shaalan et al

Proposed Named Entity Recognition for Arabic (NERA). Achieved satisfactory performance. recall : 86.3%, precision 89.2% and F1 87.7%. Spam On Emails: El-Halees, on pure English, pure Arabic and mixed collections of emails. Several ML techniques were used, including SVM, NB, k-Nearest Neighbor (k-NN) and Neural Networks. Proved SVM better on English.

Proved Stemming for Arabic enhances classification. On Social Networks Sentiment Analysis Done on Arabic Facebook Comments by Hamouda Used SVM, NB and Decision Trees for classification. Best performance achieved by SVM : 73.4%. Done on Arabic Tweets by Duwairi et al.

handling Dialects. Used NB, SVM and K-NN. Best accuracy from NB. Done on Arabizi also by Duwairi et al Converted Arabizi to Arabic first. Applied SVM and NB SVM outperformed NB. Stemming

Khojas Stemmers and Light Stemmers. Gadri and Moussaoui elaborated a multilingual stemmer. IV. Future Work The Vision The plan to use NLP and ML to build a system to detect Cyberbullying written in Arabic, Arabizi or English. Building on previous work in Arabic and English NLP to process data. Data will consist of tweets and Facebook comments from the Middle East region. It will be used to train and test ML classifiers.

References [1] K. Poels, A. DeSmet, K. Van Cleemput, S. Bastiaensens, H. Vandebosch and I. De Bourdeaudhuij, "Cyberbullying on social network sites. An experimental study into bystanders," Cyberbullying on social network sites, vol. 31, p. 259271, 2014. [2] S. S. Kazarian and J. Ammar, "School Bullying in the Arab World: A Review," The Arab Journal of Psychiatry , vol. 24, no. 1, pp. 37 - 45, 2013. [3] ICDL, "Cyber Safety Report: Research into the online behaviour of Arab youth and the risks they face," ICDL Arabia, 2015. [4] K. DINAKAR, B. JONES, C. HAVASI, H. LIEBERMAN and R. PICARD, "Common Sense Reasoning for Detection, Prevention,and Mitigation of Cyberbullying," in ACM Transactions on Interactive Intelligent Systems, NY, September 2012. [5] O. f. V. o. C. National Crime Prevention Council, "Cyberbullying Tip Sheets," National Crime Prevention Council, 2016. [Online]. Available: http://www.ncpc.org/topics/cyberbullying/cyberbullying-tip-sheets/. [Accessed 10 June 2016]. [6] N. Willard, "Educators Guide to Cyberbullying and Cyberthreats," Center for Safe and Responsible Internet Use, 2007. [7] N. Samaneh, A. Masrah, M. Azmi, M. S. Nurfadhilna, A. Mustapha and S. Shojaee, "13th International Conrence on Intelligent Systems Design and Applications (ISDA)," in A Review of Cyberbullying Detection . An Overview, 2013.

[8] D. Mann, "Emotional Troubles for 'Cyberbullies' and Victims," WebMD Health News, 6 July 2010. [Online]. Available: http://www.webmd.com/parenting/news/20100706/emotional-troubles-for-cyberbullies-and-victims. [Accessed 24 August 2015]. [9] T. M. Mitchell, "The Discipline of Machine Learning," CMU-ML-06-108, Pittsburgh, July 2006. [10] P. Kulkarni, Reinforcement And Systemic Machine Learning For Decision Making, New Jersey: IEEE, WILEY, 2012. [11] P. FLACH, MACHINE LEARNING The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, 2012. [12] D. Vilario, C. Esteban, D. Pinto, I. Olmos and S. Len, "Information Retrieval and Classication based Approaches for the Sexual Predator Identication," Faculty of Computer Science, Mexico. [13] H. Jos Mara Gmez and A. A. Caurcel Diaz, "Combining Predation Heuristics and Chat-Like Features in Sexual Predator Identication," 2012. [14] A. S. a. S. Vishwanathan, Introduction to Machine Learning, Cambridge: Cambridge University Press, 2008. [15] I. Santos, P. G. Bringas, P. Galan-Garca and J. Gaviria de la Puerta, "Supervised Machine Learning for the Detection of Troll Proles in Twitter Social Network: Application to a Real Case of Cyberbullying," DeustoTech Computing, University of D [16] I.-S. Kang, . C.-K. Kim, . S.-J. Kang and S.-H. Na, IR-based k-Nearest Neighbor Approach for Identifying Abnormal Chat Users, 2012. [17] C. M. a. G. Hirst, Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features, 2012.

[18] D. E. L. a. A. B. Javier Parapar, "A learning-based approach for the identification of sexual predators in chat logs," 2012. [19] Ron Kohavi and R. Quinlan, "Decision Tree Discovery," 1999. [20] K. Reynolds, "Using Machine Learning to Detect Cyberbullying," 2012. [21] S. Ahmad, "Tutorial on Natural Language Processing," Artificial Intelligence (810:161) Fall 2007. [22] V. Gupta, "A Survey of Natural Language Processing Techniques," vol. 5, 01 Jan 2014. [23] B. MANARIS, "Natural Language Processing: A HumanComputer Interaction Perspective," vol. 47, no. pp. 1-66, 1998.. [24] E. Cambria and B. White, "Jumping NLP Curves: A Review of Natural Language Processing Research," IEEE ComputatIonal IntEllIgEnCE magazIne, May 2014. [25] C. Surabhi.M, "Natural Language Processing Future," in International Conference on Optical Imaging Sensor and Security, Coimbatore, Tamil Nadu, India, July 2-3, 2013. [26] G. G. Chowdhury, "Natural Language Processing," Annual Review of Information Science and Technology, vol. 37, no. 0066-4200, pp. 51-89, 2003. [27] E. Cambria, Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine, University of Stirling, Scotland, UK, 2011.

[28] M. Grassi, E. Cambria, A. Hussain and F. Piazza, "Sentic Web: A New Paradigm for Managing Social Media Affective Information," Cogn Comput (2011) 3:480489. [29] W. E. Webber, Measurement in Information Retrieval Evaluation ( Doctor of Philosophy), The University of Melbourne, September 2010. [30] C. J. v. RIJSBERGEN, INFORMATION RETRIEVAL, University of Glasgow. [31] N. Chinchor, "MUC-4 EVALUATION METRICS," in Fourth Message Understanding Conference, 1992. [32] Y. Sasaki, "The truth of the F-measure," University of Manchester, 26th October, 2007. [33] "Arabic chat alphabet," 23 May 2016. [Online]. Available: https://en.wikipedia.org/wiki/Arabic_chat_alphabet. [Accessed 2 June 2016]. [34] WatchGuard, "Stop Cyber-Bullying in its Tracks - Protect Schools and the Workplace," WatchGuard Technologies, 2011. [35] "https://blog.barracuda.com/2015/02/16/3-ways-the-barracuda-web-filter-can-protect-your-classroom-from-cyberbullying/". References Cont.

[36] "Internet Monitoring and Web Filtering Solutions," PEARL SOFTWARE, 2015. [Online]. Available: http://www.pearlsoftware.com/solutions/cyberbullying-in-schools.html. [Accessed 2 June 2016]. [37] V. Nahar, X. Li and C. Pang, "An Effective Approach for Cyberbullying Detection," in Communications in Information Science and Management Engineering, May 2013. [38] "Perverted Justice," Perverted Justice Foundation, [Online]. Available: http://www.perverted-justice.com/. [39] "Amazon Mechanical Turk," 15 August 2014. [Online]. Available: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMechanicalTurkGettingStartedGuide/ SvcIntro.html. [Accessed 2 June 2016]. [40] S. Garner, "Weka: The waikato environment for knowledge analysis," New Zealand, 1995. [41] "tf-idf: A single Page Tutorial," [Online]. Available: http://www.tfidf.com. [Accessed 13 May 2016]. [42] K. Dinakar , R. Reichart and H. Lieberman, "Modeling the Detection of Textual Cyberbullying," Cambridge, 2011. [43] V. S. Chavan and Shylaja S S , "Machine Learning Approach for Detection of Cyber-Aggressive Comments by Peers on Social Media Network," in International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2 [44] M. Dadvar, D. Trieschnigg, R. Ordelman and F. De Jong, "Improving cyberbullying detection with user context," 2013.

[45] D. Yin, Z. Xue, L. Hong, B. D. Davidson, A. Kontostathis and L. Edwards, "Detection of Harassment on Web 2.0," Madrid, Spain, April 21, 2009. [46] J. Bayzick, A. Kontostathis and L. Edwards, "Detecting the Presence of Cyberbullying Using Computer Software," Koblenz, Germany, June 14-17, 2011. [47] Y. Chen, S. Zhu, Y. Zhou and H. Xu, "Detecting Offensive Language in Social Media to Protect Adolescent Online Safety," 2012. [48] Z. Xu and S. Zhu, "Filtering Offensive Language in Online Communities using Grammatical Relations," Redmond, Washington, US, July 13-14, 2010. [49] H. Hosseinmardi, S. Arredondo Mattson, R. IbnRaq, R. Han, Q. Lv and S. Mishra, "Detection of Cyberbullying Incidents on the Instagram Social Network," 2015. [50] N. Potha and M. Maragoudakis, "Cyberbullying Detection using Time Series Modeling," 2014. [51] K. Baker, "Singular Value Decomposition Tutorial," 2013. [52] M. Muller, "Dynamic Time Warping," in Information Retrieval for Music and Motion, Springer, 2007, pp. 69 - 84. [53] B. Nandhinia and J. Sheebab , "Online Social Network Bullying Detection Using Intelligence Techniques," 2015. [54] M. A. Attia, Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation, Doctor of Philosophy in the Faculty of Humanities, 2008.

[55] K. Darwish and W. Magdy, "Arabic Information Retrieval," vol. 7, no. 4, 2013. [56] A. FARGHALY and K. Shaalan, "Arabic Natural Language Processing:Challenges and Solutions," vol. 8, December 2009. [57] "12 Arabic Swear Words and Their Meanings You Didnt Know," [Online]. Available: http://scoopempire.com/swear-words-meanings-around-middle-east/#.V0fdjPl96M9. [Accessed 2 June 2016]. [58] N. Ghaleb Ali and N. Omar, "Arabic Keyphrases Extraction Using a Hybrid of Statistical and Machine Learning," in International Conference on Information Technology and Multimedia (ICIMU), Putrajaya, Malaysia, 2014. [59] T. Haifley, "Linear Logistic Regression: An Introduction," IEEE, 2002. [60] G. J. McLACHLAN, "Discriminant Analysis and Statistical Pattern Recognition," Wiley InterScience, New Jersey, 2004. [61] K. Shaalan and H. Raza, "Arabic Named Entity Recognition from Diverse Text Types," Berlin Heidelberg, GoTAL 2008. [62] A. El-Halees, "Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques," The International Arab Journal of Information Technology, vol. 6, no. 1, 2009. [63] T. M. COVER and P. E. HART, "Nearest Neighbor Pattern Classification," IEEE TRANSACTIONS ON INFORMATION THEORY, vol. 13, no. 1, 1967. [64] A. E.-D. A. Hamouda and F. E.-z. El-taher, "Sentiment Analyzer for Arabic Comments System," (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 4, no. 3, 2013.

[65] R. M. Duwairi, R. Marji, N. Shaban and S. Rushaidat, "Sentiment Analysis in Arabic Tweets," in 5th International Conference on Information and Communication Systems (ICICS), 2014. [66] A. Al-Zyoud and W. A. Al-Rabayah, "Arabic Stemming Techniques: Comparisons and New Vision," in Proceedings of the 8th IEEE GCC Conference and Exhibition, Muscat, Oman, 2015. [67] S. Khoja and R. Garside, "Stemming arabic text," Computing Department, Lancaster University, Lancaster, UK, 1999. [68] L. S. Larkey, L. Ballesteros and M. E. Connell, "Light Stemming for Arabic Information Retrieval," in Arabic Computational Morphology, book chapter, , , Springer, 2007. [69] S. Gadri and A. Moussaoui, "Information Retrieval: A New Multilingual Stemmer Based on a Statistical Approach," in 3rd International Conference on Control, Engineering & Information Technology (CEIT), 2015. [70] Hewlett-Packard Development Company. L.P., 2013. [Online]. Available: http://www.autonomy.com/html/power/idol-10.5/index.html. [Accessed 2 June 2016].

Recently Viewed Presentations

  • Mobile phone security &Cryptograph - Carleton University

    Mobile phone security &Cryptograph - Carleton University

    : integrated circuit card identifier. This serial number is used to identify the subscriber identity module. 3.KI: The Ki is the individual subscriber authentication key. It is a 128-bit number that is paired with an IMSI when the SIM card...
  • "Integrating Cultures and Comparisons into Middle School ...

    "Integrating Cultures and Comparisons into Middle School ...

    Working in pairs to do mechanical grammatical exercises out of the textbook does not constitute interpersonal communication. Information Gap activities does not constitute as an interpersonal activity either since students will know in advance what the other students will respond.
  • Challenges of Computational Verification in Social Media Christina

    Challenges of Computational Verification in Social Media Christina

    Related: Web & OSN Spam. Web spam is a relatively old problem, wherein the spammer tries to "trick" search engines into thinking that a webpage is high-quality, while it's not (Gyongyi & Garcia-Molina, 2005).
  • CS 2770: Computer Vision

    CS 2770: Computer Vision

    Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Andrej Karpathy. Announcements. PS3 due 6/4 (tonight), 11:59 pm. Review session during Thurs lecture. Post questions on piazza. Final exam 6/7 (Friday), 1-3 pm.
  • Welcome to the College of Nursing Baccalaureate Program

    Welcome to the College of Nursing Baccalaureate Program

    Claim your UMnetID. Non-academic requirements for admission - Deadline for submission: July 15th Current contact information. University policies. Final Completion of degree requirements is the responsibility of the student. The first thing you need to do is to claim your...
  • Yearbook Vocabulary!

    Yearbook Vocabulary!

    Yearbook Vocabulary! Table of Contents This will appear in the front of the book and list all sections and which page numbers each section covers. Also ties into theme. Copy This is the text that appears on the pages of...
  • Question 6 - quack.varndean.ac.uk

    Question 6 - quack.varndean.ac.uk

    College VLE we used this to make sure that we were sticking to the guidelines to the filming basis. YouTube this was used to gather the inspiration for the for the main characters costume. BBFC we used this when looking...
  • Building a Training Network: the Technology Liaison Program

    Building a Training Network: the Technology Liaison Program

    Title: Building a Training Network: the Technology Liaison Program Subject: Educause Annual Conference 2004 Author: Jeff Overholtzer Last modified by