Comparative study on corpus development for Malay investment fraud detection in website

  • M.M. Din
  • N.H.H. Hashim
  • M.M. Siraj
Keywords: corpus development, information extraction, part recognition, fraud detection.


In the online world, fraudsterscaneasily manipulate people to gain something and usually for
monetary gain. Corpus development research can be use identify keywords used by fraudsters online to prevent the crime. The aim of this research is to develop a corpus for Malay investment fraud so that it can be used in detection and classification of investment fraud in Malay website and compare the most suitable technique. In this research, Part-of-Speech tagger (POS) and Named Entity Recognition (NER) tagger are selected. Proposed
methodology that are used in this research is corpus development, training and development of dataset using Naïve Bayes and performance evaluation. The dataset used in this research is online news archive and discussion forums. This research able to help the law enforcements agencies in collecting and notifying the keyword used by fraudsters so that they can take any
legal actions.

Keywords: corpus development; information extraction; part recognition; fraud detection.

Journal Identifiers

eISSN: 1112-9867