Adaptive neuro-fuzzy system for malware detection
Malware, which are computer programs designed to infiltrate and disrupt computing operations, is one of the security challenges faced by Internet users. Most malware detection techniques such as signature-based, specification-based and static-based are faced with high false positive, low accuracy and inability to detect both zero day and polymorphic malware. In this research work, an Adaptive Neuro Fuzzy System for Malware Detection (ANFSMD) was proposed to address these problems. ANFSMD utilizes both the Application Programming Interface (API) calls and operation codes to study the behaviour of Portable Executable (PE) files. The PE files were disassembled into low-level codes and the identified features were grouped for efficient detection. Five features, selected using weighted average, were used for the fuzzification. Using a bell membership function, 243 rules were generated for predicting the behaviours of the PE files. A normalization technique was used to combine the various fuzzy sets into one. Back propagation algorithm was used for the training and the resulting errors from outputs were used to dynamically modify inputs for improved outcomes. The implementation of ANFSMD was carried out using Java Programming Language, Interactive Disassembler and Matlab because of their supports for implementation of micro-programs. A total of 20,750 malware programs from VX Heaven public dataset and 15,000 clean files from Filehippo were used for the evaluation. The result showed that Adaptive Neuro-Fuzzy Inference System (ANFIS) has a detection rate of 97.96%, Naïve Bayes has detection rate of 93.88%, Random Forest has 84.78% and Support Vector Machine has 92.87. The proposed method was also compared with a Control Flow Graph (CFG), which is one of the best existing techniques that adopted the use of API calls. The evaluation showed that the detection rate, false positive rate and overall accuracy for CFG were 93.9%, 9.3% and 92.4%, while the proposed method achieved 98%, 3.9% and 97% respectively. These results showed that ANFSMD can be deployed for efficient detection of all categories of malware.
Keywords: Malware, API, N-grams, ANFIS, Features extraction.