International Journal of Scientific Research and Engineering Development

International Journal of Scientific Research and Engineering Development

( International Peer Reviewed Open Access Journal ) ISSN [ Online ] : 2581 - 7175

IJSRED » Archives

Submit Your Manuscript OnlineIJSRED

Performance Analysis of Different Word Embedding Models for Text Classification

    International Journal of Scientific Research and Engineering Development (IJSRED)

Full Text:

Published Issue : Volume-3 Issue-6
Year of Publication : 2020
Unique Identification Number : IJSRED-V3I6P86
Authors : Ajose-Ismail B.M, Abimbola O.V, Oloruntoba S.A
: Click Here

Abstract :

The task of classifying an unstructured document tothe proper category to which it belongs to is becoming a herculean task because of the steady but exponential growth in the volume of information shared over the internet. Text classification is the task of allocating the documents into one or more number of predefined categories. In general, this technique is used in the field of information retrieval, text summarization and, text extraction. From extant literature, the performance of text classification system depends on adequate textual representation of the text document. To perform the classification task, transformation of text into feature vectors is a very important stage. Several textual representation techniques such as bag of words, n-gram and topic models have been proposed by authors to capture the real semantics of web documents but are fraught with several challenges such as semantic mismatch and multiple meanings of words.Thus, this paper proposes word embedding’s to solve the document representation problem in text classification systems. In order to achieve this task, this research work utilizes different word embedding algorithms to represent documents which are also used in conjunction with classification algorithms to determine the most effective embedding model. Results obtained confirms the earlier assumption that Word2Vec performs robustly on very high dimensional text such as web documents, it also captures the real semantics of the web document The performance metrics employed in this research work are Precision, f-measure and accuracy.