International Journal of Scientific Research and Engineering Development( International Peer Reviewed Open Access Journal ) ISSN [ Online ] : 2581 - 7175 |
Performance Analysis of Different Word Embedding Models for Text Classification
International Journal of Scientific Research and Engineering Development (IJSRED) | ||
Published Issue : Volume-3 Issue-6 | ||
Year of Publication : 2020 | ||
Unique Identification Number : IJSRED-V3I6P86 | ||
Authors : Ajose-Ismail B.M, Abimbola O.V, Oloruntoba S.A | ||
: Click Here |
Abstract :
The task of classifying an unstructured document tothe proper category to which it belongs to is becoming a herculean task because of the steady but exponential growth in the volume of information shared over the internet. Text classification is the task of allocating the documents into one or more number of predefined categories. In general, this technique is used in the field of information retrieval, text summarization and, text extraction. From extant literature, the performance of text classification system depends on adequate textual representation of the text document. To perform the classification task, transformation of text into feature vectors is a very important stage. Several textual representation techniques such as bag of words, n-gram and topic models have been proposed by authors to capture the real semantics of web documents but are fraught with several challenges such as semantic mismatch and multiple meanings of words.Thus, this paper proposes word embedding’s to solve the document representation problem in text classification systems. In order to achieve this task, this research work utilizes different word embedding algorithms to represent documents which are also used in conjunction with classification algorithms to determine the most effective embedding model. Results obtained confirms the earlier assumption that Word2Vec performs robustly on very high dimensional text such as web documents, it also captures the real semantics of the web document The performance metrics employed in this research work are Precision, f-measure and accuracy.