Ram Maheshwari Logo Image
Bioinformatics and Artificial Intelligence Team

Requests classification in the customer service area for software companies using Machine Learning and Natural Language Processing

Research project intended to generate Machine Learning algorithms to classify text for companies, this project includes a data preprocessing using Natural Language Processing techniques.

Project Image

Project Overview

Artificial Intelligence (AI) is one of the components recognized for its potential to transform the way we live today radically. It makes it possible for machines to learn from experience, adjust to new contributions and perform tasks like human beings. The business field is the focus of this research. This paper proposes implementing an incident classification model using Machine Learning (ML) and Natural Language Processing (NLP). The application is for the technical support area in a software development company that currently resolves customer requests manually. Through ML and NLP techniques applied to company data, it is possible to know the category of a request given by the client. It increases customer satisfaction by reviewing historical records to analyze their behavior and correctly provide the expected solution to the incidents presented. Also, this practice would reduce the cost and time spent on relationship management with the potential consumer. This work evaluates different Machine Learning models, such as Support Vector Machine (SVM), Extra Trees, and Random Forest. The SVM algorithm demonstrates the highest accuracy of 98.97% with class balance, hyper-parameter optimization, and pre-processing techniques.

Results Highlight

Results presented in this section correspond to experiments performed during the development of this project.

Performance Metrics
Model Accuracy [%] F1-Score [%] Cross-validation [%]
Support Vector Machine 98.97 98.95 98.98 ± 0.07
Extra Trees 98.77 98.74 98.77 ± 0.06
Random Forest 98.56 98.53 98.60 ± 0.08

Tools Used

Python
Scikit-Learn
NLTK :: Natural Language Toolkit
Imbalanced-learn
NumPy
Matplotlib