Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the koko-analytics domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /hum/web/sitestest.hum.uu.nl/htdocs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the formidable domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /hum/web/sitestest.hum.uu.nl/htdocs/wp-includes/functions.php on line 6114
Human Data Science (HDS) | Shiva Nadi: Text Classification and Named Entity Recognition for Online News Data

Human Data Science (HDS)

Agenda

14 March 2019
15:00 - 16:00
Sjoerd Groenmangebouw B1.09

Shiva Nadi: Text Classification and Named Entity Recognition for Online News Data

Thursday 14/03/2019 at 15:00 in room B1.09

 

The next MSDSlab meetings will be on Thursday the 14th of March and will be presented by Shiva Nadi of Utrecht University.  She will  provide a brief overview of the work she is doing on text mining and present some of the challenges she is facing.

 

AbstractText Classification and Named Entity Recognition for Online News Data

Nowadays, news websites provide information every day for millions of users. But with the continuous development of information technology, the amount of unstructured news data in social sciences is increasing. How to organize the text and make automatically text classification is still a challenge for social science applications. Text classification is a smart way of classifying text into categories. Using machine learning algorithms to automate these tasks, makes the whole process fast and efficient. This project, mainly makes a research about the text news classification. It proposes a model based on the Scikit-Learn Python package. We feed labeled data to the machine learning algorithm to train on with the pre-defined categories. Due to the noises and high-dimension, this model uses preprocessing steps to reduce text dimension and get features. At the same time, the work also uses a tool for Named Entity Recognition, getting related features such as name of persons, organizations, locations and  so forth, out of text. During the testing phase, the algorithm is fed with unobserved data and classifies them into categories based on the training model.