In our next MünsteR R-user group meetup on Tuesday, July 9th, 2019, we will have two exciting talks about Word2Vec Text Mining & Parallelization in R!

You can RSVP here: https://www.meetup.com/de-DE/Munster-R-Users-Group/events/262236134/

Thorben Hellweg will talk about Parallelization in R. More information tba!

Maren Reuter from viadee AG will give an introduction into the functionality and use of the Word2Vec algorithm in R.

Text data in its raw form cannot be used as input for machine learning algorithms. Therefore, an information extraction method is required to process plain text into an appropriate representation. By exploiting the semantic and syntactic structure of the text data, the importance of a word can be defined and represented as a vector in a vector space. I.e. the vector can be seen as a numerical „importance“ value. There exist two predominant approaches to represent words as vectors: Either by using the word frequency (ngrams), or by using a prediction model to estimate the relatedness of words. The Word2Vec algorithm by Mikolov et al. belongs to the latter one. This talk will show the functionality of the algorithm and how it can be used in practice.

About Maren:

Maren Reuter is an IT-Consultant at viadee AG and part of the company’s Artificial Intelligence research group. She got her Master’s degree in Information Systems at the University of Münster with a focus in Data Analytics. In her Master thesis she dealt with text mining techniques to predict maintenance tasks in agile software projects. For this purpose, she used the Word2Vec algorithm to build a word vector representation model.