Using a Bilingual Dictionary to Expand Topic Model Training Data

Using a Bilingual Dictionary to Expand Topic Model Training Data

After being interested in foreign languages and computer science for some time, I wanted to try a project related to both subjects. The purpose of this project was to try and find a way to artificially increase the amount of training data for Low Resource languages by exploiting bilingual dictionaries which have multiple translations possible between different words. While the results of the experiment showed that the method I came up with is not very promising, I got great experience working with a huge amount of data with limited hardware resources. scrapers and more to complete this project. This project was accepted to and presented at NCUR 2019 at Kennesaw state university.