IMG_20160420_195130 Metadata fields from Hillary Clinton's emails
At the second natural language processing meeting we analyzed Hillary Clinton's emails. Diana, our instructor, got from Kaggle -- this was one Kaggle datasets that you can use for data science projects. Each email has many metadata fields, such as Subject, To, From, SenderPersonId etc., and NLTK can help us analyze them all. This was the spring of 2016, when we did not yet fully know what significance those emails will have half a year from then.
This was the second of the five meeting series on natural language processing, hosted by Women Who Code Austin at Rackspace. The instructor, Diana, introduced us to the basics of natural language processing. She did several demos of simple text analysis one can do with Python Natural Language Toolkit (NLTK). Examples of such actions are reading in the text, tokenizing it, and tagging parts of speech, which can involve a lot of interesting ambiguity.
Then we ventured deeper into natural language processing to discuss where and how it is used, including such fields as sentiment analysis. Diana talked about challenges present in those fields, such as for example determining similarity between concepts. We need to be able to handle that so as to extract accurate meanings from texts. This is where ontologies can be handy.