IMG_20160615_192106 Metadata extracted from Hillary Clinton's emails
This is a Python dataframe with the data from the previous image, i.e. Hillary Clinton's emails metadata. As we can see, the MetaDataSubject and MetaDataTo fields contain some familiar names and topics that made the news...
At the second natural language processing we analyzed Hillary Clinton's emails, which Diana, our instructor, got from Kaggle -- this was one Kaggle datasets that you can use for data science projects. Each email has many metadata fields, such as Subject, To, From, SenderPersonId etc., and NLTK can help us analyze them all. This was the spring of 2016, when we did not yet fully know what significance those emails will have half a year from then.
This was the second of the four meeting series on natural language processing, hosted by Women Who Code Austin at Rackspace. The instructor, Diana, introduced us to the basics of natural language processing, and see what kinds of simple text analysis we can do with Python Natural Language Tookit (NLTK). Examples of such actions are reading in the text, tokenizing it, and tagging parts of speech, which can involve a lot of interesting ambiguity.
Then we ventured deeper into natural language processing to discuss where and how it is used, including such fields as sentiment analysis. Diana talked about challenges present in those fields, such as for example determining similarity between concepts. We need to be able to handle that so as to extract accurate meanings from texts. This is where ontologies can be handy.