IMG_20180428_104732 Natural Language Processing: summarization
Becky explains that summarization can be extractive or abstractive. Extractive summarization selects a few representative sentences from the text, while abstractive summarization creates a summary of the text.
As an example, Becky gave a phrase: "The Army Corps of Engineers, rushing to meet President Bush's promise to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps last year despite warnings from its own expert that the equipment would fail during the storm, according to documents obtained by the Associated Press."
Extractive summarization would extract such phrases from it as:
- Army Corps of Engineers
- President Bush
- New Orleans
- defective flood-control pumps
In contrast, abstractive summarization would generate such phrases as:
- government agency
- presidential orders
- defective equipment
- storm preparation
- hurricane Katrina
I can't quite put my finger on it, but it seems that extractive summarization extracts names of specific entities, but not much information as to what happened to those entities or what did they do. But abstractive summarization seems to "understand" what those entities actually represent and what they do, and thereby extracts more "gist" from the paragraph. I could be wrong about it, of course.
According to Becky, extractive summarization is a mostly solved problem by now. TextRank algorithm takes care of it. But abstractive summarization is a very difficult, unsolved problem, though knowledge graphs help.
The short, work-life balance-respecting Natural Language Processing hackathon started out with the presentation from the data scientist Becky. She first introduced us to the three cornerstone approaches of NLP -- summarization, topic modeling, and sentiment analysis.
The attendees arranged themselves into three teams along those lines. I ended up in the summarization team, which, as Becky explained, can be extractive or abstractive. We wrote Python code to summarize wine reviews, and by that I mean we called a bunch of functions from various Python packages. The results were mixed.
Here is a link to my whole article on the Women in Data Science Natural Language Processing hackathon.