Making sense of unstructured data is both useful and difficult. PINGAR's toolset can assist people in this task, and this blog post, fresh from our research lab, shows an example. We have analyzed thousands of emails circulated within the Enron organization and created an interactive visualization of trending topics discussed by the Enron employees during the company's good and bad times. The result of this research is a public demo and this blog post, summarizing some of the insights.
Keyword and summary extraction from Enron emails
The Enron emails went public during the investigation of the Enron Scandal by the US government. Since then this unique dataset has been popular among researchers . We decided to use it to demonstrate the power of the PINGAR technology.
For each email, we extracted the body and sent it to the GetKeywords method via the PINGAR API. We then computed statistics on how the keyword frequency changed over time, and which topics were trending at a particular month. The normalized and scaled keyword frequencies plotted on a time graph visualize these statistics. We have also used the open-source search engine Solr to provide the ability to search within the emails and the PINGAR API method GetQueryBasedSummary to generate a preview of search results. Additionally, we allow the comparison of a given pair of keywords for detecting dependencies between them.
November 2000 at Enron
Let's have a look at the resulting demo. A simple way of validating the resulting graphs would be to look at the trending topics for November 2000, the time of the US presidential elections, and indeed Al Gore and George W. Bush are trending. As well as Thanksgiving, given that it's November. Other keywords indicate internal events within the company (Performance Review, Due Date), as well as private emails sent within the company (nanny, popcorn).

Now that we know that the graphs contain valid information, let’s dive deeper into internal processes at Enron.
Products at Enron
Back then, the “America’s Most Innovative Company” wasn’t restricting itself to the original product (natural gas), but traded over 30 different products, including oil, paper, water, freight, broadband, steel and plastics. Compare natural gas and broadband or oil and water, and you will see the relative importance of these products within Enron over time.


Bankruptcy times
For those who know the Timeline of the Enron scandal, the relation between the SEC investigations and the bankruptcy of the company won’t be surprising:

However, we wondered what other topics that were trending during the peak times of SEC (October 2001) and bankruptcy (January 2002) and, more importantly, why. For October 2001, it is keywords like chief financial officer, being the person under attack, as well as the self-explanatory request for more information and asset management. For January 2002, it’s The New York Times covering the aftermath of the scandal and Kenneth Lay, who resigned from his position as a CEO that month under pressure. Another trending topic was the California energy crisis. It happened earlier, but only now it came out that it was partially cased by the illegal actions of Enron traders.
If a topic is trending, it means it was highly discussed in the emails, but in which light? Searching in emails and reading the summaries generated by PINGAR helps to understand the context. For example, while Al Gore was a trending topic in November 2000, the employees (and the mailing services they subscribed to) talked negatively about him.


Similarly, the trending of retirement plans in January 2002 is an indicator of employees’ worry. That month an email circulated among the employees “Demand Ken Lay Donate Proceeds from Enron Stock Sales” stating that Enron’s devaluated stock prices devastated many of them financially and wiped out their retirement plans.

Visualizing statistics computed using the structured data not only interesting, it also reveals interesting patterns and dependencies. However, the majority of data out there is unstructured: pure text. This is where PINGAR comes in. Using the PINGAR API any developer can turn text into structure. Next steps to discovery from there are easy and the results are impressive. Play around with the Enron demo to see what you discover! Then get started with the PINGAR API and look deeper into your own data.