Data Visualization Is Key To Analyzing Unstructured Data
Data visualization has evolved dramatically in recent years, with organizations utilizing increasingly impressive software to present the vast amount of information they are collecting not so much as basic bars and charts, but as interactive spectacles to be engaged with by the audience, whether it be decision makers in a boardroom or children in a science museum.
One of the most important challenges facing those in the data visualization space is what they can do with unstructured data. Unstructured data is any data that does not fit into relational databases, including videos, powerpoint presentations, company records, social media, RSS, documents, and text – essentially, the vast majority of communication. It is estimated that 80% of the world’s data is unstructured, and it is growing rapidly, with IDC estimating that unstructured data will grow from 9.3 zettabytes in 2015 to 44.1 zettabytes in 2020. Its importance to enterprise is growing equally as rapidly. Ranko Cosic, Guest Lecturer and Researcher (Business Analytics) at The University of Melbourne, for one notes: ‘The way I see the use of data changing in the next few years is that, although organizations will continue to gather and analyse structured data stored in data warehouses, traditional and relational databases, there will be more focus on the gathering and analysis of unstructured data in the form of audio, images, music, text, video and interactive content from traditional and social media websites.’
The reason unstructured data is so important is the context it provides. While analysis of structured data may be able to tell what it is that’s happening, it is primarily through analyzing the complex streams of unstructured data that you will get the why. Structured data holds important numbers about revenue performance and operational metrics, but the written text of unstructured data can show sentiment about company’s products, information about its staff, and provide its competitive edge.
Analysis of unstructured data is, however, a relatively new science, with its scale and complexity previously rendering it impenetrable to humans. Efficiently processing unstructured data is the work of many a startup, the majority of whom are now focused on unlocking it using machine learning algorithms as opposed to how they did it before, by converting unstructured data into its structured form. They are automating both analysis and visualization, so companies can get instant results from their unstructured datasets. BrainSpace and DeepDive are just two of those to have made significant progress, and both have been the subject of large funding rounds. Dave Copps, CEO of Brainspace, told us that ‘Before, all we really did with unstructured data was search, get a load of documents together and hack at it with keywords. Technologies like Tableau and Quickview were always good for looking at structured data, but those that tried to use unstructured data were really just taking it out and putting it into structured data platforms. Once you pull words out of a document, you destroy their context. So, say you’re analyzing resume´s. If you take the Java out of a software developers CV, you don’t know if that’s only in there because the person has said ‘I suck at Java.’ What we’re doing is, rather than just analysing words, we’re looking at the whitespace between the words – the context.’
However, while we are making some great strides with the analysis, we are still not actually using the information to its full potential. In recent 451 Research commissioned by data-in-motion specialist Logtrust, 89% of IT managers who responded said they have made structured data initiatives a high priority for their organizations, yet just 43% said the same of unstructured data initiatives. Key to changing these attitudes is data visualization. Companies like BrainSpace offer engaging, interactive automated visualizations, but there is still a lot of unexplored potential. Walter Storm, Chief Data Scientist at Lockheed Martin, notes that: ‘Technology has surely made unstructured data easier to analyze – the big question is: ‘how useful is the analysis?’ There is a lot of art to topic modelling, graph analysis, and even dimensionality reduction and visualization. How many features? Which ones? How many layers in my deep net? How many nodes? What kernel width gives me good separation? What is the relationship among neighbors in 2nd or 3rd order derived feature space? What in the world did this algorithm just learn? What was my hypothesis again?’
The discovery of novelty is a wonderful thing, but it is entirely pointless to a business if you can’t convince decision makers of its existence so that they can take appropriate action. Data visualization is the best way of doing this, revealing intricate structure in data that cannot be absorbed in any other way. The way the human brain processes information means that by communicating it to people visually and enabling them to engage with it, you can help provide a narrative to the patterns you have found and even discover insights for themselves. It also enables a greater range of people to make sense of the data more easily, which should help increase data democratization throughout the organization and lead to more insights.
Visualizing unstructured data presents a unique challenge compared with traditional numerical data, and it is still in its infancy. At the recent Data Visualization summit in San Francisco, Ken Cherven, Data Visualization Specialist at General Motors, outlined an example process using all of the State of the Union addresses throughout history. His results show why visualizing unstructured data is so necessary to understanding it, and the exciting opportunities it offers for us to be creative with how we display information previously thought impossible and learn from it.