The delicate art of the data lake

Jen Miller, CIO Dive

In an ideal world, a data lake should keep data in a way that makes it readily available. That’s why the tools put into the lake are so important. Otherwise, it’s just a blob.

Companies have become obsessed with data, and for good reason: collecting the right data, and knowing how to analyze it, can unlock potential a company never knew it had.

One word executives will hear tossed around during almost any discussion about big data: “data lakes.”

“At a very high level, a data lake is simply a storage component where you can put structured or unstructured data in its raw format,” Shaun Bierweiler, vice president of U.S. public sector at data software company Hortonworks, told CIO Dive in an interview. “When you dig a little bit lower and get use case and applicability of [data], that’s where the magic really happens.”

Diving into the data deep end

How helpful a data lake is depends on what’s done with raw data.

“One person’s data lake is another person’s data swamp,” Colin Britton, chief strategy officer at Devo, told CIO Dive in an interview.

A lot of companies are storing data, but often don’t know what to do it, which is where things can get swampy.

“Most of these big companies have massive centralized IT assets like data warehouses and data lakes, but it’s very hard to access it because they’ve never been built with this specific business purpose at the time,” Prat Moghe, CEO of data platform company Cazena, told CIO Dive in an interview.

In an ideal world, a data lake should keep data in a way that makes it readily available, and takes tasks that would otherwise take weeks or even months to complete.

That’s why the tools put into the lake are so important. Otherwise, it’s just a blob.

Hortonworks gets a lot of calls from companies in that kind of situation, said Bierweiler. They have a lot of data, but “they either have the inability to meet the requirement of their mission, or they’re no longer getting results in a timely manner for the results they’re trying to accomplish.”

Being able to do these things quickly can prepare a company for when they need information in the future for a purpose they may not know about yet, said Britton. That’s especially true of security threats.

“We don’t know how they’re going to look, what the sequence of events will lead to that threat or breach,” he said. “We don’t know what that looks like in the future, so we collect the information and are able to look at it in a way that makes sense in a future state.”

Teaching an old company new tricks

Data lakes aren’t just for new and upcoming companies, either.

Carlson Wagonlit Travel (CWT) is over 100 years old and today a leading corporate travel airline – big enough that they serve enough travelers to fill up almost 200 Boeing 747s every day.

CWT wanted to learn from traveler behavior and deliver personalized services, which would require bringing together and analyzing existing customer data, transaction data, traveler comments and external market data from more than 1,600 data sources.

https://www.ciodive.com/news/the-delicate-art-of-the-data-lake/539382/