Introducing Data Lakes

Introducing
Data Lakes
Pravin Singh

Why?
• Once upon a time, there was a Data Warehouse
– Data pre-categorized at the point of entry
– Data well organized, but in silos
– Common, predetermined data model for “optimal” analysis
– Upfront DB modelling and ETL effort
– A single-source-of-truth, but at the cost of flexibility
– Complex system with low tolerance for human error, IT help required for even
the smallest enhancements
– Not to forget, the high costs
• Then came the Big Bang, of Information!
• Data Lake to the Rescue

Benefits
• Breaks the silos
• Flexible Data Model (Schema on Read)
• Data Provenance
• No upfront modeling and data cleansing
• Low cost of ownership
• Focused on exploration, not on operations
• Can work as staging area for ETL

Pitfalls and Challenges
• Data Lake as Data Graveyard
• Metadata
• Governance
• Information Lifecycle Management (ILM)
• Security and Privacy
• Training

Four Stages of Data Lake Adoption
1: Life Before Hadoop
– Applications stand alone with their databases
– Some applications contribute data to a data warehouse
– Analysts run reporting and analytics in data warehouse

2: Hadoop is Introduced
– Applications contribute data to Hadoop
– Hadoop runs batch MapReduce jobs
– Hadoop used for ETL into warehouse or analytic databases
– Hadoop data reintroduced into applications

3: Growing the Data Lake
– Newly built systems center around Hadoop by default
– Applications use each other’s data via Hadoop
– Hadoop becomes a default data destination; governance and metadata become
important
– Data warehouse use becomes the exception, where legacy or special
requirements dictate

4: Data Lake and Application Cloud
– New applications are built on a Hadoop application platform around the data lake
– Hadoop matures as an elastic distributed data computing platform
– Data lake adds security and governance layers
– Data availability increases, application deployment time decreases
– Some apps still have special or legacy needs and execute independently

Introducing Data Lakes

More Related Content

What's hot

Viewers also liked

Similar to Introducing Data Lakes

Recently uploaded

Introducing Data Lakes