SlideShare a Scribd company logo
Building a Data Lake
An App Dev’s Perspective
GeekNight Hyderabad - March 8th 2017
Geetha Balasundaram
geethab@thoughtworks.com
© 2017 ThoughtWorks Technologies Pvt. Limited
ABOUT ME
Developer @ ThoughtWorks
Building a data lake in the enterprise ecosystem
Helping a retail business make sense of it ( data guided org )
Been part of web development space ( enterprise rewrite )
Equally startled like everyone else by the data engineering space
Share know-how’s and do-how’s from our team’s experience
snithish@thoughtworks.com
© 2017 ThoughtWorks Technologies Pvt. Limited
AGENDA
What is data in the true sense…
Data Warehouse in an enterprise ecosystem...
What is a data lake...
Data lake implementation in an enterprise ecosystem…
How to make effective use of a data lake: technology+process+people
Cluster Administration tool - Cloudera Manager
Pitfalls to avoid
© 2017 ThoughtWorks Technologies Pvt. Limited
Question ???
How did R.Ashwin perform in the last
Test match?
HIGH LEVEL
PROBLEM STATEMENT
© 2017 ThoughtWorks Technologies Pvt. Limited
COMPLEX HISTORICAL DATA
Why?
Exploit and derive as much new insights as possible
Match Made
Enterprise systems produce this nature of complexity
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA WAREHOUSE
https://martinfowler.com/articles/microservices.html
ETL
© 2017 ThoughtWorks Technologies Pvt. Limited
DID MICROSERVICES CAUSE THIS PROBLEM ?
Decentralised Data
https://martinfowler.com/articles/microservices.html
© 2017 ThoughtWorks Technologies Pvt. Limited
MICROSERVICES HELPED
Break down business unit
Break down complexity
Understand the nature of data
© 2017 ThoughtWorks Technologies Pvt. Limited
Question ???
R.Ashwin performed well ( 6/41 ) in yesterday’s match!
Complex historical data can quantify how well he has performed
Can we say why did he do well in this particular match?
What factors affected his enhanced performance?
© 2017 ThoughtWorks Technologies Pvt. Limited
FACT is a FACT
… even when we don’t know how it can be used
© 2017 ThoughtWorks Technologies Pvt. Limited
KEY DIFFERENCE
https://martinfowler.com/bliki/DataLake.html
© 2017 ThoughtWorks Technologies Pvt. Limited
What is a data lake?
© 2017 ThoughtWorks Technologies Pvt. Limited
LAKE is...
.. a large body of water in a more natural state.
The contents of the lake, stream in from a source to fill the lake,
and various users of the lake can come to examine, dive in, or
take samples
https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA LAKE is...
.. a large body of water data facts in a more natural state.
The contents of the lake, stream in from a source to fill the lake,
and various users of the lake can come to examine analyse, dive
in build models, or take samples use subset for specific use
cases
© 2017 ThoughtWorks Technologies Pvt. Limited
KEY DIFFERENCE
https://martinfowler.com/bliki/DataLake.html
© 2017 ThoughtWorks Technologies Pvt. Limited
Implementation
© 2017 ThoughtWorks Technologies Pvt. Limited
OUR IMPLEMENTATION - TECH STACK
DATA SOURCE
DATA INGESTION
DATA LAKE
DATA MARTS
DATA ANALYSIS
Staging / Queue
© 2017 ThoughtWorks Technologies Pvt. Limited
© 2017 ThoughtWorks Technologies Pvt. Limited
How to make effective use of a data lake:
technology+process+people
© 2017 ThoughtWorks Technologies Pvt. Limited
Functionality Vs Reality
I need a feature so that I can do this action…..
to
I need this insight so that I can take this action….
eg : I need a functionality to order items anytime before or during a promotion…
to
..I need to know on time, if I have to order items anytime before or during a promotion…
so that I can improve promotion sales
People
© 2017 ThoughtWorks Technologies Pvt. Limited
Start Simple
There is no data lake yet…
Carve out portions of data which are easy wins yet critical to
arrive at the earlier stated insight..
Set up the infrastructure and pipeline
Get your hands dirty..
eg: Sales is an important factor to analyse / predict anything in retail space..
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
How much should I know about the data ?
As a consumer of data (read ‘not a consumer of service’)
How much should I know about it?
Schema ⇔ Contracts
Nature of the data versioned vs latest
transactional vs reference
facts vs aggregate
frequency of change
…..
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA INSIGHT - Part 1
Incrementally add
new data to the
lake
Serve data
for analysis
eg: What data wrt promotions do I need to bring into the datalake ??
Sales → improve promotion sales
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
DATA INSIGHT - Part 2
Sales + Promotions → improve promotion sales
How does adding more data to the lake help arriving at new insights..?
history of past promotions sales = how much to order for this promotion
history of past promotion sales + ‘X’ = how much to order for this promotion
history of past promotion sales + ‘X’ + ‘Y’ …… = how much to order for this promotion
eg: seasonality has a strong correlation with sales
history of past promotion sales + ‘X’ + ‘Y’ …… + ‘A’ = how much to order for this promotion after the start
People
© 2017 ThoughtWorks Technologies Pvt. Limited
Think Agile
Sales + Promotions + X factor → improve promotion sales
Near perfect list of
parameters
Progressive set of
parameters
Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time?
Process
© 2017 ThoughtWorks Technologies Pvt. Limited
DataMarts
... as a store of bottled water – cleansed and packaged and
structured for easy consumption
© 2017 ThoughtWorks Technologies Pvt. Limited
DataMarts
... as a store of data subset - curated from meaningful facts
bundled into logical groups for arriving at useful insights
© 2017 ThoughtWorks Technologies Pvt. Limited
Easy Insight
Sales + Promotions →
is the quantity arrived from these factors (known to business) ordered on time?
System : Tells me what is the quantity that is supposed to be ordered
for this promotion..
System : Tells me in realtime what is the quantity that is ordered
Technology
© 2017 ThoughtWorks Technologies Pvt. Limited
Cluster Administration Tool
Cloudera Manager
© 2017 ThoughtWorks Technologies Pvt. Limited
Think DevOps
Scale | Performance | Memory | Resource Contention |
Optimization | Stability |
Need for an ecosystem - to monitor how well the different tools
play together without chaos
Tools
© 2017 ThoughtWorks Technologies Pvt. Limited
QUICK RECAP
What is data in the true sense…
Data Warehouse in an enterprise ecosystem...
What is a data lake...
Data lake implementation in an enterprise ecosystem...
How to make effective use of a data lake…
Cluster Administration tool - Cloudera Manager
© 2017 ThoughtWorks Technologies Pvt. Limited
PITFALLS TO AVOID
Data envy - Ref:https://martinfowler.com/bliki/Datensparsamkeit.html
Tool envy
Reliable data is a luxury
Understanding the nature of data is a must
Dialogue with the data scientist
Treating the data lake like a RDBMS
Keeping the business involved
Data flow state visibility
© 2017 ThoughtWorks Technologies Pvt. Limited

More Related Content

What's hot

Datalake Architecture
Datalake ArchitectureDatalake Architecture
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
Mark Kromer
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
Ricky Barron
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
Richard Vermillion
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
DATAVERSITY
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
sambiswal
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
SnapLogic
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
Tamir Dresher
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 

Viewers also liked

GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNightHyderabad
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 
Understanding the Intelligent Cloud
Understanding the Intelligent CloudUnderstanding the Intelligent Cloud
Understanding the Intelligent Cloud
GeekNightHyderabad
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Institute of Contemporary Sciences
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Geek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming LanguagesGeek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming Languages
GeekNightHyderabad
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overview
AddReality
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Hellmar Becker
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
James Chen
 
Big Data from Small Places
Big Data from Small PlacesBig Data from Small Places
Big Data from Small Places
Initial State
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
Jean-Michel Franco
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017
Jean-Michel Franco
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
 
Blockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured ApproachBlockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured Approach
Cognizant
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of ThingsKeeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
Jennifer Stern
 

Viewers also liked (20)

GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and Akka
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Understanding the Intelligent Cloud
Understanding the Intelligent CloudUnderstanding the Intelligent Cloud
Understanding the Intelligent Cloud
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Geek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming LanguagesGeek Night 16.0 - Evolution of Programming Languages
Geek Night 16.0 - Evolution of Programming Languages
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
AddReality company overview
AddReality company overviewAddReality company overview
AddReality company overview
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
Big Data from Small Places
Big Data from Small PlacesBig Data from Small Places
Big Data from Small Places
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017
 
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
 
Blockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured ApproachBlockchain in Banking: A Measured Approach
Blockchain in Banking: A Measured Approach
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of ThingsKeeping Your Cloud Infrastructure Healthy with the Internet of Things
Keeping Your Cloud Infrastructure Healthy with the Internet of Things
 

Similar to Building a Data Lake - An App Dev's Perspective

David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storage
Veritas Technologies LLC
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
Amazon Web Services
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of data
Veritas Technologies LLC
 
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Deborah Schalm
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
SnapLogic
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Matt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Matt Stubbs
 
Modernizing IT in the Platform Era
Modernizing IT in the Platform EraModernizing IT in the Platform Era
Modernizing IT in the Platform Era
Apcera
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
 
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo
 
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
DevOps.com
 
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Matt Stubbs
 
Enhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case StudyEnhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case Study
Senturus
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Senturus
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Veritas Technologies LLC
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Veritas Technologies LLC
 

Similar to Building a Data Lake - An App Dev's Perspective (20)

David Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storageDavid Noy – Realising the true potential of software-defined storage
David Noy – Realising the true potential of software-defined storage
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Peter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of dataPeter Grimmond – Harnessing the power of data
Peter Grimmond – Harnessing the power of data
 
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
Taking DevOps Monitoring to the Next Level - The 5 Step Guide to Monitoring N...
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Modernizing IT in the Platform Era
Modernizing IT in the Platform EraModernizing IT in the Platform Era
Modernizing IT in the Platform Era
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data Leadership
 
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
5 Steps to Achieving the Single Pane of Glass Across DevOps -- APM, NPM, Metr...
 
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
 
Enhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case StudyEnhancing BI with Predictive Analytics with Case Study
Enhancing BI with Predictive Analytics with Case Study
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
 
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
Mike Palmer of Veritas: Debunking the myths of multi-cloud to achieve 360 Dat...
 

More from GeekNightHyderabad

Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservices
GeekNightHyderabad
 
Metaprogramming ruby
Metaprogramming rubyMetaprogramming ruby
Metaprogramming ruby
GeekNightHyderabad
 
Scaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetesScaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetes
GeekNightHyderabad
 
FreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networksFreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networks
GeekNightHyderabad
 
Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)
GeekNightHyderabad
 
Role of AI & ML in beauty care industry
Role of AI & ML in beauty care industryRole of AI & ML in beauty care industry
Role of AI & ML in beauty care industry
GeekNightHyderabad
 
Breaking down a monolith
Breaking down a monolithBreaking down a monolith
Breaking down a monolith
GeekNightHyderabad
 
Design lean agile_thinking presentation
Design lean agile_thinking presentationDesign lean agile_thinking presentation
Design lean agile_thinking presentation
GeekNightHyderabad
 
Scaling pipelines
Scaling pipelinesScaling pipelines
Scaling pipelines
GeekNightHyderabad
 
Blockchain beyond bitcoin
Blockchain beyond bitcoinBlockchain beyond bitcoin
Blockchain beyond bitcoin
GeekNightHyderabad
 
Http/2
Http/2Http/2
Hardware hacking and internet of things
Hardware hacking and internet of thingsHardware hacking and internet of things
Hardware hacking and internet of things
GeekNightHyderabad
 
Spring to Cloud - REST To Microservices
Spring to Cloud - REST To MicroservicesSpring to Cloud - REST To Microservices
Spring to Cloud - REST To Microservices
GeekNightHyderabad
 
Serverless
ServerlessServerless
Serverless
GeekNightHyderabad
 
Building Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring CloudBuilding Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring Cloud
GeekNightHyderabad
 
Progressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web TechnologiesProgressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web Technologies
GeekNightHyderabad
 
Scaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 UsersScaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 Users
GeekNightHyderabad
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
GeekNightHyderabad
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
GeekNightHyderabad
 
Geek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the InternetGeek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the Internet
GeekNightHyderabad
 

More from GeekNightHyderabad (20)

Testing strategies in microservices
Testing strategies in microservicesTesting strategies in microservices
Testing strategies in microservices
 
Metaprogramming ruby
Metaprogramming rubyMetaprogramming ruby
Metaprogramming ruby
 
Scaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetesScaling enterprise digital platforms with kubernetes
Scaling enterprise digital platforms with kubernetes
 
FreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networksFreedomBox & Community Wi-Fi networks
FreedomBox & Community Wi-Fi networks
 
Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)Rendezvous with aucovei (autonomous connected car)
Rendezvous with aucovei (autonomous connected car)
 
Role of AI & ML in beauty care industry
Role of AI & ML in beauty care industryRole of AI & ML in beauty care industry
Role of AI & ML in beauty care industry
 
Breaking down a monolith
Breaking down a monolithBreaking down a monolith
Breaking down a monolith
 
Design lean agile_thinking presentation
Design lean agile_thinking presentationDesign lean agile_thinking presentation
Design lean agile_thinking presentation
 
Scaling pipelines
Scaling pipelinesScaling pipelines
Scaling pipelines
 
Blockchain beyond bitcoin
Blockchain beyond bitcoinBlockchain beyond bitcoin
Blockchain beyond bitcoin
 
Http/2
Http/2Http/2
Http/2
 
Hardware hacking and internet of things
Hardware hacking and internet of thingsHardware hacking and internet of things
Hardware hacking and internet of things
 
Spring to Cloud - REST To Microservices
Spring to Cloud - REST To MicroservicesSpring to Cloud - REST To Microservices
Spring to Cloud - REST To Microservices
 
Serverless
ServerlessServerless
Serverless
 
Building Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring CloudBuilding Cloud Native Applications Using Spring Boot and Spring Cloud
Building Cloud Native Applications Using Spring Boot and Spring Cloud
 
Progressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web TechnologiesProgressive Web Applications - The Next Gen Web Technologies
Progressive Web Applications - The Next Gen Web Technologies
 
Scaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 UsersScaling a Game Server: From 500 to 100,000 Users
Scaling a Game Server: From 500 to 100,000 Users
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
 
Geek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the InternetGeek Night 15.0 - Touring the Dark-Side of the Internet
Geek Night 15.0 - Touring the Dark-Side of the Internet
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Building a Data Lake - An App Dev's Perspective

  • 1. Building a Data Lake An App Dev’s Perspective GeekNight Hyderabad - March 8th 2017 Geetha Balasundaram geethab@thoughtworks.com © 2017 ThoughtWorks Technologies Pvt. Limited
  • 2. ABOUT ME Developer @ ThoughtWorks Building a data lake in the enterprise ecosystem Helping a retail business make sense of it ( data guided org ) Been part of web development space ( enterprise rewrite ) Equally startled like everyone else by the data engineering space Share know-how’s and do-how’s from our team’s experience snithish@thoughtworks.com © 2017 ThoughtWorks Technologies Pvt. Limited
  • 3. AGENDA What is data in the true sense… Data Warehouse in an enterprise ecosystem... What is a data lake... Data lake implementation in an enterprise ecosystem… How to make effective use of a data lake: technology+process+people Cluster Administration tool - Cloudera Manager Pitfalls to avoid © 2017 ThoughtWorks Technologies Pvt. Limited
  • 4. Question ??? How did R.Ashwin perform in the last Test match? HIGH LEVEL PROBLEM STATEMENT © 2017 ThoughtWorks Technologies Pvt. Limited
  • 5. COMPLEX HISTORICAL DATA Why? Exploit and derive as much new insights as possible Match Made Enterprise systems produce this nature of complexity © 2017 ThoughtWorks Technologies Pvt. Limited
  • 7. DID MICROSERVICES CAUSE THIS PROBLEM ? Decentralised Data https://martinfowler.com/articles/microservices.html © 2017 ThoughtWorks Technologies Pvt. Limited
  • 8. MICROSERVICES HELPED Break down business unit Break down complexity Understand the nature of data © 2017 ThoughtWorks Technologies Pvt. Limited
  • 9. Question ??? R.Ashwin performed well ( 6/41 ) in yesterday’s match! Complex historical data can quantify how well he has performed Can we say why did he do well in this particular match? What factors affected his enhanced performance? © 2017 ThoughtWorks Technologies Pvt. Limited
  • 10. FACT is a FACT … even when we don’t know how it can be used © 2017 ThoughtWorks Technologies Pvt. Limited
  • 12. What is a data lake? © 2017 ThoughtWorks Technologies Pvt. Limited
  • 13. LAKE is... .. a large body of water in a more natural state. The contents of the lake, stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/ © 2017 ThoughtWorks Technologies Pvt. Limited
  • 14. DATA LAKE is... .. a large body of water data facts in a more natural state. The contents of the lake, stream in from a source to fill the lake, and various users of the lake can come to examine analyse, dive in build models, or take samples use subset for specific use cases © 2017 ThoughtWorks Technologies Pvt. Limited
  • 16. Implementation © 2017 ThoughtWorks Technologies Pvt. Limited
  • 17. OUR IMPLEMENTATION - TECH STACK DATA SOURCE DATA INGESTION DATA LAKE DATA MARTS DATA ANALYSIS Staging / Queue © 2017 ThoughtWorks Technologies Pvt. Limited
  • 18. © 2017 ThoughtWorks Technologies Pvt. Limited
  • 19. How to make effective use of a data lake: technology+process+people © 2017 ThoughtWorks Technologies Pvt. Limited
  • 20. Functionality Vs Reality I need a feature so that I can do this action….. to I need this insight so that I can take this action…. eg : I need a functionality to order items anytime before or during a promotion… to ..I need to know on time, if I have to order items anytime before or during a promotion… so that I can improve promotion sales People © 2017 ThoughtWorks Technologies Pvt. Limited
  • 21. Start Simple There is no data lake yet… Carve out portions of data which are easy wins yet critical to arrive at the earlier stated insight.. Set up the infrastructure and pipeline Get your hands dirty.. eg: Sales is an important factor to analyse / predict anything in retail space.. Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 22. How much should I know about the data ? As a consumer of data (read ‘not a consumer of service’) How much should I know about it? Schema ⇔ Contracts Nature of the data versioned vs latest transactional vs reference facts vs aggregate frequency of change ….. Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 23. DATA INSIGHT - Part 1 Incrementally add new data to the lake Serve data for analysis eg: What data wrt promotions do I need to bring into the datalake ?? Sales → improve promotion sales Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 24. DATA INSIGHT - Part 2 Sales + Promotions → improve promotion sales How does adding more data to the lake help arriving at new insights..? history of past promotions sales = how much to order for this promotion history of past promotion sales + ‘X’ = how much to order for this promotion history of past promotion sales + ‘X’ + ‘Y’ …… = how much to order for this promotion eg: seasonality has a strong correlation with sales history of past promotion sales + ‘X’ + ‘Y’ …… + ‘A’ = how much to order for this promotion after the start People © 2017 ThoughtWorks Technologies Pvt. Limited
  • 25. Think Agile Sales + Promotions + X factor → improve promotion sales Near perfect list of parameters Progressive set of parameters Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time? Process © 2017 ThoughtWorks Technologies Pvt. Limited
  • 26. DataMarts ... as a store of bottled water – cleansed and packaged and structured for easy consumption © 2017 ThoughtWorks Technologies Pvt. Limited
  • 27. DataMarts ... as a store of data subset - curated from meaningful facts bundled into logical groups for arriving at useful insights © 2017 ThoughtWorks Technologies Pvt. Limited
  • 28. Easy Insight Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time? System : Tells me what is the quantity that is supposed to be ordered for this promotion.. System : Tells me in realtime what is the quantity that is ordered Technology © 2017 ThoughtWorks Technologies Pvt. Limited
  • 29. Cluster Administration Tool Cloudera Manager © 2017 ThoughtWorks Technologies Pvt. Limited
  • 30. Think DevOps Scale | Performance | Memory | Resource Contention | Optimization | Stability | Need for an ecosystem - to monitor how well the different tools play together without chaos Tools © 2017 ThoughtWorks Technologies Pvt. Limited
  • 31. QUICK RECAP What is data in the true sense… Data Warehouse in an enterprise ecosystem... What is a data lake... Data lake implementation in an enterprise ecosystem... How to make effective use of a data lake… Cluster Administration tool - Cloudera Manager © 2017 ThoughtWorks Technologies Pvt. Limited
  • 32. PITFALLS TO AVOID Data envy - Ref:https://martinfowler.com/bliki/Datensparsamkeit.html Tool envy Reliable data is a luxury Understanding the nature of data is a must Dialogue with the data scientist Treating the data lake like a RDBMS Keeping the business involved Data flow state visibility © 2017 ThoughtWorks Technologies Pvt. Limited