Submit Search
Upload
Filling the Data Lake
•
2 likes
•
2,148 views
DataWorks Summit/Hadoop Summit
Follow
Filling the Data Lake
Read less
Read more
Technology
Report
Share
Report
Share
1 of 31
Download now
Download to read offline
Recommended
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
DataWorks Summit/Hadoop Summit
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
Recommended
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
DataWorks Summit/Hadoop Summit
Data-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Capgemini
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
Hortonworks
Apache hive essentials
Apache hive essentials
Steve Tran
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
DataWorks Summit/Hadoop Summit
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
How to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
More Related Content
What's hot
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Capgemini
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
Hortonworks
Apache hive essentials
Apache hive essentials
Steve Tran
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
DataWorks Summit/Hadoop Summit
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
What's hot
(20)
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
Apache hive essentials
Apache hive essentials
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
Viewers also liked
How to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
Big data architectures and the data lake
Big data architectures and the data lake
James Serra
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
DataWorks Summit/Hadoop Summit
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming Pipelines
DataWorks Summit/Hadoop Summit
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
DataWorks Summit/Hadoop Summit
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
DataWorks Summit/Hadoop Summit
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
DataWorks Summit/Hadoop Summit
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
DataWorks Summit/Hadoop Summit
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
Viewers also liked
(20)
How to build a successful Data Lake
How to build a successful Data Lake
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Big data architectures and the data lake
Big data architectures and the data lake
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming Pipelines
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Big Data Ready Enterprise
Big Data Ready Enterprise
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Similar to Filling the Data Lake
Big data for product managers
Big data for product managers
AIPMM Administration
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
Big Data for Product Managers
Big Data for Product Managers
Pentaho
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
Subramanian Senthamarai Kannan
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
Datameer
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sector
Michael Haddad
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
Mark Kromer
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BI
Inside Analysis
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
CA Technologies
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?
CA Technologies
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
Big data an elephant business opportunities
Big data an elephant business opportunities
Bigdata Meetup Kochi
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
Hadoop uk user group meeting final
Hadoop uk user group meeting final
Skills Matter
Similar to Filling the Data Lake
(20)
Big data for product managers
Big data for product managers
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Big Data for Product Managers
Big Data for Product Managers
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
Big Data for BI - Beyond the Hype - Pentaho
Big Data for BI - Beyond the Hype - Pentaho
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour LONDON: Analysing the Internet of Things: Davy Nys, Pen...
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
MongoDB IoT City Tour STUTTGART: Analysing the Internet of Things. By, Pentaho
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sector
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
Five Critical Success Factors for Big Data and Traditional BI
Five Critical Success Factors for Big Data and Traditional BI
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Big Data, Big Picture: Can You See It?
Big Data, Big Picture: Can You See It?
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
Big data an elephant business opportunities
Big data an elephant business opportunities
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Hadoop uk user group meeting final
Hadoop uk user group meeting final
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
V3cube
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Recently uploaded
(20)
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Filling the Data Lake
1.
Filling the Data Lake June
29, 2016 Chuck Yarbrough Sr Director, Solutions Marketing and Management @cyarbrough Mark Burnette Enterprise Sales Engineer @MarkCBurnette
2.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75552 Emerging Big Data Use Cases Improve operational effectiveness Machines/sensors: predict failures, network attacks Financial risk management: reduce fraud, increase security Reduce data warehouse cost Improve customer experience Build a 360° view to fully understand and serve the customer Drive personalized and adjusted interaction Use automated recommendations logic Drive incremental revenue Predict customer behavior across all channels Understand and monetize customer behavior Begin to monetize data as a service
3.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75553 Spectrum of Big Data Use Cases Entry Transform Advanced Optimize Data Warehouse Optimization Streamlined Data Refinery Big Data Exploration Customer 360 Degree View Harnessing Machine & Sensor Data Next Generation Applications Internal Big Data as a Service On-Demand Big Data Blending Big Data Predictive Analytics Use Case Complexity BusinessImpact Monetize My Data Data Warehouse Optimization Data Warehouse Optimization Streamlined Data Refinery 360 Degree View Big Data Onboarding Filling the Data Lake
4.
What Does Pentaho Do?
5.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75555 Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation Data Pipeline Data Engineering Managing and Automating the Pipeline Data Engineering AnalyticsData Preparation Data Lake
6.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75556 The Data Swamp
7.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75557 The Data Lake
8.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75558 Does Hadoop Have to be Hard? Empower team members to integrate and process Hadoop Data Establish a modern data on boarding process that is flexible and scalable Deliver governed analytic insights for large production use bases Things that can help ease the pain
9.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-75559 Proper Care and Feeding of the Data Lake
10.
Data Onboarding Challenges
11.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755511 More Data, More Problems Even with good integration tools, major data onboarding projects can be painful: User Challenges § Repetitive manual design § Very time-consuming § Difficult to maintain Business Challenges § Takes too long § Business deadlines at risk § Opportunity cost
12.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755512 How do we effectively scale data pipelines to accommodate exploding data sources, volumes, and complexity? More Data, More Problems Have you ever had the pleasure of… Migrating hundreds of sources between systems? Enabling business users to onboard a variety of data themselves? Ingesting hundreds of changing data sources into Hadoop?
13.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755513 More Data, More Problems Modern data onboarding is more than just “dumping data” – it includes: Managing a changing array of data sources Establishing repeatable processes at scale Maintaining control and governance
14.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755514 CSV RDBMS Data On Boarding Filling the Data Lake Ingest Procedures Disparate Data Sources Integration Processes Transformations Hadoop AVRO
15.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755515 CSVCSV RDBMS Data On Boarding at Scale RDBMS Disparate Data Sources Integration Processes Transformations RDBMS Ingest Procedures Hadoop AVRO
16.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755516 Filling the Data Lake A Modern Data Onboarding Blueprint Streamline data ingest from wide variety of source data Reduce dependence on hard coded data movement procedures Simplify regular data movement at scale into Data Lake
17.
Template-based Approach
18.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755518 CSVCSV RDBMS Dynamic ELT Ingest Templates Hadoop RDBMS Disparate Data Sources Dynamic Integration Processes Dynamic Transformations RDBMS Pass metadata in at run time to generate jobs on the fly (metadata injection)
19.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755519 CSV CSV RDBMS Templated workflows RDBMS -> AVRO Template Hadoop RDBMS Disparate Data Sources Dynamic Integration Processes Dynamic Transformations RDBMS CSV -> AVRO Template CSV -> HDFS Template
20.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755520 Variety – different metadata, one template Hadoop Disparate Data Sources Dynamic Integration Processes Dynamic Transformations CSV -> AVRO Template
21.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755521 Key Takeaway Managing ELT and ELT procedures Managing Metadata Metadata Injection
22.
Metadata Acquisition
23.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755523 RDBMS Ingestion Automated Metadata Extraction Extract table and store in AVRO § Database connection details § Table(s) § Field names (if available) § Data types § String length § Mask for numbers and dates § …
24.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755524 Option 1: Ingest RAW files into HDFS (no parsing) § Path to CSVs CSV Ingestion Option 2: Parse and store in AVRO § Path to CSVs § Delimiter § Field names (if available) § Data types § String length § Mask for numbers and dates § … Automated Metadata Extraction
25.
Demonstration
26.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755526 Key Takeaway ELT development DAYS Provisioning MINUTES Automated Metadata Extraction
27.
Summary
28.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755528 Key Takeaways Template-based Data Integration Manage metadata vs. ELT procedures Automated Metadata Extraction Provide minimum required configuration Reduce Risk Maintain an organized, standardized, & clean, data lake Data Onboarding Blueprint
29.
© 2015, Pentaho.
All rights reserved. pentaho.com. Worldwide +1 (866) 660-755529 Learn more about Big Data Onboarding at Pentaho.com Download Pentaho Platform at Pentaho.com What Next?
30.
Q&A
31.
Thank You
Download now