Submit Search
Upload
Big Data for Manager: From Hadoop to Streaming and Beyond
•
1 like
•
525 views
DataWorks Summit/Hadoop Summit
Follow
Big Data for Manager: From Hadoop to Streaming and Beyond
Read less
Read more
Technology
Report
Share
Report
Share
1 of 40
Download now
Download to read offline
Recommended
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
Amazon Web Services
Introduction to big data and apache spark
Introduction to big data and apache spark
Mohammed Guller
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
Recommended
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
Seeling Cheung
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
Amazon Web Services
Introduction to big data and apache spark
Introduction to big data and apache spark
Mohammed Guller
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
Jack (Yaakov) Bezalel
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
Big Data Discovery
Big Data Discovery
Harald Erb
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
DataWorks Summit
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
Data Lake Architecture
Data Lake Architecture
DATAVERSITY
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
Making Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
Agilisium Consulting
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
SnapLogic
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
Cascalog internal dsl_preso
Cascalog internal dsl_preso
Hadoop User Group
Hdfs high availability
Hdfs high availability
Hadoop User Group
More Related Content
What's hot
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
SnapLogic
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
Jack (Yaakov) Bezalel
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
Big Data Discovery
Big Data Discovery
Harald Erb
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
DataWorks Summit
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
Data Lake Architecture
Data Lake Architecture
DATAVERSITY
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
Making Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
Agilisium Consulting
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
SnapLogic
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Cloudera, Inc.
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
What's hot
(20)
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
Big Data Discovery
Big Data Discovery
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Data Lake Architecture
Data Lake Architecture
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Making Big Data Easy for Everyone
Making Big Data Easy for Everyone
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Viewers also liked
Cascalog internal dsl_preso
Cascalog internal dsl_preso
Hadoop User Group
Hdfs high availability
Hdfs high availability
Hadoop User Group
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
Hadoop User Group
Pig at Linkedin
Pig at Linkedin
Hadoop User Group
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
Yahoo Developer Network
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
Yahoo Developer Network
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
Yahoo Developer Network
Common crawlpresentation
Common crawlpresentation
Hadoop User Group
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
Yahoo Developer Network
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
Yahoo Developer Network
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Hadoop User Group
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Hadoop User Group
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Hadoop User Group
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
Yahoo Developer Network
Viewers also liked
(14)
Cascalog internal dsl_preso
Cascalog internal dsl_preso
Hdfs high availability
Hdfs high availability
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
Pig at Linkedin
Pig at Linkedin
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
Common crawlpresentation
Common crawlpresentation
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
January 2011 HUG: Kafka Presentation
January 2011 HUG: Kafka Presentation
Similar to Big Data for Manager: From Hadoop to Streaming and Beyond
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
DataWorks Summit
From lots of reports (with some data Analysis) to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) to Massive Data Analysis (Wit...
Mark Rittman
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
Introduction to Big Data
Introduction to Big Data
Mohammed Guller
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Mark Rittman
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
Linked Enterprise Date Services
Amazon QuickSight
Amazon QuickSight
Amazon Web Services
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
Datameer
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
DATAVERSITY
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
Embarcadero Technologies
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Jonathan Raspaud
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
Enterprise Management Associates
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Databricks
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Eric Kavanagh
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
DATAVERSITY
Similar to Big Data for Manager: From Hadoop to Streaming and Beyond
(20)
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
From lots of reports (with some data Analysis) to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) to Massive Data Analysis (Wit...
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Introduction to Big Data
Introduction to Big Data
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
Amazon QuickSight
Amazon QuickSight
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Democratizing Apache Spark for the Enterprise with Jonathan Gole
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
null - The Open Security Community
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
Neo4j
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
The transition to renewables in India.pdf
The transition to renewables in India.pdf
Competition Advisory Services (India) LLP
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
null - The Open Security Community
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Recently uploaded
(20)
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
The transition to renewables in India.pdf
The transition to renewables in India.pdf
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Big Data for Manager: From Hadoop to Streaming and Beyond
1.
Big Data for Managers: From Hadoop to Streaming and Beyond Dr. Vladimir Bacvanski vladimir.bacvanski@scispike.com @OnSo5ware
2.
www.scispike.com Copyright © SciSpike 2016 Dr. Vladimir Bacvanski § Founder of
SciSpike, a development, consulting, and training firm § Passionate about software and data § PhD in computer science RWTH Aachen, Germany § Architect, consultant, mentor § Custom development: Scalable Web and IoT systems § Training and mentoring in Big Data, Scala, node.js, software architecture @OnSoftware https://www.linkedin.com/in/vladimirbacvanski
3.
www.scispike.com Copyright © SciSpike 2016 Problems with Rela9onal Stores § Data that does not naturally fit into tables à Impedance mismatch § Development Eme o5en to long §
Dealing with unstructured data § Performance problems § Difficult to run on clusters § Cost 3
4.
www.scispike.com Copyright © SciSpike 2016 Structured and Unstructured Data Sources Structured Data Sources • ExisEng databases • ERP/CRM/BI systems • Inventory • Supply chain Unstructured Data Sources • Server logs • Search engine logs • Browsing logs • E-Commerce records • Social media • Voice • Video • Sensor data 4
5.
www.scispike.com Copyright © SciSpike 2016 NoSQL Impact 5 Disks Processors x1000 x1000 x1000 Cost / Performance 1M
1B 1T 1Q …HUGE!!! x1000 Rela9onal Database Big Data + NoSQL Tomorrow - Volume is out of reach Today - Doable, but expensive and slow Stabilize Cost & Increase Performance Enable Unlimited Volume Growth
6.
www.scispike.com Copyright © SciSpike 2016 Scale Up vs. Scale Out 6 Capability Cost Scale Up Capability Cost Scale Out
7.
www.scispike.com Copyright © SciSpike 2016 A Common PaNern for Processing Large Data Load a large set of records onto a set of machines Extract something interesEng from each record Shuffle and sort intermediate results Aggregate intermediate results Store end result 7 "Map" "Reduce" Key/Value pairs
8.
www.scispike.com Copyright © SciSpike 2016 Two Key Aspects of Hadoop § MapReduce framework – How Hadoop understands and assigns work to the nodes (machines) § Hadoop Distributed File System = HDFS – Where Hadoop stores data – A file system that spans all the nodes in a Hadoop cluster – It links together the file systems on many local nodes to make them into one big file system 8
9.
www.scispike.com Copyright © SciSpike 2016 MapReduce Example: Word Count § WordCount is the "Hello World" of Big Data – You will see various technologies implemenEng it – A good first step to compare the expressiveness of Big Data tools 9 dog cat
bird dog cat bird dog dog cat dog, 1 cat, 1 bird, 1 dog, 1 cat, 1 bird, 1 dog, 1 dog, 1 cat, 1 Map dog, 1 dog, 1 dog, 1 dog, 1 cat, 1 cat, 1 cat, 1 bird, 1 bird, 1 Shuffle dog, 4 cat, 3 bird, 2 Reduce dog cat bird dog cat bird dog dog cat pets.txt dog, 4 cat, 3 bird, 2 pet_freq.txt
10.
www.scispike.com Copyright © SciSpike 2016 10 The MapReduce Programming Model § "Map" step: – Input split into pieces –
Worker nodes process individual pieces in parallel (under global control of the Job Tracker node) – Each worker node stores its result in its local file system where a reducer is able to access it § "Reduce" step: – Data is aggregated (‘reduced” from the map steps) by worker nodes (under control of the Job Tracker) – MulEple reduce tasks can parallelize the aggregaEon 10
11.
www.scispike.com Copyright © SciSpike 2016 Separa9on of Work Programmers • Map • Reduce Framework • Deals with fault tolerance • Assign workers to map and reduce tasks • Moves processes to data • Shuffles and sorts intermediate data • Deals with errors 11
12.
www.scispike.com Copyright © SciSpike 2016 How To Create MapReduce Jobs § Java API – Low level, very flexible – Time consuming development § Streaming API – A simple, producEve model for Python and Ruby §
Hive – Open source language / Apache sub-project – Provides a SQL-like interface to Hadoop § Pig – Data flow language / Apache sub-project 15
13.
www.scispike.com Copyright © SciSpike 2016 The Big Picture: NoSQL + Hadoop in Applica9ons 16 Columnar Price updates Logs Document Product info Graph Customer Agent relaFon- ships RDB XA data Hadoop Oper. analyFcs Price analyFcs Key/Value Session data ApplicaFons
14.
www.scispike.com Copyright © SciSpike 2016 Streaming: A New Paradigm § ConvenEonal processing: sta9c data DataQueries Results §
Real-time processing: streaming data QueriesData Results 17
15.
www.scispike.com Copyright © SciSpike 2016 Common Streaming Applica9ons § PersonalizaEon § Search §
Revenue opEmizaEon § User events § Content feeds § Log processing § Monitoring § RecommendaEons § Ads § Notable users: – Twiper – Yahoo – SpoEfy – Cisco – Flickr – Weather Channel 18
16.
www.scispike.com Copyright © SciSpike 2016 Beyond Hadoop: Spark & Flink 19 MapReduce Tez Spark Flink
17.
www.scispike.com Copyright © SciSpike 2016 Apache Spark § Important Features – In Memory Data – Resilient Distributed Datasets (RDDs) • Datasets can rebuild themselves if failure occurs – Rich set of operators §
Efficient: – 10x (on Disk) -100x (In Memory) faster than Hadoop MR – 2 to 5 Emes less code (Rich APIs in Scala/Java/Python) 20
18.
www.scispike.com Copyright © SciSpike 2016 Spark Architecture § A powerful set of tools § Beyond tradiEonal Hadoop Source: hpp://spark.apache.org
19.
www.scispike.com Copyright © SciSpike 2016 Data Sharing in Apache Spark H D F S IteraFon 1 Result 1 Held In Cluster Memory IteraFon 2 Result 2 Held In Cluster Memory Query 1 Query 2
20.
www.scispike.com Copyright © SciSpike 2016 Apache Flink § ExecuEon: – Programs compiled into an execuEon plan –
Plan is opEmized – Executed § Design goals: – High performance – Hybrid batch and streaming runEme – Simplicity for the developer – Rich libraries – IntegraEon with many systems 23
21.
www.scispike.com Copyright © SciSpike 2016 Apache Flink Components § IntegraEon with Hadoop YARN, MapReduce, HBase, Cassandra, Kara, … § ExecuEon engine for Apache Beam (Google Dataflow) 24
22.
www.scispike.com Copyright © SciSpike 2016 Flink Op9miza9on and Execu9on § OpEmizer selects an execuEon plan § Similar to what we have in relaEonal databases §
OpEmal plan depends on the size of the input files § Run as standalone or on top of Hadoop § IntegraEon with many Hadoop technologies 25
23.
www.scispike.com Copyright © SciSpike 2016 Flink & Spark: The Advantages and Outlook § Less IO overhead than convenEonal Hadoop § Caching §
IteraEve algorithms § Unifying batch and stream compuEng § Scala as a natural, expressive language for Big Data – Other languages: Python, Java, R § Beware of less mature components 26
24.
www.scispike.com Copyright © SciSpike 2016 Typical NoSQL Systems § Non-relaKonal § Distributed §
Horizontally scalable § No need for a fixed schema § Several established players § Systems are specialized 27
25.
www.scispike.com Copyright © SciSpike 2016 NoSQL Stores and Their Categories § Choose a store that is a best match for your applicaEon § It is fine to have several different stores used – "Polyglot persistence" 28 k
v Key-Value Column- Family Document- Oriented Graph DB
26.
www.scispike.com Copyright © SciSpike 2016 NoSQL Stores: Scale vs. Complexity of Data 29 k v Key-Value Column- Family Document- Oriented complexity scalability Graph DB needs of most applicaFons
27.
www.scispike.com Copyright © SciSpike 2016 Key-Value Stores § Key à Value mapping § Large, persistent Map ("hashtable") – Values could be lists and hashes §
Easy to use § Scale very well § Data model may be too simple for most applicaEons § Systems: – Redis, Riak, Memcached, Amazon DynamoDB, Aerospike, FoundaEonDB § Use when data model is very simple and scalability essenEal 30
28.
www.scispike.com Copyright © SciSpike 2016 Typical Use Cases § The data model is very simple! – Actual data can be JSON § Session data §
User preferences and profiles § Shopping cart § If other NoSQL store is good enough, you may want to skip this and let Column or Document store handle it 31
29.
www.scispike.com Copyright © SciSpike 2016 Column-Family § "Column-family": similar to a table – Table is sparse § Key à (Column:Value)* §
Columns have names § Can be indexed § Can store complex data – Denormalize! § Systems: – Google BigTable, HBase, Cassandra, Amazon SimpleDB, Hypertable § Use when scalability is essenEal 32
30.
www.scispike.com Copyright © SciSpike 2016 Typical Use Cases § High insert volume: logging § Real-Eme updates §
Content management § Expiring content § Cross-datacenter replicaEon § MapReduce analyEcs over stored data § You don’t need convenEonal (ACID) transacEons 33
31.
www.scispike.com Copyright © SciSpike 2016 Document Stores § JSON, BSON, XML § No schema §
Indexes improve performance § Easy transiEon from RDBMS § Systems – MongoDB, CouchDB, CouchBase § Use when data is in semi-structured form § O5en seen in new Web applicaEons 34
32.
www.scispike.com Copyright © SciSpike 2016 Typical Use Cases § Logging – Especially with variable content § Product informaEon §
Customer informaEon § Content management § Data to be stored has format that varies over Eme – Flexible schema § Web analyEcs 35
33.
www.scispike.com Copyright © SciSpike 2016 Graph Databases § Nodes with properEes § Nodes connected through relaEonships §
Can model very complex graph data – Social networks § Systems: – Neo4J, Infinite Graph, TitanDB, OrientDB § Use when data is a (complex) graph 36
34.
www.scispike.com Copyright © SciSpike 2016 Typical Use Cases § Highly interconnected data § Social graphs §
Party relaEonships in an enterprise § LocaEon based services § Purchasing analyEcs and recommendaEons § O5en combined with other systems to store the bulk of data – Graph database can focus on relaEonships 37
35.
www.scispike.com Copyright © SciSpike 2016 Integra9ng Rela9onal, Streams, and Hadoop Streams Data + Big Data TradiEonal Warehouse In-MoEon AnalyEcs Data analyEcs Results Database & Warehouse At-rest data analyEcs Results Ultra Low Latency Results TradiEonal / RelaEonal Data Sources Non-TradiEonal / Non-RelaEonal Data Sources Varied data formats Semi-structured, unstructured... Event System NoSQL 38
36.
www.scispike.com Copyright © SciSpike 2016 Merge Results Lambda Architecture 39 Event (Speed) Layer Real Time Data Batch Layer Serving Layer Master Dataset Batch View Incoming Data Real Time Update Batch Update Queries Rolling Values
37.
www.scispike.com Copyright © SciSpike 2016 Master Data Management and Governance § Big Data and NoSQL stores can easily become a bigger mess than relaEonal stores § Introduce a pracEcal plan – Avoid lengthy and cumbersome governance – Actual use should be the driving force – Start slow §
Be ready for change – The technologies change rapidly § Focus on business outcomes 40
38.
www.scispike.com Copyright © SciSpike 2016 Succeeding with Big Data and NoSQL 1. AcEvely look for soluEons where the right store can ease the pain 2. Make sure you deliver tangible value to clients 3.
A5er you get your first apps to work: create a Big Data introducEon and governance plan 4. PrioriEze: do the most useful thing for the business first 5. Integrate with exisEng IT 6. Make sure you hire or grow your Big Data champions 7. Field is immature: look out for new tools and techniques 41
39.
www.scispike.com Copyright © SciSpike 2016 Conclusions – Hadoop and NoSQL address the weak points of relaEonal systems: • Scale • Performance •
Unstructured and semistructured data – Streaming addresses the processing of data in real-Eme – Integrate with convenEonal technologies! – Spark and Flink: the next generaEon Big Data systems 42
40.
QuesKons?
Download now