Submit Search
Upload
NYC HUG - Application Architectures with Apache Hadoop
•
4 likes
•
705 views
M
markgrover
Follow
Presentation at NYC HUG on Application Architectures with Apache Hadoop
Read less
Read more
Technology
Report
Share
Report
Share
1 of 48
Download now
Download to read offline
Recommended
Applications on Hadoop
Applications on Hadoop
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
Intro to hadoop tutorial
Intro to hadoop tutorial
markgrover
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
Recommended
Applications on Hadoop
Applications on Hadoop
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
Intro to hadoop tutorial
Intro to hadoop tutorial
markgrover
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Introduction to Impala
Introduction to Impala
markgrover
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014
hadooparchbook
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
larsgeorge
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
markgrover
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
hadooparchbook
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
DataWorks Summit
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
Apekshit Sharma
More Related Content
What's hot
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Introduction to Impala
Introduction to Impala
markgrover
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014
hadooparchbook
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
larsgeorge
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
markgrover
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
hadooparchbook
What's hot
(20)
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
Introduction to Impala
Introduction to Impala
The Impala Cookbook
The Impala Cookbook
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
Application architectures with Hadoop and Sessionization in MR
Application architectures with Hadoop and Sessionization in MR
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
Viewers also liked
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
DataWorks Summit
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
Apekshit Sharma
Hadoop Tutorial
Hadoop Tutorial
awesomesos
Fraud Detection with Hadoop
Fraud Detection with Hadoop
markgrover
Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training Webinar
Cloudera, Inc.
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
Spark徹底入門 #cwt2015
Spark徹底入門 #cwt2015
Cloudera Japan
Apache Hive - Introduction
Apache Hive - Introduction
Muralidharan Deenathayalan
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Daniel Madrigal
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Cloudera, Inc.
Inside Flume
Inside Flume
Cloudera, Inc.
Introduction to Data Analyst Training
Introduction to Data Analyst Training
Cloudera, Inc.
Introduction to Apache HBase Training
Introduction to Apache HBase Training
Cloudera, Inc.
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
IMC Institute
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
DataWorks Summit
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
Introduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
Viewers also liked
(20)
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
Hadoop Tutorial
Hadoop Tutorial
Fraud Detection with Hadoop
Fraud Detection with Hadoop
Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training Webinar
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Spark徹底入門 #cwt2015
Spark徹底入門 #cwt2015
Apache Hive - Introduction
Apache Hive - Introduction
Hadoop Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
SQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Inside Flume
Inside Flume
Introduction to Data Analyst Training
Introduction to Data Analyst Training
Introduction to Apache HBase Training
Introduction to Apache HBase Training
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Operations
Hadoop Operations
Introduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Similar to NYC HUG - Application Architectures with Apache Hadoop
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
hadooparchbook
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
hadooparchbook
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
Video Analysis in Hadoop
Video Analysis in Hadoop
DataWorks Summit
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Daniel Zivkovic
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Group
hadooparchbook
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Cloudera, Inc.
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdf
Alex446314
Architecting next generation big data platform
Architecting next generation big data platform
hadooparchbook
Azure Service Fabric Mesh
Azure Service Fabric Mesh
Udaiappa Ramachandran
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
CloudVillage
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
Apache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
Similar to NYC HUG - Application Architectures with Apache Hadoop
(20)
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
Video Analysis in Hadoop
Video Analysis in Hadoop
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Group
Fraud Detection using Hadoop
Fraud Detection using Hadoop
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdf
Architecting next generation big data platform
Architecting next generation big data platform
Azure Service Fabric Mesh
Azure Service Fabric Mesh
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
Keynote - Cloudy Vision: How Cloud Integration Complicates Security
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Apache Accumulo Overview
Apache Accumulo Overview
More from markgrover
From discovering to trusting data
From discovering to trusting data
markgrover
Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020
markgrover
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
markgrover
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsen
markgrover
Amundsen gremlin proxy design
Amundsen gremlin proxy design
markgrover
Amundsen: From discovering to security data
Amundsen: From discovering to security data
markgrover
Amundsen: From discovering to security data
Amundsen: From discovering to security data
markgrover
Data Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
markgrover
Data Discovery and Metadata
Data Discovery and Metadata
markgrover
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
markgrover
Disrupting Data Discovery
Disrupting Data Discovery
markgrover
TensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache Beam
markgrover
Big Data at Speed
Big Data at Speed
markgrover
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
Dogfooding data at Lyft
Dogfooding data at Lyft
markgrover
Fighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache Spot
markgrover
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
Introduction to Hive and HCatalog
Introduction to Hive and HCatalog
markgrover
Hadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
More from markgrover
(20)
From discovering to trusting data
From discovering to trusting data
Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020
Amundsen at Brex and Looker integration
Amundsen at Brex and Looker integration
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsen
Amundsen gremlin proxy design
Amundsen gremlin proxy design
Amundsen: From discovering to security data
Amundsen: From discovering to security data
Amundsen: From discovering to security data
Amundsen: From discovering to security data
Data Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
Data Discovery and Metadata
Data Discovery and Metadata
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
Disrupting Data Discovery
Disrupting Data Discovery
TensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache Beam
Big Data at Speed
Big Data at Speed
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
Dogfooding data at Lyft
Dogfooding data at Lyft
Fighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache Spot
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
Introduction to Hive and HCatalog
Introduction to Hive and HCatalog
Hadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
Recently uploaded
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Principled Technologies
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
The Digital Insurer
Recently uploaded
(20)
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
NYC HUG - Application Architectures with Apache Hadoop
1.
1 ApplicaAon Architectures
with Apache Headline Hadoop Goes Here Mark Grover | @mark_grover Speaker Name or Subhead Goes Here NYC HUG slideshare.com/markgrover October 14th, 2014 DO NOT USE PUBLICLY PRIOR TO 10/23/12 ©2014 Cloudera, Inc. All Rights Reserved.
2.
About Me •
CommiPer on Apache Bigtop, commiPer and PPMC member on Apache Sentry (incubaAng). • Contributor to Hadoop, Hive, Spark, Sqoop, Flume. • SoWware developer at Cloudera • @mark_grover 2 ©2014 Cloudera, Inc. All Rights Reserved.
3.
Co-‐authoring O’Reilly book
• @hadooparchbook • hadooparchitecturebook.com • Strata Hadoop World Tutorial • at 9 AM tomorrow ©2014 Cloudera, Inc. All Rights 3 Reserved.
4.
4 Case Study
Click Stream Analysis ©2014 Cloudera, Inc. All Rights Reserved.
5.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 5 Reserved.
6.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 6 Reserved.
7.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 7 Reserved.
8.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 8 Reserved.
9.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 9 Reserved.
10.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 10 Reserved.
11.
AnalyAcs ©2014 Cloudera,
Inc. All Rights 11 Reserved.
12.
Web Logs –
Combined Log Format 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 244.157.45.12 - - [17/Oct/2014:21:59:59 ] "GET /Store/cart.jsp? productID=1023 HTTP/1.0" 200 3757 "http://www.casualcyclist.com" "Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; HTC Vision Build/ GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1” ©2014 Cloudera, Inc. All Rights Reserved. 12
13.
Clickstream AnalyAcs ©2014
Cloudera, Inc. All Rights Reserved. 244.157.45.12 - - [17/Oct/ 2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http:// bestcyclingreviews.com/ top_online_shops" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 36.0.1944.0 Safari/537.36” 13
14.
Similar use-‐cases •
Sensors – heart, agriculture, etc. • Casinos – session of a person at a table ©2014 Cloudera, Inc. All Rights 14 Reserved.
15.
Challenges of Hadoop
ImplementaAon ©2014 Cloudera, Inc. All Rights 15 Reserved.
16.
Challenges of Hadoop
ImplementaAon ©2014 Cloudera, Inc. All Rights 16 Reserved.
17.
Other challenges -‐
Architectural ConsideraAons • Storage managers? • HDFS? HBase? • Data storage and modeling: • File formats? Compression? Schema design? • Data movement • How do we actually get the data into Hadoop? How do we get it out? • Metadata • How do we manage data about the data? • Processing • How can we transform it? How do we query it? • OrchestraAon • How do we manage the workflow for all of this? ©2014 Cloudera, Inc. All Rights 17 Reserved.
18.
18 2. Processing
Since that’s all what the Ame allows todayJ ©2014 Cloudera, Inc. All Rights Reserved.
19.
Processing • De-‐duplicaAon
• Filtering • SessionizaAon 19 ©2014 Cloudera, Inc. All Rights Reserved.
20.
DeduplicaAon – remove
duplicate records 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” ©2014 Cloudera, Inc. All Rights 20 Reserved.
21.
Filtering – filter
out invalid records 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 244.157.45.12 - - [17/Oct/2014:21:59:59 ] "GET /Store/cart.jsp? productID=1023 HTTP/1.0" 200 3757 "http://www.casualcyclist.com" "Mozilla/5.0 (Linux; U… ©2014 Cloudera, Inc. All Rights 21 Reserved.
22.
SessionizaAon Website visit
©2014 Cloudera, Inc. All Rights 22 Reserved. Visitor 1 Session 1 Visitor 1 Session 2 Visitor 2 Session 1 > 30 minutes
23.
Why sessionize? Helps
answers quesAons like: • What is my website’s bounce rate? • i.e. how many % of visitors don’t go past the landing page? • Which markeAng channels (e.g. organic search, display ad, etc.) are leading to most sessions? • Which ones of those lead to most conversions (e.g. people buying things, signing up, etc.) • Do aPribuAon analysis – which channels are responsible for most conversions? 23 ©2014 Cloudera, Inc. All Rights Reserved.
24.
SessionizaAon 244.157.45.12 -
- [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 165 244.157.45.12 - - [17/Oct/2014:21:59:59 ] "GET /Store/cart.jsp? productID=1023 HTTP/1.0" 200 3757 "http://www.casualcyclist.com" "Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; HTC Vision Build/ GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1” 166 ©2014 Cloudera, Inc. All Rights 24 Reserved.
25.
How to Sessionize?
1. Given a list of clicks, determine which clicks came from the same user 2. Given a parAcular user's clicks, determine if a given click is a part of a new session or a conAnuaAon of the previous session 25 ©2014 Cloudera, Inc. All Rights Reserved.
26.
#1 – Which
clicks are from same user? • We can use: • IP address (244.157.45.12) • Cookies (A9A3BECE0563982D) • IP address (244.157.45.12)and user agent string ((KHTML, like Gecko) Chrome/36.0.1944.0 Safari/ 537.36") 26 ©2014 Cloudera, Inc. All Rights Reserved.
27.
#1 – Which
clicks are from same user? • We can use: • IP address (244.157.45.12) • Cookies (A9A3BECE0563982D) • IP address (244.157.45.12)and user agent string ((KHTML, like Gecko) Chrome/36.0.1944.0 Safari/ 537.36") 27 ©2014 Cloudera, Inc. All Rights Reserved.
28.
#1 – Which
clicks are from same user? 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 244.157.45.12 - - [17/Oct/2014:21:59:59 ] "GET /Store/cart.jsp? productID=1023 HTTP/1.0" 200 3757 "http://www.casualcyclist.com" "Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; HTC Vision Build/ GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1” ©2014 Cloudera, Inc. All Rights 28 Reserved.
29.
#2 – Which
clicks part of the same session? 244.157.45.12 - - [17/Oct/2014:21:08:30 ] "GET /seatposts HTTP/1.0" 200 4463 "http://bestcyclingreviews.com/top_online_shops" "Mozilla/ 5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36” 244.157.45.12 - - [17/Oct/2014:21:59:59 ] "GET /Store/cart.jsp? productID=1023 HTTP/1.0" 200 3757 "http://www.casualcyclist.com" "Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; HTC Vision Build/ GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1” ©2014 Cloudera, Inc. All Rights 29 Reserved. > 30 mins apart = different sessions
30.
30 SessionizaAon in
MapReduce github.com/hadooparchitecturebook/clickstream-‐tutorial ©2014 Cloudera, Inc. All Rights Reserved.
31.
SessionizaAon in MapReduce
log IP2, log log lines 31 ©2014 Cloudera, Inc. All Rights Reserved. Map Reduce Reduce Log line IP1, log lines IP1, lines Log line, session ID Map Map Log line Log line IP2, lines Log line, session ID
32.
Mapper for SessionizaAon
32 ©2014 Cloudera, Inc. All Rights Reserved. public static class SessionizeMapper extends Mapper<Object, Text, IpTimestampKey, Text> { private Matcher logRecordMatcher; public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { logRecordMatcher = logRecordPattern.matcher(value.toString()); // We only emit something out if the record matches with our regex. Otherwise, we assume the record is busted and simply ignore it if (logRecordMatcher.matches()) { String ip = logRecordMatcher.group(1); DateTime timestamp = DateTime.parse(logRecordMatcher.group(2), TIMESTAMP_FORMATTER); Long unixTimestamp = timestamp.getMillis(); IpTimestampKey outputKey = new IpTimestampKey(ip, unixTimestamp); context.write(outputKey, value); } } }
33.
Reducer for SessionizaAon
public static class SessionizeReducer 33 ©2014 Cloudera, Inc. All Rights Reserved. extends Reducer<IpTimestampKey, Text, IpTimestampKey, Text> { private Text result = new Text(); public void reduce(IpTimestampKey key, Iterable<Text> values, Context context ) throws IOException, InterruptedException { // The sessionId generated here is per day, per IP. So, any queries // that will be done as if this session ID were global, would require // a combination of the day in question and IP as well. String sessionId = null; Long lastTimeStamp = null; for (Text value : values) { String logRecord = value.toString();
34.
Reducer for SessionizaAon
// If this is the first record for this user or it's been more than the 34 ©2014 Cloudera, Inc. All Rights Reserved. timeout since // the last click from this user, let's increment the session ID. if (lastTimeStamp == null || (key.getUnixTimestamp() -‐ lastTimeStamp > SESSION_TIMEOUT_IN_MS)) { sessionId = key.getIp() + "+" + key.getUnixTimestamp(); } lastTimeStamp = key.getUnixTimestamp(); result.set(logRecord + " " + sessionId); // Since we only care about printing out the entire record in the result, with session ID appended // at the end, we just emit out "null" for the key context.write(null, result); } } }
35.
Secondary sorAng –
by Amestamp • Need records to reducer to be grouped by IP address and sorted by Amestamp – a concept called secondary sor/ng • Instead of using just IP address as map output key and reduce input key • We use a composite key (IP, Amestamp) as map output key and reduce input key 35 ©2014 Cloudera, Inc. All Rights Reserved.
36.
Secondary sorAng –
vocabulary • Composite key – IP address, Amestamp • Natural key – IP address • Secondary sort key -‐ Amestamp 36 ©2014 Cloudera, Inc. All Rights Reserved.
37.
Secondary sorAng •
Custom Grouping Comparator – on Natural Key (IP) • Custom Sort Comparator – on Composite Key (IP, address) • Custom ParAAoner – on Natural Key (IP) job.setGroupingComparatorClass(NaturalKeyComparator.class ); job.setSortComparatorClass(CompositeKeyComparator.class); job.setPartitionerClass(NaturalKeyPartitioner.class); 37 ©2014 Cloudera, Inc. All Rights Reserved.
38.
38 Final Architecture
©2014 Cloudera, Inc. All Rights Reserved.
39.
Hadoop Cluster ©2014
Cloudera, Inc. All Rights 39 Reserved. BI/VisualizaAon tool (e.g. microstrategy) BI Analysts Spark For machine learning and graph processing R/Python StaAsAcal Analysis Custom Apps 3. Accessing 2. Processing 4. OrchestraAon Website users CRM System 1. IngesAon via Oozie OperaAonal Data Store Via Sqoop Web servers
40.
Final Architecture –
High Level Overview 40 Data Sources IngesAon Data Storage/ Processing Data ReporAng/ Analysis ©2014 Cloudera, Inc. All Rights Reserved.
41.
Final Architecture –
High Level Overview 41 Data Sources IngesAon Data Storage/ Processing Data ReporAng/ Analysis ©2014 Cloudera, Inc. All Rights Reserved.
42.
Final Architecture –
IngesAon 42 Web App Avro Agent Web App Avro Agent Web App Avro Agent Web App Avro Agent Web App Avro Agent Web App Avro Agent Web App Avro Agent Web App Avro Agent Flume Agent Flume Agent Flume Agent Flume Agent Fan-‐in PaPern MulA Agents for Failover and rolling restarts HDFS ©2014 Cloudera, Inc. All Rights Reserved.
43.
Final Architecture –
High Level Overview 43 Data Sources IngesAon Data Storage/ Processing Data ReporAng/ Analysis ©2014 Cloudera, Inc. All Rights Reserved.
44.
Final Architecture –
Storage and Processing /etl/weblogs/20140331/ /etl/weblogs/20140401/ … 44 Data Processing /data/markeAng/clickstream/bouncerate/ /data/markeAng/clickstream/aPribuAon/ … ©2014 Cloudera, Inc. All Rights Reserved.
45.
Final Architecture –
High Level Overview 45 Data Sources IngesAon Data Storage/ Processing Data ReporAng/ Analysis ©2014 Cloudera, Inc. All Rights Reserved.
46.
Final Architecture –
Data Access 46 Hive/ Impala BI/ AnalyAcs Tools JDBC/ODBC DWH Sqoop Local Disk R, etc. DB import tool ©2014 Cloudera, Inc. All Rights Reserved.
47.
Contact info •
Mark Grover • @mark_grover • www.linkedin.com/in/grovermark • Slides at slideshare.net/markgrover 47 ©2014 Cloudera, Inc. All Rights Reserved.
48.
48 ©2014 Cloudera,
Inc. All Rights Reserved.
Download now