SlideShare a Scribd company logo
1 
Hands on Hadoop 
Daniel Templeton & Inyoung Cho 
Cloudera, Inc.
2 
Your Hosts 
Daniel Templeton 
• Certification Developer 
• Crusty, old HPC guy 
• Likes Perl 
Inyoung Cho 
• Certification Developer 
• Recovering Java 
Evangelist 
• Invented JavaOne Hands-on 
Labs 
©2014 Cloudera, Inc. 2 All rights reserved.
3 
What is “Big Data”? 
• Super-cool marketing buzz word 
• “Come see our new line of BIG DATA toasters…” 
• “The Five V’s” 
• Any data that is difficult to store in a traditional 
RDBMS 
• Too big, changes schemas too often, unstructured, … 
©2014 Cloudera, Inc. 3 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 4 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 5 All rights reserved.
6 
HDFS in a Nutshell 
• Distributed “file system” service 
• Highly scalable and fault resilient 
• Chunks files into “blocks” that are replicated and 
distributed across the cluster 
©2014 Cloudera, Inc. 6 All rights reserved.
7 
MapReduce in a Nutshell 
• Embarrassingly parallel batch execution engine 
• Two phases: map and reduce 
• https://www.youtube.com/watch?v=bcjSe0xCHbE 
• Tasks are scheduled to run where the data is 
• Jobs are written to Java API 
©2014 Cloudera, Inc. 7 All rights reserved.
8 
Hive in a Nutshell 
• SQL engine for Hadoop 
• Translates HiveQL into MapReduce jobs 
©2014 Cloudera, Inc. 8 All rights reserved.
9 
Impala in a Nutshell 
• Hive with the MapReduce 
©2014 Cloudera, Inc. 9 All rights reserved.
10 
Pig in a Nutshell 
• Script-like language for data operations 
• Translates into MapReduce jobs 
©2014 Cloudera, Inc. 10 All rights reserved.
11 
The Lab 
• Self-paced 
• Should take right about 2 hours 
• “Additional Exercises” if you finish early 
• Inyoung and I are here to answer questions 
• Have fun! 
©2014 Cloudera, Inc. 11 All rights reserved.
12 ©2014 Cloudera, Inc. All rights reserved. 
Aaron Myers & 
Daniel Templeton

More Related Content

What's hot

Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
DataKitchen
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with Groovy
Steve Pember
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
Chef Software, Inc.
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap Up
Minwoo Kim
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
m_richardson
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
Amazon Web Services
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016
Igmar Palsenberg
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
ShapeBlue
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
Jon Haddad
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
Instaclustr
 
Open Datacentre
Open DatacentreOpen Datacentre
Open Datacentre
Des Drury
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container Deployments
Lars Wander
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
nelsonadpresent
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Edureka!
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
Des Drury
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changers
ke4qqq
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
Andrew Yongjoon Kong
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
Andrew Yongjoon Kong
 

What's hot (20)

Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with Groovy
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap Up
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Open Datacentre
Open DatacentreOpen Datacentre
Open Datacentre
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container Deployments
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changers
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 

Viewers also liked

Healthcare presentation
Healthcare presentationHealthcare presentation
Healthcare presentation
Samy Rajan
 
Facebooks dilemma
Facebooks dilemmaFacebooks dilemma
Facebooks dilemma
garciagodoy7
 
Privacy and security on twitter
Privacy and security on twitterPrivacy and security on twitter
Privacy and security on twitter
Eman Aldakheel
 
Team Building and Leadership Development India
Team Building and Leadership Development IndiaTeam Building and Leadership Development India
Team Building and Leadership Development India
orangesimran
 
Brandingwineandmeat11202005
Brandingwineandmeat11202005Brandingwineandmeat11202005
Brandingwineandmeat11202005
panakj051
 
Best Global Brands 2010 U S
Best  Global  Brands 2010  U SBest  Global  Brands 2010  U S
Best Global Brands 2010 U S
ทีจีเอ บางกอก
 
Chaparral biome
Chaparral biomeChaparral biome
Chaparral biome
Vini Kurnia Ramadhani
 
Work Strategy
Work StrategyWork Strategy
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp017waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
ทีจีเอ บางกอก
 
Esai
EsaiEsai
Road signs
Road signsRoad signs
Road signs
minikui81
 
Power and politics
Power and politicsPower and politics
Power and politics
Omar Jacalne
 
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
Program Executive Office, Assembled Chemical Weapons Alternatives (PEO ACWA)
 
Sildes on different topics
Sildes on different topicsSildes on different topics
Sildes on different topics
Sadia Zareen
 
Collective bargaining plm
Collective bargaining plmCollective bargaining plm
Collective bargaining plm
Omar Jacalne
 
Final Marketing Presentation
Final Marketing PresentationFinal Marketing Presentation
Final Marketing Presentation
taygiunto
 
N3 (Bunpou)
N3 (Bunpou)N3 (Bunpou)
N3 (Bunpou)Mae
 
CPA journal lurie shuv article
CPA journal lurie shuv articleCPA journal lurie shuv article
CPA journal lurie shuv article
Ehud Lurie
 

Viewers also liked (18)

Healthcare presentation
Healthcare presentationHealthcare presentation
Healthcare presentation
 
Facebooks dilemma
Facebooks dilemmaFacebooks dilemma
Facebooks dilemma
 
Privacy and security on twitter
Privacy and security on twitterPrivacy and security on twitter
Privacy and security on twitter
 
Team Building and Leadership Development India
Team Building and Leadership Development IndiaTeam Building and Leadership Development India
Team Building and Leadership Development India
 
Brandingwineandmeat11202005
Brandingwineandmeat11202005Brandingwineandmeat11202005
Brandingwineandmeat11202005
 
Best Global Brands 2010 U S
Best  Global  Brands 2010  U SBest  Global  Brands 2010  U S
Best Global Brands 2010 U S
 
Chaparral biome
Chaparral biomeChaparral biome
Chaparral biome
 
Work Strategy
Work StrategyWork Strategy
Work Strategy
 
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp017waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
 
Esai
EsaiEsai
Esai
 
Road signs
Road signsRoad signs
Road signs
 
Power and politics
Power and politicsPower and politics
Power and politics
 
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
 
Sildes on different topics
Sildes on different topicsSildes on different topics
Sildes on different topics
 
Collective bargaining plm
Collective bargaining plmCollective bargaining plm
Collective bargaining plm
 
Final Marketing Presentation
Final Marketing PresentationFinal Marketing Presentation
Final Marketing Presentation
 
N3 (Bunpou)
N3 (Bunpou)N3 (Bunpou)
N3 (Bunpou)
 
CPA journal lurie shuv article
CPA journal lurie shuv articleCPA journal lurie shuv article
CPA journal lurie shuv article
 

Similar to Java one14 handsonhadoop

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
YARN
YARNYARN
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Cloudera, Inc.
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
Ram Kedem
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Mark Kerzner
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.
Red_Hat_Storage
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
Kamesh Pemmaraju
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
Kognitio
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluz
Ricard Clau
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
DataWorks Summit
 

Similar to Java one14 handsonhadoop (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
YARN
YARNYARN
YARN
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluz
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 

More from templedf

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Server
templedf
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
templedf
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
templedf
 
Talend
TalendTalend
Talend
templedf
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
templedf
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprise
templedf
 
Couchbase
CouchbaseCouchbase
Couchbase
templedf
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
templedf
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Server
templedf
 

More from templedf (9)

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Server
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Talend
TalendTalend
Talend
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprise
 
Couchbase
CouchbaseCouchbase
Couchbase
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Server
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

Java one14 handsonhadoop

  • 1. 1 Hands on Hadoop Daniel Templeton & Inyoung Cho Cloudera, Inc.
  • 2. 2 Your Hosts Daniel Templeton • Certification Developer • Crusty, old HPC guy • Likes Perl Inyoung Cho • Certification Developer • Recovering Java Evangelist • Invented JavaOne Hands-on Labs ©2014 Cloudera, Inc. 2 All rights reserved.
  • 3. 3 What is “Big Data”? • Super-cool marketing buzz word • “Come see our new line of BIG DATA toasters…” • “The Five V’s” • Any data that is difficult to store in a traditional RDBMS • Too big, changes schemas too often, unstructured, … ©2014 Cloudera, Inc. 3 All rights reserved.
  • 4. What is Hadoop? ©2014 Cloudera, Inc. 4 All rights reserved.
  • 5. What is Hadoop? ©2014 Cloudera, Inc. 5 All rights reserved.
  • 6. 6 HDFS in a Nutshell • Distributed “file system” service • Highly scalable and fault resilient • Chunks files into “blocks” that are replicated and distributed across the cluster ©2014 Cloudera, Inc. 6 All rights reserved.
  • 7. 7 MapReduce in a Nutshell • Embarrassingly parallel batch execution engine • Two phases: map and reduce • https://www.youtube.com/watch?v=bcjSe0xCHbE • Tasks are scheduled to run where the data is • Jobs are written to Java API ©2014 Cloudera, Inc. 7 All rights reserved.
  • 8. 8 Hive in a Nutshell • SQL engine for Hadoop • Translates HiveQL into MapReduce jobs ©2014 Cloudera, Inc. 8 All rights reserved.
  • 9. 9 Impala in a Nutshell • Hive with the MapReduce ©2014 Cloudera, Inc. 9 All rights reserved.
  • 10. 10 Pig in a Nutshell • Script-like language for data operations • Translates into MapReduce jobs ©2014 Cloudera, Inc. 10 All rights reserved.
  • 11. 11 The Lab • Self-paced • Should take right about 2 hours • “Additional Exercises” if you finish early • Inyoung and I are here to answer questions • Have fun! ©2014 Cloudera, Inc. 11 All rights reserved.
  • 12. 12 ©2014 Cloudera, Inc. All rights reserved. Aaron Myers & Daniel Templeton