SlideShare a Scribd company logo
1 of 12
1 
Hands on Hadoop 
Daniel Templeton & Inyoung Cho 
Cloudera, Inc.
2 
Your Hosts 
Daniel Templeton 
• Certification Developer 
• Crusty, old HPC guy 
• Likes Perl 
Inyoung Cho 
• Certification Developer 
• Recovering Java 
Evangelist 
• Invented JavaOne Hands-on 
Labs 
©2014 Cloudera, Inc. 2 All rights reserved.
3 
What is “Big Data”? 
• Super-cool marketing buzz word 
• “Come see our new line of BIG DATA toasters…” 
• “The Five V’s” 
• Any data that is difficult to store in a traditional 
RDBMS 
• Too big, changes schemas too often, unstructured, … 
©2014 Cloudera, Inc. 3 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 4 All rights reserved.
What is Hadoop? 
©2014 Cloudera, Inc. 5 All rights reserved.
6 
HDFS in a Nutshell 
• Distributed “file system” service 
• Highly scalable and fault resilient 
• Chunks files into “blocks” that are replicated and 
distributed across the cluster 
©2014 Cloudera, Inc. 6 All rights reserved.
7 
MapReduce in a Nutshell 
• Embarrassingly parallel batch execution engine 
• Two phases: map and reduce 
• https://www.youtube.com/watch?v=bcjSe0xCHbE 
• Tasks are scheduled to run where the data is 
• Jobs are written to Java API 
©2014 Cloudera, Inc. 7 All rights reserved.
8 
Hive in a Nutshell 
• SQL engine for Hadoop 
• Translates HiveQL into MapReduce jobs 
©2014 Cloudera, Inc. 8 All rights reserved.
9 
Impala in a Nutshell 
• Hive with the MapReduce 
©2014 Cloudera, Inc. 9 All rights reserved.
10 
Pig in a Nutshell 
• Script-like language for data operations 
• Translates into MapReduce jobs 
©2014 Cloudera, Inc. 10 All rights reserved.
11 
The Lab 
• Self-paced 
• Should take right about 2 hours 
• “Additional Exercises” if you finish early 
• Inyoung and I are here to answer questions 
• Have fun! 
©2014 Cloudera, Inc. 11 All rights reserved.
12 ©2014 Cloudera, Inc. All rights reserved. 
Aaron Myers & 
Daniel Templeton

More Related Content

What's hot

Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovySteve Pember
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef Software, Inc.
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap UpMinwoo Kim
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collidem_richardson
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour Amazon Web Services
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016Igmar Palsenberg
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and cephShapeBlue
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best FriendsJon Haddad
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra ManagementInstaclustr
 
Open Datacentre
Open DatacentreOpen Datacentre
Open DatacentreDes Drury
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsLars Wander
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesosnelsonadpresent
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Edureka!
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes trainingDes Drury
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changerske4qqq
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
 

What's hot (20)

Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with Groovy
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 
Hbasecon2013 Wrap Up
Hbasecon2013 Wrap UpHbasecon2013 Wrap Up
Hbasecon2013 Wrap Up
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
NLUUG print conference May 26 2016
NLUUG print conference May 26 2016NLUUG print conference May 26 2016
NLUUG print conference May 26 2016
 
Wido den hollander cloud stack and ceph
Wido den hollander   cloud stack and cephWido den hollander   cloud stack and ceph
Wido den hollander cloud stack and ceph
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Open Datacentre
Open DatacentreOpen Datacentre
Open Datacentre
 
Orchestrating VM & Container Deployments
Orchestrating VM & Container DeploymentsOrchestrating VM & Container Deployments
Orchestrating VM & Container Deployments
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
 
Kubernetes training
Kubernetes trainingKubernetes training
Kubernetes training
 
DevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape ChangersDevOps, Cloud, and the Death of Backup Tape Changers
DevOps, Cloud, and the Death of Backup Tape Changers
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 

Viewers also liked (18)

Healthcare presentation
Healthcare presentationHealthcare presentation
Healthcare presentation
 
Facebooks dilemma
Facebooks dilemmaFacebooks dilemma
Facebooks dilemma
 
Privacy and security on twitter
Privacy and security on twitterPrivacy and security on twitter
Privacy and security on twitter
 
Team Building and Leadership Development India
Team Building and Leadership Development IndiaTeam Building and Leadership Development India
Team Building and Leadership Development India
 
Brandingwineandmeat11202005
Brandingwineandmeat11202005Brandingwineandmeat11202005
Brandingwineandmeat11202005
 
Best Global Brands 2010 U S
Best  Global  Brands 2010  U SBest  Global  Brands 2010  U S
Best Global Brands 2010 U S
 
Chaparral biome
Chaparral biomeChaparral biome
Chaparral biome
 
Work Strategy
Work StrategyWork Strategy
Work Strategy
 
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp017waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
7waystousesocialmediatobuildbrands Key 090417111441 Phpapp01
 
Esai
EsaiEsai
Esai
 
Road signs
Road signsRoad signs
Road signs
 
Power and politics
Power and politicsPower and politics
Power and politics
 
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
July 2012 - Blue Grass Chemical Agent-Destruction Pilot Plant Monthly Status ...
 
Sildes on different topics
Sildes on different topicsSildes on different topics
Sildes on different topics
 
Collective bargaining plm
Collective bargaining plmCollective bargaining plm
Collective bargaining plm
 
Final Marketing Presentation
Final Marketing PresentationFinal Marketing Presentation
Final Marketing Presentation
 
N3 (Bunpou)
N3 (Bunpou)N3 (Bunpou)
N3 (Bunpou)
 
CPA journal lurie shuv article
CPA journal lurie shuv articleCPA journal lurie shuv article
CPA journal lurie shuv article
 

Similar to Java one14 handsonhadoop

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Red_Hat_Storage
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephantKognitio
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluzRicard Clau
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015Cloudera, Inc.
 

Similar to Java one14 handsonhadoop (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
YARN
YARNYARN
YARN
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.Case Study: University Alabama-Birmingham.
Case Study: University Alabama-Birmingham.
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Big data and mstr bridge the elephant
Big data and mstr   bridge the elephantBig data and mstr   bridge the elephant
Big data and mstr bridge the elephant
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluz
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 

More from templedf

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Servertempledf
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructuretempledf
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solutiontempledf
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprisetempledf
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRushtempledf
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Servertempledf
 

More from templedf (9)

Couchbase Server
Couchbase ServerCouchbase Server
Couchbase Server
 
Supermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop InfrastructureSupermicro High Performance Enterprise Hadoop Infrastructure
Supermicro High Performance Enterprise Hadoop Infrastructure
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Talend
TalendTalend
Talend
 
Datameer Analytics Solution
Datameer Analytics SolutionDatameer Analytics Solution
Datameer Analytics Solution
 
Puppet Labs Puppet Enterprise
Puppet Labs Puppet EnterprisePuppet Labs Puppet Enterprise
Puppet Labs Puppet Enterprise
 
Couchbase
CouchbaseCouchbase
Couchbase
 
Pervasive DataRush
Pervasive DataRushPervasive DataRush
Pervasive DataRush
 
Composite Information Server
Composite Information ServerComposite Information Server
Composite Information Server
 

Recently uploaded

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Java one14 handsonhadoop

  • 1. 1 Hands on Hadoop Daniel Templeton & Inyoung Cho Cloudera, Inc.
  • 2. 2 Your Hosts Daniel Templeton • Certification Developer • Crusty, old HPC guy • Likes Perl Inyoung Cho • Certification Developer • Recovering Java Evangelist • Invented JavaOne Hands-on Labs ©2014 Cloudera, Inc. 2 All rights reserved.
  • 3. 3 What is “Big Data”? • Super-cool marketing buzz word • “Come see our new line of BIG DATA toasters…” • “The Five V’s” • Any data that is difficult to store in a traditional RDBMS • Too big, changes schemas too often, unstructured, … ©2014 Cloudera, Inc. 3 All rights reserved.
  • 4. What is Hadoop? ©2014 Cloudera, Inc. 4 All rights reserved.
  • 5. What is Hadoop? ©2014 Cloudera, Inc. 5 All rights reserved.
  • 6. 6 HDFS in a Nutshell • Distributed “file system” service • Highly scalable and fault resilient • Chunks files into “blocks” that are replicated and distributed across the cluster ©2014 Cloudera, Inc. 6 All rights reserved.
  • 7. 7 MapReduce in a Nutshell • Embarrassingly parallel batch execution engine • Two phases: map and reduce • https://www.youtube.com/watch?v=bcjSe0xCHbE • Tasks are scheduled to run where the data is • Jobs are written to Java API ©2014 Cloudera, Inc. 7 All rights reserved.
  • 8. 8 Hive in a Nutshell • SQL engine for Hadoop • Translates HiveQL into MapReduce jobs ©2014 Cloudera, Inc. 8 All rights reserved.
  • 9. 9 Impala in a Nutshell • Hive with the MapReduce ©2014 Cloudera, Inc. 9 All rights reserved.
  • 10. 10 Pig in a Nutshell • Script-like language for data operations • Translates into MapReduce jobs ©2014 Cloudera, Inc. 10 All rights reserved.
  • 11. 11 The Lab • Self-paced • Should take right about 2 hours • “Additional Exercises” if you finish early • Inyoung and I are here to answer questions • Have fun! ©2014 Cloudera, Inc. 11 All rights reserved.
  • 12. 12 ©2014 Cloudera, Inc. All rights reserved. Aaron Myers & Daniel Templeton