SlideShare a Scribd company logo
1 of 13
Localized Hadoop Development
How to get up and running quickly by Tim Bytnar
This Photo by Unknown Author is licensed under CC BY-SA
Tim Bytnar
17 years in the industry
Data Engineering
Microsoft Development and Application Stack
Systems Automation
Datacenter Infrastructure
Network Engineering
Email: Tim.Bytnar@Daugherty.com
LinkedIn: https://www.linkedin.com/in/timbytnar/
I have not failed. I've just found 10,000 ways that won't work.
- Thomas A. Edison
What is the problem?
Hadoop development has a
steep requirement of having
access to an environment that
allows you to freely explore the
overwhelming ecosystem
Are there other options?
CLOUD PROVIDER “FREE” TIME BOOK LEARNING OR VIDEO
TRAINING
HOME LAB (IF YOU HAVE ONE OF
THESE LYING AROUND LIKE I DON’T)
What do you
propose?
This Photo by Unknown Author is licensed under CC BY-SA
Dockerized Hadoop and Spark Environments
What the environment
is for.
• Learning Hadoop!
• Developing…
• BASH Scripts
• Hive Automations
• Spark Processing
• Data Analysis (Tableau, PowerBI, Jupyter, etc…)
• Rapid Proof of Concept
• Will this dataset work in Hadoop?
• What advantages would Spark give me for this
workload?
What the
environment
is NOT for.
Demo Time
How to get
started?
> git clone https://github.com/tbytnar/docker-hive.git
Want any
help?
The repository is public and open for pull
requests or forks
Future Plans
• Keep it updated
• Add more modularity
• Add walkthroughs and challenges
• Improve Cross-platform Portability
• Baseline Performance Optimized Version
Questions
and Answers
Tim Bytnar
Email: Tim.Bytnar@Daugherty.com
LinkedIn: https://www.linkedin.com/in/timbytnar/
> git clone https://github.com/tbytnar/docker-hive.git
Thank you to:
Ivan Ermilov and his team at Big Data Europe
http://github.com/big-data-europe/docker-hadoop
http://github.com/big-data-europe/docker-hive

More Related Content

Similar to Localized Hadoop Development

Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseLaine Campbell
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabAmanda Casari
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkBTI360
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothmanDenis Rothman
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Web Performance & You
Web Performance & YouWeb Performance & You
Web Performance & YouDave Olsen
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Open source secret_sauce_apache_con_2010
Open source secret_sauce_apache_con_2010Open source secret_sauce_apache_con_2010
Open source secret_sauce_apache_con_2010Ted Husted
 
Agile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningAgile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningKyle Hailey
 
What is the semantic web
What is the semantic webWhat is the semantic web
What is the semantic webDarren Meehan
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
Business Intelligence for normal people
Business Intelligence for normal peopleBusiness Intelligence for normal people
Business Intelligence for normal peoplemark madsen
 
Murli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina NetworksMurli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina NetworksEntrepreneurTrek
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 

Similar to Localized Hadoop Development (20)

Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLab
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothman
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Web Performance & You
Web Performance & YouWeb Performance & You
Web Performance & You
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Open source secret_sauce_apache_con_2010
Open source secret_sauce_apache_con_2010Open source secret_sauce_apache_con_2010
Open source secret_sauce_apache_con_2010
 
Agile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloningAgile Data: revolutionizing data and database cloning
Agile Data: revolutionizing data and database cloning
 
Tech
TechTech
Tech
 
Big data hadoop
Big data hadoopBig data hadoop
Big data hadoop
 
What is the semantic web
What is the semantic webWhat is the semantic web
What is the semantic web
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Business Intelligence for normal people
Business Intelligence for normal peopleBusiness Intelligence for normal people
Business Intelligence for normal people
 
Murli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina NetworksMurli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina Networks
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
AI from Space using Azure
AI from Space using AzureAI from Space using Azure
AI from Space using Azure
 

More from Adam Doyle

Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering RolesAdam Doyle
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowAdam Doyle
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020Adam Doyle
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAAdam Doyle
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackAdam Doyle
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does dataAdam Doyle
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsAdam Doyle
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019Adam Doyle
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleAdam Doyle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 

More from Adam Doyle (20)

ML Ops.pptx
ML Ops.pptxML Ops.pptx
ML Ops.pptx
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering Roles
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
The new big data
The new big dataThe new big data
The new big data
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does data
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analytics
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science Lifecycle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 

Recently uploaded

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 

Recently uploaded (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 

Localized Hadoop Development

  • 1. Localized Hadoop Development How to get up and running quickly by Tim Bytnar This Photo by Unknown Author is licensed under CC BY-SA
  • 2. Tim Bytnar 17 years in the industry Data Engineering Microsoft Development and Application Stack Systems Automation Datacenter Infrastructure Network Engineering Email: Tim.Bytnar@Daugherty.com LinkedIn: https://www.linkedin.com/in/timbytnar/ I have not failed. I've just found 10,000 ways that won't work. - Thomas A. Edison
  • 3. What is the problem? Hadoop development has a steep requirement of having access to an environment that allows you to freely explore the overwhelming ecosystem
  • 4. Are there other options? CLOUD PROVIDER “FREE” TIME BOOK LEARNING OR VIDEO TRAINING HOME LAB (IF YOU HAVE ONE OF THESE LYING AROUND LIKE I DON’T)
  • 5. What do you propose? This Photo by Unknown Author is licensed under CC BY-SA
  • 6. Dockerized Hadoop and Spark Environments
  • 7. What the environment is for. • Learning Hadoop! • Developing… • BASH Scripts • Hive Automations • Spark Processing • Data Analysis (Tableau, PowerBI, Jupyter, etc…) • Rapid Proof of Concept • Will this dataset work in Hadoop? • What advantages would Spark give me for this workload?
  • 10. How to get started? > git clone https://github.com/tbytnar/docker-hive.git
  • 11. Want any help? The repository is public and open for pull requests or forks Future Plans • Keep it updated • Add more modularity • Add walkthroughs and challenges • Improve Cross-platform Portability • Baseline Performance Optimized Version
  • 13. Tim Bytnar Email: Tim.Bytnar@Daugherty.com LinkedIn: https://www.linkedin.com/in/timbytnar/ > git clone https://github.com/tbytnar/docker-hive.git Thank you to: Ivan Ermilov and his team at Big Data Europe http://github.com/big-data-europe/docker-hadoop http://github.com/big-data-europe/docker-hive

Editor's Notes

  1. Thank you for attending today and thank you for giving me your time. Tonight, I’ll be talking a bit about training and developing in Hadoop and particularly the challenges of doing so.
  2. First that awkward narcissistic slide where I tell you a little about myself. Like many of you I grew up lovingly addicted to technology, especially computers. Seventeen years ago I finally turned that passion into a career and over that time I’ve gotten my hands into many different verticals. Much of that time has been spent working with data either as a DBA or as an engineer. Paired with that has been a lot of time in the Microsoft stack either developing and supporting software applications or deploying and managing server infrastructure. As most of my career has been spent in managed hosting, I’ve also had quite of bit of experience working with systems automation, monitoring, infrastructure design and implementation and a little dabbling in network engineering. I’ve put my favorite quote there by Thomas Edison. [READ THE QUOTE] You’ll find out why I like that quote so much in a bit.
  3. So, what IS the problem exactly? Well, I should probably start with my story. I got interested in Big Data several years ago when the term became mainstream. I did my typical Google-fu to see what I could learn about the technology and maybe convince my managers to look at implementing it. No dice. It felt like the more I dug the more questions I had. Hadoop, HDFS, YARN, PIG, SQOOP, MapReduce, Spark, Hive, Solar, Lucene, Zookeeper, Oozie… and I’ve only scratched the surface of the entire ecosystem. By the time I got INTO big data and Hadoop, it was already overwhelming. Alright fine, I’ll knuckle down and get a private environment setup for myself so I can start learning this behemoth. At the time, most of the guides I followed all directed me to the cloud providers…which I followed…and a several hundred dollar bill later after forgetting that I left a cluster online for a month put a big price tag on this lesson. And the effect of that? Well, I shied away, opting instead to try to learn Hadoop in other people’s environments… which of course took a lot more time. So Hadoop has a steep learning requirement that is … having an environment to learn with in the first place.
  4. “Well but Tim there must be other options out there.” you’re probably saying right now. “What about Cloudera’s Quickstart VM?” you’re asking. Well Cloudera has ended the Quickstart environment in favor of pushing their “free” trial of a hosted product. There are other options and some of them can be pretty effective. Let’s touch back on the Cloud Hosted method. There is a vast number of guides that will take you step-by-step through spinning up a Hadoop cluster in each of the major Cloud Providers. I will warn you that a lot of those guides are outdated and will have you scratching your head with older or mismatched versions of components. Also set yourself a reminder. Shut that thing down when you’re done with it, your wallet will thank you later. As for Book learning or Video Training, I’ve always envied people who were able to sit down and read a training manual cover to cover and absorb all of that knowledge. Myself? I learn better when I’m getting my hands dirty. Video training ala Pluralsight or Linda does a pretty good job, but usually only get you so far before sending you off on your own without a working environment to use. And of course for those of you who are fortunate enough to have a full Cisco UCS chassis sitting in your basement just waiting for another workload to be thrown at it, more power to you folks. For the rest of us, if you have a spare PC lying around with a fair amount of memory (> 8GB), you can manage to cobble together a home lab and there are plenty of guides out there on how to do that.
  5. So, what am I proposing? Well, Docker to be quite honest. The portability, flexibility and scalability make this option REALLY attractive. So attractive that I took a good college try at putting an environment together. Now… this is where I fall on my sword and recall that quote from Thomas Edison earlier. I… didn’t fail per-se… but I certainly found at LEAST 10,000 ways to build a Dockerized Hadoop environment incorrectly. To that end, in my adventures in this space I’ve stumbled across several repositories that I’ve forked, enhanced and utilized to create my own environment. What I’ve put together is a Docker-Compose file that make it quick and easy to build and provision a Hadoop cluster with Hive AND a multi-node Spark cluster, all of which is open source and ready to be further enhanced by anyone wanting to contribute.
  6. My goal with this environment is to provide like-minded individuals a way to dip their toes into Hadoop at its core. It’s barebones Hadoop, Hive and Spark. The idea is straight to the point, get data into the environment, add it to HDFS, create a Hive table for that data and get to work. If you choose to do so, you can leave it at that, or you can spin up the Spark cluster and really get your hands dirty with the data. When you execute the docker-compose commands you see here, these are the containers that get provisioned. On the Hadoop side you have a namenode and a single datanode. You get a hive-server, a dedicated hive-metastore container and a postgres container that houses the hive metastore database. On the Spark side you get a Master and two Worker nodes. All of this can interconnect using Dockers bridge networking which also allows your workstation to connect to these components as if they were running on your machine. Once you’ve mastered the basics here you can easily jump in and start adding more components like PIG or Impala or Ranger maybe.
  7. We’ve covered why I built the environment but here’s a few reasons why I think it could be helpful for others and why I’m sharing it with you all today. Obviously the most useful thing about this environment is enabling people to Learn Hadoop. And learn it without all the other distractions that enterprise deployments bring with them. I’m looking at you Cloudera. Development can take place in this environment and I’m comfortable with saying it will get you at least 90% of the way there. You’ll want to spend that last 10% tweaking your code for performance reasons on whatever environment you’re working in. And lastly maybe you’re assessing whether or not Hadoop is right for your team. With this environment you can rapidly stand up a proof of concept and decide whether Hadoop is right for your datasets or whether or not Spark would be advantageous to you.
  8. The environment is not, let me repeat that, NOT for production purposes. It’s not optimized for performance at all and that’s on purpose. I think part of the fun of working at this capacity is troubleshooting all the hair-raising events that would come up in a production environment. So the installation is completely default. Throw your workload on it and tweak the performance to your liking. I don’t know if I made this clear enough before but to reiterate, this environment is NOT for production. I’ve taken no security standards or best-practices in mind when building this. Again, that’s on purpose. If I were to secure everything the way it should be, no one would want to use it. That said, it’s the perfect environment for learning how to implement security policies, so feel free to go nuts. Worst case scenario you blow away your containers and spin up new ones ready to be broken again.
  9. Getting started is as simple as cloning the GitHub repository and following the instructions posted in the README. A few warnings or disclaimers. This hasn’t been thoroughly tested on all platforms, yes it’s Docker and as long as you’re running a recent version of that it SHOULD work fine, but I think we all know there’s a big difference between SHOULD work and WILL work. Also, in the spirit of open source, I want to make it known that I will be actively maintaining this repository. So feel free to throw PRs my way or fork my work and enhance it for your own uses.
  10. That brings me to the end of my presentation. Thank you all for sitting through my babbling, hopefully you found at least some of it useful. Again, here is my contact information should you have ANY questions at all or what to help participate in the project. A HUGE thank you to Ivan Ermilov and his team at Big Data Europe. Their work REALLY saved me on this, and I highly recommend you check out what they’ve done at their repositories.