SlideShare a Scribd company logo
1 of 12
Hadoop Ecosystem Architectures 
BigData + Oracle/SQL Server Databases 
Summary from Absolute SW slides
BigData Failures 
 >50% of Hadoop initiatives fail; Why? 
- Start: Assume Hadoop replaces a database and the 
DB apps 
- Progression: Assume Hadoop supplements the DB 
and is not a complete replacement. Some of the 
batch jobs can migrate to Hadoop 
This may solve the problem of having to pay the next round 
of licensing fees for the next higher step up in db capacity 
 Most of these initiatives still fail. Why?
Hadoop/DB Migrations 
 Takes too long to migrate the db schema to 
Hadoop for the longer batch queries. Too 
long=> increased cost=> :( 
- Vendor Training is not adequate 
 to get business logic implemented in an API on top of 
Hadoop quickly.(tools e.g. SQOOP) 
 For devops/production/customization 
 Confusion in which components to use; workflows 
w/Oozie; PIG+UDFs or Spark or Hive+UDFs; HBase 
- Fix: Use REST APIs/Services + Hadoop MR+Spark 
Shell; Training
What is a better strategy? 
 Besides going all in with Hadoop and buying 
the Cloudera/MapR/Hortonworks sales pitch; 
what is missing? 
 Goal: quickly establish a user base; not 2 
years. ~6 months; 
- Mix REST services with Hadoop/HDFS. Tableau 
one example, better to custom develop 
- Start w/ opensource hadoop; not CM or Ambari; 
build the source; learn to apply the patches to Jira 
bugs (used to be important). Drives understanding 
in internals for configuration, skills for production
Open Source strategy 
 Normally takes 1-2y 
- Training reduces time from POC to deployment to 6 
months for first use case 
 Training on both REST services to establish a corporate 
agile strategy/template with Hadoop takes years to 
develop. Different than Hadoop Vendor training for 
implementing business logic 
 Covers REST examples w/Spring and/or Guice and 
building the source, removing the unnecessary 
components to keep the code base small; adding 
integration tests specific to a customer deployment using 
iTest; puppet scripts and how to deploy from a single 
source tree using Jenkins
Use case: DB Queries 
 Misconception replacing DB queries in complex schema 
with Hadoop Hive/Pig/Spark queries as a strategy 
- Develop REST BE/FE template/skills(<1H 
implementation). Can Deploy w/HDFS(w/wo 
indexes) Queries. Why? 
• Faster perf, less code to do the same thing, less 
admin; lower cost at small scale. REST services 
are closer to a db than Hadoop. :) users 
 With training REST services take 1h-1day to build. 
 Hadoop impediments: 
 having to provision a cluster, understanding what 
the XML files do, running benchmarks, configuring 
kerberos, setting ACLS, versioning data, testing 
backup and recovery strategies, testing
REST + Hadoop 
 Successful deployments contain a mix of 
homegrown services + Hadoop components 
- Training to develop REST services quickly 
 No Spring, no J2EE, no Glassfish, no complex s/w with 
millions of lines of code. 
 DI with Google Guice; maven; Jetty; FE using jQuery or 
use Twitter bootstrap. Keep the BE and FE simple first 
before looking at web frameworks like Play, Django, 
Ruby, node.js... etc... 
 Training materials: no Guice, w/Guice 
- Package REST services with Hadoop distro using 
the Bigtop Skills
REST + Hadoop 
 Successful deployments contain a mix of 
homegrown services + Hadoop components 
- Training to develop REST services quickly 
 No Spring, no J2EE, no Glassfish, no complex s/w with 
millions of lines of code. 
 DI with Google Guice; maven; Jetty; FE using jQuery or 
use Twitter bootstrap. Keep the BE and FE simple first 
before looking at web frameworks like Play, Django, 
Ruby, node.js... etc... 
 Training materials: no Guice, w/Guice 
- Package REST services with Hadoop distro using 
the Bigtop Skills
Back to Hadoop 
 K/V storage; why? 
- Add nodes to scale out horizontally; i.e. need more 
memory to handle more data<=> more db rows 
problem/soln 
 M/R spills to disk; speeding up data reads are 
ok but M/R still a problem; Spark/Scala in 
memory computation w/KV store 
 Building a data repository, customize the CDK 
to reflect the schemas. Productionize using 
Guice. Spring too rigid, not morphlines(like 
SED)
Hive/Pig/Oozie/Sqoop 
 Departments pick their own tools/approach 
based on the problem description 
 HTTPFS isn't an API 
 Add REST API 
 Hive/PIG slow to develop. Developing UDFs 
take time, production code hard to 
maintain/modify 
 buried behind production firewall 
 Better with beeline add jar
Scala/Spark 
 Some parts of Scala/Spark not parallelizable 
- Parallelize over threads in ExecutionContext vs. 
Workers in separate JVMs 
 Takes 3x to get something right for users 
1) Learning;everything new(vendor training good) 
2) Know what is important for your own use case; 
focus time on soln here; code is different than first 
time. e.g. scala teaching 
3) now know what the problem definition is and 
probably what the best soln is; can focus on 
execution and making service fast and usable
Analytics Use case: Model building 
 Models take a long time to build. Example: 
Random Forest 
- 4h on 8GB macbook(~2010;R) 
- 4h on AWS Large instance(R) 
- 16h(Mahout; not same impl as R) on M/R in AWS 
cluster on 4 nodes. More not faster 
- Soln: 
 Distributed+MultiTenant. Not Mahout

More Related Content

What's hot

Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorEdureka!
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin trainingArun Kumar
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 

What's hot (20)

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 

Viewers also liked

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming InfoDoug Chang
 
Bigtop june302013
Bigtop june302013Bigtop june302013
Bigtop june302013Doug Chang
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notesDoug Chang
 
Demographics andweblogtargeting
Demographics andweblogtargetingDemographics andweblogtargeting
Demographics andweblogtargetingDoug Chang
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013Doug Chang
 
Bigtop elancesmallrev1
Bigtop elancesmallrev1Bigtop elancesmallrev1
Bigtop elancesmallrev1Doug Chang
 

Viewers also liked (6)

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Bigtop june302013
Bigtop june302013Bigtop june302013
Bigtop june302013
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notes
 
Demographics andweblogtargeting
Demographics andweblogtargetingDemographics andweblogtargeting
Demographics andweblogtargeting
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013
 
Bigtop elancesmallrev1
Bigtop elancesmallrev1Bigtop elancesmallrev1
Bigtop elancesmallrev1
 

Similar to Hadoop applicationarchitectures

Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...mindscriptsseo
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectKamal A
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Fordnadine39280
 
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopSenior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopabinash bindhani
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-trainingKnowledgehut
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Hadoop administarrtion
Hadoop administarrtionHadoop administarrtion
Hadoop administarrtionJanu Jahnavi
 
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real Worldvoberoi
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placementsofia taylor
 

Similar to Hadoop applicationarchitectures (20)

Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified Architect
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopSenior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
HimaBindu
HimaBinduHimaBindu
HimaBindu
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Firebasics
FirebasicsFirebasics
Firebasics
 
Hadoop administarrtion
Hadoop administarrtionHadoop administarrtion
Hadoop administarrtion
 
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 

Recently uploaded

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 

Hadoop applicationarchitectures

  • 1. Hadoop Ecosystem Architectures BigData + Oracle/SQL Server Databases Summary from Absolute SW slides
  • 2. BigData Failures  >50% of Hadoop initiatives fail; Why? - Start: Assume Hadoop replaces a database and the DB apps - Progression: Assume Hadoop supplements the DB and is not a complete replacement. Some of the batch jobs can migrate to Hadoop This may solve the problem of having to pay the next round of licensing fees for the next higher step up in db capacity  Most of these initiatives still fail. Why?
  • 3. Hadoop/DB Migrations  Takes too long to migrate the db schema to Hadoop for the longer batch queries. Too long=> increased cost=> :( - Vendor Training is not adequate  to get business logic implemented in an API on top of Hadoop quickly.(tools e.g. SQOOP)  For devops/production/customization  Confusion in which components to use; workflows w/Oozie; PIG+UDFs or Spark or Hive+UDFs; HBase - Fix: Use REST APIs/Services + Hadoop MR+Spark Shell; Training
  • 4. What is a better strategy?  Besides going all in with Hadoop and buying the Cloudera/MapR/Hortonworks sales pitch; what is missing?  Goal: quickly establish a user base; not 2 years. ~6 months; - Mix REST services with Hadoop/HDFS. Tableau one example, better to custom develop - Start w/ opensource hadoop; not CM or Ambari; build the source; learn to apply the patches to Jira bugs (used to be important). Drives understanding in internals for configuration, skills for production
  • 5. Open Source strategy  Normally takes 1-2y - Training reduces time from POC to deployment to 6 months for first use case  Training on both REST services to establish a corporate agile strategy/template with Hadoop takes years to develop. Different than Hadoop Vendor training for implementing business logic  Covers REST examples w/Spring and/or Guice and building the source, removing the unnecessary components to keep the code base small; adding integration tests specific to a customer deployment using iTest; puppet scripts and how to deploy from a single source tree using Jenkins
  • 6. Use case: DB Queries  Misconception replacing DB queries in complex schema with Hadoop Hive/Pig/Spark queries as a strategy - Develop REST BE/FE template/skills(<1H implementation). Can Deploy w/HDFS(w/wo indexes) Queries. Why? • Faster perf, less code to do the same thing, less admin; lower cost at small scale. REST services are closer to a db than Hadoop. :) users  With training REST services take 1h-1day to build.  Hadoop impediments:  having to provision a cluster, understanding what the XML files do, running benchmarks, configuring kerberos, setting ACLS, versioning data, testing backup and recovery strategies, testing
  • 7. REST + Hadoop  Successful deployments contain a mix of homegrown services + Hadoop components - Training to develop REST services quickly  No Spring, no J2EE, no Glassfish, no complex s/w with millions of lines of code.  DI with Google Guice; maven; Jetty; FE using jQuery or use Twitter bootstrap. Keep the BE and FE simple first before looking at web frameworks like Play, Django, Ruby, node.js... etc...  Training materials: no Guice, w/Guice - Package REST services with Hadoop distro using the Bigtop Skills
  • 8. REST + Hadoop  Successful deployments contain a mix of homegrown services + Hadoop components - Training to develop REST services quickly  No Spring, no J2EE, no Glassfish, no complex s/w with millions of lines of code.  DI with Google Guice; maven; Jetty; FE using jQuery or use Twitter bootstrap. Keep the BE and FE simple first before looking at web frameworks like Play, Django, Ruby, node.js... etc...  Training materials: no Guice, w/Guice - Package REST services with Hadoop distro using the Bigtop Skills
  • 9. Back to Hadoop  K/V storage; why? - Add nodes to scale out horizontally; i.e. need more memory to handle more data<=> more db rows problem/soln  M/R spills to disk; speeding up data reads are ok but M/R still a problem; Spark/Scala in memory computation w/KV store  Building a data repository, customize the CDK to reflect the schemas. Productionize using Guice. Spring too rigid, not morphlines(like SED)
  • 10. Hive/Pig/Oozie/Sqoop  Departments pick their own tools/approach based on the problem description  HTTPFS isn't an API  Add REST API  Hive/PIG slow to develop. Developing UDFs take time, production code hard to maintain/modify  buried behind production firewall  Better with beeline add jar
  • 11. Scala/Spark  Some parts of Scala/Spark not parallelizable - Parallelize over threads in ExecutionContext vs. Workers in separate JVMs  Takes 3x to get something right for users 1) Learning;everything new(vendor training good) 2) Know what is important for your own use case; focus time on soln here; code is different than first time. e.g. scala teaching 3) now know what the problem definition is and probably what the best soln is; can focus on execution and making service fast and usable
  • 12. Analytics Use case: Model building  Models take a long time to build. Example: Random Forest - 4h on 8GB macbook(~2010;R) - 4h on AWS Large instance(R) - 16h(Mahout; not same impl as R) on M/R in AWS cluster on 4 nodes. More not faster - Soln:  Distributed+MultiTenant. Not Mahout