SlideShare a Scribd company logo
1 of 12
Hadoop Ecosystem Architectures 
BigData + Oracle/SQL Server Databases 
Summary from Absolute SW slides
BigData Failures 
 >50% of Hadoop initiatives fail; Why? 
- Start: Assume Hadoop replaces a database and the 
DB apps 
- Progression: Assume Hadoop supplements the DB 
and is not a complete replacement. Some of the 
batch jobs can migrate to Hadoop 
This may solve the problem of having to pay the next round 
of licensing fees for the next higher step up in db capacity 
 Most of these initiatives still fail. Why?
Hadoop/DB Migrations 
 Takes too long to migrate the db schema to 
Hadoop for the longer batch queries. Too 
long=> increased cost=> :( 
- Vendor Training is not adequate 
 to get business logic implemented in an API on top of 
Hadoop quickly.(tools e.g. SQOOP) 
 For devops/production/customization 
 Confusion in which components to use; workflows 
w/Oozie; PIG+UDFs or Spark or Hive+UDFs; HBase 
- Fix: Use REST APIs/Services + Hadoop MR+Spark 
Shell; Training
What is a better strategy? 
 Besides going all in with Hadoop and buying 
the Cloudera/MapR/Hortonworks sales pitch; 
what is missing? 
 Goal: quickly establish a user base; not 2 
years. ~6 months; 
- Mix REST services with Hadoop/HDFS. Tableau 
one example, better to custom develop 
- Start w/ opensource hadoop; not CM or Ambari; 
build the source; learn to apply the patches to Jira 
bugs (used to be important). Drives understanding 
in internals for configuration, skills for production
Open Source strategy 
 Normally takes 1-2y 
- Training reduces time from POC to deployment to 6 
months for first use case 
 Training on both REST services to establish a corporate 
agile strategy/template with Hadoop takes years to 
develop. Different than Hadoop Vendor training for 
implementing business logic 
 Covers REST examples w/Spring and/or Guice and 
building the source, removing the unnecessary 
components to keep the code base small; adding 
integration tests specific to a customer deployment using 
iTest; puppet scripts and how to deploy from a single 
source tree using Jenkins
Use case: DB Queries 
 Misconception replacing DB queries in complex schema 
with Hadoop Hive/Pig/Spark queries as a strategy 
- Develop REST BE/FE template/skills(<1H 
implementation). Can Deploy w/HDFS(w/wo 
indexes) Queries. Why? 
• Faster perf, less code to do the same thing, less 
admin; lower cost at small scale. REST services 
are closer to a db than Hadoop. :) users 
 With training REST services take 1h-1day to build. 
 Hadoop impediments: 
 having to provision a cluster, understanding what 
the XML files do, running benchmarks, configuring 
kerberos, setting ACLS, versioning data, testing 
backup and recovery strategies, testing
REST + Hadoop 
 Successful deployments contain a mix of 
homegrown services + Hadoop components 
- Training to develop REST services quickly 
 No Spring, no J2EE, no Glassfish, no complex s/w with 
millions of lines of code. 
 DI with Google Guice; maven; Jetty; FE using jQuery or 
use Twitter bootstrap. Keep the BE and FE simple first 
before looking at web frameworks like Play, Django, 
Ruby, node.js... etc... 
 Training materials: no Guice, w/Guice 
- Package REST services with Hadoop distro using 
the Bigtop Skills
REST + Hadoop 
 Successful deployments contain a mix of 
homegrown services + Hadoop components 
- Training to develop REST services quickly 
 No Spring, no J2EE, no Glassfish, no complex s/w with 
millions of lines of code. 
 DI with Google Guice; maven; Jetty; FE using jQuery or 
use Twitter bootstrap. Keep the BE and FE simple first 
before looking at web frameworks like Play, Django, 
Ruby, node.js... etc... 
 Training materials: no Guice, w/Guice 
- Package REST services with Hadoop distro using 
the Bigtop Skills
Back to Hadoop 
 K/V storage; why? 
- Add nodes to scale out horizontally; i.e. need more 
memory to handle more data<=> more db rows 
problem/soln 
 M/R spills to disk; speeding up data reads are 
ok but M/R still a problem; Spark/Scala in 
memory computation w/KV store 
 Building a data repository, customize the CDK 
to reflect the schemas. Productionize using 
Guice. Spring too rigid, not morphlines(like 
SED)
Hive/Pig/Oozie/Sqoop 
 Departments pick their own tools/approach 
based on the problem description 
 HTTPFS isn't an API 
 Add REST API 
 Hive/PIG slow to develop. Developing UDFs 
take time, production code hard to 
maintain/modify 
 buried behind production firewall 
 Better with beeline add jar
Scala/Spark 
 Some parts of Scala/Spark not parallelizable 
- Parallelize over threads in ExecutionContext vs. 
Workers in separate JVMs 
 Takes 3x to get something right for users 
1) Learning;everything new(vendor training good) 
2) Know what is important for your own use case; 
focus time on soln here; code is different than first 
time. e.g. scala teaching 
3) now know what the problem definition is and 
probably what the best soln is; can focus on 
execution and making service fast and usable
Analytics Use case: Model building 
 Models take a long time to build. Example: 
Random Forest 
- 4h on 8GB macbook(~2010;R) 
- 4h on AWS Large instance(R) 
- 16h(Mahout; not same impl as R) on M/R in AWS 
cluster on 4 nodes. More not faster 
- Soln: 
 Distributed+MultiTenant. Not Mahout

More Related Content

What's hot

Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorEdureka!
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoopShashwat Shriparv
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin trainingArun Kumar
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 

What's hot (20)

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 

Viewers also liked

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming InfoDoug Chang
 
Bigtop june302013
Bigtop june302013Bigtop june302013
Bigtop june302013Doug Chang
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notesDoug Chang
 
Demographics andweblogtargeting
Demographics andweblogtargetingDemographics andweblogtargeting
Demographics andweblogtargetingDoug Chang
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013Doug Chang
 
Bigtop elancesmallrev1
Bigtop elancesmallrev1Bigtop elancesmallrev1
Bigtop elancesmallrev1Doug Chang
 

Viewers also liked (6)

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Bigtop june302013
Bigtop june302013Bigtop june302013
Bigtop june302013
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notes
 
Demographics andweblogtargeting
Demographics andweblogtargetingDemographics andweblogtargeting
Demographics andweblogtargeting
 
Apache bigtopwg7142013
Apache bigtopwg7142013Apache bigtopwg7142013
Apache bigtopwg7142013
 
Bigtop elancesmallrev1
Bigtop elancesmallrev1Bigtop elancesmallrev1
Bigtop elancesmallrev1
 

Similar to Hadoop applicationarchitectures

Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...mindscriptsseo
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationVskills
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectKamal A
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Fordnadine39280
 
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopSenior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopabinash bindhani
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-trainingKnowledgehut
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Hadoop administarrtion
Hadoop administarrtionHadoop administarrtion
Hadoop administarrtionJanu Jahnavi
 
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real Worldvoberoi
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placementsofia taylor
 

Similar to Hadoop applicationarchitectures (20)

Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
 
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
 
Hadoop and Mapreduce Certification
Hadoop and Mapreduce CertificationHadoop and Mapreduce Certification
Hadoop and Mapreduce Certification
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Practical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified ArchitectPractical Hadoop Big Data Training Course by Certified Architect
Practical Hadoop Big Data Training Course by Certified Architect
 
Feb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel FordFeb 2024 Apache Hudi Community Sync with Daniel Ford
Feb 2024 Apache Hudi Community Sync with Daniel Ford
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoopSenior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
Senior systems engineer at Infosys with 2.4yrs of experience on Bigdata & hadoop
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
HimaBindu
HimaBinduHimaBindu
HimaBindu
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Firebasics
FirebasicsFirebasics
Firebasics
 
Hadoop administarrtion
Hadoop administarrtionHadoop administarrtion
Hadoop administarrtion
 
Hadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real WorldHadoop at Meebo: Lessons in the Real World
Hadoop at Meebo: Lessons in the Real World
 
Hadoop training-and-placement
Hadoop training-and-placementHadoop training-and-placement
Hadoop training-and-placement
 

Recently uploaded

VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...SUHANI PANDEY
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...nilamkumrai
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...nirzagarg
 

Recently uploaded (20)

VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call G...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 

Hadoop applicationarchitectures

  • 1. Hadoop Ecosystem Architectures BigData + Oracle/SQL Server Databases Summary from Absolute SW slides
  • 2. BigData Failures  >50% of Hadoop initiatives fail; Why? - Start: Assume Hadoop replaces a database and the DB apps - Progression: Assume Hadoop supplements the DB and is not a complete replacement. Some of the batch jobs can migrate to Hadoop This may solve the problem of having to pay the next round of licensing fees for the next higher step up in db capacity  Most of these initiatives still fail. Why?
  • 3. Hadoop/DB Migrations  Takes too long to migrate the db schema to Hadoop for the longer batch queries. Too long=> increased cost=> :( - Vendor Training is not adequate  to get business logic implemented in an API on top of Hadoop quickly.(tools e.g. SQOOP)  For devops/production/customization  Confusion in which components to use; workflows w/Oozie; PIG+UDFs or Spark or Hive+UDFs; HBase - Fix: Use REST APIs/Services + Hadoop MR+Spark Shell; Training
  • 4. What is a better strategy?  Besides going all in with Hadoop and buying the Cloudera/MapR/Hortonworks sales pitch; what is missing?  Goal: quickly establish a user base; not 2 years. ~6 months; - Mix REST services with Hadoop/HDFS. Tableau one example, better to custom develop - Start w/ opensource hadoop; not CM or Ambari; build the source; learn to apply the patches to Jira bugs (used to be important). Drives understanding in internals for configuration, skills for production
  • 5. Open Source strategy  Normally takes 1-2y - Training reduces time from POC to deployment to 6 months for first use case  Training on both REST services to establish a corporate agile strategy/template with Hadoop takes years to develop. Different than Hadoop Vendor training for implementing business logic  Covers REST examples w/Spring and/or Guice and building the source, removing the unnecessary components to keep the code base small; adding integration tests specific to a customer deployment using iTest; puppet scripts and how to deploy from a single source tree using Jenkins
  • 6. Use case: DB Queries  Misconception replacing DB queries in complex schema with Hadoop Hive/Pig/Spark queries as a strategy - Develop REST BE/FE template/skills(<1H implementation). Can Deploy w/HDFS(w/wo indexes) Queries. Why? • Faster perf, less code to do the same thing, less admin; lower cost at small scale. REST services are closer to a db than Hadoop. :) users  With training REST services take 1h-1day to build.  Hadoop impediments:  having to provision a cluster, understanding what the XML files do, running benchmarks, configuring kerberos, setting ACLS, versioning data, testing backup and recovery strategies, testing
  • 7. REST + Hadoop  Successful deployments contain a mix of homegrown services + Hadoop components - Training to develop REST services quickly  No Spring, no J2EE, no Glassfish, no complex s/w with millions of lines of code.  DI with Google Guice; maven; Jetty; FE using jQuery or use Twitter bootstrap. Keep the BE and FE simple first before looking at web frameworks like Play, Django, Ruby, node.js... etc...  Training materials: no Guice, w/Guice - Package REST services with Hadoop distro using the Bigtop Skills
  • 8. REST + Hadoop  Successful deployments contain a mix of homegrown services + Hadoop components - Training to develop REST services quickly  No Spring, no J2EE, no Glassfish, no complex s/w with millions of lines of code.  DI with Google Guice; maven; Jetty; FE using jQuery or use Twitter bootstrap. Keep the BE and FE simple first before looking at web frameworks like Play, Django, Ruby, node.js... etc...  Training materials: no Guice, w/Guice - Package REST services with Hadoop distro using the Bigtop Skills
  • 9. Back to Hadoop  K/V storage; why? - Add nodes to scale out horizontally; i.e. need more memory to handle more data<=> more db rows problem/soln  M/R spills to disk; speeding up data reads are ok but M/R still a problem; Spark/Scala in memory computation w/KV store  Building a data repository, customize the CDK to reflect the schemas. Productionize using Guice. Spring too rigid, not morphlines(like SED)
  • 10. Hive/Pig/Oozie/Sqoop  Departments pick their own tools/approach based on the problem description  HTTPFS isn't an API  Add REST API  Hive/PIG slow to develop. Developing UDFs take time, production code hard to maintain/modify  buried behind production firewall  Better with beeline add jar
  • 11. Scala/Spark  Some parts of Scala/Spark not parallelizable - Parallelize over threads in ExecutionContext vs. Workers in separate JVMs  Takes 3x to get something right for users 1) Learning;everything new(vendor training good) 2) Know what is important for your own use case; focus time on soln here; code is different than first time. e.g. scala teaching 3) now know what the problem definition is and probably what the best soln is; can focus on execution and making service fast and usable
  • 12. Analytics Use case: Model building  Models take a long time to build. Example: Random Forest - 4h on 8GB macbook(~2010;R) - 4h on AWS Large instance(R) - 16h(Mahout; not same impl as R) on M/R in AWS cluster on 4 nodes. More not faster - Soln:  Distributed+MultiTenant. Not Mahout