SlideShare a Scribd company logo
1 of 16
Hortonworks
Devashish B.
Consultant / Seed InfoTech
9844621164
A glimpse into the Future of Hadoop & Big Data
About Hortonworks
• Mission: Revolutionize and commoditize the storage and
processing of big data via open source
• Vision: Half of the world’s data will be stored in Apache
Hadoop within five years
• Strategy: Grow the Apache Hadoop Ecosystem by making
Apache Hadoop easier to consume, profit by providing
training, support and certification
 An independent company
 Focused on making Apache Hadoop great
 Hold nothing back, Apache Hadoop will be complete
1
Credentials
• Technical: key architects and committers from Yahoo! Hadoop
engineering team
− Highest concentration of Apache Hadoop committers
− Contributed >70% of the code in Hadoop, Pig and ZooKeeper
− Delivered every major/stable Apache Hadoop release since 0.1
− History of drivingExperience managing world’s largest deployment innovation
across entire Apache Hadoop stack
• Business operations: team of highly successful open source veterans
− Led by Rob Bearden, former COO of SpringSource & JBoss
• Investors: backed by Benchmark Capital and Yahoo!
− Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay
2
Hortonworks and Yahoo!
• Yahoo! is a development partner
−Leverage large Yahoo! development, testing & operations team
 More than 1,000 active & sophisticated users of Apache Hadoop
 Access to the Yahoo! grid for testing large workloads
 Only organization that has delivered a stable release of Apache Hadoop
−Yahoo will continue to contribute Apache Hadoop code too!
• Yahoo! is a customer
−Hortonworks provides level 3 support and training to Yahoo!
−Yahoo deploys Apache Hadoop releases across its 42,000 grid
• Yahoo! is an investor
3
Current State of Adoption
Vendor Ecosystem
Adoption
Enterprise Adoption
• Early adopters
• Technology is hard to install,
manage & use
• Technology lacks enterprise
robustness
• Requires significant
investment in technical staff
or consulting
• Hard to find & hire
experienced developer &
operations talent
• Early in vendor adoption
lifecycle
• Hadoop is hard to integrate
and extend
• Hard to find & hire
experienced developer &
operations talent
Technology & Knowledge
Gaps Prevent Apache
Hadoop from Reaching Full
Potential
Customers are asking their
vendors for help with
Hadoop!
“We’re seeing Hadoop in all
of our fortune 2000 data
accounts”
4
Hortonworks Role & Opportunity
Vendor Ecosystem
Adoption
Enterprise
Adoption
Bridge the Gap!
Grow Market
Sell training and
support via
Partners
Fundamental shift in enterprise data architecture strategy
• Apache Hadoop becomes standard for managing new types & scale of data
• New applications & solutions will be created to leverage data in Apache Hadoop
• Creates massive big data technology and services opportunity for ecosystem
5
Hortonworks Objectives
• Make Apache Hadoop projects easier
to install, manage & use
− Regular sustaining releases
− Compiled code for each project (e.g. RPMs)
− Testing at scale
• Make Apache Hadoop more robust
− Performance gains
− High availability
− Administration & monitoring
• Make Apache Hadoop easier to
integrate & extend
− Open APIs for extension & experimentation
All done within Apache
Hadoop community
• Develop collaboratively
with community
• Complete transparency
• All code contributed
back to Apache
Anyone should be able to easily deploy the Hadoop projects directly from Apache
6
Technology Roadmap
Phase 1 – Making Apache Hadoop Accessible
• Release the most stable version of Hadoop ever
• Release directly usable code via Apache (RPMs, .debs…)
• Frequent sustaining releases off of the stable branches
2011
Phase 2 – Next Generation Apache Hadoop
• Address key product gaps (Hbase support, HA, Management…)
• Enable community & partner innovation via modular architecture &
open APIs
• Work with community to define integrated stack
2012
(Alphas starting
Oct 2011)
7
Phase 2 - Next Generation Apache Hadoop
• Core
− HDFS Federation
− Next Gen MapReduce
− New Write Pipeline (HBase support)
− HA (no SPOF) and Wire compatibility
• Data - HCatalog 0.3
− Pig, Hive, MapReduce and Streaming as clients
− HDFS and HBase as storage systems
− Performance and storage improvements
• Management & Ease of use
− All components fully tested and deployable as a stack
− Stack installation and centralized config management
− REST and GUI for user tasks
8
Phase 2 – Core - MapReduce
Client
Client
Resource
Manager
Zookeeper
(No SPOF)
Application Master
Application Worker
Compute Machine
MapReduce App
MapReduce App
MPI App
• Complete rewrite of the resource management layer
• Performance and Scale improvements
• 6,000+ nodes / 100,000 concurrent tasks
• Supports better availability and fail-over
• Supports new frameworks beyond MapReduce
9
Phase 2 – Core – HDFS Federation
• Multiple independent Namenodes and Namespace Volumes in a cluster
− Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support
− Client side mount tables for Global Namespace
• Block storage as a generic shared storage service
− DataNodes store blocks for all Namespace volumes – no partitioning
− Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage
Datanode 1 Datanode 2 Datanode m
... ... ...
NS1
Foreig
n NS n
... ...
NS k
B
a
l
a
n
c
e
r
Block Pools
Pool nPool kPool 1
NN-1 NN-k NN-n
Common Storage
NamespaceBlockstorage
10
• Limitations of HDFS write pipeline in 0.20
−Broken Flush, Sync, Append
−Node failures can cause data loss for slow writers
• Hadoop.Next
−Flush, Sync, and Append support
−New replicas are added dynamically on failures
Phase 2 – Core – HDFS Write Pipeline
DNDN DN
Client
Flush Ack
11
Phase 2 – Data – HCatalog
HDFS HBase
HCatalog
Map
Reduce
Pig Hive Streaming
= Phase 1
= Phase 2
• Shared schema and data model
• Data can be shared between tool users
• Data located by table rather than file
• Clients independent of storage details
• format, compression, …
• Only one adaptor for new formats
• not one per tool
• Notifications when new data is
available
12
Hortonworks Value
For Enterprises
• Make Apache Hadoop
easier to consume
• Extend to broader
developer audience
• Foster vibrant
technology and
services ecosystem
• Access to
Hortonworks’
technical expertise
For Vendors
• Create larger market
for Apache Hadoop
technology and
services
• Simplify process for
supporting Hadoop
• Access to
Hortonworks’
technical expertise
For
Community
• Ensure Apache
Hadoop remains
unified and strong
• Expand value provided
by core Apache
projects
• Foster additional
participation &
contributions from
ecosystem
13
Hortonworks Differentiation
• Unmatched domain expertise
− Delivered every major release of Apache Hadoop to date
− Critical mass of committers
• Community leadership role
− Setting direction for core projects
• Yahoo! commitment and backing
− Access to 1,000+ Hadoop engineers, Yahoo! grid
• Absolute dedication to Apache & open source
− Focused on making Apache Hadoop the standard
• Focus on delivering significant value to technology vendors
− ISVs, OEMs, Systems Integrators and other service providers
14
Thank You.

More Related Content

What's hot

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to HadoopHortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 

What's hot (20)

5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Hadoop pycon2011uk
Hadoop pycon2011ukHadoop pycon2011uk
Hadoop pycon2011uk
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to Hadoop
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Presentation
PresentationPresentation
Presentation
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 

Viewers also liked

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
HCatalog & Templeton
HCatalog & TempletonHCatalog & Templeton
HCatalog & TempletonDaegeun Kim
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesThanigai Vellore
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 

Viewers also liked (7)

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalogJan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
 
HCatalog & Templeton
HCatalog & TempletonHCatalog & Templeton
HCatalog & Templeton
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 

Similar to A glimpse into the Future of Hadoop & Big Data

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoopKnowledgehut
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdfManoel Ribeiro
 

Similar to A glimpse into the Future of Hadoop & Big Data (20)

201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 

Recently uploaded

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

A glimpse into the Future of Hadoop & Big Data

  • 1. Hortonworks Devashish B. Consultant / Seed InfoTech 9844621164 A glimpse into the Future of Hadoop & Big Data
  • 2. About Hortonworks • Mission: Revolutionize and commoditize the storage and processing of big data via open source • Vision: Half of the world’s data will be stored in Apache Hadoop within five years • Strategy: Grow the Apache Hadoop Ecosystem by making Apache Hadoop easier to consume, profit by providing training, support and certification  An independent company  Focused on making Apache Hadoop great  Hold nothing back, Apache Hadoop will be complete 1
  • 3. Credentials • Technical: key architects and committers from Yahoo! Hadoop engineering team − Highest concentration of Apache Hadoop committers − Contributed >70% of the code in Hadoop, Pig and ZooKeeper − Delivered every major/stable Apache Hadoop release since 0.1 − History of drivingExperience managing world’s largest deployment innovation across entire Apache Hadoop stack • Business operations: team of highly successful open source veterans − Led by Rob Bearden, former COO of SpringSource & JBoss • Investors: backed by Benchmark Capital and Yahoo! − Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay 2
  • 4. Hortonworks and Yahoo! • Yahoo! is a development partner −Leverage large Yahoo! development, testing & operations team  More than 1,000 active & sophisticated users of Apache Hadoop  Access to the Yahoo! grid for testing large workloads  Only organization that has delivered a stable release of Apache Hadoop −Yahoo will continue to contribute Apache Hadoop code too! • Yahoo! is a customer −Hortonworks provides level 3 support and training to Yahoo! −Yahoo deploys Apache Hadoop releases across its 42,000 grid • Yahoo! is an investor 3
  • 5. Current State of Adoption Vendor Ecosystem Adoption Enterprise Adoption • Early adopters • Technology is hard to install, manage & use • Technology lacks enterprise robustness • Requires significant investment in technical staff or consulting • Hard to find & hire experienced developer & operations talent • Early in vendor adoption lifecycle • Hadoop is hard to integrate and extend • Hard to find & hire experienced developer & operations talent Technology & Knowledge Gaps Prevent Apache Hadoop from Reaching Full Potential Customers are asking their vendors for help with Hadoop! “We’re seeing Hadoop in all of our fortune 2000 data accounts” 4
  • 6. Hortonworks Role & Opportunity Vendor Ecosystem Adoption Enterprise Adoption Bridge the Gap! Grow Market Sell training and support via Partners Fundamental shift in enterprise data architecture strategy • Apache Hadoop becomes standard for managing new types & scale of data • New applications & solutions will be created to leverage data in Apache Hadoop • Creates massive big data technology and services opportunity for ecosystem 5
  • 7. Hortonworks Objectives • Make Apache Hadoop projects easier to install, manage & use − Regular sustaining releases − Compiled code for each project (e.g. RPMs) − Testing at scale • Make Apache Hadoop more robust − Performance gains − High availability − Administration & monitoring • Make Apache Hadoop easier to integrate & extend − Open APIs for extension & experimentation All done within Apache Hadoop community • Develop collaboratively with community • Complete transparency • All code contributed back to Apache Anyone should be able to easily deploy the Hadoop projects directly from Apache 6
  • 8. Technology Roadmap Phase 1 – Making Apache Hadoop Accessible • Release the most stable version of Hadoop ever • Release directly usable code via Apache (RPMs, .debs…) • Frequent sustaining releases off of the stable branches 2011 Phase 2 – Next Generation Apache Hadoop • Address key product gaps (Hbase support, HA, Management…) • Enable community & partner innovation via modular architecture & open APIs • Work with community to define integrated stack 2012 (Alphas starting Oct 2011) 7
  • 9. Phase 2 - Next Generation Apache Hadoop • Core − HDFS Federation − Next Gen MapReduce − New Write Pipeline (HBase support) − HA (no SPOF) and Wire compatibility • Data - HCatalog 0.3 − Pig, Hive, MapReduce and Streaming as clients − HDFS and HBase as storage systems − Performance and storage improvements • Management & Ease of use − All components fully tested and deployable as a stack − Stack installation and centralized config management − REST and GUI for user tasks 8
  • 10. Phase 2 – Core - MapReduce Client Client Resource Manager Zookeeper (No SPOF) Application Master Application Worker Compute Machine MapReduce App MapReduce App MPI App • Complete rewrite of the resource management layer • Performance and Scale improvements • 6,000+ nodes / 100,000 concurrent tasks • Supports better availability and fail-over • Supports new frameworks beyond MapReduce 9
  • 11. Phase 2 – Core – HDFS Federation • Multiple independent Namenodes and Namespace Volumes in a cluster − Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support − Client side mount tables for Global Namespace • Block storage as a generic shared storage service − DataNodes store blocks for all Namespace volumes – no partitioning − Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage Datanode 1 Datanode 2 Datanode m ... ... ... NS1 Foreig n NS n ... ... NS k B a l a n c e r Block Pools Pool nPool kPool 1 NN-1 NN-k NN-n Common Storage NamespaceBlockstorage 10
  • 12. • Limitations of HDFS write pipeline in 0.20 −Broken Flush, Sync, Append −Node failures can cause data loss for slow writers • Hadoop.Next −Flush, Sync, and Append support −New replicas are added dynamically on failures Phase 2 – Core – HDFS Write Pipeline DNDN DN Client Flush Ack 11
  • 13. Phase 2 – Data – HCatalog HDFS HBase HCatalog Map Reduce Pig Hive Streaming = Phase 1 = Phase 2 • Shared schema and data model • Data can be shared between tool users • Data located by table rather than file • Clients independent of storage details • format, compression, … • Only one adaptor for new formats • not one per tool • Notifications when new data is available 12
  • 14. Hortonworks Value For Enterprises • Make Apache Hadoop easier to consume • Extend to broader developer audience • Foster vibrant technology and services ecosystem • Access to Hortonworks’ technical expertise For Vendors • Create larger market for Apache Hadoop technology and services • Simplify process for supporting Hadoop • Access to Hortonworks’ technical expertise For Community • Ensure Apache Hadoop remains unified and strong • Expand value provided by core Apache projects • Foster additional participation & contributions from ecosystem 13
  • 15. Hortonworks Differentiation • Unmatched domain expertise − Delivered every major release of Apache Hadoop to date − Critical mass of committers • Community leadership role − Setting direction for core projects • Yahoo! commitment and backing − Access to 1,000+ Hadoop engineers, Yahoo! grid • Absolute dedication to Apache & open source − Focused on making Apache Hadoop the standard • Focus on delivering significant value to technology vendors − ISVs, OEMs, Systems Integrators and other service providers 14