SlideShare a Scribd company logo
1 of 16
Hortonworks
Devashish B.
Consultant / Seed InfoTech
9844621164
A glimpse into the Future of Hadoop & Big Data
About Hortonworks
• Mission: Revolutionize and commoditize the storage and
processing of big data via open source
• Vision: Half of the world’s data will be stored in Apache
Hadoop within five years
• Strategy: Grow the Apache Hadoop Ecosystem by making
Apache Hadoop easier to consume, profit by providing
training, support and certification
 An independent company
 Focused on making Apache Hadoop great
 Hold nothing back, Apache Hadoop will be complete
1
Credentials
• Technical: key architects and committers from Yahoo! Hadoop
engineering team
− Highest concentration of Apache Hadoop committers
− Contributed >70% of the code in Hadoop, Pig and ZooKeeper
− Delivered every major/stable Apache Hadoop release since 0.1
− History of drivingExperience managing world’s largest deployment innovation
across entire Apache Hadoop stack
• Business operations: team of highly successful open source veterans
− Led by Rob Bearden, former COO of SpringSource & JBoss
• Investors: backed by Benchmark Capital and Yahoo!
− Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay
2
Hortonworks and Yahoo!
• Yahoo! is a development partner
−Leverage large Yahoo! development, testing & operations team
 More than 1,000 active & sophisticated users of Apache Hadoop
 Access to the Yahoo! grid for testing large workloads
 Only organization that has delivered a stable release of Apache Hadoop
−Yahoo will continue to contribute Apache Hadoop code too!
• Yahoo! is a customer
−Hortonworks provides level 3 support and training to Yahoo!
−Yahoo deploys Apache Hadoop releases across its 42,000 grid
• Yahoo! is an investor
3
Current State of Adoption
Vendor Ecosystem
Adoption
Enterprise Adoption
• Early adopters
• Technology is hard to install,
manage & use
• Technology lacks enterprise
robustness
• Requires significant
investment in technical staff
or consulting
• Hard to find & hire
experienced developer &
operations talent
• Early in vendor adoption
lifecycle
• Hadoop is hard to integrate
and extend
• Hard to find & hire
experienced developer &
operations talent
Technology & Knowledge
Gaps Prevent Apache
Hadoop from Reaching Full
Potential
Customers are asking their
vendors for help with
Hadoop!
“We’re seeing Hadoop in all
of our fortune 2000 data
accounts”
4
Hortonworks Role & Opportunity
Vendor Ecosystem
Adoption
Enterprise
Adoption
Bridge the Gap!
Grow Market
Sell training and
support via
Partners
Fundamental shift in enterprise data architecture strategy
• Apache Hadoop becomes standard for managing new types & scale of data
• New applications & solutions will be created to leverage data in Apache Hadoop
• Creates massive big data technology and services opportunity for ecosystem
5
Hortonworks Objectives
• Make Apache Hadoop projects easier
to install, manage & use
− Regular sustaining releases
− Compiled code for each project (e.g. RPMs)
− Testing at scale
• Make Apache Hadoop more robust
− Performance gains
− High availability
− Administration & monitoring
• Make Apache Hadoop easier to
integrate & extend
− Open APIs for extension & experimentation
All done within Apache
Hadoop community
• Develop collaboratively
with community
• Complete transparency
• All code contributed
back to Apache
Anyone should be able to easily deploy the Hadoop projects directly from Apache
6
Technology Roadmap
Phase 1 – Making Apache Hadoop Accessible
• Release the most stable version of Hadoop ever
• Release directly usable code via Apache (RPMs, .debs…)
• Frequent sustaining releases off of the stable branches
2011
Phase 2 – Next Generation Apache Hadoop
• Address key product gaps (Hbase support, HA, Management…)
• Enable community & partner innovation via modular architecture &
open APIs
• Work with community to define integrated stack
2012
(Alphas starting
Oct 2011)
7
Phase 2 - Next Generation Apache Hadoop
• Core
− HDFS Federation
− Next Gen MapReduce
− New Write Pipeline (HBase support)
− HA (no SPOF) and Wire compatibility
• Data - HCatalog 0.3
− Pig, Hive, MapReduce and Streaming as clients
− HDFS and HBase as storage systems
− Performance and storage improvements
• Management & Ease of use
− All components fully tested and deployable as a stack
− Stack installation and centralized config management
− REST and GUI for user tasks
8
Phase 2 – Core - MapReduce
Client
Client
Resource
Manager
Zookeeper
(No SPOF)
Application Master
Application Worker
Compute Machine
MapReduce App
MapReduce App
MPI App
• Complete rewrite of the resource management layer
• Performance and Scale improvements
• 6,000+ nodes / 100,000 concurrent tasks
• Supports better availability and fail-over
• Supports new frameworks beyond MapReduce
9
Phase 2 – Core – HDFS Federation
• Multiple independent Namenodes and Namespace Volumes in a cluster
− Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support
− Client side mount tables for Global Namespace
• Block storage as a generic shared storage service
− DataNodes store blocks for all Namespace volumes – no partitioning
− Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage
Datanode 1 Datanode 2 Datanode m
... ... ...
NS1
Foreig
n NS n
... ...
NS k
B
a
l
a
n
c
e
r
Block Pools
Pool nPool kPool 1
NN-1 NN-k NN-n
Common Storage
NamespaceBlockstorage
10
• Limitations of HDFS write pipeline in 0.20
−Broken Flush, Sync, Append
−Node failures can cause data loss for slow writers
• Hadoop.Next
−Flush, Sync, and Append support
−New replicas are added dynamically on failures
Phase 2 – Core – HDFS Write Pipeline
DNDN DN
Client
Flush Ack
11
Phase 2 – Data – HCatalog
HDFS HBase
HCatalog
Map
Reduce
Pig Hive Streaming
= Phase 1
= Phase 2
• Shared schema and data model
• Data can be shared between tool users
• Data located by table rather than file
• Clients independent of storage details
• format, compression, …
• Only one adaptor for new formats
• not one per tool
• Notifications when new data is
available
12
Hortonworks Value
For Enterprises
• Make Apache Hadoop
easier to consume
• Extend to broader
developer audience
• Foster vibrant
technology and
services ecosystem
• Access to
Hortonworks’
technical expertise
For Vendors
• Create larger market
for Apache Hadoop
technology and
services
• Simplify process for
supporting Hadoop
• Access to
Hortonworks’
technical expertise
For
Community
• Ensure Apache
Hadoop remains
unified and strong
• Expand value provided
by core Apache
projects
• Foster additional
participation &
contributions from
ecosystem
13
Hortonworks Differentiation
• Unmatched domain expertise
− Delivered every major release of Apache Hadoop to date
− Critical mass of committers
• Community leadership role
− Setting direction for core projects
• Yahoo! commitment and backing
− Access to 1,000+ Hadoop engineers, Yahoo! grid
• Absolute dedication to Apache & open source
− Focused on making Apache Hadoop the standard
• Focus on delivering significant value to technology vendors
− ISVs, OEMs, Systems Integrators and other service providers
14
Thank You.

More Related Content

What's hot

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQpivotalny
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to HadoopHortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Hortonworks
 

What's hot (20)

5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 
Hadoop pycon2011uk
Hadoop pycon2011ukHadoop pycon2011uk
Hadoop pycon2011uk
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to Hadoop
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Presentation
PresentationPresentation
Presentation
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 

Viewers also liked

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2Wes Floyd
 
HCatalog & Templeton
HCatalog & TempletonHCatalog & Templeton
HCatalog & TempletonDaegeun Kim
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesThanigai Vellore
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 

Viewers also liked (7)

Sql saturday pig session (wes floyd) v2
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
 
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalogJan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
 
HCatalog & Templeton
HCatalog & TempletonHCatalog & Templeton
HCatalog & Templeton
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 

Similar to A glimpse into the Future of Hadoop & Big Data

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoopKnowledgehut
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdfManoel Ribeiro
 

Similar to A glimpse into the Future of Hadoop & Big Data (20)

201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big data analytics_using_hadoop
Big data analytics_using_hadoopBig data analytics_using_hadoop
Big data analytics_using_hadoop
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

A glimpse into the Future of Hadoop & Big Data

  • 1. Hortonworks Devashish B. Consultant / Seed InfoTech 9844621164 A glimpse into the Future of Hadoop & Big Data
  • 2. About Hortonworks • Mission: Revolutionize and commoditize the storage and processing of big data via open source • Vision: Half of the world’s data will be stored in Apache Hadoop within five years • Strategy: Grow the Apache Hadoop Ecosystem by making Apache Hadoop easier to consume, profit by providing training, support and certification  An independent company  Focused on making Apache Hadoop great  Hold nothing back, Apache Hadoop will be complete 1
  • 3. Credentials • Technical: key architects and committers from Yahoo! Hadoop engineering team − Highest concentration of Apache Hadoop committers − Contributed >70% of the code in Hadoop, Pig and ZooKeeper − Delivered every major/stable Apache Hadoop release since 0.1 − History of drivingExperience managing world’s largest deployment innovation across entire Apache Hadoop stack • Business operations: team of highly successful open source veterans − Led by Rob Bearden, former COO of SpringSource & JBoss • Investors: backed by Benchmark Capital and Yahoo! − Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay 2
  • 4. Hortonworks and Yahoo! • Yahoo! is a development partner −Leverage large Yahoo! development, testing & operations team  More than 1,000 active & sophisticated users of Apache Hadoop  Access to the Yahoo! grid for testing large workloads  Only organization that has delivered a stable release of Apache Hadoop −Yahoo will continue to contribute Apache Hadoop code too! • Yahoo! is a customer −Hortonworks provides level 3 support and training to Yahoo! −Yahoo deploys Apache Hadoop releases across its 42,000 grid • Yahoo! is an investor 3
  • 5. Current State of Adoption Vendor Ecosystem Adoption Enterprise Adoption • Early adopters • Technology is hard to install, manage & use • Technology lacks enterprise robustness • Requires significant investment in technical staff or consulting • Hard to find & hire experienced developer & operations talent • Early in vendor adoption lifecycle • Hadoop is hard to integrate and extend • Hard to find & hire experienced developer & operations talent Technology & Knowledge Gaps Prevent Apache Hadoop from Reaching Full Potential Customers are asking their vendors for help with Hadoop! “We’re seeing Hadoop in all of our fortune 2000 data accounts” 4
  • 6. Hortonworks Role & Opportunity Vendor Ecosystem Adoption Enterprise Adoption Bridge the Gap! Grow Market Sell training and support via Partners Fundamental shift in enterprise data architecture strategy • Apache Hadoop becomes standard for managing new types & scale of data • New applications & solutions will be created to leverage data in Apache Hadoop • Creates massive big data technology and services opportunity for ecosystem 5
  • 7. Hortonworks Objectives • Make Apache Hadoop projects easier to install, manage & use − Regular sustaining releases − Compiled code for each project (e.g. RPMs) − Testing at scale • Make Apache Hadoop more robust − Performance gains − High availability − Administration & monitoring • Make Apache Hadoop easier to integrate & extend − Open APIs for extension & experimentation All done within Apache Hadoop community • Develop collaboratively with community • Complete transparency • All code contributed back to Apache Anyone should be able to easily deploy the Hadoop projects directly from Apache 6
  • 8. Technology Roadmap Phase 1 – Making Apache Hadoop Accessible • Release the most stable version of Hadoop ever • Release directly usable code via Apache (RPMs, .debs…) • Frequent sustaining releases off of the stable branches 2011 Phase 2 – Next Generation Apache Hadoop • Address key product gaps (Hbase support, HA, Management…) • Enable community & partner innovation via modular architecture & open APIs • Work with community to define integrated stack 2012 (Alphas starting Oct 2011) 7
  • 9. Phase 2 - Next Generation Apache Hadoop • Core − HDFS Federation − Next Gen MapReduce − New Write Pipeline (HBase support) − HA (no SPOF) and Wire compatibility • Data - HCatalog 0.3 − Pig, Hive, MapReduce and Streaming as clients − HDFS and HBase as storage systems − Performance and storage improvements • Management & Ease of use − All components fully tested and deployable as a stack − Stack installation and centralized config management − REST and GUI for user tasks 8
  • 10. Phase 2 – Core - MapReduce Client Client Resource Manager Zookeeper (No SPOF) Application Master Application Worker Compute Machine MapReduce App MapReduce App MPI App • Complete rewrite of the resource management layer • Performance and Scale improvements • 6,000+ nodes / 100,000 concurrent tasks • Supports better availability and fail-over • Supports new frameworks beyond MapReduce 9
  • 11. Phase 2 – Core – HDFS Federation • Multiple independent Namenodes and Namespace Volumes in a cluster − Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support − Client side mount tables for Global Namespace • Block storage as a generic shared storage service − DataNodes store blocks for all Namespace volumes – no partitioning − Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage Datanode 1 Datanode 2 Datanode m ... ... ... NS1 Foreig n NS n ... ... NS k B a l a n c e r Block Pools Pool nPool kPool 1 NN-1 NN-k NN-n Common Storage NamespaceBlockstorage 10
  • 12. • Limitations of HDFS write pipeline in 0.20 −Broken Flush, Sync, Append −Node failures can cause data loss for slow writers • Hadoop.Next −Flush, Sync, and Append support −New replicas are added dynamically on failures Phase 2 – Core – HDFS Write Pipeline DNDN DN Client Flush Ack 11
  • 13. Phase 2 – Data – HCatalog HDFS HBase HCatalog Map Reduce Pig Hive Streaming = Phase 1 = Phase 2 • Shared schema and data model • Data can be shared between tool users • Data located by table rather than file • Clients independent of storage details • format, compression, … • Only one adaptor for new formats • not one per tool • Notifications when new data is available 12
  • 14. Hortonworks Value For Enterprises • Make Apache Hadoop easier to consume • Extend to broader developer audience • Foster vibrant technology and services ecosystem • Access to Hortonworks’ technical expertise For Vendors • Create larger market for Apache Hadoop technology and services • Simplify process for supporting Hadoop • Access to Hortonworks’ technical expertise For Community • Ensure Apache Hadoop remains unified and strong • Expand value provided by core Apache projects • Foster additional participation & contributions from ecosystem 13
  • 15. Hortonworks Differentiation • Unmatched domain expertise − Delivered every major release of Apache Hadoop to date − Critical mass of committers • Community leadership role − Setting direction for core projects • Yahoo! commitment and backing − Access to 1,000+ Hadoop engineers, Yahoo! grid • Absolute dedication to Apache & open source − Focused on making Apache Hadoop the standard • Focus on delivering significant value to technology vendors − ISVs, OEMs, Systems Integrators and other service providers 14