More Related Content Similar to Mrinal devadas, Hortonworks Making Sense Of Big Data Similar to Mrinal devadas, Hortonworks Making Sense Of Big Data (20) More from PatrickCrompton More from PatrickCrompton (11) Mrinal devadas, Hortonworks Making Sense Of Big Data1. © Hortonworks Inc. 2013
Hortonworks
Community Driven
Enterprise Apache Hadoop
Mrinal Devadas
Systems Architect
mdevadas@hortonworks.com
Page 1
2. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our Approach
• Patterns of Use
Page 2
3. © Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 3
2013
Focus on INNOVATION
2005: Yahoo! creates
team under E14 to
work on Hadoop
Focus on OPERATIONS
2008: Yahoo team extends focus to
operations to support multiple
projects & growing clusters
Yahoo! begins to
Operate at scale
Enterprise
Hadoop
Apache Project
Established
Hortonworks
Data Platform
2004 2008 2010 20122006
STABILITY
2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24
key Hadoop engineers from Yahoo
4. © Hortonworks Inc. 2013
Hortonworks Snapshot
Page 4
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data
Platform
• We engineer, test & certify
HDP for enterprise usage
• We employ the core
architects, builders and
operators of Apache Hadoop
• We drive innovation within
Apache Software
Foundation projects
• We are uniquely positioned
to deliver the highest quality
of Hadoop support
• We enable the ecosystem to
work better with Hadoop
Develop Distribute Support
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo
5. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-In: 100% Open Source
• Patterns of Use
Page 5
6. © Hortonworks Inc. 2013
Page 6
Apache Software Foundation
Guiding Principles
• Release early & often
• Transparency, respect, meritocracy
Key Roles held by Hortonworkers
• PMC Members
– Managing community projects
– Mentoring new incubator projects
– Over 20 Hortonworkers managing community
• Committers
– Authoring, reviewing & editing code
– Over 50 Hortonworkers across projects
• Release Managers
– Testing & releasing projects
– Hortonworkers across key projects like Hadoop,
Hive, Pig, HCatalog, Ambari, HBase
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Apache
Pig
Apache
HCatalo
g
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
“We have noticed more activity over the last year
from Hortonworks’ engineers on building out
Apache Hadoop’s more innovative features. These
include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
7. © Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 7
• Driving next generation Hadoop
– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006
– More than twice nearest contributor
• Deeply integrating w/ecosystem
– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions
– (ex. Teradata big data appliance)
• All Apache, NO holdbacks
– 100% of code contributed to Apache
8. © Hortonworks Inc. 2013
Driving Enterprise Hadoop Innovation
Page 8
Hortonworks
Committers
Cloudera
Committers
19 8
6 1
5 0
5 9
16 0
0% 20% 40% 60% 80% 100%
AMBARI
HBASE
HIVE/HCATAL
OG
PIG
HADOOP
CORE
Lines Of Code By Company
Source: Apache Software Fundation
Hortonworks Yahoo! Cloudera Other
9. © Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 9
Upstream Community Projects Downstream Enterprise Product
Hortonworks
Data Platform
Design &
Develop
Distribute
Integrate
& Test
Package
& Certify
Apache
HCatalo
g
Apache
Pig
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Stable Project
Releases
Fixed Issues
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s
more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
10. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring NO LOCK-IN: 100% Open Source
• Patterns of use
Page 10
11. © Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale
storage & processing
with enterprise-ready
platform services
Unique Focus Areas:
• Bigger, faster, more flexible
Continued focus on speed & scale and
enabling near-real-time apps
• Tested & certified at scale
Run ~1300 system tests on large Yahoo
clusters for every release
• Enterprise-ready services
High availability, disaster
recovery, snapshots, security, …
Page 11
HADOOP CORE
Hortonworkers are the
architects, operators, and builders of
core Hadoop
Distributed
Storage & Processing
PLATFORM SERVICES Enterprise Readiness
12. © Hortonworks Inc. 2013
Page 12
HADOOP CORE
DATA
SERVICES
Provide data services to
store, process & access
data in many ways
Unique Focus Areas:
• Apache HCatalog
Metadata services for consistent table
access to Hadoop data
• Apache Hive
Explore & process Hadoop data via SQL &
ODBC-compliant BI tools
Distributed
Storage & Processing
Hortonworks enables Hadoop data to be
accessed via existing tools & systems
Store, Proces
s and Access
Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
13. © Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 13
OPERATIONAL
SERVICES
Include complete
operational services for
productive operations
& management
Unique Focus Area:
• Apache Ambari:
Provision, manage & monitor a cluster;
complete REST APIs to integrate with
existing operational tools; job & task
visualizer to diagnose issues
Only Hortonworks provides a complete
open source Hadoop management tool
Manage &
Operate at
Scale
DATA
SERVICES
Store, Proces
s and Access
Data
HADOOP CORE
Distributed
Storage & Processing
PLATFORM SERVICES Enterprise Readiness
14. © Hortonworks Inc. 2013
OS Cloud VM Appliance
Page 14
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store, Proces
s and Access
Data
Enterprise Readiness
Only Hortonworks
allows you to deploy
seamlessly across any
deployment option
• Linux & Windows
• Azure, Rackspace & other clouds
• Virtual platforms
• Big data appliances
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Deployable Across a Range of Options
15. © Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 15
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store, Proces
s and Access
Data
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source
and complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to
ensure interoperability
Enterprise Readiness
16. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-in: 100% Open Source
• Patterns of use
Page 16
17. © Hortonworks Inc. 2013
Existing Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
Page 17
18. © Hortonworks Inc. 2013
Next-Generation Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
ENTERPRISE
HADOOP PLATFORM
Page 18
19. © Hortonworks Inc. 2013
Interoperating With Your Tools
Page 19
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HORTONWORKS
DATA PLATFORM
DATASOURCES
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
20. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-In: 100% Open Source
• Patterns of use
Page 20
21. © Hortonworks Inc. 2013
True Enterprise Class Open Source
• Community-driven Approach Mitigates Lock-In
–Identify & introduce enterprise requirements into public domain
–Work with community to advance & incubate open source projects
–Apply Enterprise Rigor for the most stable and reliable distribution
• 100% Open Source. No Holdbacks.
–Only true implementation of OSS Apache Hadoop
–Preferred by the software vendors that you rely on
–Proprietary Open Source = Lock-In
–Open communities always trump “open source”
• Flexible Deployment
–No License Fee for usage
Page 21
22. © Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
• Patterns of use
Page 22
23. © Hortonworks Inc. 2013
Big Data
Transactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKS
DATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 23
24. © Hortonworks Inc. 2013
Operational Data RefineryDATASYSTEMSDATASOURCES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore
Enric
h
2
APPLICATIONS
Transform & refine ALL
sources of data
Also known as Data
Reservoir or Catch Basin
TRADITIONAL REPOS
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Enterprise
Applications
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Page 24
HORTONWORKS
DATA PLATFORM
25. © Hortonworks Inc. 2013
Big Data Exploration & VisualizationDATASYSTEMSDATASOURCES
Refine Explore Enrich
APPLICATIONS
Leverage “data lake”
to perform iterative
investigation for value
3
2
TRADITIONAL REPOS
RDBMS EDW MPP
1
Business
Analytics
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Custom
Applications
Enterprise
Applications
1 Capture
Process
Explore & Visualize
2
3
Page 25
HORTONWORKS
DATA PLATFORM
26. © Hortonworks Inc. 2013
DATASYSTEMSDATASOURCES
Refine Explore Enrich
APPLICATIONS
Create intelligent
applications
Collect data, create
analytical models and
deliver to online apps
3
1
2
TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Custom
Applications
Enterprise
Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 26
Application Enrichment
HORTONWORKS
DATA PLATFORM
27. © Hortonworks Inc. 2013
Flexible Support Subscription Programs
Leverage Hortonworks Expertise: Subscription and Support delivered and
backed by Hadoop experts; subscriptions based on nodes or storage
Page 27
Developer Support
“How to” guidance for
developers and archs
Essential Support*
Operations support for
small research clusters
Standard Support
Operations support for
dev & test clusters
12 x 5
Web only
12 x 5
Web only
All Sev:
1 business day
All Sev:
1 business day
12 x 5
Web only
Application
Design Advice
Code Review
Cluster
Design, Install, Maintai
n, Performance
Cluster
Design, Install, Maintai
n, Performance
All Sev:
1 business day
1 seat
3
Contacts
3
Contacts
Patches &
Updates
Patches &
Updates
* Limited in size and no expansion
Enterprise Support
Operations support for
critical clusters
24 x 7
Phone &
Web
Sev 1: 1 Hour
Sev 2: 4 Bus Hour
Cluster
Design, Install, Maintai
n, Performance
5
Contacts
Patches &
Updates
Additional Options
28. © Hortonworks Inc. 2013
Hortonworks: Best In Class Hadoop Support
• Experienced enterprise support team
– Experience supporting enterprise clients in production
– Core engineers have real operational
experience: built and supported 44+K nodes in production
– Extensive experience in commercial big data offerings
including HDP, MapR, Karmasphere
• Global 24x7 operation – support based in Sunnyvale, UK & India
• Stringent case management processes ensures high quality customer
service & responsiveness
Page 28
29. © Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
The expert source for
Apache Hadoop training & certification
• World class training programs designed to
help you learn fast
– Role-based hands on classes with 50% lab time
• Expert consulting services
– Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program
– Fastest way to learn Apache Hadoop
– Multi-level tutorials for wide applicability
– Customizable and updateable
Page 29
30. © Hortonworks Inc. 2013
Introducing Hortonworks Data Platform for Windows
Enterprise Apache Hadoop
March 2013
Page 30
31. © Hortonworks Inc. 2013
Why Apache Hadoop on Windows?
• According to IDC Windows Server held 73% market share in 2012
– Hadoop was traditionally built for Linux servers so there are a large number of underserved
organizations
• According to 2012 Barclays CIO study big data outranks
virtualization as #1 trend driving spending initiatives
– Unstructured data growth exceeds 80% year/year in most enterprises
• Apache Hadoop is the defacto big data platform
for processing massive amounts of unstructured data
– Complementary to existing Microsoft technologies
– There is a huge untapped community of Windows developers and ecosystem partners
• A strong Microsoft-Hortonworks partnership and 18 months of
development makes this a natural next step
Page 31
32. © Hortonworks Inc. 2013
Hortonworks Data Platform for Windows
• Enterprise-grade Apache Hadoop on Windows
– Enables same experience for Hadoop on Windows & Linux
• More partners, more developers for Hadoop
– Makes native Apache Hadoop available to Windows ecosystem
– More options for Windows focused organizations
• Hortonworks focus: Enterprise Apache Hadoop for all platforms
– Trusted reliable production-ready distribution for on-premise Hadoop on Windows
deployments
• Built with joint investment and contributions from Microsoft
– Deep engineering relationship ensures tight integration and maximum performance
Page 32
HDP is the first and only distribution available on Windows & Linux
33. © Hortonworks Inc. 2013
Seamless Interoperability with Your Microsoft Tools
• Integrated with Microsoft tools
for native big data analysis
– Bi-directional connectors for SQL
Server and SQL Azure through SQOOP
– Excel ODBC integration through Hive
• Addressing demand for Hadoop
on Windows
– Ideal for Windows customers with
Hadoop operational experience
• Enables most common Hadoop
workloads in the Enterprise
– Data refinement and ETL offload for
high-volume data landing
– Data exploration for discovery of new
business opportunities
– Data enrichment for fined tuned delivery
and recommendation engines
Page 33
APPLICATIONSDATASYSTEMS
Microsoft Applications
HORTONWORKS
DATA PLATFORM
For Windows
DATASOURCES
MOBILE
DATA
OLTP, PO
S
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
34. © Hortonworks Inc. 2013
Inside HDP for Windows
Page 34
HORTONWORKS
DATA PLATFORM (HDP)
For Windows
Hortonworks
Data Platform (HDP)
For Windows
• 100% Open Source
Enterprise Hadoop
• Component and version
compatible with HDInsight
• Availability
• Beta release available now
PLATFORM SERVICES
HADOOP CORE
Distributed
Storage & ProcessingHDFS
WEBHDFS
MAP REDUCE
DATA
SERVICES
Store, Proces
s and Access
Data
HCATALOG
HIVEPIG
SQOOP
OPERATIONAL
SERVICES
Manage &
Operate at
ScaleOOZIE
35. © Hortonworks Inc. 2013
Maximize Your Hadoop Deployment Choice
• Use HDP for Windows for on-premises deployment on Windows Server
– Ideal for Windows users with Hadoop experience
– Perfect next step for those who are ready to move from POC to production
• Use HDInsight for Microsoft tooling and Management and Provisioning
– HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –
available in Preview today
– HDInsight Server for full integration of Hadoop with Microsoft tools on premises –
Developer Preview available today
• Full interoperability and deployment choice across platforms
– Implement big data applications that run on-premise & cloud
– By leveraging open source HDP, enables seamless interoperability across
environments: Linux, Windows, Windows Azure
Page 35
36. © Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more
–www.hortonworks.com
–http://hortonworks.com/hadoop-training/
Page 36
Editor's Notes In that capacity,Arun allows Hortonworks to be instrumental in working with the community to drive the roadmap for Core Hadoop, where the focus today is on things like YARN, MapReduce2, HDFS2 and more.For Core Hadoop, in absolute terms, Hortonworkers have contributed more than twice as many lines of code as the next closest contributor, and even more if you include Yahoo, our development partner. Taking such a prominent role also enables us to ensure that our distribution integrates deeply with the ecosystem: on both choice of deployment platforms such as Windows, Azure and more, but also to create deeply engineered solutions with key partners such as Teradata.And consistent with our approach, all of this is done in 100% open source.