State of the Union with Shaun Connolly

2,061 views

Published on

In 2012, we released Hortonworks Data Platform powered by Apache Hadoop and established partnerships with major enterprise software vendors including Microsoft and Teradata that are making enterprise ready Hadoop easier and faster to consume. As we start 2013, we invite you to join us for this live webinar where Shaun Connolly, VP of Strategy at Hortonworks, will cover the highlights of 2012 and the road ahead in 2013 for Hortonworks and Apache Hadoop.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,061
On SlideShare
0
From Embeds
0
Number of Embeds
127
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • State of the Union” Webinar Features Hortonworks Executive Delivering 2012 Year-in-Review, MappingOut Strategic Direction for 2013 and Highlighting Key Product Offerings PALO ALTO, Calif.—January 16, 2013—Hortonworks, a leading commercial vendor promoting the innovation, development and support of Apache Hadoop, today announced its “State of the Union and Vision for Apache Hadoop in 2013” webinar, taking place on Tuesday, January 22, 2013 at 1:00 p.m. ET. During the webinar, Vice President of Corporate Strategy Shaun Connolly will provide an overview of company highlights from 2012 as well as a strategic roadmap for Apache Hadoop in 2013.What: “Hortonworks State of the Union and Vision for Apache Hadoop in 2013” webinarWho: Shaun Connolly, vice president of corporate strategy, HortonworksWhen: Tuesday, January 22, 2013 at 1:00 p.m. ET/10:00 a.m. PT
  • I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  • I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  • I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  • At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop.We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community.We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data PlatformGiven our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support youOur approach is also uniquely endorsed by some of the biggest vendors in the IT marketYahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the worldMicrosoft has partnered with Hortonworks to include HDP in both their off-premise offering on Azure but also their on-premise offering under the product name HDInsight. This also includes integration with both Visual Studio for application development but also with System Center for operational management of the infrastructureTeradata includes HDP in their products in order to provide the broadest possible range of options for their customers
  • Eric and team created the Hadoop project as open source, and that is and always will be central to our approach. We believe strongly that the technology needs to be community driven and open source.In terms of open source mechanics, Apache Hadoop is governed by the Apache Software Foundation which provides structure to what inside a commercial software company would be a tightly governed process around the development, test and release process. When we think of Core Hadoop, the ASF has helped to manage this process for several years now.However as Hadoop has become more widely used, it has spawned a set of ancillary open source projects that introduce capabilities required for more mainstream use. These projects are generally classified as either being related to:“Data Services” – those that enable the Storage, Processing, and Accessing of data“Operational Services” – those that enable the management and operations of the infrastructureThe projects within these categories are run as independent projects with their own teams, and include some of the technologies you likely know of: Data Services include projects such as Hive, Pig, Hbase and Hcatalog, while Operational Services include Apache Ambari and more.Hortonworkers have always played a critical role in the development, test and release process for Core Apache Hadoop but also play leading roles in these ancillary projects that are required for enterprise usage. This includes every role from committer, release manager, and in many cases, the project leads. For example Arun Murthy is the project lead for Core Hadoop.Current Hortonworks PMC members by project:Hadoop:  Arun Murthy, Deveraj Das, EnisSoztutar, GiridharanKesavan, JitendraNathPandy, MahadevKonar, Matt Foley, Owen O'Malley, Sanjay Radia, Suresh Srinivas, Nicholas Sze, Vinod Kumar VavilapalliPig:  Daniel Dai, Alan Gates, GiridharanKesavan, AshutoshChauhan, Thejas NairHive:  AshutoshChauhanHBase:  NoneOozie:  Deveraj Das, Alan GatesSqoop:  NoneFlume:  NoneBigtop:  Alan Gates, Steve Loughran, Owen O'MalleyIncubator (not a Hadoop project but shows who's helping grow new projects in Apache):  Arun Murthy, Deveraj Das, Alan Gates, MahadevKonar, Steve Loughran, Owen O'Malley, EnisSoztutar
  • Eric and team created the Hadoop project as open source, and that is and always will be central to our approach. We believe strongly that the technology needs to be community driven and open source.In terms of open source mechanics, Apache Hadoop is governed by the Apache Software Foundation which provides structure to what inside a commercial software company would be a tightly governed process around the development, test and release process. When we think of Core Hadoop, the ASF has helped to manage this process for several years now.However as Hadoop has become more widely used, it has spawned a set of ancillary open source projects that introduce capabilities required for more mainstream use. These projects are generally classified as either being related to:“Data Services” – those that enable the Storage, Processing, and Accessing of data“Operational Services” – those that enable the management and operations of the infrastructureThe projects within these categories are run as independent projects with their own teams, and include some of the technologies you likely know of: Data Services include projects such as Hive, Pig, Hbase and Hcatalog, while Operational Services include Apache Ambari and more.Hortonworkers have always played a critical role in the development, test and release process for Core Apache Hadoop but also play leading roles in these ancillary projects that are required for enterprise usage. This includes every role from committer, release manager, and in many cases, the project leads. For example Arun Murthy is the project lead for Core Hadoop.Current Hortonworks PMC members by project:Hadoop:  Arun Murthy, Deveraj Das, EnisSoztutar, GiridharanKesavan, JitendraNathPandy, MahadevKonar, Matt Foley, Owen O'Malley, Sanjay Radia, Suresh Srinivas, Nicholas Sze, Vinod Kumar VavilapalliPig:  Daniel Dai, Alan Gates, GiridharanKesavan, AshutoshChauhan, Thejas NairHive:  AshutoshChauhanHBase:  NoneOozie:  Deveraj Das, Alan GatesSqoop:  NoneFlume:  NoneBigtop:  Alan Gates, Steve Loughran, Owen O'MalleyIncubator (not a Hadoop project but shows who's helping grow new projects in Apache):  Arun Murthy, Deveraj Das, Alan Gates, MahadevKonar, Steve Loughran, Owen O'Malley, EnisSoztutar
  • Eric and team created the Hadoop project as open source, and that is and always will be central to our approach. We believe strongly that the technology needs to be community driven and open source.In terms of open source mechanics, Apache Hadoop is governed by the Apache Software Foundation which provides structure to what inside a commercial software company would be a tightly governed process around the development, test and release process. When we think of Core Hadoop, the ASF has helped to manage this process for several years now.However as Hadoop has become more widely used, it has spawned a set of ancillary open source projects that introduce capabilities required for more mainstream use. These projects are generally classified as either being related to:“Data Services” – those that enable the Storage, Processing, and Accessing of data“Operational Services” – those that enable the management and operations of the infrastructureThe projects within these categories are run as independent projects with their own teams, and include some of the technologies you likely know of: Data Services include projects such as Hive, Pig, Hbase and Hcatalog, while Operational Services include Apache Ambari and more.Hortonworkers have always played a critical role in the development, test and release process for Core Apache Hadoop but also play leading roles in these ancillary projects that are required for enterprise usage. This includes every role from committer, release manager, and in many cases, the project leads. For example Arun Murthy is the project lead for Core Hadoop.Current Hortonworks PMC members by project:Hadoop:  Arun Murthy, Deveraj Das, EnisSoztutar, GiridharanKesavan, JitendraNathPandy, MahadevKonar, Matt Foley, Owen O'Malley, Sanjay Radia, Suresh Srinivas, Nicholas Sze, Vinod Kumar VavilapalliPig:  Daniel Dai, Alan Gates, GiridharanKesavan, AshutoshChauhan, Thejas NairHive:  AshutoshChauhanHBase:  NoneOozie:  Deveraj Das, Alan GatesSqoop:  NoneFlume:  NoneBigtop:  Alan Gates, Steve Loughran, Owen O'MalleyIncubator (not a Hadoop project but shows who's helping grow new projects in Apache):  Arun Murthy, Deveraj Das, Alan Gates, MahadevKonar, Steve Loughran, Owen O'Malley, EnisSoztutar
  • So how does this get brought together into our distribution? It is really pretty straightforward, but also very unique:We start with this group of open source projects that I described and that we are continually driving in the OSS community. [CLICK] We then package the appropriate versions of those open source projects, integrate and test them using a full suite, including all the IP for regression testing contributed by Yahoo, and [CLICK] contribute back all of the bug fixes to the open source tree. From there, we package and certify a distribution in the from of the Hortonworks Data Platform (HDP) that includes both Hadoop Core as well as the related projects required by the Enterprise user, and provide to our customers.Through this application of Enterprise Software development process to the open source projects, the result is a 100% open source distribution that has been packaged, tested and certified by Hortonworks. It is also 100% in sync with the open source trees.
  • So how does this get brought together into our distribution? It is really pretty straightforward, but also very unique:We start with this group of open source projects that I described and that we are continually driving in the OSS community. [CLICK] We then package the appropriate versions of those open source projects, integrate and test them using a full suite, including all the IP for regression testing contributed by Yahoo, and [CLICK] contribute back all of the bug fixes to the open source tree. From there, we package and certify a distribution in the from of the Hortonworks Data Platform (HDP) that includes both Hadoop Core as well as the related projects required by the Enterprise user, and provide to our customers.Through this application of Enterprise Software development process to the open source projects, the result is a 100% open source distribution that has been packaged, tested and certified by Hortonworks. It is also 100% in sync with the open source trees.
  • So how does this get brought together into our distribution? It is really pretty straightforward, but also very unique:We start with this group of open source projects that I described and that we are continually driving in the OSS community. [CLICK] We then package the appropriate versions of those open source projects, integrate and test them using a full suite, including all the IP for regression testing contributed by Yahoo, and [CLICK] contribute back all of the bug fixes to the open source tree. From there, we package and certify a distribution in the from of the Hortonworks Data Platform (HDP) that includes both Hadoop Core as well as the related projects required by the Enterprise user, and provide to our customers.Through this application of Enterprise Software development process to the open source projects, the result is a 100% open source distribution that has been packaged, tested and certified by Hortonworks. It is also 100% in sync with the open source trees.
  • HDP tracks closely to Apache project releasesCDH forks early and patches CDH distributions off to the side of the Apache community projects resulting in unnecessary drift and risk of lock-inThe “+923.423” and the “+541” parts of the version numbers represent how many patches these components have drifted away from corresponding Apache projects.While some drift can be expected, patches and changes that are in the order of hundreds results in lock-in and actually eliminates the virtuous cycle that upstream community should help drive.
  • We are believers in open source: for us, we believe it is the most efficient way to develop enterprise softwareBut more importantly, we believe that 100% open source is the best approach for our customers. And in particular in the data management market, our customers are acutely aware of the implication of growing their database usage with a proprietary vendor who then can exert pricing pressure (Oracle).Particularly when it comes to data storage, which we can all anticipate will continue to grow exponentially, you don’t want to be penalized for scale. By choosing an open source approach organizations can build their operational processes on open technologies, without concern that they will be locked in to a particular vendor. And they can be confident that as their usage grows, they can choose from flexible pricing alternatives – by node or by storage – that aligns best to their needs.It is ultimately about mitigating risk, and in this regard open source has been proven as the safest approach. I would also caution you to look beyond the open source label used by some vendors: are they harvesting open source work, forking the code and then working independently (“fork early / patch often”)? Or like Hortonworks, have they embraced and committed to the community open source approach which will allow them to stay in sync with the innovation of the community? In the Hadoop community, Hortonworks is unquestioned in taking the community-driven approach.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • While overly simplistic, this graphic represents what we commonly see as a general data architecture:A set of data sources producing dataA set of data systems to capture and store that data: most typically a mix of RDBMS and data warehousesA set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications.Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  • As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
  • It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
  • SQL-H integration between Aster and Hadoop nodes:Analytics and discovery capabilities provided by Aster nodes while retaining archival data on HadoopHigh PerformanceHigh performance due to parallel BYNETconnections amongst the Aster nodes and Hadoop nodes Management InterfaceSeamless integration with Horton Management ConsoleTeradata Viewpoint integration by March 2013Troubleshooting and SupportabilitySeamless integration with Teradata Vital Infrastructure (TVI) for log collection and support case resolution Out-of-the-box ExperiencePre-tuned parameters for HDFS and MapReduce infrastructure
  • Enterprise Reports – Your cell phone bill is an exampleDashboard – KPI trackingParameterized Reports – What are the hot prospects in my region?Visualization – Visual exploration of dataData Mining – Large scale data processing and extraction usually fed to other toolsHow?Improve Latency & ThroughputQuery engine improvementsNew “Optimized RCFile” column storeNext-gen runtime (elim’s M/R latency)Extend Deep Analytical AbilityAnalytics functionsImproved SQL coverageContinued focus on core Hive use cases
  • We are believers in open source: for us, we believe it is the most efficient way to develop enterprise softwareBut more importantly, we believe that 100% open source is the best approach for our customers. And in particular in the data management market, our customers are acutely aware of the implication of growing their database usage with a proprietary vendor who then can exert pricing pressure (Oracle).Particularly when it comes to data storage, which we can all anticipate will continue to grow exponentially, you don’t want to be penalized for scale. By choosing an open source approach organizations can build their operational processes on open technologies, without concern that they will be locked in to a particular vendor. And they can be confident that as their usage grows, they won’t get penalized for success.It is ultimately about mitigating risk, and in this regard open source has been proven as the safest approach. I would also caution you to look beyond the open source label used by some vendors: are they harvesting open source work, forking the code and then working independently (“fork early / patch often”)? Or like Hortonworks, have they embraced and committed to the community open source approach which will allow them to stay in sync with the innovation of the community? In the Hadoop community, Hortonworks is unquestioned in taking the community-driven approach.
  • State of the Union with Shaun Connolly

    1. 1. Hortonworks“State of the Union”Shaun Connolly, VP Strategy@shaunconnolly, @hortonworksJanuary 22, 2013© Hortonworks Inc. 2013 Page 1
    2. 2. Quick House Keeping Rule• Q&A panel is available if you have any questions during the webinar• There will be time for Q&A at the end• We will record the webinar for future viewing• All attendees will receive a cope of the slides an recording Page 2 © Hortonworks Inc. 2013
    3. 3. Hortonworks• History of Apache Hadoop & Hortonworks’ Role – Genesis of Apache Hadoop – Role of Apache Software Foundation – Hortonworks Process for “Enterprise Hadoop”• Key Areas of Focus in 2012• The Road Ahead for Enterprise Hadoop Page 3 © Hortonworks Inc. 2013
    4. 4. A Brief History of Apache Hadoop Apache Project Yahoo! Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop Page 4 © Hortonworks Inc. 2013
    5. 5. A Brief History of Apache Hadoop Apache Project Yahoo! Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop 2007: Yahoo team extends focus to operations to support multiple Focus on OPERATIONS projects & growing clusters Page 5 © Hortonworks Inc. 2013
    6. 6. A Brief History of Apache Hadoop Apache Project Yahoo! Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop 2007: Yahoo team extends focus to operations to support multiple Focus on OPERATIONS projects & growing clusters 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 STABILITY key Hadoop engineers from Yahoo Page 6 © Hortonworks Inc. 2013
    7. 7. Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Headquarters: Palo Alto, CA Employees: 180+ and growing Enterprise Hadoop distribution Investors: Benchmark, Index, YahooDevelop Distribute Support• We employ the core • We distribute the only 100% • We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data• We drive innovation within Platform • We enable the ecosystem to Apache Software work better with Hadoop Foundation projects • We engineer, test & certify HDP for enterprise usageEndorsed by Strategic Partners Page 7 © Hortonworks Inc. 2013
    8. 8. Apache Community Leadership Apache Software Foundation Test & Guiding Principles Patch Release Apache • Release early & often Hadoop • Transparency, respect, meritocracy Design & Develop“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” - Jeff Kelly: Wikibon Page 8 © Hortonworks Inc. 2013
    9. 9. Apache Community Leadership Apache Apache Software Foundation Pig Test & Guiding Principles Patch Release Apache • Release early & often Hadoop Apache • Transparency, respect, meritocracy Hive Design & Develop Apache Apache HCatalo HBase g Apache Other Ambari Apache Projects“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” - Jeff Kelly: Wikibon Page 9 © Hortonworks Inc. 2013
    10. 10. Apache Community Leadership Apache Apache Software Foundation Pig Test & Guiding Principles Patch Release Apache • Release early & often Hadoop Apache • Transparency, respect, meritocracy Hive Design & Develop Apache Key Roles held by Hortonworkers Apache HBase HCatalo g • PMC Members – Managing community projects Apache Ambari – Mentoring new incubator projects Other Apache – About 20 Hortonworkers managing community Projects • Committers – Authoring, reviewing & editing code – About 50 Hortonworkers across projects“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These • Release Managers include YARN, Ambari and HCatalog..” – Testing & releasing projects – Hortonworkers across key projects like - Jeff Kelly: Wikibon Hadoop, Hive, Pig, HCatalog, Ambari, HBase Page 10 © Hortonworks Inc. 2013
    11. 11. Hortonworks Process for Enterprise HadoopUpstream Community Projects Downstream Enterprise Product Apache Pig Test & Patch Release Apache Hadoop Apache Hortonworks Hive Design & Develop Data Platform Apache Apache HCatalo HBase g Apache Other Ambari Apache Projects Page 11 © Hortonworks Inc. 2013
    12. 12. Hortonworks Process for Enterprise HadoopUpstream Community Projects Downstream Enterprise Product Integrate & Test Apache Design & Pig Test & Patch Develop Apache Release Package Hadoop & Certify Apache Hortonworks Hive Design & Develop Data Platform Apache Apache HCatalo HBase g Distribute Apache Other Ambari Apache Projects Page 12 © Hortonworks Inc. 2013
    13. 13. Hortonworks Process for Enterprise HadoopUpstream Community Projects Downstream Enterprise Product Virtuous Cycle: development & fixed issues done upstream & stable project releases flow downstream Integrate & Test Fixed Issues Apache Design & Pig Test & Patch Develop Apache Release Package Hadoop & Certify Apache Stable Project Hortonworks Hive Releases Design & Develop Data Platform Apache Apache HCatalo HBase g Distribute Apache Other Ambari Apache Projects No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Page 13 © Hortonworks Inc. 2013
    14. 14. HDP Certifies Latest Stable Components Apache HDP CDH CDH Project 1.2 3u5 4.1.2 Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541 Pig 0.10.1 0.8.1 +51.39 0.10.0 +48 Hive 0.10.0 0.7.1 +42.56 0.9.0 +148 HCatalog 0.5.0 n/a n/a HBase 0.94.2 0.90.6 +84.73 0.92.1 +154 Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51 Oozie 3.2.0 3.2.0 3.2.0 Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25 Ambari 1.2.0 n/a n/a Flume 1.3.0 0.9.4 +25.46 1.2.0 +119 Mahout 0.7.0 0.5 +9.7 0.7 +4 Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf Page 14 © Hortonworks Inc. 2013
    15. 15. True Enterprise Class Open Source• 100% Open Source. No Holdbacks. – Only true implementation of OSS Apache Hadoop – Preferred by the software vendors that you rely on• Flexible Deployment – No License Fee for usage• Community Open Source Mitigates Lock-In – Proprietary Open Source = Lock-In – Open communities always trump “open source” Page 15 © Hortonworks Inc. 2013
    16. 16. Hortonworks• History of Apache Hadoop & Hortonworks’ Role• Key Areas of Focus in 2012 – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem• The Road Ahead for Enterprise Hadoop Page 16 © Hortonworks Inc. 2013
    17. 17. HDP: Enterprise Hadoop Distribution Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN Enterprise Readiness: PLATFORM SERVICES HA, DR, Snapshots, Security • Enterprise grade, proven and ,… tested at scale • Ecosystem endorsed to ensure interoperability Page 17 © Hortonworks Inc. 2013
    18. 18. HDP: Enterprise Hadoop Distribution DATA Hortonworks SERVICES Data Platform (HDP) FLUME Store, Proces PIG HIVE s and Access HBASE Enterprise Hadoop SQOOP Data HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN Enterprise Readiness: PLATFORM SERVICES HA, DR, Snapshots, Security • Enterprise grade, proven and ,… tested at scale • Ecosystem endorsed to ensure interoperability Page 18 © Hortonworks Inc. 2013
    19. 19. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, Proces PIG HIVE Operate at s and Access HBASE Enterprise Hadoop Scale SQOOP Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN PLATFORM SERVICES Enterprise Readiness: HA, DR, Snapshots, Security, … • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability Page 19 © Hortonworks Inc. 2013
    20. 20. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, Proces PIG HIVE Operate at s and Access HBASE Enterprise Hadoop Scale SQOOP Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN Enterprise Readiness: PLATFORM SERVICES HA, DR, Snapshots, Security • Enterprise grade, proven and ,… tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability Page 20 © Hortonworks Inc. 2013
    21. 21. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & AMBARI FLUME Store, Proces PIG HIVE Operate at s and Access HBASE Enterprise Hadoop Scale SQOOP Data OOZIE HCATALOG • The ONLY 100% open source WEBHDFS Distributed MAP REDUCE and complete distribution HADOOP CORE Storage & Processing (in 2.0) HDFS YARN Enterprise Readiness: PLATFORM SERVICES HA, DR, Snapshots, Security • Enterprise grade, proven and ,… tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability OS Cloud VM Appliance Page 21 © Hortonworks Inc. 2013
    22. 22. Latest Hortonworks AnnouncementsTwo releases in January 2013 JANUARY Hortonworks Data Platform 1.2 Hortonworks Brings Enterprise Manageability to 100% Open Source Apache Hadoop Distribution 15 JANUARY Hortonworks Sandbox Hortonworks accelerates Hadoop skills development with an easy-to-use, flexible and extensible platform to 22 learn, evaluate and use Apache Hadoop Page 22 © Hortonworks Inc. 2013
    23. 23. HDP 1.2 SummaryHortonworks Data Platform 1.2HDP outpaces the competition to extend leadership through 100%open source Enterprise Apache HadoopFocus areas: • Ambari: continued innovation with a complete, free and open cluster management tool • Provision, Manage and Monitor your Hadoop infrastructure • Job diagnostics, usage heat maps • Ecosystem integration • Enhanced security model for Hive and HCatalog • Performance and operational enhancements for HBase • Extended Full Stack HA to Hive & HCatalog Metastore Page 23 © Hortonworks Inc. 2013
    24. 24. HDP 1.2: Ambari Key Features • Job Diagnostics Visualize and troubleshoot Hadoop job execution and performance • Cluster History View historical job execution & performance • Instant Insight View health of Core Hadoop (HDFS, MapReduce) and related projects • Cluster Navigation “Quick link” buttons jump into namenode web UI for a server • REST interface provides external access to Ambari for existing tools. FacilitatesApache Ambari Dashboard integration with Microsoft System Center and Teradata Viewpoint Page 24 © Hortonworks Inc. 2013
    25. 25. 0 to Big Data in 15 MinutesHands on tutorials integrated into HDP environment for Sandbox evaluation Page 25 © Hortonworks Inc. 2013
    26. 26. Hortonworks• History of Apache Hadoop & Hortonworks’ Role• Key Areas of Focus in 2012 – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem• The Road Ahead for Enterprise Hadoop Page 26 © Hortonworks Inc. 2013
    27. 27. Traditional Data ArchitectureAPPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TESTDATA SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR RDBMS EDW MPP TRADITIONAL REPOSDATA SOURCES Traditional Sources OLTP, PO(RDBMS, OLTP, OLAP) S SYSTEMS Page 27 © Hortonworks Inc. 2013
    28. 28. Next-Generation Data ArchitectureAPPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TESTDATA SYSTEMS OPERATIONAL TOOLS HORTONWORKS MANAGE & DATA PLATFORM MONITOR RDBMS EDW MPP TRADITIONAL REPOSDATA SOURCES Traditional Sources New Sources OLTP, PO(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE S DATA SYSTEMS Page 28 © Hortonworks Inc. 2013
    29. 29. Interoperating With Your ToolsAPPLICATIONS Microsoft Applications DEV & DATA TOOLSDATA SYSTEMS OPERATIONAL TOOLS HORTONWORKS DATA PLATFORM TRADITIONAL REPOS ViewpointDATA SOURCES Traditional Sources New Sources OLTP, PO(RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE S DATA SYSTEMS Page 29 © Hortonworks Inc. 2013
    30. 30. Hortonworks & Teradata• Unified Data Architecture – The right technology on the right analytical problems using best of breed technologies • Viewpoint Integration – Common management console for Aster, Teradata and Apache Hadoop • TVI: Teradata Vital Infrastructure – Proactive reliability, availability, and manageability support service • Aster Connector for Hadoop – SQL-H integration • Teradata Connector for Hadoop – Sqoop integration • Pre-tuned HDFS and HORTONWORKS MapReduce parameters for Big DISTRIBUTION PLATFORM Data workloads Big Data Management Page 30 © Hortonworks Inc. 2013
    31. 31. Hortonworks & Microsoft Microsoft Brings Big Data to the MassesHDInsight • Simplifies Hadoop, Enterprise Ready • Hortonworks Data Platform used for Hadoop on Windows Server and Azure+ • An engineered, open source solution – Hadoop engineered for Windows• Excel – Hadoop powered Microsoft business tools• PowerPivot (BI) – Ops integration with MS System Center• PowerView – Bidirectional connectors for SQL Server (visualization) – Support for Hyper-V, deploy Hadoop on VMs – Opens the .NET developer community to Hadoop – Deploy on Azure in 10 minutes Page 31 © Hortonworks Inc. 2013
    32. 32. Hortonworks• History of Apache Hadoop & Hortonworks’ Role• Key Areas of Focus in 2012• The Road Ahead for Enterprise Hadoop – Patterns of Use – Key Areas of Investment Page 32 © Hortonworks Inc. 2013
    33. 33. Market Transitioning into Early Majority Enterprise adoption accelerates via: • Repeatable horizontal patterns of use relative %customers • Ecosystem-driven pull market • Vertical applications (aka bowling pins) The CHASM Innovators, t Early Early Late Laggards, Ske echnology adopters, majority, pr majority, conse ptics enthusiasts visionaries agmatists rvatives time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 33 © Hortonworks Inc. 2013
    34. 34. Patterns of Use: “Right-time Access” Business Case Batch Interactive Online Refine Explore Enrich HORTONWORKS DATA PLATFORM Big Data Transactions, Interactions, Observations Page 34 © Hortonworks Inc. 2013
    35. 35. Being Big Data Driven at Neustar Create new business opportunities and save money with information analytics• Provides real-time information and analysis to • Traditional business heavy in data capture and data Internet, telecommunic movement. ations, entertainment and marketing – Aggregate data for industries as information exchange industries throughout – For instance they used to store 1% of DNS data for 60 days the world. to bill customers and identify DDOS attacks – With Hadoop• Started off focused on they now store 100% over a year # porting for carriers – Not economically feasible to use existing DW for new data• 2500+ Employees • Eliminated politics with creation of “catch basin” – Year 1: Use Hadoop to capture everything they used to throw away while leaving existing systems in tact – Year 2: Make this data available for new business opportunities, but require the business to justify Page 35 © Hortonworks Inc. 2013
    36. 36. Customers Don’t Want More Data Silos AVOID: Systems separated by GOAL: Platform that nativelyworkload type due to contention supports mixed workloads Batch Interactive OnlineRefine Explore Enrich Refine Explore EnrichBig Big Big Big DataData Data Data Transactions, Interactions, Observations Page 36 © Hortonworks Inc. 2013
    37. 37. 2013 “Enterprise Hadoop” Initiatives Invest In: OPERATIONA DATA L SERVICES SERVICES HADOOP CORE PLATFORM SERVICES HORTONWORKS DATA PLATFORM (HDP) Page 37 © Hortonworks Inc. 2013
    38. 38. 2013 “Enterprise Hadoop” Initiatives Invest In: –Platform Services OPERATIONA DATA L SERVICES SERVICES HADOOP CORE PLATFORM SERVICES HORTONWORKS DATA PLATFORM (HDP) “Continuum” Biz Continuity Page 38 © Hortonworks Inc. 2013
    39. 39. 2013 “Enterprise Hadoop” Initiatives Invest In: Hive / “Stinger” Interactive Query –Platform Services HBase Online Data –Data Services OPERATIONA DATA L SERVICES SERVICES HADOOP CORE PLATFORM SERVICES “Herd” – HORTONWORKS DATA PLATFORM (HDP) Data Integration “Continuum” Biz Continuity Page 39 © Hortonworks Inc. 2013
    40. 40. 2013 “Enterprise Hadoop” Initiatives Invest In: Hive / “Stinger” Interactive Query –Platform Services Ambari HBaseManage & Operate Online Data –Data Services OPERATIONA DATA L SERVICES SERVICES HADOOP CORE PLATFORM SERVICES “Herd” –Operational Services “Knox” HORTONWORKS Secure Access DATA PLATFORM (HDP) Data Integration “Continuum” Biz Continuity Page 40 © Hortonworks Inc. 2013
    41. 41. Top BI Vendors Support Hive Today Page 41 © Hortonworks Inc. 2013
    42. 42. Stinger: Enhance Hive for BI Use Cases Enterprise Reports Parameterized Reports Dashboard / Scorecard Data Mining Visualization More SQL & Better Performance Batch Interactive Page 42 © Hortonworks Inc. 2013
    43. 43. Our Focus Remains Unchanged• Innovate Core Hadoop – Lead innovation within the Apache Hadoop community• Enhance Hadoop for Enterprise Class Usage – Add platform, data, and operational services that enterprises need – Apply enterprise software rigor to test & release process• Enable the Data Ecosystem – Leverage Hadoop to enable Partners to be successful• All Open Source, All the Time – Avoid proprietary open source which locks you in Page 43 © Hortonworks Inc. 2013
    44. 44. Next Steps Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform www.hortonworks.com/download Register for Enterprise Hadoop Series www.hortonworks.com/webinars Follow… @shaunconnolly, @hortonworks Page 44 © Hortonworks Inc. 2013
    45. 45. Power of Community is Key Amsterdam San Jose, CA March 20 - 21, 2013 June 26 - 27, 2013 REGISTER NOW CALL FOR PAPERShttp://hadoopsummit.org/amsterdam/register/ http://hadoopsummit.org/san-jose/call-for-papers/ © Hortonworks Inc. 2013
    46. 46. Next Steps Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform www.hortonworks.com/download Register for Enterprise Hadoop Series www.hortonworks.com/webinars Follow… @shaunconnolly, @hortonworks Page 46 © Hortonworks Inc. 2013
    47. 47. Questions?@shaunconnolly, @hortonworks Page 47 © Hortonworks Inc. 2013

    ×