2012 06 hortonworks paris hug
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,660
On Slideshare
1,856
From Embeds
804
Number of Embeds
11

Actions

Shares
Downloads
31
Comments
0
Likes
1

Embeds 804

http://hugfrance.fr 501
http://blog.xebia.fr 259
http://flavors.me 16
http://fr.flavors.me 11
http://webcache.googleusercontent.com 5
http://pt.flavors.me 5
http://jp.flavors.me 3
http://translate.googleusercontent.com 1
http://es.flavors.me 1
http://127.0.0.1 1
http://cloud.feedly.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Life used to be simple and very transactional in natureEarly 90’s, ERP: transactions count your sales by customer by locationLate 90’s – the age of segmentation and targeted offers. Merge customer operations with marketingNow, life is more complex, connected, and interactional in nature!Digital marketing enables measurement of interactions across channelsSocial networks, mobile commerce, and user-generated content increases the TYPES and VOLUMES of data which is generated by system:system communication and data exhaust from customer behavior like click-streamAnd big data is just beginning – we don’t even list all the sensors, telematics, and other machine-generated data which is predicted to eclipse even that which is generated by social networks
  • Facts that bolster this vision include: 80% - 90% of the world’s data is unstructured or semi-structured (Forrester, IDC, Gartner all agree) Data volumes have increased exponentially over the past decade and are continuing to do so (IDC, McKinsey reports) Hadoop is uniquely designed to store and process this type of data…at scale…across commodity systems The major server and storage platform vendors are all creating Hadoop-focused strategies
  • Apache Hadoop LeadershipSanjay RadiaHDFS Core Lead Architect, 4+ years on Hadoop. Major projects include Append v2, Capacity scheduler, Federation and HA.Owen O’MalleyThe Leading committer of code to Hadoop. 5+ years on Hadoop. Original Hadoop architect at Yahoo!. Drove the implementation of Security throughout the project. Arun MurthyOriginal MapReduce Lead. 5+ years on Hadoop. Currently lead architect and Release manager of Apache Hadoop .23.Matt FoleyRelease manager of Apache Hadoop .20.205. Former Director of Engineering for Yahoo! Mail, now running Hortonworks’ Quality and Release efforts.Deveraj DasBuilt the original MapReduce development team at Yahoo!. 5+ years on Hadoop. Now leading up the Apache Ambari (Hadoop Management) project.Alan GatesLead of Pig and HCatalog. 3+ years on Hadoop.
  • Infrastructure Platform (Servers, Storage, Network, Operating System, Virtualization, Cloud)Systems Management (Installation, Configuration, Administration, Monitoring, Performance, Security Mgmt, Capacity Mgmt, Quality of Service)Data Management Systems (SQL, NoSQL, NewSQL, EDW, Datamarts, MPP DBs, Search, Indexing, MDM, etc.)Data Movement & Integration (ETL, Data Quality, Integration Middleware, Event Processing)Tools & Languages (IDEs, Programming Languages, other tools)Business Intelligence & Analytics (Analytics, Reporting, Visualization, and Dashboards)Applications & Solutions (SaaS offerings, bundled solutions, etc.)
  • Infrastructure Platform (Servers, Storage, Network, Operating System, Virtualization, Cloud)Systems Management (Installation, Configuration, Administration, Monitoring, Performance, Security Mgmt, Capacity Mgmt, Quality of Service)Data Management Systems (SQL, NoSQL, NewSQL, EDW, Datamarts, MPP DBs, Search, Indexing, MDM, etc.)Data Movement & Integration (ETL, Data Quality, Integration Middleware, Event Processing)Tools & Languages (IDEs, Programming Languages, other tools)Business Intelligence & Analytics (Analytics, Reporting, Visualization, and Dashboards)Applications & Solutions (SaaS offerings, bundled solutions, etc.)
  • In the graphic above, Apache Hadoop acts as the Big Data Refinery. It’s great at storing, aggregating, and transforming multi-structured data into more useful and valuable formats.Apache Hive is a Hadoop-related component that fits within the Business Intelligence & Analytics category since it is commonly used for querying and analyzing data within Hadoop in a SQL-like manner. Apache Hadoop can also be integrated with other EDW, MPP, and NewSQL components such as Teradata, Aster Data, HP Vertica, IBM Netezza, EMC Greenplum, SAP Hana, Microsoft SQL Server PDW and many others.Apache HBase is a Hadoop-related NoSQL Key/Value store that is commonly used for building highly responsive next-generation applications. Apache Hadoop can also be integrated with other SQL, NoSQL, and NewSQL technologies such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM DB2, MongoDB, DynamoDB, MarkLogic, Riak, Redis, Neo4J, Terracotta, GemFire, SQLFire, VoltDB and many others.Finally, data movement and integration technologies help ensure data flows seamlessly between the systems in the above diagrams; the lines in the graphic are powered by technologies such as WebHDFS, Apache HCatalog, Apache Sqoop, Talend Open Studio for Big Data, Informatica, Pentaho, SnapLogic, Splunk, Attunity and many others.
  • At the highest level, I describe three broad areas of data processing and outline how these areas interconnect.The three areas are:1.Business Transactions & Interactions2. Business Intelligence & Analytics3. Big Data RefineryThe graphic illustrates a vision for how these three types of systems can interconnect in ways aimed at deriving maximum value from all forms of data.Enterprise IT has been connecting systems via classic ETL processing, as illustrated in Step 1 above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions.The “Big Data Refinery”, as highlighted in Step 2, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers.More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.The Big Data Refinery platform provides fertile ground for new types of tools and data processing workloads to emerge in support of rich multi-level data refinement solutions.With that as backdrop, Step 3 takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions & Interactions and Business Intelligence & Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.By directly integrating the Big Data Refinery with existing Business Intelligence & Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to more accurately understand the customer behaviors that lead to the transactions.Moreover, systems focused on Business Transactions & Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in Steps 4 and 5. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.
  • “Node" means a Server or Virtual Machine capable of running the Software. “Server” means a single hardware system capable of running the Software. A hardware partition or blade is considered a separate hardware system.“Virtual Machine" means a software container that can run its own operating system and execute applications like a physical machine.“Cluster” means two or more Nodes that are interconnected for the purposes of executing application programs and sharing data.“Storage” means the total available storage space, also known as raw capacity, within the cluster
  • I want to be careful with how we present services….they do want people to come onsite for extended engagements

Transcript

  • 1. HortonworksEnabling Apache Hadoop topower next-generation enterprise data architecturesJune 2012© Hortonworks Inc. 2012 Page 1
  • 2. Topics• Big Data Market Overview• Hortonworks Company & Strategy Overview• Hortonworks Offerings – Hortonworks Data Platform Subscriptions – Public & On-site Training – Expert Short-term Consulting Services Page 2 © Hortonworks Inc. 2012
  • 3. Big Data = Transactions + Interactions + Observations BIG DATA Sensors / RFID / Devices User Generated ContentPetabytes Mobile Web Sentiment Social Interactions & Feeds User Click Stream Spatial & GPS Coordinates Web logs WEB A/B testing Terabytes External Demographics Offer history Dynamic Pricing Business Data Feeds Affiliate Networks CRM HD Video, Audio, Images Gigabytes Segmentation Search Marketing Offer details Speech to Text ERP Behavioral Targeting Purchase detail Customer Touches Product/Service Logs Megabytes Purchase record Support Contacts Dynamic Funnels Payment record SMS/MMS Increasing Data Variety and Complexity Source: Contents of above graphic created in partnership with Teradata, Inc. Page 3 © Hortonworks Inc. 2012
  • 4. What is Apache Hadoop?• Collection of Open Source Projects One of the best examples of – Apache Software Foundation (ASF) open source driving innovation – Loosely coupled, ship early/often and creating a market • Foundation for Big Data Solutions – Stores petabytes of data reliably – Hadoop Distributed File System – Runs highly distributed computations – Hadoop MapReduce framework – Enables a rational economics model – Commodity servers & storage – Powers data-driven business Page 4 © Hortonworks Inc. 2012
  • 5. 7 Key Drivers for Hadoop Business Pressure1 Opportunity to enable innovative new business models2 Potential new insights that drive competitive advantage Technical Pressure3 Data collected and stored continues to grow exponentially4 Data is increasingly everywhere and in many formats5 Traditional solutions not designed for new requirements Financial Pressure6 Cost of data systems, as % of IT spend, continues to grow7 Cost advantages of commodity hardware & open source Page 5 © Hortonworks Inc. 2012
  • 6. 3 Phases of Hadoop Adoption Educate/Evaluate Initial Production Wide-scale ProductionTimeline 1 - 12 months 9 - 24 months 18 - 36 monthsStage Awareness, adoption and Departmental production Enterprise wide production proof of enterprise viability usage usageDescription See it -> Learn it -> Do it Single business use case, Multiple use cases, Evaluation, exploration, focused solution architecture broader solution architecture POCs, Dev & Admin trainingKey Questions What are the potential use Can the solution enable How can the solution be cases? Which one should I future business models? leveraged enterprise-wide? focus on? Am I maximizing the value What is required to enable, How do I get value now? from the chosen use case? integrate, operate at scale? Where does Hadoop fit in How does this solution What does our next- my data architecture? interact within our generation data architecture departmental data look like? Can I leverage my existing architecture? tools/platforms? How can I maximize access How do I operationalize the to data while minimizing Can I replace any of my solution? risk? existing systems? Page 6 © Hortonworks Inc. 2012
  • 7. What’s Needed to Accelerate Adoption?• Enterprise tooling to become a complete data platform – Open deployment & provisioning – Higher quality data loading – Monitoring and management – APIs for easy and efficient integration• Ecosystem support & development – Existing infrastructure vendors need to continue to integrate – Apps need to continue to be developed on this infrastructure• Market to rally around core Apache Hadoop – To avoid splintering/market fragmentation – To accelerate adoption Page 7 © Hortonworks Inc. 2012
  • 8. Topics• Big Data Market Overview• Hortonworks Company & Strategy Overview• Hortonworks Offerings – Hortonworks Data Platform Subscriptions – Public & On-site Training – Expert Architectural Services Page 8 © Hortonworks Inc. 2012
  • 9. Hortonworks Vision & Role We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop.1 Make Hadoop easy to use and consume2 Make Hadoop an enterprise-viable data platform3 Provide open APIs and data services4 Enable ecosystem at each layer of the data stack5 Be stewards of the core and innovators on the edges Page 9 © Hortonworks Inc. 2012
  • 10. Hortonworks Strategy• Lead within Hadoop Community – Team has delivered every major Hadoop release since 0.1 – Experience managing world’s largest deployment – Ongoing access to Y!’s 1,000+ users and 40k+ nodes for testing, QA, etc.• Embrace & Enable Hadoop Ecosystem – 100% open source software – Full lifecycle support subscriptions – Expert role-based training – Enable solution architectures Page 10 © Hortonworks Inc. 2012
  • 11. Enable Hadoop to be Next-Gen Data Platform Enable the ecosystem at each layer Make Hadoop easy to use/consume • Usability Applications & Solutions • Ease of Installation Tools & Languages Make Hadoop ent viable platform BI & Analytics Data Management Systems Installation & ConfigurationData Movement & Integration Administration Infrastructure Platform Hortonworks Monitoring Data Platform Data Extract & Load Load and process data Enterprise data services Provide open APIs and data services Page 12 © Hortonworks Inc. 2012
  • 12. Next-Generation Data Architecture Audio, Web, Mobile, CRM, Video, ERP, SCM, …Images Business Transactions Docs, Text, & Interactions XML Web Logs, Clicks Big Data SQL NoSQL NewSQLSocial, G Refineryraph, Fe eds EDW MPP NewSQLSensors,Devices, RFID BusinessSpatial, Intelligence GPS Apache Hadoop & AnalyticsEvents, Other Dashboards, Reports, Visualization, … Page 14 © Hortonworks Inc. 2012
  • 13. Maximizing the Value from ALL of your Data Audio, Retain runtime models and Video,Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big DataSocial, G Refinery Classicraph, Fe 3 Share refined data and 1 eds ETL runtime models processingSensors, 2Devices, RFID Store, aggregate, and transform BusinessSpatial, multi-structured GPS data to unlock Intelligence value & Analytics Retain historicalEvents, data to unlock 5 Other additional value Dashboards, Reports, Visualization, … Page 15 © Hortonworks Inc. 2012
  • 14. Topics• Big Data Market Overview• Hortonworks Company & Strategy Overview• Hortonworks Offerings – Hortonworks Data Platform Subscriptions – Public & On-site Training – Expert Short-term Consulting Services Page 16 © Hortonworks Inc. 2012
  • 15. Balancing Innovation & Stability• Apache: Be aggressive - ship early and often – Projects need to keep innovating and visibly improve – Aim for big improvements on trunk – Make early buggy releases• Hortonworks: Be predictable - ship when stable – We need to ship stable, working releases – Make packaged binary releases available – We need to do regular sustaining engineering releases – QA for stable Hadoop releases – HDP quarterly release trains sweep in stable Apache projects – Enables HDP to stay reasonably current and predictable while minimizing risk of thrashing that coordinating large # of Apache projects can cause Page 17 © Hortonworks Inc. 2012
  • 16. Hadoop Now, Next, and Beyond Apache community, including Hortonworks investing to improve Hadoop: • Make Hadoop an open, extensible, and enterprise viable platform • Enable more applications to run on Apache Hadoop “Hadoop.Beyond” Integrate w/ecosystem “Hadoop.Next” (Hadoop 2.x) HDP 2 “Hadoop.Now” Next-gen MapReduce & HDFS (Hadoop 1.0) HDP 1Most stable Hadoop ever Page 18 © Hortonworks Inc. 2012
  • 17. Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apache Hadoop• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times• Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 19 © Hortonworks Inc. 2012
  • 18. Cluster Subscriptions Starter Standard Enterprise Per Cluster Per Cluster Unit 3 month 20 Nodes w/ 250TB of Storage 20 Nodes w/ 250TB of Storage (Compute or Storage Expansion) (Compute or Storage Expansion)Supported Hortonworks Data Platform (HDP) and patches and updates for HDP. Software Software acquired via Hortonworks website and Cluster Subscriptions. Cluster operators can interact with the expert Hortonworks support staff during the proof-of-concept, staging and deployment phases. We Support: Configuration and installation questions, explanation of routine Support maintenance, analysis of performance issues, diagnosis of system or application Coverage issues and any bug fixes or patches that may be necessary. We Don’t Support: Production issues with customer code, end-to-end debugging of customer code, development of customer code, 3rd-party products used during development and deployment. Web, Monday to Friday, Web, Monday to Friday, Access 6am to 6pm PT 6am to 6pm PT Web and Phone, 24 x 7 Incidents Unlimited Unlimited Unlimited Priority 1: 1 HourResponse Business Day Business Day Priority 2: 4 Hours Priority 3: 8 Hours / Biz Day Page 20 © Hortonworks Inc. 2012
  • 19. Developer Subscription Developer Price Per Developer Hortonworks Data Platform (HDP) and patches and updates for HDP. Software acquired via Hortonworks website and Cluster Subscriptions.Supported Software Software acquired via Hortonworks website, Cluster Subscriptions, or Virtual/Cloud Sandbox environments. Developers can interact with the expert Hortonworks support staff to receive guidance on the use of the software and answers for “how-to” questions. We Support: Design advice, performance tuning advice, code snippet review and Support advice, problem diagnosis, bug reports, and other development related questions. Coverage We Dont Support: Production issues with customer code, end-to-end debugging of customer code, development of customer code, 3rd-party products used during development and deployment. Access Web, Monday to Friday, 6am to 6pm PT Incidents UnlimitedResponse 4 Hours / Business Day Page 21 © Hortonworks Inc. 2012
  • 20. Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (hortonworks.com/training)• Comprehensive certification programs• Customized, on-site courses available Page 22 © Hortonworks Inc. 2012
  • 21. Hortonworks Architectural Services• Services team dedicated to Hadoop Architecture and Optimization – Extensive cluster experience from smaller <100 clusters to the largest in the world – Recognized technical experts on Hadoop• We work closely with the technical teams to understand the business need and use case – Translate the needs and use cases to technical requirements – Callout other considerations based on our extensive knowledge for growing and expanding clusters• Designed for short-term high-impact knowledge transfer and assist – Complement internal technical team and SI Page 23 © Hortonworks Inc. 2012
  • 22. Thank You!Questions & Answers Page 24 © Hortonworks Inc. 2012