Data Discovery, Visualization, and Apache Hadoop


Published on

In this webinar, we will discuss how Apache Hadoop works with your current infrastructure and how you can use data discovery and visualization tools to gain deeper insights from new data types stored in Hadoop and your existing data center investments.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples. ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases. Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data. Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data. Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella. Moreover, business data feeds and publicly available data sets are also “big data”. So we should not minimize our thinking to just data that flows through an organization. Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example. The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available. One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics applied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime! Anyhow, this is what Big Data means to me…hopefully it makes sense to you. It is important to note that we think of big data beyond the traditional concepts of volume, velocity and variety into transactions, interactions and observations. In reality, this IS the big data our customers are dealing with.
  • While overly simplistic, this graphic represents what we commonly see as a general data architecture: A set of data sources producing data A set of data systems to capture and store that data: most typically a mix of RDBMS and data warehouses A set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications. Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  • As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets). Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with: Existing applications – such as Tableau, SAS, Business Objects, etc, Existing databases and data warehouses for loading data to / from the data warehouse Development tools used for building custom applications Operational tools for managing and monitoring
  • It is for that reason that we focus on HDP interoperability across all of these categories: Data systems HDP is endorsed and embedded with SQL Server, Teradata and more BI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and more With Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with Hadoop For Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDP Operational tools Integration with System Center, and with Teradata viewpoint
  • So we ’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop. At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich. Refine captures the data into the platform and transforms (or refines it) into the desired formats. Explore is about creating laks of data that you can interactively surf through to find valuable insights. Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent. So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs. Let me cover these patterns in a little more detail.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the: Core Services Platform Services Data Services Operational Services Required by the Enterprise user. And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop. We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community. We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data Platform Given our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support you Our approach is also uniquely endorsed by some of the biggest vendors in the IT market Yahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo ’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the world Microsoft has partnered with Hortonworks to include HDP in: HDP for Windows, HDInsight Server, and HDInsight Service
  • Data Discovery, Visualization, and Apache Hadoop

    1. 1. Data Discovery, VisualizationData Discovery, Visualization and Apache Hadoopand Apache Hadoop An InformationWeek WebcastAn InformationWeek Webcast Sponsored bySponsored by
    2. 2. Webcast LogisticsWebcast Logistics
    3. 3. TodayToday’s Presenters’s Presenters Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
    4. 4. © Hortonworks Inc. 2012 Agenda • How Hadoop fits into the Modern Data Architecture • How it works with your existing data center infrastructure • Typical Hadoop patterns of use • The importance of data discovery for all business users • Get started with visual analytics software and Hadoop • Demo • Next Steps
    5. 5. Insert Poll 1 HEREInsert Poll 1 HERE
    6. 6. © Hortonworks Inc. 2012 Big Data: Changing The Game for Organizations Page 6 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMSSentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA
    7. 7. © Hortonworks Inc. 2013 Existing Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) BUILD & TEST Business Analytics Custom Applications Enterprise Applications Page 7
    8. 8. © Hortonworks Inc. 2013 Emerging Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) BUILD & TEST Business Analytics Custom Applications Enterprise Applications ENTERPRISE HADOOP PLATFORM Page 8
    9. 9. © Hortonworks Inc. 2013 Interoperating With Your Tools Page 9 TRADITIONAL REPOS Viewpoint Microsoft Applications HORTONWORKS DATA PLATFORM Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media)
    10. 10. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 10
    11. 11. © Hortonworks Inc. 2013 Business Cases of Hadoop Vertical Refine Explore Enrich Retail & Web • Log Analysis/Site Optimization • Social Network Analysis • Dynamic Pricing • Session & Content Optimization Retail • Loyalty Program Optimization • Brand and Sentiment Analysis • Dynamic Pricing/Targeted Offer Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries Finance • Risk Modeling & Fraud Identification • Trade Performance Analytics • Surveillance and Fraud Detection • Customer Risk Analysis • Real-time upsell, cross sales marketing offers Energy • Smart Grid: Production Optimization • Grid Failure Prevention • Smart Meters • Individual Power Grid Manufacturing • Supply Chain Optimization • Customer Churn Analysis • Dynamic Delivery • Replacement parts Healthcare & Payer • Electronic Medical Records (EMPI) • Clinical Trials Analysis • Insurance Premium Determination
    12. 12. © Hortonworks Inc. 2012 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 12 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Process and Access Data HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Hortonworks Data Platform (HDP) Enterprise Hadoop •The ONLY 100% open source and complete distribution •Enterprise grade, proven and tested at scale •Ecosystem endorsed to ensure interoperability Enterprise Readiness
    13. 13. © Hortonworks Inc. 2012 What We Do… • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop DevelopDevelop DistributeDistribute SupportSupport We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo
    14. 14. Insert Poll 2 HEREInsert Poll 2 HERE
    15. 15. DEMO
    16. 16. © Hortonworks Inc. 2013 Hortonworks Sandbox Fastest Onramp to Apache Hadoop • What is it – A free download of a virtualized single-node implementation of the enterprise-ready Hortonworks Data Platform – A personal Hadoop environment – An integrated learning environment with frequently, easily updatable hands-on step-by- step tutorials • What it does – Dramatically accelerates the process of learning Apache Hadoop – Accelerate and validates the use of Hadoop within your unique data architecture – Use your data to explore and investigate your use cases • ZERO to big data in 15 minutes • Get Started! Page 23 Download Hortonworks Sandbox Sign up for Training for in-depth learning
    17. 17. © Hortonworks Inc. 2013 Hadoop Summit 2013 • June 26-27, 2013- San Jose Convention Cntr • Co-hosted by Hortonworks & Yahoo! • Theme: Enabling the Next Generation Enterprise Data Platform • 90+ Sessions and 7 Tracks: • Community Focused Event – Sessions selected by a Conference Committee – Community Choice allowed public to vote for sessions they want to see • Training classes offered pre event – Apache Hadoop Essentials: A Technical Understanding for Business Users – Understanding Microsoft HDInsight and Apache Hadoop – Developing Solutions with Apache Hadoop – HDFS and MapReduce – Applying Data Science using Apache Hadoop Page 24
    18. 18. © Hortonworks Inc. 2012 Next Steps • Try Tableau on Hortonworks Sandbox! • Download Sandbox – • Download Tableau trial – • Visit Hortonworks blog on connecting Tableau to the Sandbox – hortonworks-sandbox/
    19. 19. Q&AQ&A Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
    20. 20. ResourcesResources To View This or Other Events On-Demand Please Visit: For more information please visit: