0
Data Discovery, VisualizationData Discovery, Visualization
and Apache Hadoopand Apache Hadoop
An InformationWeek WebcastAn...
Webcast LogisticsWebcast Logistics
TodayToday’s Presenters’s Presenters
Ted J. Wasserman
Product Manager
Tableau Software
John Kreisa
VP Strategic Marketing
...
© Hortonworks Inc. 2012
Agenda
• How Hadoop fits into the Modern Data Architecture
• How it works with your existing data ...
Insert Poll 1 HEREInsert Poll 1 HERE
© Hortonworks Inc. 2012
Big Data: Changing The Game for Organizations
Page 6
Megabytes
Gigabytes
Terabytes
Petabytes
Purch...
© Hortonworks Inc. 2013
Existing Data Architecture
TRADITIONAL REPOS
RDBMS EDW MPP
OLTP,
POS
SYSTEMS
MANAGE
&
MONITOR
Trad...
© Hortonworks Inc. 2013
Emerging Data Architecture
TRADITIONAL REPOS
RDBMS EDW MPP
OLTP,
POS
SYSTEMS
MANAGE
&
MONITOR
Trad...
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 9
TRADITIONAL REPOS Viewpoint
Microsoft Applications
HORTONWOR...
© Hortonworks Inc. 2013
Big Data
Transactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HOR...
© Hortonworks Inc. 2013
Business Cases of Hadoop
Vertical Refine Explore Enrich
Retail & Web
• Log Analysis/Site
Optimizat...
© Hortonworks Inc. 2012
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 12
PLATFORM SERVICES
HADOOP CORE
DA...
© Hortonworks Inc. 2012
What We Do…
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks ...
Insert Poll 2 HEREInsert Poll 2 HERE
DEMO
© Hortonworks Inc. 2013
Hortonworks Sandbox
Fastest Onramp to Apache Hadoop
• What is it
– A free download of a virtualize...
© Hortonworks Inc. 2013
Hadoop Summit 2013
• June 26-27, 2013- San Jose Convention Cntr
• Co-hosted by Hortonworks & Yahoo...
© Hortonworks Inc. 2012
Next Steps
• Try Tableau on Hortonworks Sandbox!
• Download Sandbox
– Hortonworks.com/sandbox
• Do...
Q&AQ&A
Ted J. Wasserman
Product Manager
Tableau Software
John Kreisa
VP Strategic Marketing
Hortonworks
Lenny Liebmann
Con...
ResourcesResources
To View This or Other Events On-Demand Please Visit:
http://www.informationweek.com/events
http://www.n...
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
Upcoming SlideShare
Loading in...5
×

Data Discovery, Visualization, and Apache Hadoop

4,264

Published on

In this webinar, we will discuss how Apache Hadoop works with your current infrastructure and how you can use data discovery and visualization tools to gain deeper insights from new data types stored in Hadoop and your existing data center investments.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,264
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
205
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples. ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases. Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data. Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data. Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella. Moreover, business data feeds and publicly available data sets are also “big data”. So we should not minimize our thinking to just data that flows through an organization. Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example. The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available. One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics applied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime! Anyhow, this is what Big Data means to me…hopefully it makes sense to you. It is important to note that we think of big data beyond the traditional concepts of volume, velocity and variety into transactions, interactions and observations. In reality, this IS the big data our customers are dealing with.
  • While overly simplistic, this graphic represents what we commonly see as a general data architecture: A set of data sources producing data A set of data systems to capture and store that data: most typically a mix of RDBMS and data warehouses A set of applications that leverage the data stored in those data systems. These could be package BI applications (Business Objects, Tableau, etc), Enterprise Applications (e.g. SAP) or Custom Applications (e.g. custom web applications), ranging from ad-hoc reporting tools to mission-critical enterprise operations applications. Your environment is undoubtedly more complicated, but conceptually it is likely similar.
  • As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets). Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with: Existing applications – such as Tableau, SAS, Business Objects, etc, Existing databases and data warehouses for loading data to / from the data warehouse Development tools used for building custom applications Operational tools for managing and monitoring
  • It is for that reason that we focus on HDP interoperability across all of these categories: Data systems HDP is endorsed and embedded with SQL Server, Teradata and more BI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and more With Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with Hadoop For Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDP Operational tools Integration with System Center, and with Teradata viewpoint
  • So we ’ve covered the overall architecture and how Hadoop fits, let’s discuss the patterns of use that we’re seeing for using Hadoop. At a high level, we describe the 3 key patterns of use as Refine, Explore, and Enrich. Refine captures the data into the platform and transforms (or refines it) into the desired formats. Explore is about creating laks of data that you can interactively surf through to find valuable insights. Enrich is about leveraging analytics and models to influence your online applications, making them more intelligent. So while some categorize Hadoop as just a Batch platform, it is increasingly being used and evolving to serve a wide range of usage patterns that span Batch, Interactive, and Online needs. Let me cover these patterns in a little more detail.
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the: Core Services Platform Services Data Services Operational Services Required by the Enterprise user. And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop. We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community. We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data Platform Given our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support you Our approach is also uniquely endorsed by some of the biggest vendors in the IT market Yahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo ’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the world Microsoft has partnered with Hortonworks to include HDP in: HDP for Windows, HDInsight Server, and HDInsight Service
  • Transcript of "Data Discovery, Visualization, and Apache Hadoop"

    1. 1. Data Discovery, VisualizationData Discovery, Visualization and Apache Hadoopand Apache Hadoop An InformationWeek WebcastAn InformationWeek Webcast Sponsored bySponsored by
    2. 2. Webcast LogisticsWebcast Logistics
    3. 3. TodayToday’s Presenters’s Presenters Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
    4. 4. © Hortonworks Inc. 2012 Agenda • How Hadoop fits into the Modern Data Architecture • How it works with your existing data center infrastructure • Typical Hadoop patterns of use • The importance of data discovery for all business users • Get started with visual analytics software and Hadoop • Demo • Next Steps
    5. 5. Insert Poll 1 HEREInsert Poll 1 HERE
    6. 6. © Hortonworks Inc. 2012 Big Data: Changing The Game for Organizations Page 6 Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record ERP CRM WEB BIG DATA Offer details Support Contacts Customer Touches Segmentation Web logs Offer history A/B testing Dynamic Pricing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels User Generated Content Mobile Web SMS/MMSSentiment External Demographics HD Video, Audio, Images Speech to Text Product/Service Logs Social Interactions & Feeds Business Data Feeds User Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Increasing Data Variety and Complexity Transactions + Interactions + Observations = BIG DATA
    7. 7. © Hortonworks Inc. 2013 Existing Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) BUILD & TEST Business Analytics Custom Applications Enterprise Applications Page 7
    8. 8. © Hortonworks Inc. 2013 Emerging Data Architecture TRADITIONAL REPOS RDBMS EDW MPP OLTP, POS SYSTEMS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) BUILD & TEST Business Analytics Custom Applications Enterprise Applications ENTERPRISE HADOOP PLATFORM Page 8
    9. 9. © Hortonworks Inc. 2013 Interoperating With Your Tools Page 9 TRADITIONAL REPOS Viewpoint Microsoft Applications HORTONWORKS DATA PLATFORM Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media)
    10. 10. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 10
    11. 11. © Hortonworks Inc. 2013 Business Cases of Hadoop Vertical Refine Explore Enrich Retail & Web • Log Analysis/Site Optimization • Social Network Analysis • Dynamic Pricing • Session & Content Optimization Retail • Loyalty Program Optimization • Brand and Sentiment Analysis • Dynamic Pricing/Targeted Offer Intelligence • Threat Identification • Person of Interest Discovery • Cross Jurisdiction Queries Finance • Risk Modeling & Fraud Identification • Trade Performance Analytics • Surveillance and Fraud Detection • Customer Risk Analysis • Real-time upsell, cross sales marketing offers Energy • Smart Grid: Production Optimization • Grid Failure Prevention • Smart Meters • Individual Power Grid Manufacturing • Supply Chain Optimization • Customer Churn Analysis • Dynamic Delivery • Replacement parts Healthcare & Payer • Electronic Medical Records (EMPI) • Clinical Trials Analysis • Insurance Premium Determination
    12. 12. © Hortonworks Inc. 2012 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 12 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Process and Access Data HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Hortonworks Data Platform (HDP) Enterprise Hadoop •The ONLY 100% open source and complete distribution •Enterprise grade, proven and tested at scale •Ecosystem endorsed to ensure interoperability Enterprise Readiness
    13. 13. © Hortonworks Inc. 2012 What We Do… • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop DevelopDevelop DistributeDistribute SupportSupport We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo
    14. 14. Insert Poll 2 HEREInsert Poll 2 HERE
    15. 15. DEMO
    16. 16. © Hortonworks Inc. 2013 Hortonworks Sandbox Fastest Onramp to Apache Hadoop • What is it – A free download of a virtualized single-node implementation of the enterprise-ready Hortonworks Data Platform – A personal Hadoop environment – An integrated learning environment with frequently, easily updatable hands-on step-by- step tutorials • What it does – Dramatically accelerates the process of learning Apache Hadoop – Accelerate and validates the use of Hadoop within your unique data architecture – Use your data to explore and investigate your use cases • ZERO to big data in 15 minutes • Get Started! Page 23 Download Hortonworks Sandbox www.hortonworks.com/sandbox Sign up for Training for in-depth learning hortonworks.com/hadoop-training/
    17. 17. © Hortonworks Inc. 2013 Hadoop Summit 2013 • June 26-27, 2013- San Jose Convention Cntr • Co-hosted by Hortonworks & Yahoo! • Theme: Enabling the Next Generation Enterprise Data Platform • 90+ Sessions and 7 Tracks: • Community Focused Event – Sessions selected by a Conference Committee – Community Choice allowed public to vote for sessions they want to see • Training classes offered pre event – Apache Hadoop Essentials: A Technical Understanding for Business Users – Understanding Microsoft HDInsight and Apache Hadoop – Developing Solutions with Apache Hadoop – HDFS and MapReduce – Applying Data Science using Apache Hadoop Page 24 hadoopsummit.org
    18. 18. © Hortonworks Inc. 2012 Next Steps • Try Tableau on Hortonworks Sandbox! • Download Sandbox – Hortonworks.com/sandbox • Download Tableau trial – Tableausoftware.com/trial • Visit Hortonworks blog on connecting Tableau to the Sandbox – http://hortonworks.com/kb/how-to-connect-tableau-to- hortonworks-sandbox/
    19. 19. Q&AQ&A Ted J. Wasserman Product Manager Tableau Software John Kreisa VP Strategic Marketing Hortonworks Lenny Liebmann Contributing Editor InformationWeek
    20. 20. ResourcesResources To View This or Other Events On-Demand Please Visit: http://www.informationweek.com/events http://www.netseminar.com For more information please visit: http://hortonworks.com/products/hortonworks-sandbox/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×