• Save
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Big Data, Hadoop, Hortonworks and Microsoft HDInsight

on

  • 3,629 views

Big Data is everywhere. And at the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of ...

Big Data is everywhere. And at the center of the big data discussion is Apache Hadoop, a next-generation enterprise data platform that allows you to capture, process and share the enormous amounts of new, multi-structured data that doesn’t fit into transitional systems.

With Microsoft HDInsight, powered by Hortonworks Data Platform, you can bridge this new world of unstructured content with the structured data we manage today. Together, we bring Hadoop to the masses as an addition to your current enterprise data architectures so that you can amass net new insight without net new headache.

Statistics

Views

Total Views
3,629
Views on SlideShare
3,628
Embed Views
1

Actions

Likes
12
Downloads
0
Comments
0

1 Embed 1

https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples.ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella.Moreover, business data feeds and publicly available data sets are also “big data”.So we should not minimize our thinking to just data that flows through an organization.Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available.One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics applied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime!Anyhow, this is what Big Data means to me…hopefully it makes sense to you. It is important to note that we think of big data beyond the traditional concepts of volume, velocity and variety into transactions, interactions and observations. In reality, this IS the big data our customers are dealing with.
  • Gray Systems Lab, Dr. David DeWittFuture of query processingOne interface to query relational & Hadoop dataQuery data without moving itExpanding to other data sources in the futureSeamless integration with unstructured data & hadoopBreakthrough technologyGrey systems lab - DeWitt It’s going to dramatically simplify how users query relational and Hadoop dataFuture of query processingPioneered in the Jim Gray Systems Labs by David DeWitt, PolyBase is a federated query processor in SQL Server 2012 Parallel Data Warehouse which represents a breakthrough innovation from traditional query processing to join structured and unstructured data from Hadoop together. Without manual intervention, PolyBase Query Processor can accept a standard SQL query and combine tables from a relational source with tables from a Hadoop source directly through external tables.  As well, PolyBase Query Processor parallelizes the ability to import/export data to and from Hadoop giving PDW speed, simplicity, and responsiveness in addressing these new types of queries.Ability to issue standard T-SQL that joins relational data with unstructured data in Hadoop PolyBase rapidly imports/exports data between Hadoop and PDW in parallel3) PolyBase can query data in Hadoop directly without movement (with external tables)4) Created in “Gray Systems Labs” by David DeWitt
  • And that's the second thing I wanted to share with you this afternoon
  • We believe that Hadoop can be in a position to process more than half the world’s data. I’ve talked to a variety of industry analysts, and there’s not a big argument over Hadoop’s opportunity to achieve this. Some would argue it should be 2016 or 2017, rather than 2015. But we believe aggressive goals help focus people on the right things, so let’s keep it 2015 for now, and let’s see how close we can get. The point here is that this statement can act as our “north star” and help guide our way as we focus on our list of 5 items we can be doing:Be diligent stewards of the open source coreBe tireless innovators beyond the coreProvide robust data platform services & open APIsEnable ecosystem at each layer of the stackMake platform enterprise-ready & easy to use

Big Data, Hadoop, Hortonworks and Microsoft HDInsight Presentation Transcript

  • 1. Polling QuestionHow Important is Big Data to your business?___ Very Important___ Somewhat Important___ Not Important Page 1 © Hortonworks Inc. 2012
  • 2. Big Data, Hadoop, Hortonworks andMicrosoft HDInsight© Hortonworks Inc. 2012 Page 2
  • 3. Your Presenters Jim Walker • Director, Prod Marketing • Computer Security and MDM •Saptak Sen • Senior Product Manager • Big Data & NoSQL Technology © Hortonworks Inc. 2012
  • 4. Why Data Driven Business? Data driven decisions are better decisions – its as simple as that. Using big data enables mangers to decide on the basis of evidence rather than intuition. For that reason it has the potential to revolutionize management Harvard Business Review October 2012111001010000101001110101010001001010010010100100100001001000100100000100010000010001001001000100001011100001001000100010100100101111010100100010010010100101001001111 1001010010100011111010001001010000010010001010010111101010011001001010010001000111 Page 4 © Hortonworks Inc. 2012
  • 5. Big Data: Organizational Game Changer Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment SMS/MMS User Click Stream = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 5 © Hortonworks Inc. 2012
  • 6. Page 6© Hortonworks Inc. 2012
  • 7. Page 7© Hortonworks Inc. 2012
  • 8. Page 8© Hortonworks Inc. 2012
  • 9. Polling QuestionWhat tools are you using with Big Data___ Hadoop___ NOSQL___ Other___ All the above Page 9 © Hortonworks Inc. 2012
  • 10. Big Data: Optimize Outcomes at Scale Sports o p ti m i z e Championships Intelligence o p ti m i z e Detection Finance o p ti m i z e Algorithms Advertising o p ti m i z e Performance Fraud o p ti m i z e PreventionRetail / Wholesale o p ti m i z e Inventory turns Manufacturing o p ti m i z e Supply chains Healthcare o p ti m i z e Patient outcomes Education o p ti m i z e Learning outcomes Government o p ti m i z e Citizen services Source: Geoffrey Moore. Hadoop Summit 2012 keynote presentation. Page 10 © Hortonworks Inc. 2012
  • 11. A little history… it’s 2005 © Hortonworks Inc. 2012
  • 12. …and then there was MapReduce Page 12 © Hortonworks Inc. 2012
  • 13. Apache Hadoop: Center of Big Data StrategyOpen Source data management Key Characteristicswith scale-out storage & • Scalable – Efficiently store and processdistributed processing petabytes of data – Linear scale driven by additional HDFS processing and storage • ReliableStorage • Distributed across “nodes” – Redundant storage • Natively redundant – Failover across nodes and racks • Name node tracks locations • Flexible – Store all types of data in any format – Apply schema on analysis and Map Reduce sharing of the dataProcessing • Splits a task across processors • Economical “near” the data & assembles results – Use commodity hardware • Self-Healing, High Bandwidth – Open source software guards Clustered Storage against vendor lock-in Page 13 © Hortonworks Inc. 2012
  • 14. What is a Hadoop “Distribution” Templeton WebHDFS Sqoop FlumeA complimentary set HCatalogof open source HBase Pig Hivetechnologies that MapReduce HDFSmake up a complete Ambari Oozie HAdata platform ZooKeeper• Tested and pre-packaged to ease installation and usage• Collects the right versions of the components that all have different release cycles and ensures they work together Page 14 © Hortonworks Inc. 2012
  • 15. Apache Hadoop & Big Data Use Cases Big Data Transactions, Interactions, Observations Refine Explore Enrich Business Case Page 15 © Hortonworks Inc. 2012
  • 16. 3 Patterns of Hadoop Use Refine Explore Enrich© Hortonworks Inc. 2012
  • 17. 3 Patterns of Hadoop Use Refine Explore Enrich© Hortonworks Inc. 2012 Eintein Photo: Courtesy: Wikipedia Creative Commons
  • 18. 3 Patterns of Hadoop Use Refine Explore Enrich© Hortonworks Inc. 2012 Eintein Photo: Courtesy: Wikipedia Creative Commons
  • 19. Balancing Innovation & Stability • Hadoop is “pre-chasm” • Ecosystem still evolving relative %customers • Enterprises endure 1-3 year adoption cycle The CHASM Innovators, Early Early Late majority, Laggards, technology adopters, majority, conservatives Skeptics enthusiasts visionaries pragmatists time Customers want Customers want technology & performance solutions & convenience Source: Geoffrey Moore - Crossing the Chasm Page 19 © Hortonworks Inc. 2012
  • 20. 
  • 21. 
  • 22. 
  • 23. DemonstrationMining Market Data – Showcase back testing on Interactive Data – Leveraging Excel Tool & BI Tool © Hortonworks Inc. 2012
  • 24. Looking Ahead | Microsoft PolyBase “I’ve said it before: Massively Parallel Processing (MPP) data warehouse appliances are Big Data databases.” - Andrew Brust SQL Server PDW Single query for relational & Hadoop data Process data in place PolyBase Seamless: Regular T-SQL command Future expansion to other data sources © Hortonworks Inc. 2012
  • 25. Hadoop Better on Windows • Active Directory • System Center Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market Microsoft Business Intelligence (BI) • ODBC Connectivity
  • 26. Leading Innovation at the CoreWe focus on innovating thecore of Apache Hadoop• Hortonworks employs the original MapR 1 17 Architects, Builders and Operators of Apache Hadoop Yahoo!• All Apache, NO holdbacks 9 100% of all code contributed back facebook 4 Cloudera to open source Apache projects 8 Number of Apache Hadoop Committers by Company Page 27 © Hortonworks Inc. 2012
  • 27. What we do… We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. Strategy: invest in Apache Hadoop to make it “The enterprise big data platform”Distribution Ecosystem Support• Hortonworks Data • Enable an Ecosystem of • Deliver highest quality Platform (HDP) Big Data Apps support and expertise• Enterprise Ready, Stable, • Our goal os to make sure all • Access to Apache Hadoop Reliable, Tested your tools work WITH Experts• 100% open source Hadoop • Hadoop training an• Built by the architects, • HDP is Hadoop for certification by the Hadoop builders and operators of • Microsoft experts(web, public, private) Apache Hadoop • Teradata Page 28 © Hortonworks Inc. 2012
  • 28. Page 29© Hortonworks Inc. 2012
  • 29. Hadoop in Enterprise Data Architectures Existing Business Infrastructure Web New Tech Datameer Tableau Karmasphere IDE & ODS & Applications & Visualization & Web Splunk Dev Tools Datamarts Spreadsheets Intelligence Applications Operations Discovery Low Tools EDW Latency/NoSQ L Custom Existing Templeton WebHDFS Sqoop Flume HCatalog HBase Pig Hive MapReduce HDFS Ambari Oozie HA ZooKeeper Social Exhaust logs files CRM ERP financials Media Data Big Data Sources (transactions, observations, interactions) Page 30 © Hortonworks Inc. 2012
  • 30. Big Data: It’s About Scale & Structure RDBMS EDW MPP NoSQL Hadoop Structured data types Multi and unstructured Limited, no data processing processing Processing coupled with data Standards and structured governance Loosely structured Required on write schema Required on read Reads are fast speed Writes are fast Software License cost Support only Known entity resources Growing, complexities, wide Interactive OLAP Analytics Data Discovery Complex ACID Transactions best fit use Processing unstructured data Operational Data Store Massive Storage/Processing Page 31 © Hortonworks Inc. 2012