Successfully reported this slideshow.
Your SlideShare is downloading. ×

Big Data Visualisation with Hadoop and PowerPivot

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 46 Ad

More Related Content

Slideshows for you (20)

Similar to Big Data Visualisation with Hadoop and PowerPivot (20)

Advertisement

More from Jen Stirrup (20)

Recently uploaded (20)

Advertisement

Big Data Visualisation with Hadoop and PowerPivot

  1. 1. Website: http://www.jenstirrup.com Twitter: @jenstirrup Email: Jen.Stirrup@copper- blue.com
  2. 2. Data Scientists You’re Incredible! And you are…. People who like data to be correct.
  3. 3. Agenda
  4. 4. How did Big Data get Big?
  5. 5. As long as you’re gonna be thinking anyway, why not think big. (Donald Trump) Because we can imagine, we are free (Jean- Paul Satre) What kind of modern world would we have if Edison, Green and Dixon had not developed cinematic technology before Hitchcock grew up? (Kevin Kelly, futurist)
  6. 6. The Unknown Unknowns • That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. (Donald Rumsfeld)
  7. 7. Data Scientists You’re Incredible!
  8. 8. Examples of Big Data
  9. 9. Big Data Takeaways. V is for:
  10. 10. Data is Black Gold
  11. 11. What does it mean for Enterprises?
  12. 12. Agenda
  13. 13. Big Data.
  14. 14. Data Management Strategy OLTP Single Purpose DW Multi Purpose DW MapReduce Compute Trend
  15. 15. Increases ad revenue by processing 3.5 billion events per day Massive Volumes Processes 464 billion rows per quarter, with average query time under 10 secs. Measures and ranks online user influence by processing 3 billion signals per day Cloud Connectivity Connects across 15 social networks via the cloud for data and API access Uses sentiment analysis and web analytics for its internal cloud Real-Time Insight Improves operational decision making for IT managers and users
  16. 16. Hadoop is for Big Data.
  17. 17. What is Hadoop? “Flexible and Available Architecture for Large Scale computation and data processing on a network of highly available commodity hardware.”
  18. 18. Hadoop’s Lineage * Resource: Kerberos Konference (Yahoo) – 2010
  19. 19. Distributed Storage (HDFS) HDInsight Ecosystem Distributed Processing (Map Reduce) ODBC(Azure Data Marketplace) Windows Azure Storage
  20. 20. Hadoop Key Terms
  21. 21. Hadoop Capabilities Machine Learning Graph Processing Distributed Compute Extract Load Transform Predictive Analysis
  22. 22. Why Hadoop? Open Source Software Commodity Hardware = Reduction of Costs for IT
  23. 23. Hadoop vs RDBMs Apache Hadoop isn’t a substitute for a database • It is not Relational • Key Value pairs • Big Data
  24. 24. Hadoop vs RDBMs • Unstructured / Semi structured • Structured • Works together with RDBMs
  25. 25. Data Knowledge Action HDInsight
  26. 26. How can Microsoft help?
  27. 27. ..Bringing home all this technology, all your data in familiar packages
  28. 28. Big Agenda
  29. 29. BIG DATA REQUIRES AN END-TO-END APPROACH Discover Combine Refine Relational Non-relational Streaming INSIGHT DATA ENRICHMENT DATA MANAGEMENT Self-Service Collaboration Corporate Apps Devices Analytical
  30. 30. Data Knowledge Action HDInsight
  31. 31. Microsoft Hadoop Vision Runs on Windows and Azure • Active Directory • System Center • .Net Programmability Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market
  32. 32. Microsoft Hadoop Vision Microsoft Business Intelligence • Hive ODBC Connectivity • BI Tools for Big Data Collaboratewith and Contribute to OSS • Collaborate with HortonWorks • Provide improvements and Windows support back to OSS
  33. 33. On Premise • Comes with: •Hadoop command line (shell) •Hadoop Status for name node and map-reduce cluster •HDInsight Dashboard
  34. 34. On Premise • On prem: http://www.microsoft.com/bigda ta/ • Single node cluster (onebox) install • C:hadoop • Starts local services
  35. 35. On Azure • On Windows Azure: http://HadoopOnAzure.com/ • 3 node cluster running as a service in Azure • Can be used for 5 days • Provides samples and HDInsight Dashboard • TAP Program
  36. 36. Agenda •Big Data – What is it? • Big Data or Big Hype? • Big Data, Big Insights with Hadoop
  37. 37. Because we can imagine, we are free Jean-Paul Satre We have the tools. All we’ve got to do is imagine what could be. We can reinvent the present; we can transform the world around us. Jason Silva
  38. 38. Recap

Editor's Notes

  • Courtesy of Bruno Aziza at @SiSense
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • There are some things in life are so complicated and abstract that they’re awesome. Eternity, cosmic significance, and the infinite universe are just a few of these awesome, convoluted concepts that have kept us fascinated and confused since the beginning of human consciousness.Awe - perceptual expansion, such perceptual vastness that you literally have to configure your mental schemata just to accommodate, just to take in the scale, of the experienceanthological awakening, realization of the connectedness of all things, and also the continuum from inanimate to animate matter; all of it is nature, all of it is inevitable, all of it is emerging as part of the same evolutionary processPhysicist Freeman Dyson speaks of a new future where a new generation of artists will write genomes the way that Shakespeare used to write verses
  • Courtesy of WIPRO
  • Teradata and Lyn Langit slide.we’ve got 7 billion people, we got 6 billion devices90% of the world’s data was created in the last two years aloneNot the data that’s kept behind corporate walls. unstructured content, most of which didn’t even exist years ago: documents, tweets, images, videos posted to YouTube, data gathered from surveillance cameras. We post, we blog, we share, we tweet, we like or don’t like. We have a voice and we leave a digital trail. And every tweet we send is being followed, monitored, analyzed, acted on. Companies are analyzing social to find out what you’re thinking, to know what new products and services you want even before you do. A new initiative by the U.N. is actually using sentiment analyses to help predict the civil unrest, job losses, spending reductions, disease outbreaks
  • Digital Marketingoptimisation – golden path analysis, clickthroughtsDigital Exploration – Discovery, new marketsMachine generated analytics – logs, real time, telemetry. Location. Remote sensors.Data Retention – archivingTraditionally: Physics Experiments, Sensor data, Satellite data, …Now:Operational LogsCustomer behaviorSocial interactions online…From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
  • What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
  • What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
  • Volume – data comes in one size – large.Variety – structured and unstructure data.Veracity – good and bad data.Velocity – fast moving.Value – business value
  • Unlike real crude oil, data can be re-used. It can be mined for profit.It needs to be re-shaped in order to be used.If you don’t’ have your data, you don’t have anything! You lose your business.
  • Thanks to @SiSense and Bruno Aziza
  • If you don’t’ have your data, you don’t have anything! You lose your business.
  • Actionable InsightPredictive InsightBusiness ImpactCustomer Discernment
  • Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
  • Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth. Technology history has taught us that the one with themost data wins. The empires of data like Twitter, Facebook, Yahoo all of whom are able to capitalize on the notion that data equates to power. More and more companies are increasingly utilizing Hadoop to power Big Data analytics and drive revenue and profit.It’s all about your Data.
  • Some examples of organizations that delivering new value based in the form of revenue growth, cost savings or creating entirely new business models.Yahoo - AS with Hive, Klout - AS with Hive (white paper), GE - Hive AnalyticsYahoo! (Gartner BI Excellence Award Winner) is driving growth for existing revenue streams:Yahoo! manages a powerful, scalable advertising exchange that includes publishers and advertisers.Advertisers want to get the most out of their investment by reaching their targeted audiences effectively and efficiently.Yahoo! needs visibility into how consumers are responding to ads alongmany dimensions (websites, creative, time of day, gender, age, location) to make the exchangework as efficiently and effectivelyas possible.Yahoo! doubled its revenue by allowing campaign managers to “tune” campaign targeting and creative.Yahoo! drove an increase in spending from advertisers since they got better performance by advertising through Yahoo!.Yahoo! TAO exposed customer segment performance to campaign managers and advertisers for the first time.Klout is creating new businesses and revenue streams:Klout’s mission is to help everyone understand and leverage their influence. Klout uses Big Data to unify the social web (consumers, brands, and partners) with social networking and activity, along with data to generate a Klout score and enable analysis, targeting, and social graphs.Helps consumers manage their “social brand.”Helps brands reach influencers at scale.Helps data partners enhance their services (customer loyalty, CRM, media and identity, and marketing). For example, the Palms uses Klout scores in addition to their normal customer rewards program to determine whether or not to upgrade their customers to a better room during their stay. The Huffington Post uses Klout to help serve the best curated Twitter content.Klout Case Study: http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Klout/Data-Services-Firm-Uses-Microsoft-BI-and-Hadoop-to-Boost-Insight-into-Big-Data/710000000129Case Study on Thailand’s Department of Special Investigations : http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Department-of-Special-Investigation/Thai-Law-Enforcement-Agency-Optimizes-Investigations-with-Big-Data-Solution/710000001175 GE is driving operational efficiencies:GE is running several use cases on its Hadoop cluster while incorporating several different disparate sources to produce results. Along with sentiment analysis, GE is running web analytics on its internal cloud structure and looking at load usage, user analytics, and failure mode analytics. GE built a recommendation engine for its intranet involving various press releases users might be interested in based on their function, user profiles, and prior visits to its site. GE is working with several types of remote monitoring and diagnostic data from energy and wind businesses.
  • Business Users need data. There is a paradigm shift towards it, despite what the cartoon says.
  • Processing Platform for Big Data ProcessingUsing the “Map-Reduce” Processing ParadigmWhen people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • Acid – Atomicity, Consistency, Isolation, Durability
  • Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice between massively parallel processing architectures — data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions. This choice is often informed by the degree to which the one of the other "Vs" — variety — comes into play. Typically, data warehousing approaches involve predetermined schemas, suiting a regular and slowly evolving dataset. Apache Hadoop, on the other hand, places no conditions on the structure of the data it can process.
  • Hadoop, on the other hand, places no conditions on the structure of the data it can process.
  • I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.
  • Acid – Atomicity, Consistency, Isolation, Durability
  • Big Data; in terms of data volume, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems.Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is to expose data science to everyone.As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity.  Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop? Microsoft HDInsight enables you to gain insight from virtually any data, connect with the world of data, improve decision making, and enhance the development of the next generation of products and services.Nearly everyone in your organization can analyze and make more informed decisions with the right tools.PowerPivot for Microsoft Excel and Power View for SharePoint give nearly all users a view into structured and unstructured data.With the Hive Add-in for Excel and Hive ODBC Driver, almost anyone in your organization can directly access Hadoop datafrom end-user tools.Hadoop simplifies programming for developers with JavaScript for MapReduce jobs. The JavaScriptimplementation can also reduce your code by up to 10 times compared to Java. 
  • The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way.   Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.  
  • There are other talks that will go into Big Data and Hadoop so we’ll only do a quick overview of that right now. We’ll spend most of our time on Hive.
  • Data democracy
  • Ask the audience first.

×