Big Data a big deal?


Published on

Why all the fuss over Big Data? And why now?

What CIOs and CEOs should understand about Big Data and how it may impact their business.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data a big deal?

  1. 1. BIG DATA…A BIG DEAL?Organized by: Andrew Waitman
  2. 2. Big Data, Small Sound Bytes2 © 2009/2012 Pythian © All Rights Reserved
  3. 3. Big Data, Small Sound Bytes3 © 2009/2012 Pythian © All Rights Reserved
  4. 4. Big Data, Small Sound Bytes4 © 2009/2012 Pythian © All Rights Reserved
  5. 5. Why Big Data Now? VOLUME1. All on-line digital activity creates artifacts or metadata which in Tera to Peta byte or more volume is being called BIG DATA2. Unstructured Metadata collection occurs when ever digital activity occurs3. Digital metadata volume has exploded with growing internet usage and has accelerated with recent smart phone & iPAD usage driving global mobile and social activity5 © 2009/2012 Pythian © All Rights Reserved
  6. 6. Why Big Data Now? HUMAN VOLUME1. In 1998 Google provided 3.6 Million searches in the year2. In 2011 Google ran 1,722,071,000,000 searches per year3. In August 2008 there were 100 Million Facebook users4. In December 2012 there will be over 1 Billion Facebook users5. In August 2012 Twitter reached over 500,000,000 usersDigital volume of user on-line metadata has exploded with growinginternet, mobile and social use.6 © 2009/2012 Pythian © All Rights Reserved
  7. 7. Why Big Data Now? DEVICE VOLUME1. In 2005 There were 1.5 Billion RFID Tags2. In 2012 There are 30 Billion RFID Tags3. 350 Billion Smart Meter Transactions per year4. 1 Billion smart phones by 2015 with location sensorsDigital sensor data volume has exploded with growingmachine usage of sensor and measurement reporting7 © 2009/2012 Pythian © All Rights Reserved
  8. 8. Why Big Data Now? ZEITGIEST1. Data Driven Decision Making is mainstream thinking– Think Moneyball by Michael Lewis2. Google demonstrated the value and importance of mining ―Big Data‖ for Search, Ad Placement, Language Translation and a myriad of other computing challenges with economic benefit.3. Data trumps smarter algorithms. It is the dawning of the Age of Real Time & Near Real Time BIG Impact Analytics.8 © 2009/2012 Pythian © All Rights Reserved
  9. 9. Why Big Data Now? ECONOMICS1. Collection & Analysis of large volumes of metadata is now relatively simple, low cost and potentially highly valuable2. Storage & computing power is relatively low cost enabling the mining of massive metadata volumes in real time, near real time or later3. The economic benefit or value of the insights can far exceed the costs of acquiring & storing the data4. The simplification and access of Big Data infrastructure tools9 © 2009/2012 Pythian © All Rights Reserved
  10. 10. Purpose of Data Analysis The analysis of data are required to understand (a) why consumers purchase a particular, (b) how consumers purchase the product, (c) the demographics and psychographics of the purchaser of the product and (d) the ultimate user of the product.10 © 2009/2012 Pythian © All Rights Reserved
  11. 11. An Alternative Perspective “Big Data is just the new rallying cry for the same old stuff BI companies have been producing all along” -Stephen Few Perceptual Edge This seems obvious, but almost no attention is being given to building the skills and technologies that help us glean insights from data more effectively. As Richards J. Heuer, AVOID Jr. argued in the Psychology of Intelligence Analysis (1999), the primary failures of analysis are less due to CONFUSING insufficient data than to flawed thinking. To succeed analytically, we must invest a great deal more of our ABUNDANCE resources in training people to think effectively and we must equip them with tools that support that effort. WITH Heuer spent 45 years supporting the work of the CIA. Identifying a potential terrorist plot requires an analyst INSIGHT to sift through a lot of data (perhaps Big Data), but more importantly, it relies on their ability to connect the dots. Contrary to Heuer’s emphasis on thinking skills, big data is merely about more, more, more; not smarter or better.11 © 2009/2012 Pythian © All Rights Reserved
  12. 12. Is Big Data really new? NOWhat is new is that the access-to-insights occurs ateconomics and tools available to almost anyone todaySaving all data is now economically viable for everyone.Large public and private sector (Global 2000) enterpriseshave always generated, stored, processed and analyzedlarge volume and a variety of structured andunstructured data:1. Particle Physics Research - Large Hadron Collider generates 1 Petabyte per second.2. Oil Exploration - Seismic sensor daa3. Bioinformatics -Human Genome Project 12 © 2009/2012 Pythian © All Rights Reserved.
  13. 13. BIG DATA VS TRADITIONAL DATA Petabytes at1/10th Cost of Pre-Engineered Gigabytes to Tera-bytes Storage SQL Structured Semi-structured Engineered Systems Variety of Sources Data Model/Schema Store Everything Selected Data Stored Raw Data Complexity at Design/Architecture stage No Data Model/Schema Simplicity at Usage stage Parallelize to handle volume Majority of $$ Investment up front Simplicity at Design/Architecture stage Complexity at Insight stage13 © 2009/2012 Pythian © All Rights Reserved
  14. 14. Big Data is BI at Scale PHASE 1 PHASE 2 PHASE 3 Capture & Speculate Exploit Store and Insights • Petabyte scale Investigate • Real Time • 300 • Data Science • NRT Decisions Terabytes/Rack • Analytics • MAP-R14 © 2009/2012 Pythian © All Rights Reserved
  15. 15. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS • How do you plan to store what types of semi-structured data? • What questions are you attempting to answer? • What Data Analysis is being currently done? • What are people asking questions about? • What DR? What compression? What Storage is possible? Flash vs Disk? Capacity and How fast to access? • How many people can access simultaneously? • KNOW THE DATA? SOURCE? RATE OF GENERATION?15 © 2009/2012 Pythian © All Rights Reserved
  16. 16. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them?STORAGE REQUIREMENTS • Be scalable • Provide tiered storage • Be self managing • Ensure content is highly available • Ensure content is widely accessible • Support both analytical and content applications • Support workflow automation • Integrate with legacy applications • Enable integration with public, private and hybrid cloud ecosystems • Be self healing16 © 2009/2012 Pythian © All Rights Reserved
  17. 17. Big Data Phase 2- Speculate and Investigate Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS • What type of semi-structured data do I have? • What type of questions am I trying to answer? • Statistical? Correlation? Causal? Patterns? • How do I need to manipulate, translate, transform, cleanse, organize, visualize the data? • How much time do I have for analysis? • What tools do I have to perform transformation and analysis?17 © 2009/2012 Pythian © All Rights Reserved
  18. 18. Big Data Phase 3- Exploit Insights Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS • Are discovered patterns/insights available in real-time, near real- time or further out? • How do systemically find pattern/insight going forward? • How do I integrate into business impacting decision process?18 © 2009/2012 Pythian © All Rights Reserved
  19. 19. Top 10 Reasons Why all the Hype around Big Data now?1. At Tera & Peta bytes it really does get interesting.2. All the Cool Kids are doing it. Once the Four Digerati Horseman (Google, Facebook, Twitter, Amazon) say its important, then it really is.3. BI Folks needed a new marketing moniker.4. ‗CLOUD‘ hype was already annoying and slowing.5. Gartner says its near its peak!6. The term went viral!7. People thought you said Big Deal!8. Voluminous data could not be pronounced9. User Data mining is next to Voyeurism10. Its Google‘s Vault!19 © 2009/2012 Pythian © All Rights Reserved
  20. 20. What is considered Big Data? VOLUME & VARIETY 1. Any data stored digitally and at scale (Tera bytes +) with potential for providing practical, useful insights, potentially with economic benefits 2. Very large volume of unstructured information/data 3. Big Data is characterized by the volume, velocity and variety of large data sets Every “connected” person or “connected” device is potentially a data generator20 © 2009/2012 Pythian © All Rights Reserved
  21. 21. What is considered Big Data? DIFFICULT & TIMELY 1. Big Data by the nature of the volume hides or obscures valuable insights. A lot of noise but with critical and potentially valuable signals buried within 2. Often the signal value perishes rapidly requiring real time or near real time analysis and action Big Data is the quintessential signal vs noise problem21 © 2009/2012 Pythian © All Rights Reserved
  22. 22. Examples of Big Data?• Local/regional weather information• WEB Traffic information• User search behavior• Social information – who connected to whom, who poked who etc.• Mobile User information – preferences, likes, habits• Application usage information• E-commerce transaction information• Physical retail customer transaction data22 © 2009/2012 Pythian © All Rights Reserved
  23. 23. Who are the Top 15 ‘Big Data’ ‘Players’? 1. Google 11.Microsoft 2. Amazon 12.IBM 3. Apple 13.Hortonworks 4. Yahoo 14.Zynga 5. Facebook 15.eBay 6. Salesforce 7. Twitter 8. Cloudera 9. LinkedIN 10.NetFlix23 © 2009/2012 Pythian © All Rights Reserved
  24. 24. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. © 2009/2012 Pythian © All Rights Reserved
  25. 25. What is the size of the BIG DATA Market? Deloitte pegs the size of the big data market at about $1.3-$1.5 billion in 2012 In March, the IDC released a statement that predicted the worldwide big data technology services market to reach $16.9 billion in 2015. The 2012 Global BI SW Market is $35 Billion25 © 2009/2012 Pythian © All Rights Reserved
  26. 26. Where does BI and Big Data co- exist? PREDICTIVE ANALYTICS26 © 2009/2012 Pythian © All Rights Reserved
  27. 27. How does Machine Learning and Big Data relate? PREDICTIVE ANALYTICS27 © 2009/2012 Pythian © All Rights Reserved
  28. 28. When is Big Data valuable? 1. When better Business decisions result from practical insights provided by data that were unavailable to expert judgment or unaware by experts 2. When time-to-insight results in big returns or benefit eg. Real time book recommendation 3. Where precision of analysis results in specific alternative decisions 4. Where patterns from heterogeneous or seemingly disparate data sources provide material competitive insights/advantage versus competition28 © 2009/2012 Pythian © All Rights Reserved
  29. 29. What is unique about Big Data Technology? MASSIVE PARRALLISM AFFORDABLE HARDWARE LOCAL PROCESSING 1. The tools do not require the data to be first structured in a particular schema as is required in relational databases 2. Data is analyzed in native format closest to where it is stored, dramatically reducing the time and effort for retrieval and restore.29 © 2009/2012 Pythian © All Rights Reserved
  30. 30. Visualization may unlock the key to Big Data Insights30 © 2009/2012 Pythian © All Rights Reserved
  31. 31. What skills do I need in my organization for Big Data? 1. Data scientists – • Identify what analysis makes sense in context. Typical background in math and statistics, as well as artificial intelligence and natural language processing. 2. Data architects – • Create Data mode and identify required data sources and analytical tools 3. Data visualizers – • Using visualizations exploring what the data means and presenting how it will impact the company 4. Data change agents – • Good communicators, and a Six Sigma background — Understand how to apply statistics.31 © 2009/2012 Pythian © All Rights Reserved
  32. 32. What skills do I need in my organization for Big Data? 5. Data engineer/operators – • Big Data infrastructure operations. Develop architecture that helps analyze and supply data in the way the business needs, and make sure systems are performing smoothly 6. Data stewards – • Ensure that data sources are properly accounted for, and may also maintain a centralized repository as part of a Master Data Management approach, in which there is one ―gold copy‖ of enterprise data to be referenced. 7. Data virtualization/cloud specialists – • Build and maintain a virtualized data service layer that can draw data from any source and make it available across organizations in a consistent, easy-to-access manner 8. Systems Administrators32 © 2009/2012 Pythian © All Rights Reserved
  33. 33. Six Steps to Big Data alchemy? 1. Select the right data sets • Identify rich data sources which may contain insights to a particular problem you are trying to solve or insight you are trying to gain. Social media data is providing incredible insights to changes in Brand positioning and new product introductions 2. Join the various sets of data • Rich unstructured and sometimes incomplete data into a new set for manipulation and analysis 3. Clean the new large data set • Begin to discover important and relevant patterns, signatures, anomalies, correlations, outliers using advanced analytic models 4. Create models • These models predict outcomes using the data. Iterate your hypothesis and keep experimenting 5. Use visualization tools • Visualization may assist in discovery or presentation of key insights from the data 6. Iterate • Keep varying your various models and data sets to assist future planning or decision making33 © 2009/2012 Pythian © All Rights Reserved
  34. 34. How is Big Data providing Value today? • On line Media and Social Sites mine user behavior Big Data for what interests whom, when, why and how. Big WEB SURF Data provides insights to Sites of what people are interested in, whom do they share that information with, and how long they stay engaged on line. • On line retailers mining Big Data to predict consumers buying behavior, purchase preferences and high impact offers to drive up total spend per session. • Insurance companies mining Big Data can improve their overall performance by facilitating greater pricing accuracy, deeper relationships with customers, and more effective and efficient loss prevention.34 © 2009/2012 Pythian © All Rights Reserved
  35. 35. How can Pythian help you with Big Data? 1. First, get informed. 2. Second, get started. Recognize an opportunity for competitive Advantage within your company. 3. Third, get the right team of people involved. Organize an internal task force to drive the Big Data initiative. Don‘t forget to find the critical Data Scientist. That person who will understand the data sources and know what questions to pose. 4. Fourth, identify the key sources of Big Data both external and internal. 5. Fifth, with Pythian‘s assistance evaluate the tools and technology that will help your Big Data program.35 © 2009/2012 Pythian © All Rights Reserved
  36. 36. Key Questions for Executives • What does the data say? • Where did the data come from? • Has the data been sufficiently cleaned? • How was the data analyzed? • How confident can we be in our analysis? • Can we distinguish correlation from causality? • How much will the data influence the key decision makers?36 © 2009/2012 Pythian © All Rights Reserved
  37. 37. A compelling balanced perspective on BigData Stephen Few- Perceptual Edge37 © 2009/2012 Pythian © All Rights Reserved
  38. 38. Archive Slides38 © 2009/2012 Pythian © All Rights Reserved
  39. 39. Big Data Start-ups • WeatherBill (which compiles large amounts of weather data from a variety of sources, then sells insurance based on statistical analysis), • Klout (a controversial startup that processes large amounts of data to create every users‘s social influence score) or • Wonga (which crunches data to grant financial loans) are some early examples of startups with big data as their core DNA. • John Partridge, the president and CEO of Tokutek Inc. — a Lexington company founded in 2006 that makes databases run faster. • Trifacta raised $$4.3 million from Accel‘s Big Data fund for a solution that doesn‘t just visualize insight, but also the analytics tools that produce it. • Platfora is a software company based in San Mateo, California, building a revolutionary BI and analytics platform that democratizes and simplifies use of big data and Hadoop. The company was founded by Ben Werther, former product head of Greenplum, an analytical database company acquired by EMC. Platfora is assembling a superb team of data and distributed systems architects/engineers, UI and UX developers, and data scientists.39 © 2009/2012 Pythian © All Rights Reserved
  40. 40. Big Data Start-ups • About MapR Technologies MapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality for more business users. MapR enables customers to harness the power of Big Data analytics. Leading companies including Amazon, Cisco, EMC and Google partner with MapR to deliver an enterprise-grade Hadoop solution. Investors include Lightspeed Venture Partners, NEA and Redpoint Ventures. • Alteryx provides indispensable analytic solutions for enterprise and SMB companies making critical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is a desktop-to-cloud Agile BI and analytics solution designed for data artisans and business leaders that brings together the market knowledge, location insight, and business intelligence today‘s organizations require. For more than a decade, Alteryx has enabled strategic planning executives to identify and seize market opportunities, outsmart their competitors, and drive more revenue.40 © 2009/2012 Pythian © All Rights Reserved