Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Finding business value in Big Data


Published on

I often hear from clients: “We don’t know much about Big Data – can you tell us what it is and how it can help our business?”  Yes!  The first step is this vendor-free presentation, where I start with a business level discussion, not a technical one.  Big Data is an opportunity to re-imagine our world, to track new signals that were once impossible, to change the way we experience our communities, our places of work and our personal lives.  I will help you to identify the business value opportunity from Big Data and how to operationalize it.  Yes, we will cover the buzz words: modern data warehouse, Hadoop, cloud, MPP, Internet of Things, and Data Lake, but I will show use cases to better understand them.  In the end, I will give you the ammo to go to your manager and say “We need Big Data an here is why!”  Because if you are not utilizing Big Data to help you make better business decisions, you can bet your competitors are.

Published in: Technology
  • Be the first to comment

Finding business value in Big Data

  1. 1. Finding business value in Big Data “What exactly is Big Data and why should I care?” James Serra Big Data Evangelist Microsoft
  2. 2. Other Presentations  Building an Effective Data Warehouse Architecture Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)  Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud and MPP) Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in  Finding business value in Big Data (What exactly is Big Data and why should I care?) Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects  How does Microsoft solve Big Data? Covers the Microsoft products that can be used to create a Big Data solution  Modern Data Warehousing with the Microsoft Analytics Platform System The next step in data warehouse performance is APS, a MPP appliance  Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc Deep dives into the various Microsoft Big Data related products
  3. 3. About Me  Business Intelligence Consultant, in IT for 28 years  Microsoft, Big Data Evangelist  Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data Warehouse solutions using the Microsoft BI stack  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE for SQL Server 2012: Data Platform and BI  SME for SQL Server 2012 certs  Contributing writer for SQL Server Pro magazine  Blog at  SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  4. 4. I tried understanding Big Data… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  5. 5. Agenda  Overview of Big Data and Analytics  Use cases  Data Lake  Hadoop and its role  IoT and real-time data  Modern data warehouse  Federated querying  Data warehouse and the cloud  Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
  6. 6. Overview of Big Data and Analytics
  7. 7. What differentiates today’s thriving organizations? Data.
  8. 8. What is Big Data, really? Data in all forms & sizes is being generated faster than ever before Capture & combine it for new insights & better, faster decisions 11
  9. 9. Harness the growing and changing nature of data Collect any data StreamingStructured Challenge is combining transactional data stored in relational databases with less structured data Big Data = All Data Get the right information to the right people at the right time in the right format Unstructured “ ”
  10. 10. An illustration of the velocity of data created Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from
  11. 11. The three V’s
  12. 12. Complex implementations Enterprise data warehouse Spreadmarts Siloed data Hadoop DashboardsAd hoc analysis Machine learning OLAP Any dataIn-memory Internet of Things Innovation Transactional systems ETL Operational reporting Value Technology innovation accelerates value
  13. 13. Discover and connect Answering new questions Value
  14. 14. 21 Put data to work for everyone in your organization Inspire innovation Accelerate decision-making Learn from & share insights
  15. 15. Units Sold, Discounts, and Profit before Tax 22 Embrace Big Data across your business Revenue and Target by Region Departments HeadcountXT2000 Status List Show Only Problems Indicator Preliminary Budget Materials and Packaging Review Book Advertising Slots Fall Showcase Event Analysis End User Survey Technical Review Milestone Status 2M 1.5M 1M 0.5M 0M Discounts(Millions) 50K 60K 70K 80K 90K 100K 110 Product A Product D Product C Product F Product G 0 5 10 15 Accounting Administration Customer Support Finance Human Resources IT Marketing R&D Sales Sales Improve revenue performance HR Maximize employee engagement Marketing Build deeper customer relationships Finance Impact your company’s bottom line 0 5 10 15 0 5 10 15 (Thousands) North South Region: South Target: 13450 Highlighted: 4900 Revenue Target
  16. 16. 23 The Data Divide 80% of data stored 70% of data generated by customers <0.5% being operationalized 0.5% being analyzed 3% prepared for analysis IDC says that right now, about 22% of data is useful. By 2020 that number will climb to 37%.
  17. 17. Major Fail Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation” Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
  18. 18. Analytics Solution Capture and integrate data from multiple internal and external sources Derive insight from data with rich, interactive dashboards and reports using the tools you know Put insight into action to increase efficiency and constituent satisfaction
  19. 19. Advanced Analytics Defined
  20. 20. The end result of Big Data - Icing on the cake
  21. 21. Use Cases
  22. 22. Let’s set off light bulbs in your head
  23. 23. Recommenda- tion engines Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting for business planning Oil & Gas exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure & Web App optimization Legal discovery and document archiving Data Analytics is needed everywhere Intelligence Gathering Location-based tracking & services Pricing Analysis Personalized Insurance
  24. 24. Personalized policies can reduce costs & better meet customer needs Insurance companies can help (and some have already started helping) their customers with truly personalized insurance plans tailored to their needs and risks Personalized Insurance Insurance Companies can collect real-time data from in- car sensors and combine it with geolocation and in-house systems. With information such as distance and speed, provide personalized insurance offers based on driving amount, risk, and other factors, for a truly personalized plan that may often save drivers money $1,600/yr. US national avg. car insurance premium
  25. 25. The vast amount of current and ever-growing customer purchase, rating and click data can all be collected and managed with an Hadoop-based solution, to pinpoint preferences based on purchase history and demographics, and be able to serve useful and compelling cross-sell and up-sell recommendations. Recommendation Engines Significantly improve up-sell and cross-sell opportunities Retailers can use customer purchase & rating information to serve recommendations to current customers, based on similarities across many dimensions 158 Items sold/second by on 11/29/2010 (Cyber Monday)
  26. 26. Retailers – whether large, small, online or in-store – can improve margins with more detailed pricing analysis. When a customer is in range of a transaction (either in the store, online or perhaps passing by), offer personalized offers, real-time price quotes, or other frequent-buyer perks to help bring more customers to the store and improve repeat business. Pricing Analysis Significantly improve sales and customer satisfaction Retailers can use customer past purchase, preference, and demo- graphic information to serve real- time custom pricing, instant discounts when near the store. up to 30% Additional price Mac users accepted for travel from Orbitz
  27. 27. Using data from the Weather Channel, Walmart can create targeted ads based on local weather, products in their nearby stores, and seasonal consumer desires. Walmart increased the berry and steak sales as much as threefold when weather-targeted ads were run
  28. 28. Using Big data to determine the best train schedules
  29. 29. Data Lake
  30. 30. What is a data lake? A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • A place to store unlimited amounts of data in any format inexpensively • Allows collection of data that you may or may not use later: “just in case” • A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” • Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW • Frees up expensive EDW resources (storage and processing), especially for data refinement • Allows for data exploration to be performed without waiting for the EDW team to model and load the data • Some processing in better done on Hadoop than ETL tools like SSIS • Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
  31. 31. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Well manicured, often relational sources Known and expected data volume and formats Little to no change Complex, rigid transformations Required extensive monitoring Transformed historical into read structures Flat, canned or multi-dimensional access to historical data Many reports, multiple versions of the truth 24 to 48h delay MONITORING AND TELEMETRY
  32. 32. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Increase in variety of data sources Increase in data volume Increase in types of data Pressure on the ingestion engine Complex, rigid transformations can’t longer keep pace Monitoring is abandoned Delay in data, inability to transform volumes, or react to new sources Repair, adjust and redesign ETL Reports become invalid or unusable Delay in preserved reports increases Users begin to “innovate” to relieve starvation MONITORING AND TELEMETRY INCREASING DATA VOLUME NON-RELATIONAL DATA INCREASE IN TIME STALE REPORTING
  33. 33. Data Lake Transformation (ELT not ETL) New Approaches All data sources are considered Leverages the power of on-prem technologies and the cloud for storage and capture Native formats, streaming data, big data Extract and load, no/minimal transform Storage of data in near-native format Orchestration becomes possible Streaming data accommodation becomes possible Refineries transform data on read Produce curated data sets to integrate with traditional warehouses Users discover published data sets/services using familiar tools CRMERPOLTP LOB DATA SOURCES FUTURE DATA SOURCESNON-RELATIONAL DATA EXTRACT AND LOAD DATA LAKE DATA REFINERY PROCESS (TRANSFORM ON READ) Transform relevant data into data sets BI AND ANALYTCIS Discover and consume predictive analytics, data sets and other reports OTHER REFINERY PROCESSES DATA WAREHOUSE Star schemas, views other read- optimized structures
  34. 34. Hadoop and its role
  35. 35. What is Hadoop? Microsoft Confidential  Distributed, scalable system on commodity HW  Composed of a few parts:  HDFS – Distributed file system  MapReduce – Programming model  Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm  Main players are Hortonworks, Cloudera, MapR  WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead) Core Services OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS OOZIE AMBARI YARN MAP REDUCE HIVE & HCATALOG PIG HBASEFALCON Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  36. 36. Hortonworks Data Platform 2.2 Simply put, Hortonworks ties all the open source products together (20)
  37. 37. The real cost of Hadoop
  38. 38. Use cases using Hadoop and a DW in combination Bringing islands of Hadoop data together Archiving data warehouse data to Hadoop (move) (Hadoop as cold storage) Exporting relational data to Hadoop (copy) (Hadoop as backup/DR, analysis, cloud use) Importing Hadoop data into data warehouse (copy) (Hadoop as staging area, sandbox, Data Lake)
  39. 39. IoT and real-time data
  40. 40. What is the Internet of Things? Connectivity Data AnalyticsThings IoT = sensor-acquired data
  41. 41. What is the Internet of Things (IoT)? Internet-connected devices that can perceive the environment in some way, share their data, and communicate with you. IoT is just a catch-all term for ways of using machine-generated data to create something useful. - Has it one processor and sensor to collect information - Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors, field operation devices that assist firefighters in search and rescue - Excludes computers, tablets, and smart phones - But really, it’s in the sphere of business intelligence that IoT will really make a difference. Cool possibilities - When a milk carton is almost empty it will ping you when you are near a store - An alarm clock that signals your coffee maker to start brewing when you wake up - An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit Gartner: 10 billion devices connected to the internet today, 26B by 2020 At some point in the future, nearly every manmade object will contain a device that transmits data!
  42. 42. Modern Data Warehouse
  43. 43. Modern Data Warehouse Think about future needs: • Increasing data volumes • Real-time performance • New data sources and types • Cloud-born data • Multi-platform solution • Hybrid architecture
  44. 44. Modern Data Warehouse Defined
  45. 45. Modern Data WarehouseThe Dream
  46. 46. The Reality
  47. 47. Federated Querying
  48. 48. Federated Querying Other names: Data virtualization, logical data warehouse, data federation, virtual database, and decentralized data warehouse. A model that allows a single query to retrieve and combine data as it sits from multiple data sources, so as to not need to use ETL or learn more than one retrieval technology
  49. 49. Select… Result set Federated Querying Relational Data DB2 Oracle MongoDB SQL Server Query Model Non- Relational Data Cloudera CHD Linux Hortonworks HDP Windows Azure HDInsight EDW
  50. 50. DW and the Cloud
  51. 51. Can I use the cloud with my DW? • Public and private cloud • Cloud-born data vs on-prem born data • Transfer cost from/to cloud and on-prem • Sensitive data on-prem, non-sensitive in cloud • Look at hybrid solutions
  52. 52. TDWI Best Practices Report (2015)
  53. 53. SMP vs MPP
  54. 54. SMP vs MPP • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  55. 55. 50 TB 100 TB 500 TB 10 TB 5 PB 1.000 100 10.000 3-5 Way Joins  Joins +  OLAP operations +  Aggregation +  Complex “Where” constraints +  Views  Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars TB’s MB’s GB’s Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Daily Load Weekly Load Strategic, Tactical Strategic Strategic, Tactical Loads Strategic, Tactical Loads, SLA “Query Freedom“ “Query complexity“ “Data Freshness” “Query Data Volume“ “Query Concurrency“ “Mixed Workload” “Schema Sophistication“ “Data Volume” DW SCALABILITY SPIDER CHART MPP – Multidimensional Scalability SMP – Tunable in one dimension on cost of other dimensions The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension.
  56. 56. When do you need a MPP solution? • We need at least 3x query performance improvement • We are near disk capacity and see a lot of growth in the upcoming years • We need to support queries during our maintenance window • We need to load data outside of our maintenance window • We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more memory, faster cpu, clustering
  57. 57. Big Data is coming
  58. 58. Summary • We live in an increasingly data-intensive world • Much of the data stored online and analyzed today is more varied than the data stored in recent years • More of our data arrives in near-real time This presents a large business opportunity. Are you ready for it?
  59. 59. Resources  The Modern Data Warehouse:  Fast Track Data Warehouse Reference Architecture for SQL Server 2014:  Should you move your data to the cloud?  Presentation slides for Modern Data Warehousing:  Presentation slides for Building an Effective Data Warehouse Architecture:  Hadoop and Data Warehouses:  What is the Microsoft Analytics Platform System (APS)?  Parallel Data Warehouse (PDW) benefits made simple:  What is Advanced Analytics?  Azure Data Lake
  60. 60. Q & A ? James Serra, Big Data Evangelist Email me at: Follow me at: @JamesSerra Link to me at: Visit my blog at: (where this slide deck is posted under “Presentations”)