Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Wake up and smell the data


Published on

Big data is a big part of the disruption hitting this market, but not in the way most people think. It's not replacing the data warehouse, but it is changing the technology stack. It doesn't eliminate data management, but it does redefine enterprise data architecture. Big data is and isn't many things. It's important to understand which information uses are well supported and which have yet to be addressed. Otherwise you risk replacing one set of problems with another. Come to this session to hear some observations on what big data is, isn't and aspires to be.
A video is available, starts at 1:03 into this Strata online event:

Published in: Technology, Business
  • Be the first to comment

Wake up and smell the data

  1. 1. Wake Up and Smell the DataFebruary, 2013Mark
  2. 2. CaveatThe focus of this talk is on information processing and delivery, leaving out many aspects of big data in the automation / execution sense.
  3. 3. Big Data, Big Hype$876 Gajillion (analyst estimates of the big data market)
  4. 4. We’ve been here beforeBill Schmarzo, EMC
  5. 5. Big Data, Big NonsenseBig data is subjective, based on bigness at a point in time?McKinsey focused on the least interesting aspect of big data.Source: McKinsey
  6. 6. Data volume is the oldest, easiest problemImage courtesy of Teradata
  7. 7. Technology Capability and Data VolumeSource: Noumenal, Inc.
  8. 8. Origin of BI and data warehouse conceptsThe general concept of a separate architecture for BI has been around longer, but this paper by Devlin and Murphy is the first formal data warehouse architecture and definition published.8“An architecture for a business andinformation system”, B. A. Devlin,P. T. Murphy, IBM Systems Journal,Vol.27, No. 1, (1988)Slide 8Copyright Third Nature, Inc.
  9. 9. Our ideas aboutinformation andhow it’s used areoutdated.
  10. 10. Metadata catalog
  11. 11. Report
  12. 12. Report library
  13. 13. BI is using broken metaphorsWe think of BI as publishing, which it isn’t.
  14. 14. When you first give people access to information that was unavailable…OH GODI can see into forever
  15. 15. After a while the response is more measured
  16. 16. User autonomy is a tradeoffAutonomy is a tradeoff in most data warehouses: control at the expense of complexity.Complexity for casual users can lead to messes.So we err on the side of simplifying user access in three ways…
  17. 17. Centralize: that solves all problems!Creates bottlenecksCauses scale problemsEnforces a single modelIn some organizations and areas of business “data warehouse” is a bad word.
  18. 18. Standardize: it’s simpler for everyone
  19. 19. The “E” in EDWwas a lie…
  20. 20. Measurement started with the convenient dataThe convenient data is transactional data.▪ Goes in the DW and is used, even if it isn’t the right measurement.The difficult and misleading data is declarative data.▪ What people say and what they do require ground truth.The inconvenient data is observational data.▪ It’s not neat, clean, or designed into most systems of operation.We need to build data systems that integrate all three.
  21. 21. Value: There’s a pony in there somewhere
  22. 22. Many current views miss the pointUsing Big Data
  23. 23. It’s not about “big”Using Big DataAnd “big” is often not as big as you think it is.
  24. 24. It’s not really about data, eitherUsing Big DataIf there’s no process for applying information in a specificcontext then you are producing expensive trivia.
  25. 25. Two keys to making big data worthwhileValue:Goal  solutionnotSolution  goalActionability:Simple “value” isn’t enough.Information has to be actionable, somehow.
  26. 26. Planning data strategy means understanding the context of data use so we can provide infrastructureMonitorAnalyzeExceptionsAnalyzeCausesDecide ActNo problem No idea Do nothingWe need to focus on what people do with data as theprimary task, not on the data or the technology.Copyright Third Nature, Inc.
  27. 27. General model for organizational use of dataCollectnew dataMonitorAnalyzeExceptionsAnalyzeCausesDecide ActNo problem No idea Do nothingAct on the processUsually days/longer timeframeAct within the processUsually real-time to daily
  28. 28. You need to be able to support both pathsCollectnew dataMonitorAnalyzeExceptionsAnalyzeCausesDecide ActAct on the processAct within the processConventional BICausal analysis, i.e. “data science”
  29. 29. How do you manage the business in today’s environment?Our simplistic notions of BI with stable models, ordered data and predictability are being replaced by concepts from decision support and complex adaptive systems (CAS).Simple Complicated ComplexAssumption: Order Assumption: Unorder Assumption: DisorderCause and effect is repeatable & predictable Cause and effect is separated in time & space, repeatable, learnableCause and effect is coherent in retrospect only, modelablebut changingKnown Knowable UnpredictableStandard processes, clear metrics, best practiceAnalytical techniques to determine options, effectsExperiment to create possible optionsSense, categorize, respond Sense, analyze, respond Test, sense, respondReporting, dashboards Ad‐hoc, OLAP, exploration Data science, casual analysisSituational context governs data useCopyright Third Nature, Inc.
  30. 30. BI/DW environment support varies for these contextsHandles this really well (most of the time).Basic BI Analysis Data science, analyticsAssumption: Order Assumption: Unorder Assumption: DisorderCause and effect is repeatable & predictable Cause and effect is separated in time & space, repeatable, learnableCause and effect is coherent in retrospect only, modelablebut changingKnown Knowable UnpredictableStandard processes, clear metrics, best practiceAnalytical techniques to determine options, effectsExperiment to create possible options, test hypothesesSense, categorize, respond Sense, analyze, respond Test, sense, respondReporting, dashboards Ad‐hoc, OLAP, data discovery Casual analysis, simulationHandles this sort of ok, sometimes.This, not so much.Copyright Third Nature, Inc.
  31. 31. TANSTAAFLTechnologies are not perfect replacements for one another.When replacing the old with the new (or ignoring the new over the old) you always make tradeoffs, and usually you won’t see them for a long time.
  32. 32. The usage models for conventional BICollectnew dataMonitorAnalyzeExceptionsAnalyzeCausesDecide ActNo problem No idea Do nothingAct on the processUsually days/longer timeframeAct within the processUsually real-time to dailyThis is what we’ve beendoing with BI so far: staticreporting, dashboards,ad-hoc query, OLAP
  33. 33. The usage models for analytics and “big data” Collectnew dataMonitorAnalyzeExceptionsAnalyzeCausesDecide ActNo problem No idea Do nothingAct on the processUsually days/longer timeframeAct within the processUsually real-time to dailyAnalytics and big data isfocused on new usecases: deeper analysis,causes, prediction,optimizing decisionsThis isn’t ad-hoc,reporting, or OLAP.
  34. 34. Analytics embiggens the data volume problemMany of the processing problems are O(n2) or worse, so moderate data can be a problem for DB‐based platforms
  35. 35. New and growing use cases drive the need to expandThe use cases are now interactive applications, lower latency data, complex analytics and discovery rather than reporting.
  36. 36. Big Data Shift in a NutshellThe old model for data▪ Centralized publishing▪ Read only▪ Integrate before use▪ Record only important data▪ Retrieval‐focused▪ Single method of access▪ Human‐level latencyThe new model for data▪ Community creation▪ Read‐write▪ Integrate at time of use▪ Record all the data▪ Processing‐focused▪ Multiple methods of access▪ Machine‐level latencyIt’s an architectural reconfiguration, just like web 2.0
  37. 37. “The future, according to some scientists, will be exactly like the past, only far more expensive.” ~ John Sladek
  38. 38. About the PresenterMark Madsen is president of Third Nature, a research and advisory firm focused on analytics, business intelligence and data management. Mark is an award‐winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor at Forbes Online and Information Management. For more information or to contact Mark, follow @markmadsen on Twitter or visit 
  39. 39. About Third NatureThird Nature is a research and consulting firm focused on new andemerging technology and practices in analytics, business intelligence, andperformance management. If your question is related to data, analytics,information strategy and technology infrastructure then you‘re at the rightplace.Our goal is to help companies take advantage of information-drivenmanagement practices and applications. We offer education, consultingand research services to support business and IT organizations as well astechnology vendors.We fill the gap between what the industry analyst firms cover and what ITneeds. We specialize in product and technology analysis, so we look atemerging technologies and markets, evaluating technology and hw it isapplied rather than vendor market positions.
  40. 40. CC Image AttributionsThanks to the people who supplied the creative commons licensed images used in this presentation:Outdated gumshoe.jpg – catalog – of hours manuscript2.jpg ‐ library san lorenzo.jpg ‐ ‐ in field.jpg ‐ ‐