Invited talk of Daragh O'Brien, Managing Director of Castlebridge Associates, at the European Data Forum 2013, 9 April 2013 in Dublin, Ireland: The Story of Maturity – How data in Business needs to pass the ‘So What’ tests
Tom gives the example of his early work in telecoms billing data. The emphasis was on the sample bias quality but the actual measurement error in the process – the data quality issues – where an order of magnitude greater than the errors due to the sample bias.
EDF2013: Invited Talk Daragh O'Brien: The Story of Maturity – How data in Business needs to pass the ‘So What’ tests
HISTORYOr: How we came to have all this data anyway…
Ancient Sumeria• Written in Accadian• Used pictographic representations of information and concepts baked/carved into tablets made of clay (high sand content)
Filing: The Birth of Big Data Image by Nic McPhee @ commons.wikimedia.com
Physical Data (5925 years approx.) 6 thousand yearsTablets Tablets Electronic Data (c.75 years) • More Information processed • Information processed faster • More ‘self service’ data processing • Changed expectations of data and processing.
Where is Big Data? Certainty Wisdom Optimising Enlightenment Managed Awakening Defined Repeatable Uncertainty Initial (Overlaying Crosby CMM model with DMBOK Maturity model)
Where is Big Data? Certainty Wisdom Optimising Enlightenment Managed Awakening Defined Repeatable Uncertainty Initial
Maturity: Answering So What QuestionsSo What… …is it? …problems will it solve? …will we be able to differently? … legal / regulatory risks does all this pose? … do we need to do to tap this gold mine? … are we not doing today that this will enable? … are we not doing today that this make worse?
Organisations don‟t manage data well Information Governance / Data Governance only now emerging as formal disciplines Information Quality / Data Quality also only beginning to be coherently tackled in many organisations Phone companies still get bills wrong Data Protection breaches still occur • Note – this is more than just SECURITY breaches Data Migrations, CRM, ERP still fail Metadata largely under-managed
Bottom Line Impact % of Risk Managers who see Information asDeloitte 88% “Significant” in their Risk Management plans % Data Migrations that FAIL (don‟t deliver, over 84% Bloor run time/budget, deliver reduced functionality)% of Chief Financial Officers who see InformationForresterManagement as a barrier to achieving Business goals 75%Estimated % of TURNOVER wasted by Gartner 35%companies due to poor information quality Time lost to organisations from staff 30% IBM rechecking information This is when dealing with “traditional” structured/semi-structured data..
“So far, for 50 years, the information revolution has centered ondata—their collection, storage, transmission, analysis, andpresentation. It has centered on the "T" in IT.The next information revolution asks, what is the MEANING ofinformation, and what is its PURPOSE?” Peter Drucker, Forbes ASAP, August 1998
Data Is the New Oil Oil Slick Water Pic: US Coast Guard Picture from NASA
A REAL EXAMPLENames have been changed to protect the innocent(and the guilty)
The Pending Order Crisis of 2006 If order not completed, cannot be billed
The Pending Order Crisis of 2006OMG There‟s MILLIONS of unbilled revenue out This is a CRISIS!!! there.
The Pending Order Crisis of 2006 The Sky is FALLING
The Pending Orders Solution 2006 Elite Specialist Information Quality Agent Licensed to “Fix the Data by all means necessary” (firearms not actually used…)
The Pending Orders Solution 2006 Orders for could have Orders for infrastructure multiple dependent had engineering statuses products – double counted Revenue Assurance did not Dependencies between look at all relevant data process steps not sources understood
The Pending Order Solution 2006There wasn‟t a Crisis situation • External Factors affected order completion times • Intra-order product dependencies lead toRevenue double counting • Context of the process wasAssurance importantHypothesis wasflawed
Question 1: So What Data Do We Need? No doubt that more data helps, but don‟t for a minute think that you need all data to make an informed business decision. Organizations that are effectively leveraging the power of Big Data realize that they will never capture all relevant information. Phil Simon To Big To Ignore: The Business Case for Big Data
Question 1: So What Data Do We Need?What is the problem we are trying to solve?What is the Process Context for this problem?What is the “Information Environment” for this problem?
The Pending Orders CrisisWhat is the problem we are trying to solve? • Customers are not being billed for services they have • Revenue from services is not being realised • We have orders that are not being completedWhat is the Process Context for this problem?What is the “Information Environment” for this problem?
Question 1: So What Data Do We Need? To properly answer this question you need to have: A PLAN
Question 2: So What is Stopping us doing it? • Data Protection Rules Regulation: • Industry Regulations re: Data Governance • Legacy architecture Technology: • Technology Management (Silos)Human Factors: • Skills (technical/problem solving/analytical • Political (Change Management)
Question 2: So What is Stopping us doing it? • Quality of internal data Data: • Completeness, consistency, “transactability” • Ability to link external data to internal data • Governance of data • Decision rights • Supplier relationship management • Roles & Responsibilities
Example of RegulationLocation DataUse of Location Data in Telecommunications is affected by EU Data Protection rules Consent is required for it to be used for “Value Adding” services
Data Quality I am incredibly sceptical about claims that “Big Data” is immune to Data Quality problems. Statistically, Data Quality errors will skew your mean, and create outliers that affect your analysis. While “Big Data” might not be as prone to „fat finger‟ errors, you still have to consider whether the mechanisms gathering the data are correctly calibrated and the algorithms for analysis are running correctly or whether you have measurement errors you don‟t know about. Dr Thomas C Redman, thought leader in Data Quality
Databases are like lakesSystem A System B System C
Bias within the Data?The greatest number of tweets about Sandy came fromManhattan. This makes sense given the citys high level ofsmartphone ownership and Twitter use, but it creates theillusion that Manhattan was the hub of the disaster. Veryfew messages originated from more severely affectedlocations, such as Breezy Point, Coney Island andRockaway. As extended power blackouts drained batteriesand limited cellular access, even fewer tweets came fromthe worst hit areas. Kate Crawford Hidden Biases in Big Data, HBR 1st April 2013