Big Data - it's the big buzz. But is it dead on arrival?
In this presentation Daragh O Brien looks at the history of information management, the challenges of data quality and governance, and the implications for big data...
2. About Castlebridge Associates (www.castlebridge.ie)
Data
Protection
Data
Protection
Information
Quality
Consulting
Coaching/
Mentoring
Training
Data
Governance
Information
Quality
Data
Governance
Project
Management
Quality
Assured
Certified
Trainers
Qualified &
Experienced
External
QA
Audits
Irish State Approved
Training Provider
IQCP &
DP
Certified
Quality
Assured
Syllabus
Fin.
Svcs
Telco
Many
Industries
Govt
Edu
Certified
PMs
Utilities
NonProfit
5. Ancient Sumeria
• Written in Accadian
• Used pictographic representations of information and concepts baked/carved
into tablets made of clay (high sand content)
7. Filing: The Birth of Big Data
Image by Nic McPhee @ commons.wikimedia.com
8. Physical Data (5925 years approx.)
6 thousand years
Tablets
Tablets
Electronic Data
(c.75 years)
•
•
•
•
More Information processed
Information processed faster
More ‘self service’ data processing
Changed expectations of data and
processing.
13. General Overview
Strong
Weak
Maturity models are (almost always) 5
step models that associate common
characteristics of organisations that are
doing things well or are on the way to
doing them better.
14. Where is Big Data?
Certainty
Wisdom
Enlightenment
Awakening
Uncertainty
Optimising
Managed
Defined
Repeatable
Initial
(Overlaying Crosby CMM model with DMBOK Maturity model)
15. Where is Big Data?
Certainty
Wisdom
Enlightenment
Awakening
Uncertainty
Initial
Repeatable
Defined
Managed
Optimising
16. Maturity: Answering So What Questions
So What…
…is it?
…problems will it solve?
…will we be able to differently?
… legal / regulatory risks does all this pose?
… do we need to do to tap this gold mine?
… are we not doing today that this will enable?
… are we not doing today that this make worse?
18. Organisations don‟t manage data well
Information Governance / Data
Governance only now emerging as
formal disciplines
Information Quality / Data Quality also
only beginning to be coherently tackled
in many organisations
Phone companies still get bills wrong
Data Protection breaches still occur
•
Note – this is more than just SECURITY
breaches
Data Migrations, CRM, ERP still fail
Metadata largely under-managed
19. Bottom Line Impact
% of Risk Managers who see Information as
Deloitte
“Significant” in their Risk Management plans
% Data Migrations that FAIL (don‟t deliver, over
Bloor
run time/budget, deliver reduced functionality)
% of Chief Financial Officers who see Information
Forrester
Management as a barrier to achieving Business goals
Estimated % of TURNOVER wasted by
Gartner
companies due to poor information quality
Time lost to organisations from staff
IBM rechecking information
88%
84%
75%
35%
30%
This is when dealing with “traditional” structured/semi-structured data..
20. “So far, for 50 years, the information revolution has centered on
data—their collection, storage, transmission, analysis, and
presentation. It has centered on the "T" in IT.
The next information revolution asks, what is the MEANING of
information, and what is its PURPOSE?”
Peter Drucker, Forbes ASAP, August 1998
27. The Pending Orders Solution 2006
Elite Specialist Information Quality Agent
Licensed to “Fix the Data by all means necessary”
(firearms not actually used…)
28. The Pending Orders Solution 2006
Orders for infrastructure
had engineering statuses
Orders for could have
multiple dependent
products – double counted
Revenue Assurance did not
look at all relevant data
sources
Dependencies between
process steps not
understood
29. The Pending Order Solution 2006
There wasn‟t a Crisis situation
Revenue
Assurance
Hypothesis was
flawed
• External Factors affected
order completion times
• Intra-order product
dependencies lead to
double counting
• Context of the process was
important
32. Question 1: So What Data Do We Need?
No doubt that more data helps,
but don‟t for a minute think that
you need all data to make an
informed business decision.
Organizations that are effectively
leveraging the power of Big Data
realize that they will never
capture all relevant information.
Phil Simon
To Big To Ignore: The Business Case for Big Data
34. Question 1: So What Data Do We Need?
What is the problem we are trying to solve?
What is the Process Context for this problem?
What is the “Information Environment” for this problem?
35. The Pending Orders Crisis
What is the problem we are trying to solve?
• Customers are not being billed for services they have
• Revenue from services is not being realised
• We have orders that are not being completed
What is the Process Context for this problem?
What is the “Information Environment” for this problem?
36. Question 1: So What Data Do We Need?
To properly answer this question you need to have:
A PLAN
37. Question 2: So What is Stopping us doing it?
Regulation:
Technology:
Human Factors:
• Data Protection Rules
• Industry Regulations re: Data Governance
• Legacy architecture
• Technology Management (Silos)
• Skills (technical/problem solving/analytical
• Political (Change Management)
38. Question 2: So What is Stopping us doing it?
Data:
• Quality of internal data
• Completeness, consistency, “transactability”
• Ability to link external data to internal data
• Governance of data
• Decision rights
• Supplier relationship management
• Roles & Responsibilities
39. Example of Regulation
Location Data
Use of Location Data in Telecommunications is affected by EU Data Protection rules
Consent is required for it to be used for “Value Adding” services
40. Data Quality
I am incredibly sceptical about claims that “Big
Data” is immune to Data Quality problems.
Statistically, Data Quality errors will skew your
mean, and create outliers that affect your
analysis.
While “Big Data” might not be as prone to „fat
finger‟ errors, you still have to consider whether
the mechanisms gathering the data are correctly
calibrated and the algorithms for analysis are
running correctly or whether you have
measurement errors you don‟t know about.
Dr Thomas C Redman, thought leader in Data Quality
43. Bias within the Data?
The greatest number of tweets about Sandy came from
Manhattan. This makes sense given the city's high level of
smartphone ownership and Twitter use, but it creates the
illusion that Manhattan was the hub of the disaster. Very
few messages originated from more severely affected
locations, such as Breezy Point, Coney Island and
Rockaway. As extended power blackouts drained batteries
and limited cellular access, even fewer tweets came from
the worst hit areas.
Kate Crawford Hidden Biases in Big Data, HBR 1st April 2013
Tom gives the example of his early work in telecoms billing data. The emphasis was on the sample bias quality but the actual measurement error in the process – the data quality issues – where an order of magnitude greater than the errors due to the sample bias.