Big Data, why the Big fuss.
Volume, Variety, Velocity ... we know the 3 V's of Big Data. But Big Data if it yields little Information is useless, so focus on the 4th V = Value.
If you haven't sorted quality & data governance for your "little data" then seriously consider if you want to venture into the world of Big Data
2. Presenter
My blog: Information Management, Life & Petrol
http://infomanagementlifeandpetrol.blogspot.
com
@InfoRacer
Chris Bradley
Chief Development Officer
chris.bradley@ipl.com
+44 1225 475000
3. Introductions
Chris has spent 32 years in the Information management field, working for
leading organisations in Data Management Strategy, Master Data Management,
Metadata Management, Data Warehouse and Business Intelligence.
Graduating in 1979 Chris worked for the MoD(Navy), Volvo, Thorn EMI (as Head
of Information Management), Readers Digest Inc (as European CIO), and
Coopers and Lybrand Management Consultancy where he established and ran
the International Data Management practice.
Chris heads IPL’s Business Consultancy practice and is advising several
Energy, Pharmaceutical, Finance and Government clients on Business Process
and Information Asset Management.
Chris is a member of the MPO, Director of DAMA UK and holds the CDMP
Master certification. He co-authored “Data Modelling For The Business – A
Handbook for aligning the business with IT using high-level data models”.
Chris is a columnist and frequent contributor to industry publications. He authors
an experts channel on the influential BeyeNETWORK, is a recognised thought-
leader in Information Management and regular key speaker at major
International Information Management conferences.
chris.bradley@ipl.com
+44 1225 475000
Blog: Information Management, Life & Petrol
http://infomanagementlifeandpetrol.blogspot.
com
@InfoRacer
Christopher Bradley
Chief Development Officer
4. Who is IPL?
Trusted, independent consulting & solutions co
30 year track record
300 staff, £28m+ turnover
High-stakes, business & mission critical contexts
Consistently exceed expectations
Business Consulting Division
Information Management
- IM Strategy
- Information Security & Assurance
- Data Governance
- Information Exploitation
- Master Data Management
- Information Architecture
- Business Intelligence
.......turning Information into a strategic asset
Enterprise Architecture
Business Process Management
Programme Management
IPL Consulting Clients
8. • Big data comes in one size: large. All enterprises are
awash with data, and can easily amass terabytes and
petabytes of information.
• Can systems scale up without degrading performance
intolerably?
Volume
• Frequently time-sensitive, big data should be used as
it streams into the enterprise in order to maximise its
value to the business.
• How can you calculate mean values across a
constantly changing landscape?
Velocity
• Big data extends beyond structured data to include
unstructured data of all varieties: text, audio, video,
click streams, log files and more.
• How do you apply the normal methods of analytics
and reporting with unknown structures?
Variety
9. Data volume keeps growing
The total amount of global data is expected to grow to 2.7
zettabytes during 2012 (up 48% from 2011)*
Equivalent of every person sending 30 tweets/hour for the
next 1200 years!
Enterprises will manage 50 times more data and files will
grow 75 times in the next decade
80% of the world’s data is unstructured
* IDC Digital Universe Study 2011
13. Then and now
Dimension
• Users
• Devices
• Capacity
• Media
• Advances
• Software
• Automation
Then
• IT in the workplace
• 3270 / Green screen
• KBs and MBs
• Expensive floppy disks
• Dedicated
• Minimal/business
• Business processes
Now
• Anywhere
• Fixed and mobile
• PBs, ZBs & YBs
• Cheap cards and sticks
• Multi-purpose
• Complex/everything
• What isn’t?
18. Back to basics
Still all about good Information and Data Management
Driver = Need to act faster
Challenge = Joining it all up … and that’s getting harder
Objective = Remains the same … Information Exploitation
23. Remember “Garbage in…”
Quality is a key factor:
Unstructured – Homeland Security may not care
Structured – poorly calibrated meters = bigger garbage
Faults in the technology and processes produce
exaggerated errors
Bad decisions get made faster
It’s all about scale…
…get the IM basics for ‘little data’ right first
25. The fundamentals
Data Architecture
Data Governance
Master Data Management
Information Security
Data Quality
Metadata Management
Business Intelligence
Information Management Core Disciplines
Source: DAMA-I
26. Managing Big Data successfully
Data quality
Sort out your ‘little data’ first
27.
28.
29. Managing Big Data successfully
Data quality
Sort out your ‘little data’ first
Select the right technology solution(s)
Understand the analytics required:
Near real-time
Mining deeper than before
Design optimal presentation channels
Target the skills you need
Key/value Data Stores eg Cassandra
Columnar/tabular NoSQL Data Stores eg
Hadoop, Hypertable
MPP Appliances eg Greenplum , Netezza
XML Data Stores eg CuDB, Marklogic
30. Conclusions
Keep it all in perspective, most of this is not new
True value comes from deep understanding of the three Vs
Remember the fourth V is the bottom line
More data does not necessarily mean better information or
wiser decisions
Apply data management fundamentals before the
technology for Big Data
31. Questions
My blog: Information Management, Life & Petrol
http://infomanagementlifeandpetrol.blogspot.com
@InfoRacer
Tel: +44 1225 475000
email: Chris.Bradley@ipl.com
34. Big Data sources
Key/value Data Stores such as Cassandra
Columnar/tabular NoSQL Data Stores such as Hadoop &
Hypertable
Massively Parallel Processing Appliances such as Greenplum
& Netezza
XML Data Stores such as CuDB & Marklogic
Data Federation/ Data Virtualisation approaches are stepping up to meet this
challenge
35. Don’t forget Data Quality
Managing the quality of the data is of the upmost
importance
What’s the use of this vast resource if its quality and
trustworthiness is questionable?
Driving your data quality capability up the maturity levels is
key
36. Data Quality Maturity Assessment
Level 1 - Initial Level 2 - Repeatable Level 3 - Defined Level 4 - Managed Level 5 - Optimised
Limited awareness
within the enterprise
of the importance of
information quality.
Very few, if any,
processes in place to
measure quality of
information. Data is
often not trusted by
business users.
The quality of few
data sources is
measured in an ad
hoc manner. A
number of different
tools used to measure
quality. The activity is
driven by a projects
or departments.
Limited
understanding of
good versus bad
quality. Identified
issues are not
consistently
managed.
Quality measures
have been defined for
some key data
sources. Specific
tools adopted to
measure quality with
some standards in
place. The processes
for measuring quality
are applied at
consistent intervals.
Data issues are
addressed where
critical.
Data quality is
measured for all key
data sources on a
regular basis. Quality
metrics information is
published via
dashboards etc.
Active management
of data issues through
the data ownership
model ensures issues
are often resolved.
Quality
considerations baked
into the SDLC.
The measurement of
data quality is
embedded in many
business processes
across the enterprise.
Data quality issues
addressed through
the data ownership
model. Data quality
issues fed back to be
fixed at source.
Editor's Notes
Chris
Chris
Chris
In 1859, Thomas Austin brought out 24 rabbits, 5 hares and 72 partridges and released them on his property, just outside of Geelong in Victoria, called ‘Barwon Park' on Christmas Day. Within 15 years, over 2million per year were being shot or trapped without denting the population.Biological controls in 2nd half of 20th Century reduced the population to aprox 300M. 1991 estimated 600M as resistance to the specific controls has built up.
Churchill V for VictoryV “visitors” 1983 TV min seriesV Vendetta originally 1980s comic book, 2005 film, Dystopian backdrop seeks to destroy Totalitarian govt.Gibson flying V guitar; first released 1958