Big Data: Why the big fuss?
Presenter
My blog: Information Management, Life & Petrol
http://infomanagementlifeandpetrol.blogspot.
com
@InfoRacer
Chris...
Introductions
Chris has spent 32 years in the Information management field, working for
leading organisations in Data Mana...
Who is IPL?
Trusted, independent consulting & solutions co
30 year track record
300 staff, £28m+ turnover
High-stakes, bus...
Three V’s
Three V’s
Three V’s
• Big data comes in one size: large. All enterprises are
awash with data, and can easily amass terabytes and
petabytes of ...
Data volume keeps growing
The total amount of global data is expected to grow to 2.7
zettabytes during 2012 (up 48% from 2...
Isn’t it all relative?
The 7 dimensions of data
Users
Devices
Capacity
Media
Advances
Software
Automation
•Population increase
•Computing demographic
•Proliferation
•Portability
•Miniturisation
•Reducing costs
•More choice
•Temp...
Then and now
Dimension
• Users
• Devices
• Capacity
• Media
• Advances
• Software
• Automation
Then
• IT in the workplace
...
Big data is not a new problem…
Then Now
Users
Devices
Capacity
Media
Advances
Software
Automation
Then Now
Users
Devices
Capacity
Media
Advances
Software
Automation
Data
It’s all about scale ……
+ the combination
Back to basics
Still all about good Information and Data Management
Driver = Need to act faster
Challenge = Joining it all...
The three Vs
The fourth V
What is needed? In what quantity? And by when?
What’s the point of Big Data yielding
Little Information?
Understand what it is that you need
Remember “Garbage in…”
Quality is a key factor:
Unstructured – Homeland Security may not care
Structured – poorly calibrat...
More data isn’t necessarily better
The fundamentals
Data Architecture
Data Governance
Master Data Management
Information Security
Data Quality
Metadata Manag...
Managing Big Data successfully
Data quality
Sort out your ‘little data’ first
Managing Big Data successfully
Data quality
Sort out your ‘little data’ first
Select the right technology solution(s)
Unde...
Conclusions
Keep it all in perspective, most of this is not new
True value comes from deep understanding of the three Vs
R...
Questions
My blog: Information Management, Life & Petrol
http://infomanagementlifeandpetrol.blogspot.com
@InfoRacer
Tel: +...
Financial Services Opportunities
Creating actionable intelligence – credit history
Customer insight
Fraud detection
Regula...
Big Data sources
Key/value Data Stores such as Cassandra
Columnar/tabular NoSQL Data Stores such as Hadoop &
Hypertable
Ma...
Don’t forget Data Quality
Managing the quality of the data is of the upmost
importance
What’s the use of this vast resourc...
Data Quality Maturity Assessment
Level 1 - Initial Level 2 - Repeatable Level 3 - Defined Level 4 - Managed Level 5 - Opti...
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
Upcoming SlideShare
Loading in …5
×

BDA 2012 Big data why the big fuss?

621 views

Published on

Big Data, why the Big fuss.
Volume, Variety, Velocity ... we know the 3 V's of Big Data. But Big Data if it yields little Information is useless, so focus on the 4th V = Value.
If you haven't sorted quality & data governance for your "little data" then seriously consider if you want to venture into the world of Big Data

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

BDA 2012 Big data why the big fuss?

  1. 1. Big Data: Why the big fuss?
  2. 2. Presenter My blog: Information Management, Life & Petrol http://infomanagementlifeandpetrol.blogspot. com @InfoRacer Chris Bradley Chief Development Officer chris.bradley@ipl.com +44 1225 475000
  3. 3. Introductions Chris has spent 32 years in the Information management field, working for leading organisations in Data Management Strategy, Master Data Management, Metadata Management, Data Warehouse and Business Intelligence. Graduating in 1979 Chris worked for the MoD(Navy), Volvo, Thorn EMI (as Head of Information Management), Readers Digest Inc (as European CIO), and Coopers and Lybrand Management Consultancy where he established and ran the International Data Management practice. Chris heads IPL’s Business Consultancy practice and is advising several Energy, Pharmaceutical, Finance and Government clients on Business Process and Information Asset Management. Chris is a member of the MPO, Director of DAMA UK and holds the CDMP Master certification. He co-authored “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”. Chris is a columnist and frequent contributor to industry publications. He authors an experts channel on the influential BeyeNETWORK, is a recognised thought- leader in Information Management and regular key speaker at major International Information Management conferences. chris.bradley@ipl.com +44 1225 475000 Blog: Information Management, Life & Petrol http://infomanagementlifeandpetrol.blogspot. com @InfoRacer Christopher Bradley Chief Development Officer
  4. 4. Who is IPL? Trusted, independent consulting & solutions co 30 year track record 300 staff, £28m+ turnover High-stakes, business & mission critical contexts Consistently exceed expectations Business Consulting Division Information Management - IM Strategy - Information Security & Assurance - Data Governance - Information Exploitation - Master Data Management - Information Architecture - Business Intelligence .......turning Information into a strategic asset Enterprise Architecture Business Process Management Programme Management IPL Consulting Clients
  5. 5. Three V’s
  6. 6. Three V’s
  7. 7. Three V’s
  8. 8. • Big data comes in one size: large. All enterprises are awash with data, and can easily amass terabytes and petabytes of information. • Can systems scale up without degrading performance intolerably? Volume • Frequently time-sensitive, big data should be used as it streams into the enterprise in order to maximise its value to the business. • How can you calculate mean values across a constantly changing landscape? Velocity • Big data extends beyond structured data to include unstructured data of all varieties: text, audio, video, click streams, log files and more. • How do you apply the normal methods of analytics and reporting with unknown structures? Variety
  9. 9. Data volume keeps growing The total amount of global data is expected to grow to 2.7 zettabytes during 2012 (up 48% from 2011)* Equivalent of every person sending 30 tweets/hour for the next 1200 years! Enterprises will manage 50 times more data and files will grow 75 times in the next decade 80% of the world’s data is unstructured * IDC Digital Universe Study 2011
  10. 10. Isn’t it all relative?
  11. 11. The 7 dimensions of data Users Devices Capacity Media Advances Software Automation
  12. 12. •Population increase •Computing demographic •Proliferation •Portability •Miniturisation •Reducing costs •More choice •Temptation to fill •File sizes •New formats •Needs more space •More files •Solution fulfillment •Augmentation
  13. 13. Then and now Dimension • Users • Devices • Capacity • Media • Advances • Software • Automation Then • IT in the workplace • 3270 / Green screen • KBs and MBs • Expensive floppy disks • Dedicated • Minimal/business • Business processes Now • Anywhere • Fixed and mobile • PBs, ZBs & YBs • Cheap cards and sticks • Multi-purpose • Complex/everything • What isn’t?
  14. 14. Big data is not a new problem…
  15. 15. Then Now Users Devices Capacity Media Advances Software Automation
  16. 16. Then Now Users Devices Capacity Media Advances Software Automation Data
  17. 17. It’s all about scale …… + the combination
  18. 18. Back to basics Still all about good Information and Data Management Driver = Need to act faster Challenge = Joining it all up … and that’s getting harder Objective = Remains the same … Information Exploitation
  19. 19. The three Vs
  20. 20. The fourth V What is needed? In what quantity? And by when?
  21. 21. What’s the point of Big Data yielding Little Information?
  22. 22. Understand what it is that you need
  23. 23. Remember “Garbage in…” Quality is a key factor: Unstructured – Homeland Security may not care Structured – poorly calibrated meters = bigger garbage Faults in the technology and processes produce exaggerated errors Bad decisions get made faster It’s all about scale… …get the IM basics for ‘little data’ right first
  24. 24. More data isn’t necessarily better
  25. 25. The fundamentals Data Architecture Data Governance Master Data Management Information Security Data Quality Metadata Management Business Intelligence Information Management Core Disciplines Source: DAMA-I
  26. 26. Managing Big Data successfully Data quality Sort out your ‘little data’ first
  27. 27. Managing Big Data successfully Data quality Sort out your ‘little data’ first Select the right technology solution(s) Understand the analytics required: Near real-time Mining deeper than before Design optimal presentation channels Target the skills you need Key/value Data Stores eg Cassandra Columnar/tabular NoSQL Data Stores eg Hadoop, Hypertable MPP Appliances eg Greenplum , Netezza XML Data Stores eg CuDB, Marklogic
  28. 28. Conclusions Keep it all in perspective, most of this is not new True value comes from deep understanding of the three Vs Remember the fourth V is the bottom line More data does not necessarily mean better information or wiser decisions Apply data management fundamentals before the technology for Big Data
  29. 29. Questions My blog: Information Management, Life & Petrol http://infomanagementlifeandpetrol.blogspot.com @InfoRacer Tel: +44 1225 475000 email: Chris.Bradley@ipl.com
  30. 30. Financial Services Opportunities Creating actionable intelligence – credit history Customer insight Fraud detection Regulatory compliance
  31. 31. Big Data sources Key/value Data Stores such as Cassandra Columnar/tabular NoSQL Data Stores such as Hadoop & Hypertable Massively Parallel Processing Appliances such as Greenplum & Netezza XML Data Stores such as CuDB & Marklogic Data Federation/ Data Virtualisation approaches are stepping up to meet this challenge
  32. 32. Don’t forget Data Quality Managing the quality of the data is of the upmost importance What’s the use of this vast resource if its quality and trustworthiness is questionable? Driving your data quality capability up the maturity levels is key
  33. 33. Data Quality Maturity Assessment Level 1 - Initial Level 2 - Repeatable Level 3 - Defined Level 4 - Managed Level 5 - Optimised Limited awareness within the enterprise of the importance of information quality. Very few, if any, processes in place to measure quality of information. Data is often not trusted by business users. The quality of few data sources is measured in an ad hoc manner. A number of different tools used to measure quality. The activity is driven by a projects or departments. Limited understanding of good versus bad quality. Identified issues are not consistently managed. Quality measures have been defined for some key data sources. Specific tools adopted to measure quality with some standards in place. The processes for measuring quality are applied at consistent intervals. Data issues are addressed where critical. Data quality is measured for all key data sources on a regular basis. Quality metrics information is published via dashboards etc. Active management of data issues through the data ownership model ensures issues are often resolved. Quality considerations baked into the SDLC. The measurement of data quality is embedded in many business processes across the enterprise. Data quality issues addressed through the data ownership model. Data quality issues fed back to be fixed at source.

×