Your SlideShare is downloading. ×
0
Demystifying Big Data
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
sw...
Get Social with Us!
Live Twitter Feed
@datablueprint
@paiken
#dataed

Like Us
www.facebook.com/datablueprint
Join the Grou...
Demystifying

Big Data 2.0

Developing the Right Approach for Implementing Big
Data Techniques

Presented by Peter Aiken, ...
Peter Aiken, PhD
•
•
•
•
•

30+ years of experience in data
management
Multiple international awards &
recognition
Founder...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Why the Big Deal about Big Data?
• We are at an inflection point: The
sheer volume of data generated,
stored, and mined fo...
Myth #1: Everyone should invest in Big Data

Fact:
• Not every company will benefit
from Big Data
• It depends on your siz...
Big Data can create significant financial value across sectors

• Some (not all)
companies can
take advantage of
Big Data ...
5 Ways in which Big Data creates Big Business Value
1. Information is transparent and
usable at much higher
frequency
2. E...
Myth #2: Big Data has a clear definition

Fact:
• The term is used so often and in
many contexts that its meaning
has beco...
Defining Big Data
• Gartner: High-volume, high-velocity, and/or high-variety
information assets that require new forms of
...
Big Data Characteristics generally include:
1. Volume

The amount of data
2. Velocity

The speed of data going
in and out
...
Big Data Gartner Hype Cycle

13

Copyright 2013 by Data Blueprint
Some Big Data Limitations
• Data analysis struggles with
social cognition
• Data struggles with context
• Data creates big...
Business Information
Market: $1.1 Trillion a
Year
• Enterprises spend an
average of $38 million
on information/year
• Smal...
Big Data = Big Spending
• Enterprises are spending wildly on Big Data but don’t know if it’s
worth it yet (Business Inside...
Myth #3: Big Data is just another IT project
Fact:
• Big Data is not your typical IT
project
– Does not answer typical IT ...
Healthcare Example: Patient Data
• Clinical data:
– Diagnosis/prognosis/treatment
– Genetic data

• Patient demographic da...
Retail Example: Loyalty Programs & Big Data
• Companies need to understand current wants and needs AND
predict future tend...
Take Aways-Big Data Context
• Technology continues to evolve at
increasing speeds
• Big Data is here
– We have the potenti...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Myth #4: Big Data is new
Fact:
• The term originated in the Silicon
Valley in the 1990s
• The concept has been used
previo...
Early Database

“The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt

23

Copyright 2013 by Data Blueprint
Mortality Geocoding

When is it happening?
Where is it happening?

Why is it happening?
“The Human Face of Big Data”, Rick...
Big Data Characteristics & the Plague
1.Volume
– Plague data collection points

2.Velocity
– Speed at which disease
regist...
John Snow’s 1854 Cholera Map of London

26

Copyright 2013 by Data Blueprint
Take Aways-Historic Big Data Challenges
• Fact: Big Data is not new
• Foundational data
management challenges
remain simil...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Myth #5: Big Data is innovative
Fact:
• Big Data techniques are innovative
• ROI and insights depend on the size
of the bu...
Data Footprints
• SQL Server
– 47,000,000,000,000 bytes
– Largest table 34 billion records
3.5 TBs

• Informix
– 1,800,000...
Big Data Characteristics generally include:
1. Volume

The amount of data
2. Velocity

The speed of data
going in and out
...
#1 VOLUME,
The Amount of Data
2012 London Summer Games
• 60 GB of data/second
• 200,000 hours of big data will
be generate...
#2 VELOCITY, The Speed of Data
Nanex 1/2 Second
Trading Data
May 2, 2013
Johnson & Johnson

The European Union
last year a...
#3 VARIETY, Range of Data Types & Sources
Increasingly individuals make use of
data producing gadgets to perform
services ...
#4 VARIABILITY,
Many options or variable interpretations confound analysis

HistoryflowWikipedia entry
for the word
“Islam...
Take Aways: Big Data Challenges Today
• Fact: Big Data techniques are innovative but
“Big Data” is not
• Challenges are bo...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Myth #6: Big Data provides all the Answers
Fact:
• Big Data does not mean the end of
scientific theory
• Be careful or you...
39

Copyright 2013 by Data Blueprint
• Identify business opportunity
• How can data be leveraged in
exploring
– External market place
• Analyze opportunities a...
Example: 2012 Olympic Summer Games

1. Volume: 845 million FB users averaging 15 TB
+ of data/day
2. Velocity: 60 GB of da...
• Based on my 6 V analysis, do I need a Big Data solution
or does my current BI solution address my business
opportunity?
...
• MUST have both
Foundational and
Technical practice
expertise

43

Copyright 2013 by Data Blueprint
44

Copyright 2013 by Data Blueprint
• Data Strategy
• Data Governance
• Data Architecture
• Data Education

45

Copyright 2013 by Data Blueprint
• Data Quality
• Data Integration
• Data Platforms
• BI/Analytics

46

Copyright 2013 by Data Blueprint
• Needs to be actionable
• Generally well understood by
business
• Document what has been learned

47

Copyright 2013 by D...
• Perfect results are not
necessary
• Reiterate and refine
• Iterative process to
reach decision point
• Use as feedback f...
49

Copyright 2013 by Data Blueprint
Take Aways-Approach: Crawl, Walk, Run
• Crawl:
– Identify business opportunity and
determine whether you truly need
a Big ...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Foundational Practice: Data Strategy
• Your data strategy must
align to your organizational
business strategy and
operatin...
Data Strategy Case Study
Enterprise Information Management Maturity

53

Copyright 2013 by Data Blueprint
Data Strategy Considerations
• What are the questions that
you cannot answer today?
• Is there a direct reliance on
unders...
Myth #7: You need Big Data for Insights
Fact:
• Distinction between Big Data and
doing analytics
– Big Data is defined by ...
Foundational Practice: Data Architecture
• Common vocabulary expressing
integrated requirements ensuring
that data assets ...
Data Architecture Considerations
• Does your current architecture for
BI and analytics support Big Data?
• Are you getting...
Technical Practice: Data Integration
• A data-centric
organization requires
unified data
• Integrating data across
organiz...
Integration Data Vault 2.0 with Big Data

Allowing
connections
between RDBMS
and NoSQL data is
beneficial
Examples:
1. Inv...
Data Integration Considerations
• The complexity of your data
integration challenge depends on
the questions you’re trying...
Technical Practice: Data Quality
• Quality is driven by fit for purpose
considerations
• Big Data quality is different:
– ...
Data Quality Considerations
• Big Data is trying to be
predictive
• What are the questions you
are trying to answer?
– Wha...
Myth #8: Bigger Data is Better
Fact:
• Better to have less data of good
quality than more poor quality big
data
• Analysis...
Technical Practice: Data Platforms
• Do you want to measure
critical operational process
performance?
• No one data platfo...
The Big Data Landscape

Copyright Dave Feinleib, bigdatalandscape.com
65

Copyright 2013 by Data Blueprint
Data Platforms Considerations
• Commonalities between most big data
stacks with file storage, columnar store,
querying eng...
Take Aways-Design Principles: Foundational & Technical
• Foundational data management
principles still apply
• Beware of S...
Outline
•
•
•
•
•
•

Big Data Context:
Why the Big Deal about Big Data?
Big Data Challenges:
Historical Perspective
Big Da...
Take Aways: In Summary
• Big data techniques are innovative
but “Big Data” is not
• Big Data characteristics: 6 Vs
– Volum...
References
•

The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November
20, 2012)

•

McK...
Questions?

+

=

It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.

71

Co...
Upcoming Events
Data-Centric Strategy & Roadmap
February 11, 2014 @ 2:00 PM ET/11:00 AM PT

Emerging Trends in Data Jobs
M...
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056
Upcoming SlideShare
Loading in...5
×

Data-Ed Webinar: Demystifying Big Data

2,373

Published on

We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.

Takeaways:

- The means by which Big Data techniques can complement existing data management practices
- The prototyping nature of practicing Big Data techniques
- The distinct ways in which utilizing Big Data can generate business value
- Bigger Data isn’t always Better Data

Published in: Technology

Transcript of "Data-Ed Webinar: Demystifying Big Data "

  1. 1. Demystifying Big Data • Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data – Michael Coren Date: Time: Presenter: May 14, 2013 2:00 PM ET/11:00 AM PT Peter Aiken, Ph.D. 1 Copyright 2013 by Data Blueprint
  2. 2. Get Social with Us! Live Twitter Feed @datablueprint @paiken #dataed Like Us www.facebook.com/datablueprint Join the Group Data Management & Business Intelligence 2 Copyright 2013 by Data Blueprint
  3. 3. Demystifying Big Data 2.0 Developing the Right Approach for Implementing Big Data Techniques Presented by Peter Aiken, Ph.D.
  4. 4. Peter Aiken, PhD • • • • • 30+ years of experience in data management Multiple international awards & recognition Founder, Data Blueprint (datablueprint.com) Associate Professor of IS, VCU (vcu.edu) Past President, DAMA International (dama.org) • • • 9 books and dozens of articles Experienced w/ 500+ data management practices in 20 countries Multi-year immersions with organizations as diverse as the US DoD, Nokia, Deutsche Bank, Wells Fargo, and the Commonwealth of Virginia 4 Copyright 2013 by Data Blueprint 2
  5. 5. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 5 Copyright 2013 by Data Blueprint
  6. 6. Why the Big Deal about Big Data? • We are at an inflection point: The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers (McKinsey) • We believe the same important principles still apply: – What problem are you trying to solve for your business? Your solution needs to fit your problem – Doing data for (big) data’s sake is not going to solve any problems – Risk of spending a lot of money on chasing Big Data that will realize little to no returns especially at this hype cycle stage http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1 6 Copyright 2013 by Data Blueprint
  7. 7. Myth #1: Everyone should invest in Big Data Fact: • Not every company will benefit from Big Data • It depends on your size and your ability – Local pizza shop vs. state-wide or national chain 7 Copyright 2013 by Data Blueprint
  8. 8. Big Data can create significant financial value across sectors • Some (not all) companies can take advantage of Big Data to create value if they want to compete 8 Copyright 2013 by Data Blueprint
  9. 9. 5 Ways in which Big Data creates Big Business Value 1. Information is transparent and usable at much higher frequency 2. Expose variability and boost performance 3. Narrow segmentation of customers and more precisely tailored products or services 4. Sophisticated analytics and improved decision-making 5. Improved development of the next generation of products and services http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1 9 Copyright 2013 by Data Blueprint
  10. 10. Myth #2: Big Data has a clear definition Fact: • The term is used so often and in many contexts that its meaning has become vague and ambiguous • Industry experts and scientists often disagree http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics 10 Copyright 2013 by Data Blueprint
  11. 11. Defining Big Data • Gartner: High-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization. • IBM: Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. • NY Times: Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. • McKinsey: Large pools of data that can be brought together and analyzed to discern patterns and make better decisions 11 Copyright 2013 by Data Blueprint
  12. 12. Big Data Characteristics generally include: 1. Volume The amount of data 2. Velocity The speed of data going in and out Q: "Would it be more useful to refer to "big data techniques?" 3. Variety The range of data types & sources 4. Variability Many options or variable interpretations confound analysis 12 Copyright 2013 by Data Blueprint
  13. 13. Big Data Gartner Hype Cycle 13 Copyright 2013 by Data Blueprint
  14. 14. Some Big Data Limitations • Data analysis struggles with social cognition • Data struggles with context • Data creates bigger haystacks • Big data has trouble with big problems • Data favors memes over masterpieces • Data obscures values David Brooks, New York Times: http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=0 14 Copyright 2013 by Data Blueprint
  15. 15. Business Information Market: $1.1 Trillion a Year • Enterprises spend an average of $38 million on information/year • Small and medium sized businesses on average spend $332,000 http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data 15 Copyright 2013 by Data Blueprint
  16. 16. Big Data = Big Spending • Enterprises are spending wildly on Big Data but don’t know if it’s worth it yet (Business Insider, 2012) • Big Data Technology Spending Trend: – 83% increase over the next 3 years (worldwide): • 2012: $28 billion • 2013: $34 billion • 2016: $232 billion • Caution: – Don’t fall victim to SOS (Shiny Object Syndrome) – A lot of money is being invested but is it generating the expected return? – Gartner Hype Cycle suggests results are going to be disappointing http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl 16 Copyright 2013 by Data Blueprint
  17. 17. Myth #3: Big Data is just another IT project Fact: • Big Data is not your typical IT project – Does not answer typical IT questions – Trend analysis, agile, actionable, etc. – Fundamentally different approach • Big Data Projects are exploratory • Big Data enables new capabilities • Big Data can be a disruptive technology • It might sound simple but that doesn’t mean it’s easy • Beware of SOS (Shiny Object Syndrome) 17 Copyright 2013 by Data Blueprint
  18. 18. Healthcare Example: Patient Data • Clinical data: – Diagnosis/prognosis/treatment – Genetic data • Patient demographic data • Insurance data: – Insurance provider – Claims data • Prescriptions & pharmacy information • Physical fitness data – Activity tracking through smartphone apps & social media • Health history • Medical research data 18 Copyright 2013 by Data Blueprint
  19. 19. Retail Example: Loyalty Programs & Big Data • Companies need to understand current wants and needs AND predict future tendencies • Customer -> Repeat Customer -> Brand Advocate • Customer loyalty programs & retention strategies – Track what is being purchased and how often – Coupons based on purchasing history – Targeted communications, campaigns & special offers – Social media for additional interactions – Personalize consumer interactions • Customer purchase history influences product placements – Retailers rapidly respond to consumer demands – Product placements, planogram optimization, etc. http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/ 19 Copyright 2013 by Data Blueprint
  20. 20. Take Aways-Big Data Context • Technology continues to evolve at increasing speeds • Big Data is here – We have the potential to create insights • Spend wisely & strategically: – Big Data is not going to solve all your problems. • Fact: – Big Data is not for everyone • Fact: – Lack of a clear definition • Hype Cycle: – Current: Peak of Inflated Expectations – Soon: Trough of Disillusionment 20 Copyright 2013 by Data Blueprint
  21. 21. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 21 Copyright 2013 by Data Blueprint
  22. 22. Myth #4: Big Data is new Fact: • The term originated in the Silicon Valley in the 1990s • The concept has been used previously – 800 year old linguistic datasets – Use in sciences in 1600s – Kepler, Sloan Digital Sky Survey, Statisticians’ view • Much harder to leverage Big Data when you lack appropriate techniques http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics 22 Copyright 2013 by Data Blueprint
  23. 23. Early Database “The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt 23 Copyright 2013 by Data Blueprint
  24. 24. Mortality Geocoding When is it happening? Where is it happening? Why is it happening? “The Human Face of Big Data”, Rick Smolan & Jennifer Erwitt 24 Copyright 2013 by Data Blueprint
  25. 25. Big Data Characteristics & the Plague 1.Volume – Plague data collection points 2.Velocity – Speed at which disease registers are updated 3.Variety – Who is collecting plague data points, how, and where? 4.Variability – Different ways of recording disease patterns and using that data – No social media yet but gossip existed 25 Copyright 2013 by Data Blueprint
  26. 26. John Snow’s 1854 Cholera Map of London 26 Copyright 2013 by Data Blueprint
  27. 27. Take Aways-Historic Big Data Challenges • Fact: Big Data is not new • Foundational data management challenges remain similar • Bills of Mortality by John Graunt – First true health data set – World’s first pattern of data – Foundation for probability industry, statistics, insurance 27 Copyright 2013 by Data Blueprint
  28. 28. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 28 Copyright 2013 by Data Blueprint
  29. 29. Myth #5: Big Data is innovative Fact: • Big Data techniques are innovative • ROI and insights depend on the size of the business and the amount of data used and produced, e.g. – Local pizza place vs. Papa John’s – Retail 29 Copyright 2013 by Data Blueprint
  30. 30. Data Footprints • SQL Server – 47,000,000,000,000 bytes – Largest table 34 billion records 3.5 TBs • Informix – 1,800,000,000 queries/day – 65,000,000 tables / 517,000 databases • Teradata – 117 billion records – 23 TBs for one table • DB2 – 29,838,518,078 daily queries 30 Copyright 2013 by Data Blueprint
  31. 31. Big Data Characteristics generally include: 1. Volume The amount of data 2. Velocity The speed of data going in and out 3. Variety The range of data types & sources 4. Variability Many options or variable interpretations confound analysis Q: "Would it be more useful to refer to "big data techniques?" 31 Copyright 2013 by Data Blueprint
  32. 32. #1 VOLUME, The Amount of Data 2012 London Summer Games • 60 GB of data/second • 200,000 hours of big data will be generated testing systems • 2,000 hours media coverage/ daily • 845 million Facebook users averaging 15 TB/day • 13,000 tweets/second • 4 billion watching • 8.5 billion devices connected 32 Copyright 2013 by Data Blueprint
  33. 33. #2 VELOCITY, The Speed of Data Nanex 1/2 Second Trading Data May 2, 2013 Johnson & Johnson The European Union last year approved a new rule mandating that all trades must exist for at least a half-second - in this instance 1,200 orders and 215 actual trades http://www.youtube.com/watch?v=LrWfXn_mvK8 33 Copyright 2013 by Data Blueprint
  34. 34. #3 VARIETY, Range of Data Types & Sources Increasingly individuals make use of data producing gadgets to perform services for them 34 Copyright 2013 by Data Blueprint
  35. 35. #4 VARIABILITY, Many options or variable interpretations confound analysis HistoryflowWikipedia entry for the word “Islam” 35 Copyright 2013 by Data Blueprint
  36. 36. Take Aways: Big Data Challenges Today • Fact: Big Data techniques are innovative but “Big Data” is not • Challenges are both foundational and technical, today as well as in 1600s • Technology continues to advance rapidly (4 Vs) • Challenges associated with Big Data are not new: – Well-known foundational data management issues – Need to align data and business with rapidly changing environment – Duplicity, accessibility, availability – Foundational business issues 36 Copyright 2013 by Data Blueprint
  37. 37. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 37 Copyright 2013 by Data Blueprint
  38. 38. Myth #6: Big Data provides all the Answers Fact: • Big Data does not mean the end of scientific theory • Be careful or you’ll end up with spurious correlations – Don’t just go fishing for correlations and hope they will explain the world • To get to the WHY of things, you need ideas, hypotheses and theories • Having more data does not substitute for thinking hard, recognizing anomalies and exploring deep truths • You need the right approach http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics 38 Copyright 2013 by Data Blueprint
  39. 39. 39 Copyright 2013 by Data Blueprint
  40. 40. • Identify business opportunity • How can data be leveraged in exploring – External market place • Analyze opportunities and threats – Internal efficiencies • Analyze strengths and weaknesses 40 Copyright 2013 by Data Blueprint
  41. 41. Example: 2012 Olympic Summer Games 1. Volume: 845 million FB users averaging 15 TB + of data/day 2. Velocity: 60 GB of data per second 3. Variety: 8.5 billion devices connected 4. Variability: Sponsor data, athlete data, etc. 5. Vitality: Data Art project “Emoto” 6. Virtual: Social media 41 Copyright 2013 by Data Blueprint
  42. 42. • Based on my 6 V analysis, do I need a Big Data solution or does my current BI solution address my business opportunity? – Do the 6 Vs indicate general Big Data characteristics? – What are the limitations of my current Bi environment? (Technology constraint) – What are my budgetary restrictions? (Financial constraint) – What is my current Big Data knowledge base? (Knowledge constraint) 42 Copyright 2013 by Data Blueprint
  43. 43. • MUST have both Foundational and Technical practice expertise 43 Copyright 2013 by Data Blueprint
  44. 44. 44 Copyright 2013 by Data Blueprint
  45. 45. • Data Strategy • Data Governance • Data Architecture • Data Education 45 Copyright 2013 by Data Blueprint
  46. 46. • Data Quality • Data Integration • Data Platforms • BI/Analytics 46 Copyright 2013 by Data Blueprint
  47. 47. • Needs to be actionable • Generally well understood by business • Document what has been learned 47 Copyright 2013 by Data Blueprint
  48. 48. • Perfect results are not necessary • Reiterate and refine • Iterative process to reach decision point • Use as feedback for next exploration 48 Copyright 2013 by Data Blueprint
  49. 49. 49 Copyright 2013 by Data Blueprint
  50. 50. Take Aways-Approach: Crawl, Walk, Run • Crawl: – Identify business opportunity and determine whether you truly need a Big Data solution • Walk: – Apply a combination of foundational and technical data management practices. Document your insights and make sure they are actionable • Run: – Recycle and explore. Staying agile allows you to be exploratory. 50 Copyright 2013 by Data Blueprint
  51. 51. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 51 Copyright 2013 by Data Blueprint
  52. 52. Foundational Practice: Data Strategy • Your data strategy must align to your organizational business strategy and operating model • As the market place becomes more datadriven, a data-focused business strategy is an imperative • Must have data strategy before you have a Big Data strategy 52 Copyright 2013 by Data Blueprint
  53. 53. Data Strategy Case Study Enterprise Information Management Maturity 53 Copyright 2013 by Data Blueprint
  54. 54. Data Strategy Considerations • What are the questions that you cannot answer today? • Is there a direct reliance on understanding customer behavior to drive revenue? • Do you have information overload and are you trying to find the signal in the noise? • Which is more important: – Establishing value from current data assets/data reporting? – Exploring Big Data opportunities? 54 Copyright 2013 by Data Blueprint
  55. 55. Myth #7: You need Big Data for Insights Fact: • Distinction between Big Data and doing analytics – Big Data is defined by the technology stack that you use – Big Data is used for predictive and prescriptive analytics • Use existing data for reporting, figure out bottlenecks and optimize current business model • Understand how is your data structured, architected and stored 55 Copyright 2013 by Data Blueprint
  56. 56. Foundational Practice: Data Architecture • Common vocabulary expressing integrated requirements ensuring that data assets are stored, arranged, managed, and used in systems in support of organizational strategy [Aiken 2010] • Most organizations have data assets that are not supportive of strategies • Big question: – How can organizations more effectively use their information architectures to support strategy implementation? 56 Copyright 2013 by Data Blueprint
  57. 57. Data Architecture Considerations • Does your current architecture for BI and analytics support Big Data? • Are you getting enough value out of your current architecture? • Can you easily integrate and share information across your organization? • Do you struggle to extract the value from your data because it is too cumbersome to navigate and access? • Are you confident your data is organized to meet the needs of your business? 57 Copyright 2013 by Data Blueprint
  58. 58. Technical Practice: Data Integration • A data-centric organization requires unified data • Integrating data across organizational silos creates new insights • It is also the biggest challenge • Big Data techniques can be used to complement existing integration efforts 58 Copyright 2013 by Data Blueprint
  59. 59. Integration Data Vault 2.0 with Big Data Allowing connections between RDBMS and NoSQL data is beneficial Examples: 1. Invoices 2. Passports 3. Stock shelving 59 Copyright 2013 by Data Blueprint
  60. 60. Data Integration Considerations • The complexity of your data integration challenge depends on the questions you’re trying to answer • Integration requirements for Big Data are dependent on the types of questions you’re asking: – Integration here may be more fuzzy than discrete – Integration is domain-based (based on time, customer concept, geographic distribution) • Those requirements should evolve from your strategy 60 Copyright 2013 by Data Blueprint
  61. 61. Technical Practice: Data Quality • Quality is driven by fit for purpose considerations • Big Data quality is different: – Basic – Availability – Soft-state – Eventual consistency • Directional accuracy is the goal • Focus on your most important data assets and ensure our solutions address the root cause of any quality issues – so that your data is correct when it is first created • Experience has shown that organizations can never get in front of their data quality issues if they only use the ‘find-and-fix’ approach 61 Copyright 2013 by Data Blueprint
  62. 62. Data Quality Considerations • Big Data is trying to be predictive • What are the questions you are trying to answer? – What level of accuracy are you looking for? – What confidence levels? – Example: Do I need to know exactly what the customer is going to buy or do I just need to know the range of products he/ she is going to choose from? 62 Copyright 2013 by Data Blueprint
  63. 63. Myth #8: Bigger Data is Better Fact: • Better to have less data of good quality than more poor quality big data • Analysis to reduce variables and increase manageability, otherwise Big Data = Quantity over Quality • Beware of Shiny Object Syndrome – What problem are we trying to solve? – The solution needs to fit the problem • Big Data may not be your answer, it may be your problem • Investments in foundational and technical approaches result in better outcomes for Big Data 63 Copyright 2013 by Data Blueprint
  64. 64. Technical Practice: Data Platforms • Do you want to measure critical operational process performance? • No one data platform can answer all your questions. This is commonly misunderstood and often leads to very expensive, bloated and ineffective data platforms. • Understanding the questions that need to be asked and how to build the right data platform or how to optimize an existing one 64 Copyright 2013 by Data Blueprint
  65. 65. The Big Data Landscape Copyright Dave Feinleib, bigdatalandscape.com 65 Copyright 2013 by Data Blueprint
  66. 66. Data Platforms Considerations • Commonalities between most big data stacks with file storage, columnar store, querying engine, etc. • Big data stack generally looks the same until you get into appliances – Algorithms are built into appliance themselves, e.g. Netezza, Teradata, etc.) • Ask these questions: – Do you want insights on your customer’s behavior? – Do you need real-time customer transactional information? – Do you need historical data or just access to the latest transactions? – Where do you go to find the single version of the truth about your customers? 66 Copyright 2013 by Data Blueprint
  67. 67. Take Aways-Design Principles: Foundational & Technical • Foundational data management principles still apply • Beware of SOS (Shiny Object Syndrome) • You must have a data strategy before you can have a Big Data strategy • Fact: You don’t need Big Data to gain insights • Big Data integration requirements evolve from your strategy • Fact: Bigger Data is not always better 67 Copyright 2013 by Data Blueprint
  68. 68. Outline • • • • • • Big Data Context: Why the Big Deal about Big Data? Big Data Challenges: Historical Perspective Big Data Challenges: Today Big Data Approach: Crawl, Walk, Run Design Principles: Foundational & Technical Take Aways and Q&A 68 Copyright 2013 by Data Blueprint
  69. 69. Take Aways: In Summary • Big data techniques are innovative but “Big Data” is not • Big Data characteristics: 6 Vs – Volume, Velocity, Variety, Variability, Vitality, Virtual • Approach: Crawl-Walk-Run • Big Data challenges require solutions that are based on foundational and technical data management practices • Beware of SOS (Shiny Object Syndrome): – Spend wisely and strategically – Big Data is not going to solve all your problems 69 Copyright 2013 by Data Blueprint
  70. 70. References • The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November 20, 2012) • McKinsey: Big Data: The next frontier for innovation, competition and productivity (http://www.mckinsey.com/insights/business_technology/ big_data_the_next_frontier_for_innovation?p=1) • The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/ 2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics) • Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/ 2575515) • The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/ 2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&) CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/ 429681/five_steps_how_better_manage_your_data/) • • Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-dataspending-2012-11#ixzz2cdT8shhe) • Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/ kathleen-kim/big-data-spending-to-increase-for-it-industry.html) • Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/ xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/) 70 Copyright 2013 by Data Blueprint
  71. 71. Questions? + = It’s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to Peter now. 71 Copyright 2013 by Data Blueprint
  72. 72. Upcoming Events Data-Centric Strategy & Roadmap February 11, 2014 @ 2:00 PM ET/11:00 AM PT Emerging Trends in Data Jobs March 13, 2014 @ 2:00 PM ET/11:00 AM PT Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net 72 Copyright 2013 by Data Blueprint
  73. 73. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×