What is Big Data ?


Published on

Can we see additional value in linking and exploiting big data for business and societal benefit?
If we bring together numerous data sources to provide a single reference point then we start to derive new value.
Until then, we simply risk creating new data silos.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

What is Big Data ?

  1. 1. What is Big Data? Rajendra Akerkar rak@vestforsk.no Presented at Université Jean Monnet – Université de Lyon France, June 10, 2013
  2. 2. www.vestforsk.no Hype around Big Data  Today, the difference between success and failure is the ability to monetize a new class of data. It’s ironic that, despite billions of dollars spent on business intelligence systems, we are still data‐bankrupt. – Roman Stanek, Founder and CEO of Good Data
  3. 3. www.vestforsk.no Source: Bloor Group
  4. 4. www.vestforsk.no The rise and rise of Big Data
  5. 5. www.vestforsk.no Share of the digital universe by India and China
  6. 6. www.vestforsk.no
  7. 7. www.vestforsk.no What is Big Data? videos & photos email mobile GPS social Big  Data Data that is too big, moves too fast, or doesn’t fit the structures of your  database architecture
  8. 8. www.vestforsk.no Fallacy!  Data does provide information  Big Data  Big Insight  Information must be:  Interpretable  Relevant  Novel  The insight we can derive is a tiny fraction of data, we need  to collect even more data and use more powerful analytics  to increase the likelihood of finding it.
  9. 9. www.vestforsk.no  The amount of information one can extract  from the data is always much less than the  data volume
  10. 10. www.vestforsk.no A  step forward in business intelligence and  analytics Can we see additional value in linking and exploiting big data  for business and societal  benefit?  If we bring together numerous data sources to provide a single reference point then  we start to derive new value.  Until then, we simply risk creating new data silos.
  11. 11. www.vestforsk.no Detecting Financial &  Insurance Fraud  Integrating Information from a  variety of sources yields significant  intelligence  Moving from a single document  view to a network view significantly  improves risk scoring effectiveness   Benefits   Major International Banks detect  fraud > $50M   Reduce costs to end‐users   Increased trust 
  12. 12. www.vestforsk.no Providing a competitive  advantage in manufacturing  Volvo – rapidly deriving intelligence from  vehicle sensor data   Reduction in cycle time for fault  rectification   Predictive maintenance   Location specific design enhancements  Proctor and Gamble ‐ using data to  “digitise” operations   Remove inefficiencies from production   Reduce inventory across supply chain   Analytics and visualisation to aid  decision making 
  13. 13. www.vestforsk.no Google predicted the spread of flu in  real time   after analyzing two datasets,   50 million most common terms that Americans type,   data on the spread of seasonal flu from public health  agency  tested a mammoth of 450 million different mathematical  models to test the search terms, comparing their  predictions against the actual flu cases  model was tested when H1N1 crisis struck in 2009 and  gave more meaningful and valuable real time  information than any public health official system. (Reference: http://www.amazon.com/Big-Data-Revolution-Transform-Think/dp/0544002695)
  14. 14. www.vestforsk.no 14
  15. 15. www.vestforsk.no There has always been Big Data… Its just that now we can actually capture and mine  it effectively. Canadian Tar Fields
  16. 16. www.vestforsk.no Knowledge is knowing when and how to use certain  info and insights.  If someone digests the info+insight, it become his knowledge
  17. 17. www.vestforsk.no Not all Big Data is created Equal Planet Google and friends are the outliers The Norm Large Telco . Google, Facebook, Twitter –are outliers  that are in a class of their own. And their  requirements are significantly different  to large enterprise businesses, let alone  the normal enterprise business and SME.
  18. 18. www.vestforsk.no Definition(s) of “big data” Big Data is a term encompassing the use of techniques to capture, process, analyse and visualize potentially large datasets in a reasonable timeframe not accessible to standard IT technologies. By extension, the platform, tools and software used for this purpose are collectively called ‘Big Data technologies’ (Networked European Software and Service Initiative, 2012). 
  19. 19. www.vestforsk.no Or, in other words,  • • • • • Big Data is data in volumes too large to  process by traditional methods. Unable to handle large data volumes & diversity of data Iterative, brute‐force and slow process Lack of ad‐hoc data navigation across events and time Focused on structured data that is warehoused Web analytics solutions force real‐time events into rigid  schemas in DBs
  20. 20. www.vestforsk.no What is a Big Data problem?
  21. 21. www.vestforsk.no 3 Vs  For Volume  How  to convert massive amounts of data into information,  meaning, and insight useful for human decision‐making.   For dealing with Variety   Use experience in using ontologies, domain models, or  vocabularies, to support semantic interoperability and  integration  For Velocity  How  to use dynamically created models of new objects,  concepts, and relationships and uses them to better  understand new clues in the data that capture rapidly  evolving events and situations.
  22. 22. www.vestforsk.no What happens if the raw data you are  injecting into your system is  incomplete or formatted incorrectly  from the get‐go? Additional attributes  Venue Vocabulary Veracity
  23. 23. www.vestforsk.no  People will be interested in value ‐ extracting value from Big Data
  24. 24. www.vestforsk.no Is it actionable ?  The first three Vs are just measures of data — how much, how fast, and how diverse?  NOT an actionable, complete definition  Definition of big data must concede that:  Exponential data growth makes it continuously difficult to  manage — store, process, and access.  Data contains non‐obvious information that companies can  discover to enhance business outcomes.  Measures of data are relative; one company’s big data is  another company’s peanut.
  25. 25. www.vestforsk.no So, the pragmatic definition store process access Big Data is the frontier of a  company’s ability to store,  process, and access  all the  data it needs to operate  effectively, make decisions,  reduce risks, and serve  customers.
  26. 26. www.vestforsk.no Big Data for Everyone • Big data is not just for data scientists and special  projects • Its for decision makers and data consumers • It needs to be anchored in the real world Analyst Consumers
  27. 27. www.vestforsk.no Who is benefitting from Big Data?
  28. 28. www.vestforsk.no
  29. 29. www.vestforsk.no Correlation versus causation versus “what’s good  enough for the job” Source: Columbia University Oncologists might benefit from seeing the similarities among cells in a  biopsy, but targeting certain markers doesn’t guarantee you can cure  someone’s cancer.
  30. 30. www.vestforsk.no
  31. 31. www.vestforsk.no Big Data analytics – the need for new approach   Scalablility No Yes Ingest high Volumes of data (all available data) no Yes Sampling of data Yes NO Variety of data (structured, semistructured, unstructured) No Simultaneous data and query processing No Faster access to all relevant information No Analyze data at high rates(GB/sec No Accuracy in anlytical models Competitive Advantage Challenges Traditional New approach approach The questions that are answered What’s the best that can happen? Optimization What will happen next? What if these trends continue? Why is this happening? Alerts Predictive Analysis Forecasting Statistical Analysis What actions are needed? Yes Query Drilldown Adhoc reports Yes Std reports Yes Do You have opportunity or a problem? How many, how often,, where? What happened? Degree of Intelligence Yes Taking unstructured data into account No Yes
  32. 32. www.vestforsk.no BIG DATA Research Focus
  33. 33. www.vestforsk.no VALUE from harnessing the challenges  Present‐day focus devoted to business intelligence and targeted  analytics needs, not to serve complex personal and collective  human requirements  e.g., empower human in health, fitness and well‐being; better  emergency management that is highly personalized.  Integrate real‐world complexity: multi‐modal and multi‐sensory  nature of real‐world and human perception  Need deeper understanding of data and its role to information   e.g., skew, coverage  Human involvement and guidance  Heading to actionable information, understanding and insight right in  the context of human activities  Bottom‐up & Top‐down processing  Infusion of models and background knowledge (data + knowledge +  reasoning)
  34. 34. www.vestforsk.no Data should provide VALUE from  harnessing the challenges posed by  volume, velocity, variety and veracity   of big data, to provide actionable  information and improve decision making
  35. 35. www.vestforsk.no Read this book! Publisher: Taylor & Francis Group/CRC Press http://www.taylorandfrancis.com/books/details/9781466578371/ 
  36. 36. www.vestforsk.no Thank you !