Successfully reported this slideshow.

The Big Data - Same Humans Problem (CIDR 2015)

1

Share

Loading in …3
×
1 of 18
1 of 18

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

The Big Data - Same Humans Problem (CIDR 2015)

  1. 1. The Big Data – Same Humans Problem Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh CIDR 2015 – Gong Show – January 5, 2015
  2. 2. 2 You know Big Data is an important problem if... • It is featured on the cover of Nature and the Economist!
  3. 3. 3 You know Big Data is an even more important problem if... • It has a Dilbert cartoon!
  4. 4. What is Big Data? Definition #1: • Big data is like teenage sex: o everyone talks about it, o nobody really knows how to do it, o everyone thinks everyone else is doing it, o so everyone claims they are doing it... Definition #2: • Anything that Won't Fit in Excel! Definition #3: • Using the Vs 4
  5. 5. If you have been living under a rock for the last few years 5 The three Vs
  6. 6. • Volume - size does matter! • Velocity - data at speed, i.e., the data “fire-hose” • Variety - heterogeneity is the rule 6 The three Vs
  7. 7. 7 Five more Vs • Variability - rapid change of data characteristics over time • Veracity - ability to handle uncertainty, inconsistency, etc • Visibility – protect privacy and provide security • Value – usefulness & ability to find the right-needle in the stack • Voracity - strong appetite for data!
  8. 8. 8 Enter Moore’s Law [ Wikipedia Image ] Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-founder of Intel Corporation, who described the trend in his 1965 paper. Source: http://en.wikipedia.org/wiki/Moore's_law
  9. 9. 9 Enter Bezos’ Law Photo: http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/ Bezos' law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years Source: http://blog.appzero.com/blog/futureofcloud
  10. 10. 10 Storage capacity increase 0 1000 2000 3000 4000 5000 6000 7000 HDD Capacity (GB) [ Wikipedia Data ] Insert other exponentially increasing graphs here (e.g., data generation rates, world-wide smartphone access rates, Internet of Things, …)
  11. 11. 11 But • Human processing capacity remains roughly the same!
  12. 12. 12 We refer to this as the: Big Data – Same Humans Problem
  13. 13. Rethinking 13 Systems point of view Human point of view • Response time • Throughput • Scale-up • Scale-out • Making sure humans do not get lost in a sea of data! Scalability
  14. 14. Rethinking Scalability 14 Systems point of view Human point of view • Fantastic! • Terrible! Example: Data Stream Management System processing 1,000,000 events per second
  15. 15. So now what? • In addition to “traditional” scalability, it is important to consider “big data” research in: o Summarization o Personalization o Ranking o Recommender Systems o Visual Analytics o Crowd-sourcing o … 15
  16. 16. It is important! 16
  17. 17. 17
  18. 18. Thank you 18 Further reading: Big Data and Its Technical Challenges http://bit.ly/bigdatachallenges By Jagadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan, and Shahabi, Communications of the ACM, July 2014

Editor's Notes

  • We faced Big Data challenges for over four decades, though the meaning of “Big” has been evolving.
  • Unless you have been living under a rock for the last few years you must have already seen/heard the 3 Vs
  • Unless you have been living under a rock for the last few years you must have already seen/heard the 3 Vs
  • My take on the big data definition and problem is as follows
  • This is the main take-away from this talk
  • ×