Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Big Data - Same Humans Problem (CIDR 2015)

1,475 views

Published on

Presentation given as part of the 7th biennial Conference on Innovative Data Systems Research (CIDR 2015) gong show.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Big Data - Same Humans Problem (CIDR 2015)

  1. 1. The Big Data – Same Humans Problem Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh CIDR 2015 – Gong Show – January 5, 2015
  2. 2. 2 You know Big Data is an important problem if... • It is featured on the cover of Nature and the Economist!
  3. 3. 3 You know Big Data is an even more important problem if... • It has a Dilbert cartoon!
  4. 4. What is Big Data? Definition #1: • Big data is like teenage sex: o everyone talks about it, o nobody really knows how to do it, o everyone thinks everyone else is doing it, o so everyone claims they are doing it... Definition #2: • Anything that Won't Fit in Excel! Definition #3: • Using the Vs 4
  5. 5. If you have been living under a rock for the last few years 5 The three Vs
  6. 6. • Volume - size does matter! • Velocity - data at speed, i.e., the data “fire-hose” • Variety - heterogeneity is the rule 6 The three Vs
  7. 7. 7 Five more Vs • Variability - rapid change of data characteristics over time • Veracity - ability to handle uncertainty, inconsistency, etc • Visibility – protect privacy and provide security • Value – usefulness & ability to find the right-needle in the stack • Voracity - strong appetite for data!
  8. 8. 8 Enter Moore’s Law [ Wikipedia Image ] Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-founder of Intel Corporation, who described the trend in his 1965 paper. Source: http://en.wikipedia.org/wiki/Moore's_law
  9. 9. 9 Enter Bezos’ Law Photo: http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/ Bezos' law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years Source: http://blog.appzero.com/blog/futureofcloud
  10. 10. 10 Storage capacity increase 0 1000 2000 3000 4000 5000 6000 7000 HDD Capacity (GB) [ Wikipedia Data ] Insert other exponentially increasing graphs here (e.g., data generation rates, world-wide smartphone access rates, Internet of Things, …)
  11. 11. 11 But • Human processing capacity remains roughly the same!
  12. 12. 12 We refer to this as the: Big Data – Same Humans Problem
  13. 13. Rethinking 13 Systems point of view Human point of view • Response time • Throughput • Scale-up • Scale-out • Making sure humans do not get lost in a sea of data! Scalability
  14. 14. Rethinking Scalability 14 Systems point of view Human point of view • Fantastic! • Terrible! Example: Data Stream Management System processing 1,000,000 events per second
  15. 15. So now what? • In addition to “traditional” scalability, it is important to consider “big data” research in: o Summarization o Personalization o Ranking o Recommender Systems o Visual Analytics o Crowd-sourcing o … 15
  16. 16. It is important! 16
  17. 17. 17
  18. 18. Thank you 18 Further reading: Big Data and Its Technical Challenges http://bit.ly/bigdatachallenges By Jagadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan, and Shahabi, Communications of the ACM, July 2014

×