The Big Data – Same Humans
Problem
Alexandros Labrinidis
Advanced Data Management Technologies Lab
Department of Computer Science
University of Pittsburgh
CIDR 2015 – Gong Show – January 5, 2015
2
You know Big Data is an
important problem if...
• It is featured on the cover of Nature and the
Economist!
3
You know Big Data is an even
more important problem if...
• It has a Dilbert cartoon!
What is Big Data?
Definition #1:
• Big data is like teenage sex:
o everyone talks about it,
o nobody really knows how to do it,
o everyone thinks everyone else is doing it,
o so everyone claims they are doing it...
Definition #2:
• Anything that Won't Fit in Excel!
Definition #3:
• Using the Vs
4
If you have been living under a rock
for the last few years
5
The three Vs
• Volume - size does matter!
• Velocity - data at speed, i.e., the data
“fire-hose”
• Variety - heterogeneity is the rule
6
The three Vs
7
Five more Vs
• Variability - rapid change of data characteristics
over time
• Veracity - ability to handle uncertainty,
inconsistency, etc
• Visibility – protect privacy and provide security
• Value – usefulness & ability to find the right-needle
in the stack
• Voracity - strong appetite for data!
8
Enter Moore’s Law
[ Wikipedia Image ]
Moore's law is the observation that, over the
history of computing hardware, the number of
transistors in a dense integrated circuit doubles
approximately every two years. The law is
named after Gordon E. Moore, co-founder of
Intel Corporation, who described the trend in
his 1965 paper.
Source: http://en.wikipedia.org/wiki/Moore's_law
9
Enter Bezos’ Law
Photo: http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/
Bezos' law is the observation that, over
the history of cloud, a unit of computing
power price is reduced by 50% approximately
every 3 years
Source: http://blog.appzero.com/blog/futureofcloud
10
Storage capacity increase
0
1000
2000
3000
4000
5000
6000
7000
HDD Capacity (GB)
[ Wikipedia Data ]
Insert other exponentially increasing graphs here
(e.g., data generation rates, world-wide smartphone access rates,
Internet of Things, …)
11
But
• Human processing capacity remains
roughly the same!
12
We refer to this as the:
Big Data – Same Humans
Problem
Rethinking
13
Systems point of view Human point of view
• Response time
• Throughput
• Scale-up
• Scale-out
• Making sure humans
do not get lost in a
sea of data!
Scalability
Rethinking Scalability
14
Systems point of view Human point of view
• Fantastic! • Terrible!
Example:
Data Stream Management System processing
1,000,000 events per second
So now what?
• In addition to “traditional” scalability, it is
important to consider “big data” research
in:
o Summarization
o Personalization
o Ranking
o Recommender Systems
o Visual Analytics
o Crowd-sourcing
o …
15
It is important!
16
17
Thank you
18
Further reading:
Big Data and Its Technical Challenges
http://bit.ly/bigdatachallenges
By Jagadish, Gehrke, Labrinidis, Papakonstantinou,
Patel, Ramakrishnan, and Shahabi,
Communications of the ACM, July 2014

The Big Data - Same Humans Problem (CIDR 2015)

  • 1.
    The Big Data– Same Humans Problem Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh CIDR 2015 – Gong Show – January 5, 2015
  • 2.
    2 You know BigData is an important problem if... • It is featured on the cover of Nature and the Economist!
  • 3.
    3 You know BigData is an even more important problem if... • It has a Dilbert cartoon!
  • 4.
    What is BigData? Definition #1: • Big data is like teenage sex: o everyone talks about it, o nobody really knows how to do it, o everyone thinks everyone else is doing it, o so everyone claims they are doing it... Definition #2: • Anything that Won't Fit in Excel! Definition #3: • Using the Vs 4
  • 5.
    If you havebeen living under a rock for the last few years 5 The three Vs
  • 6.
    • Volume -size does matter! • Velocity - data at speed, i.e., the data “fire-hose” • Variety - heterogeneity is the rule 6 The three Vs
  • 7.
    7 Five more Vs •Variability - rapid change of data characteristics over time • Veracity - ability to handle uncertainty, inconsistency, etc • Visibility – protect privacy and provide security • Value – usefulness & ability to find the right-needle in the stack • Voracity - strong appetite for data!
  • 8.
    8 Enter Moore’s Law [Wikipedia Image ] Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-founder of Intel Corporation, who described the trend in his 1965 paper. Source: http://en.wikipedia.org/wiki/Moore's_law
  • 9.
    9 Enter Bezos’ Law Photo:http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/ Bezos' law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years Source: http://blog.appzero.com/blog/futureofcloud
  • 10.
    10 Storage capacity increase 0 1000 2000 3000 4000 5000 6000 7000 HDDCapacity (GB) [ Wikipedia Data ] Insert other exponentially increasing graphs here (e.g., data generation rates, world-wide smartphone access rates, Internet of Things, …)
  • 11.
    11 But • Human processingcapacity remains roughly the same!
  • 12.
    12 We refer tothis as the: Big Data – Same Humans Problem
  • 13.
    Rethinking 13 Systems point ofview Human point of view • Response time • Throughput • Scale-up • Scale-out • Making sure humans do not get lost in a sea of data! Scalability
  • 14.
    Rethinking Scalability 14 Systems pointof view Human point of view • Fantastic! • Terrible! Example: Data Stream Management System processing 1,000,000 events per second
  • 15.
    So now what? •In addition to “traditional” scalability, it is important to consider “big data” research in: o Summarization o Personalization o Ranking o Recommender Systems o Visual Analytics o Crowd-sourcing o … 15
  • 16.
  • 17.
  • 18.
    Thank you 18 Further reading: BigData and Its Technical Challenges http://bit.ly/bigdatachallenges By Jagadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan, and Shahabi, Communications of the ACM, July 2014

Editor's Notes

  • #2 We faced Big Data challenges for over four decades, though the meaning of “Big” has been evolving.
  • #6 Unless you have been living under a rock for the last few years you must have already seen/heard the 3 Vs
  • #7 Unless you have been living under a rock for the last few years you must have already seen/heard the 3 Vs
  • #8 My take on the big data definition and problem is as follows
  • #12 This is the main take-away from this talk