SlideShare a Scribd company logo
1
THE ELEPHANT IN THE ROOM:
WHEN DID DATA GET SO BIG?
(c) 2013 Ian Brown 2
we'll talk ABOUT
(c) 2013 Ian Brown 3
we'll talk ABOUT
(c) 2013 Ian Brown 4
WE’LL TALK ABOUT
(c) 2013 Ian Brown 5
WE’LL TALK ABOUT
• What is Big Data? What makes it "Big"?
• Who needs Big Data? Where does it come from?
• How does Big Data work? What are the tools and the
issues?
• What does the future of Big Data look like?
(c) 2013 Ian Brown 6
WHAT IS BIG DATA?
• To some extent “Big” really means
“Difficult to handle”
• Something of a misnomer:
not only about size as three things
distinguish big data:
• Volume (how much capacity you need
to process/store)
• Velocity (how quickly you need to process updates)
• Variety (how complicated/non-standard the data is)
Volume
VelocityVariety
BIG DATA
(c) 2013 Ian Brown 77
source: datasciencecentral
(c) 2013 Ian Brown 8
UNITS
Source: www.wikipedia.com
(c) 2013 Ian Brown 9
VOLUME
• From pre-history to 2004 the world generated around 5 exabytes
of data - we now produce that amount every 2 days
• Data volumes are huge and growing: 1.8 zettabytes in 2011
• = 1’800 Petabytes
• =1.8 billion Terabytes
• Data is predicted to grow x44 by 2020
• >40% every year
(c) 2013 Ian Brown 10
VOLUME
• Whilst data has previously been “big” for some people,
sometimes in the past - it’s definitely potentially big now (for
everyone) and getting bigger every day
• Sources are networks (voice/data/video), social networks,
sensors & transducers, GPS, banking, logistics, trade etc
• 90% of the World’s digital data was gathered in the last 2
years (source: IBM 2012)
(c) 2013 Ian Brown 11
VARIETY (Variability)
• Governments and Corporates have always had big databases
but the data has always been structured - invoices, customers,
inventory etc
• Of the huge increase in data we just mentioned only 10-20%
will be structured - the rest (80-90%) will be unstructured:
• Video, email, social media, audio, images/scanned material
• Traditional SQL databases (the clue is in the S) don’t do well
with this sort of mixed data
(c) 2013 Ian Brown 12
VELOCITY
• Data is now coming at users constantly from global sources
which therefore gives a 24x7 problem.
• Q.When do you stop to summarise/analyse? At what point
do you cut-off for the day/week/period to run a report or
plan the next action?
• A. Sometimes you can’t! Analysis/processing/Action may
have to happen on streaming data and corrections or
actions are taken on-the-fly. Sometimes without storing the
data!
(c) 2013 Ian Brown 1313
source: datasciencecentral
(c) 2013 Ian Brown 14
HASN’T DATA ALWAYS
BEEN “BIG”?
• Maybe.
• Historically computing was done in “batches” where stacks of
punchcards or reels of tape (first paper, then magnetic) were
processed one file at a time.This had to be done when the business
was “closed”.
• If you closed at 18:00 and opened the next day at 09:00 you had a
window of 15 hours to do all your calculations and reports before
you had to stop and open for the next day’s business.
• If you couldn’t get it done in 15 hours your data was “big”
(c) 2013 Ian Brown 15
• Hence this is a relative question of how much data vs how
much computing you can throw at it
• For more than three decades we have seen a constant
increase in computing power which made the data
generated by most businesses through their local customers
look “small”
• Then the Web happened ....
HASN’T DATA ALWAYS
BEEN “BIG”?
(c) 2013 Ian Brown 16
• Initially Web 1.0 and eCommerce opened up servers to many millions of events in terms
of “hits” on web sites, logs, emails and a global multiplier of who could be a customer and
access your system. Analysis of who was searching for what and who was buying what
absorbed a lot of computing capacity.
• Web 2.0 has added hundreds of millions of social networking users all broadcasting data
in terms of photos, tweets, status updates, blog posts etc which has created a truly vast
ocean of data which can be trawled to learn about our behaviours, beliefs and likely future
actions.
• If you want to process this data it certainly has volume, it doesn’t stop coming at you when
you close for the night and so has tremendous velocity and if you are pulling it in from several
sources it quickly starts to exhibit complexity and variety
• Traditional Hardware/Software has not kept pace with the growth of volume/velocity/variety
HASN’T DATA ALWAYS
BEEN “BIG”?
(c) 2013 Ian Brown 17
WHO NEEDS BIG DATA?
• Generally: anyone who can derive a “big picture” insight by adding up all the small data
points and “zooming out”
• How much can you say about one tweet? A thousand tweets?
• Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add
another billion tweets.
Source: www.statisticbrain.com (2012)
• What you “reckon” changes into sentiment analysis
(c) 2013 Ian Brown 18
• Generally: anyone who can derive a “big picture” insight by adding up all the small data
points and “zooming out”
• How much can you say about one tweet? A thousand tweets?
• Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add
another billion tweets.
Source: www.statisticbrain.com (2012)
• What you “reckon” changes into sentiment analysis
Source
Flickr
WHO NEEDS BIG DATA?
(c) 2013 Ian Brown 19
THE SCALE CHANGES
THINGS
• Big Data may be analogous to the
difference between the insight in
a picture vs. a video
Source: slowmotionrunninghorse.com
(c) 2013 Ian Brown 20
THE SCALE CHANGES
THINGS
• Big Data may be analogous to the
difference between the insight in
a picture vs. a video
Source: slowmotionrunninghorse.com
(c) 2013 Ian Brown 21
WHY CARE?
• Governments - release of open data: McKinsey est. $300m per year
savings in US, $100m savings in Europe
• Banks - fraud detection, algo trading: losses/profits. 2/3rd of 7 Bn US shares
a day ..
• Life Sciences - genomics, drug research. 10yrs to seq the human genome
• Retailers - buying patterns, CRM, if you like this ... : cross-selling
• Social - Google, Facebook, LinkedIn,Twitter, Amazon, eBay: - Insight!
• Networks - load management/routing, protecting networks
• Probabalistic outcomes - Google Flu predictions (Nature: 2009)
(c) 2013 Ian Brown 22
WHAT’S THE DIFFERENCE?
•EXHAUSTIVE
•SCRUFFY
•PRAGMATIC
Anything missing ...?
Source: damfoundation.org
(c) 2013 Ian Brown 23
SO WHAT?
• Three key pieces have shifted:
• A shift from sampling to populations
• A shift from exactness to “gisting”
• A move from causality to correlation
• Data no longer tied to the purpose for which it was
collected
Data used to be
small, exact and
causal
(c) 2013 Ian Brown 24
ASPECTS
Source: www.datasciencecentral.com
(c) 2013 Ian Brown 25
NEW SOURCES OF DATA
• Information is now gathered on events and values that were not
traditionally thought of as data:
• Current location (vs. address)
• Whether you “like” someone else’s post
• Things you nearly bought but didn’t
• How much energy your office needs now
• PLUS transactional systems, social media, sensors etc etc
(c) 2013 Ian Brown 26
HOW DOES IT WORK?
• Is this just a big database running on a powerful machine?
• Not usually. Traditional databases don’t scale to this
• Many hands make light work: Remember S.E.T.I. ?
• Split it up and share it out between many nodes
• Key analysis perspectives:
• Real-time streaming data analysis (detect events and act)
• Business Intelligence (asking specific questions of)
• Data Mining (asking is there anything interesting here?)
(c) 2013 Ian Brown 28
PHYSICALLY
Source: Leons Petražickis, IBM Canada
(c) 2013 Ian Brown 29
WHAT ARE THE PIECES?
• HDFS Distributed File system (Google)
• MapReduce (Google)
• Split the problem into chunks
• Spread it out over lots of (cheap) computing nodes
• Reassemble the answer from the parts
(c) 2013 Ian Brown 30
LOGICALLY
Source: Leons Petražickis, IBM Canada
(c) 2013 Ian Brown 31
WHAT IS THE APPROACH?
• Somewhere to store it across different systems
• e.g. Distributed File System (HDFS) - batch mode
• Some way of specifying work in pieces/jobs
• e.g. Hadoop (Yahoo) or MapReduce (for low-level jobs)
• e.g. Pig or Hive or Oozie (for high-level apps/queries that translate
to MapReduce)
• Some way of reading/processing in real-time vs batch e.g. Hbase and
Flume
• Some way mining the data for trends/meaning (Data Mining/Machine
learning) e.g. Mahout
• Some way of getting data in/out of SQL databases e.g. Sqoop
(c) 2013 Ian Brown 32
HOW MANY “CHUNKS”?
• eBay had 530 cores in 2010. It’s now in excess of 2’500
cores
• Yahoo has >4’000 cores
• FaceBook have 23’000 cores with 20Pb of storage - be
careful what you “like”...
• Google aren’t telling .... (24Pb of data / day)
• LinkedIn offer 100Bn recommendations / week
(c) 2013 Ian Brown 33
WHERE CAN I GET SOME!!
• IBM
• ORACLE
• MICROSOFT
• EMC
• Informatica
• Apache - Open source
• Amazon - Elastic computing / cloud-based hadoop
• Small installations are free
(c) 2013 Ian Brown 34
THE FUTURE OF BIG DATA
(c) 2013 Ian Brown 35
THE FUTURE ..
HYPE CYCLE
(c) 2013 Ian Brown 36
(c) 2013 Ian Brown 37
WHERE AREYOU?
(c) 2013 Ian Brown 38
TRENDS
• More data - MUCH MUCH MORE data
• Internet of Things (IOT) - instrumentation/measurement
• SmartEnergy meters 2005, RFID tags (1.3bn 2011 >30bn 2013)
• each A380 engine gives 10TB every 30m: 640TB JFK->London
• Big Science: Genomics, Pharmacology. LHC experiment gives 40TB/sec!!
• Much more video and unstructured stuff (~60% of Internet traffic video by 2015)
• The re-invention (or demise) of search/SEO
• The need to move from local big data to distributed big data and sense-making networks
• The rise of Observation - the need to filter and gain more control
(c) 2013 Ian Brown 39
Where does that leave your
company?
source: sap.com
(c) 2013 Ian Brown 40
MAGIC BULLET?
• Hadoop probably won’t replace your existing database
• It is very good at large files/data sets so you not see so much
benefit from large volumes of small files/datasets
• It is very good at dealing with unstructured data so if your data is
largely structured or can be made to look structured you may be
better to stick with traditional databases
• It doesn’t need to know about how you want to query the data
which makes it very flexible but if your queries are always the
same you may be able to stick with SQL databases and BI/DW
systems
(c) 2013 Ian Brown 41
TWO THINGS WORTH
REMEMBERING ..
(c) 2013 Ian Brown 42
Questions?

More Related Content

What's hot

Big data overview external
Big data overview externalBig data overview external
Big data overview external
Brett Colbert
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta
diyotta
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Mirko Lorenz
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everything
David Gerhard
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
David Feinleib
 
AI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesAI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for Libraries
Brian Pichman
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
James Hendler
 
Teaching information: from Google Search to Big Data
Teaching information: from Google Search to Big DataTeaching information: from Google Search to Big Data
Teaching information: from Google Search to Big Data
Martin Patrick
 
Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?
Sanjeev Kumar
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
gngeorge
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
Fabio Stella
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
Daryaz Fares
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
UNDP Eurasia
 
The New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business AdvantageThe New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business Advantage
JoAnna Cheshire
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
IABmembership
 
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Flevum
 
Big Data! Dopey Quotes!
Big Data! Dopey Quotes!Big Data! Dopey Quotes!
Big Data! Dopey Quotes!
Martyn Richard Jones
 

What's hot (18)

Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everything
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
AI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesAI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for Libraries
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Teaching information: from Google Search to Big Data
Teaching information: from Google Search to Big DataTeaching information: from Google Search to Big Data
Teaching information: from Google Search to Big Data
 
Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
 
The New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business AdvantageThe New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business Advantage
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
 
Big Data! Dopey Quotes!
Big Data! Dopey Quotes!Big Data! Dopey Quotes!
Big Data! Dopey Quotes!
 

Viewers also liked

The Digital University #IDRW2014
The Digital University #IDRW2014The Digital University #IDRW2014
The Digital University #IDRW2014
Web Science Institute
 
State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012
PresentMark
 
Acatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAcatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAntonella Stancu
 
Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)
David McClelland
 
Read my blog
Read my blog Read my blog
Read my blog
Web Science Institute
 
Malalties relacionades amb estils de vida
Malalties relacionades amb estils de vidaMalalties relacionades amb estils de vida
Malalties relacionades amb estils de vida
Noelia Medina Allué
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
raj_vij
 
"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单
viraree
 
SEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon TyneSEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon Tyne
Web Social Media
 
Acatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAcatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAntonella Stancu
 
Curriculum Innovation for DE Lunch
Curriculum Innovation for DE LunchCurriculum Innovation for DE Lunch
Curriculum Innovation for DE Lunch
Web Science Institute
 
Zeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeZeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeAntonella Stancu
 
Hype Springs Eternal
Hype Springs EternalHype Springs Eternal
Hype Springs Eternal
Web Science Institute
 
RCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom FranklandRCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom Frankland
Web Science Institute
 
Cool Social Media Tools
Cool Social Media ToolsCool Social Media Tools
Cool Social Media Tools
Web Science Institute
 
Funny-stuff-my-kids-have-said
 Funny-stuff-my-kids-have-said Funny-stuff-my-kids-have-said
Funny-stuff-my-kids-have-said
Antonella Stancu
 
Kennismaking Flynth Regio Midden
Kennismaking Flynth Regio MiddenKennismaking Flynth Regio Midden
Kennismaking Flynth Regio Midden
FransJansen505
 
The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)
David McClelland
 
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
David McClelland
 
FlytelFun
FlytelFunFlytelFun
FlytelFun
viraree
 

Viewers also liked (20)

The Digital University #IDRW2014
The Digital University #IDRW2014The Digital University #IDRW2014
The Digital University #IDRW2014
 
State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012
 
Acatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAcatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritul
 
Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)
 
Read my blog
Read my blog Read my blog
Read my blog
 
Malalties relacionades amb estils de vida
Malalties relacionades amb estils de vidaMalalties relacionades amb estils de vida
Malalties relacionades amb estils de vida
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单
 
SEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon TyneSEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon Tyne
 
Acatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAcatistul sfântului luca al Crimeei
Acatistul sfântului luca al Crimeei
 
Curriculum Innovation for DE Lunch
Curriculum Innovation for DE LunchCurriculum Innovation for DE Lunch
Curriculum Innovation for DE Lunch
 
Zeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeZeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-antice
 
Hype Springs Eternal
Hype Springs EternalHype Springs Eternal
Hype Springs Eternal
 
RCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom FranklandRCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom Frankland
 
Cool Social Media Tools
Cool Social Media ToolsCool Social Media Tools
Cool Social Media Tools
 
Funny-stuff-my-kids-have-said
 Funny-stuff-my-kids-have-said Funny-stuff-my-kids-have-said
Funny-stuff-my-kids-have-said
 
Kennismaking Flynth Regio Midden
Kennismaking Flynth Regio MiddenKennismaking Flynth Regio Midden
Kennismaking Flynth Regio Midden
 
The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)
 
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
 
FlytelFun
FlytelFunFlytelFun
FlytelFun
 

Similar to Big Data

Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
Data Blueprint
 
Confluence2016
Confluence2016Confluence2016
Confluence2016
Bebo White
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Jedha Bootcamp
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
Terry Bunio
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Doug Denton
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information Architecture
Inside Analysis
 
Big data
Big dataBig data
Big data
Prince Barai
 
big data
big data big data
big data
subhakirthi
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
Kathirvel Ayyaswamy
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
amiyadash
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
Sanoj Kumar
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
Wim Van Leuven
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
Big Data
Big DataBig Data
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)
puja singh
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
SutanuGhosal1
 

Similar to Big Data (20)

Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Confluence2016
Confluence2016Confluence2016
Confluence2016
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information Architecture
 
Big data
Big dataBig data
Big data
 
big data
big data big data
big data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 

More from Web Science Institute

Disrupting our education and research through MOOCs
Disrupting our education and research through MOOCsDisrupting our education and research through MOOCs
Disrupting our education and research through MOOCs
Web Science Institute
 
Working together for MOOCs
Working together for MOOCsWorking together for MOOCs
Working together for MOOCs
Web Science Institute
 
The Digital University
The Digital UniversityThe Digital University
The Digital University
Web Science Institute
 
Web Science Research Week
Web Science Research WeekWeb Science Research Week
Web Science Research Week
Web Science Institute
 
Introduction to Web Science Institute
Introduction to Web Science InstituteIntroduction to Web Science Institute
Introduction to Web Science Institute
Web Science Institute
 
Making the most of social media july 2013
Making the most of social media   july 2013Making the most of social media   july 2013
Making the most of social media july 2013
Web Science Institute
 
Design and Implementation of MOOCs
Design and Implementation of MOOCsDesign and Implementation of MOOCs
Design and Implementation of MOOCs
Web Science Institute
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
Web Science Institute
 
Digital Champions #digichamps
Digital Champions #digichampsDigital Champions #digichamps
Digital Champions #digichamps
Web Science Institute
 
Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011
Web Science Institute
 

More from Web Science Institute (10)

Disrupting our education and research through MOOCs
Disrupting our education and research through MOOCsDisrupting our education and research through MOOCs
Disrupting our education and research through MOOCs
 
Working together for MOOCs
Working together for MOOCsWorking together for MOOCs
Working together for MOOCs
 
The Digital University
The Digital UniversityThe Digital University
The Digital University
 
Web Science Research Week
Web Science Research WeekWeb Science Research Week
Web Science Research Week
 
Introduction to Web Science Institute
Introduction to Web Science InstituteIntroduction to Web Science Institute
Introduction to Web Science Institute
 
Making the most of social media july 2013
Making the most of social media   july 2013Making the most of social media   july 2013
Making the most of social media july 2013
 
Design and Implementation of MOOCs
Design and Implementation of MOOCsDesign and Implementation of MOOCs
Design and Implementation of MOOCs
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
 
Digital Champions #digichamps
Digital Champions #digichampsDigital Champions #digichamps
Digital Champions #digichamps
 
Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011
 

Recently uploaded

DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 

Recently uploaded (20)

DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 

Big Data

  • 1. 1 THE ELEPHANT IN THE ROOM: WHEN DID DATA GET SO BIG?
  • 2. (c) 2013 Ian Brown 2 we'll talk ABOUT
  • 3. (c) 2013 Ian Brown 3 we'll talk ABOUT
  • 4. (c) 2013 Ian Brown 4 WE’LL TALK ABOUT
  • 5. (c) 2013 Ian Brown 5 WE’LL TALK ABOUT • What is Big Data? What makes it "Big"? • Who needs Big Data? Where does it come from? • How does Big Data work? What are the tools and the issues? • What does the future of Big Data look like?
  • 6. (c) 2013 Ian Brown 6 WHAT IS BIG DATA? • To some extent “Big” really means “Difficult to handle” • Something of a misnomer: not only about size as three things distinguish big data: • Volume (how much capacity you need to process/store) • Velocity (how quickly you need to process updates) • Variety (how complicated/non-standard the data is) Volume VelocityVariety BIG DATA
  • 7. (c) 2013 Ian Brown 77 source: datasciencecentral
  • 8. (c) 2013 Ian Brown 8 UNITS Source: www.wikipedia.com
  • 9. (c) 2013 Ian Brown 9 VOLUME • From pre-history to 2004 the world generated around 5 exabytes of data - we now produce that amount every 2 days • Data volumes are huge and growing: 1.8 zettabytes in 2011 • = 1’800 Petabytes • =1.8 billion Terabytes • Data is predicted to grow x44 by 2020 • >40% every year
  • 10. (c) 2013 Ian Brown 10 VOLUME • Whilst data has previously been “big” for some people, sometimes in the past - it’s definitely potentially big now (for everyone) and getting bigger every day • Sources are networks (voice/data/video), social networks, sensors & transducers, GPS, banking, logistics, trade etc • 90% of the World’s digital data was gathered in the last 2 years (source: IBM 2012)
  • 11. (c) 2013 Ian Brown 11 VARIETY (Variability) • Governments and Corporates have always had big databases but the data has always been structured - invoices, customers, inventory etc • Of the huge increase in data we just mentioned only 10-20% will be structured - the rest (80-90%) will be unstructured: • Video, email, social media, audio, images/scanned material • Traditional SQL databases (the clue is in the S) don’t do well with this sort of mixed data
  • 12. (c) 2013 Ian Brown 12 VELOCITY • Data is now coming at users constantly from global sources which therefore gives a 24x7 problem. • Q.When do you stop to summarise/analyse? At what point do you cut-off for the day/week/period to run a report or plan the next action? • A. Sometimes you can’t! Analysis/processing/Action may have to happen on streaming data and corrections or actions are taken on-the-fly. Sometimes without storing the data!
  • 13. (c) 2013 Ian Brown 1313 source: datasciencecentral
  • 14. (c) 2013 Ian Brown 14 HASN’T DATA ALWAYS BEEN “BIG”? • Maybe. • Historically computing was done in “batches” where stacks of punchcards or reels of tape (first paper, then magnetic) were processed one file at a time.This had to be done when the business was “closed”. • If you closed at 18:00 and opened the next day at 09:00 you had a window of 15 hours to do all your calculations and reports before you had to stop and open for the next day’s business. • If you couldn’t get it done in 15 hours your data was “big”
  • 15. (c) 2013 Ian Brown 15 • Hence this is a relative question of how much data vs how much computing you can throw at it • For more than three decades we have seen a constant increase in computing power which made the data generated by most businesses through their local customers look “small” • Then the Web happened .... HASN’T DATA ALWAYS BEEN “BIG”?
  • 16. (c) 2013 Ian Brown 16 • Initially Web 1.0 and eCommerce opened up servers to many millions of events in terms of “hits” on web sites, logs, emails and a global multiplier of who could be a customer and access your system. Analysis of who was searching for what and who was buying what absorbed a lot of computing capacity. • Web 2.0 has added hundreds of millions of social networking users all broadcasting data in terms of photos, tweets, status updates, blog posts etc which has created a truly vast ocean of data which can be trawled to learn about our behaviours, beliefs and likely future actions. • If you want to process this data it certainly has volume, it doesn’t stop coming at you when you close for the night and so has tremendous velocity and if you are pulling it in from several sources it quickly starts to exhibit complexity and variety • Traditional Hardware/Software has not kept pace with the growth of volume/velocity/variety HASN’T DATA ALWAYS BEEN “BIG”?
  • 17. (c) 2013 Ian Brown 17 WHO NEEDS BIG DATA? • Generally: anyone who can derive a “big picture” insight by adding up all the small data points and “zooming out” • How much can you say about one tweet? A thousand tweets? • Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add another billion tweets. Source: www.statisticbrain.com (2012) • What you “reckon” changes into sentiment analysis
  • 18. (c) 2013 Ian Brown 18 • Generally: anyone who can derive a “big picture” insight by adding up all the small data points and “zooming out” • How much can you say about one tweet? A thousand tweets? • Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add another billion tweets. Source: www.statisticbrain.com (2012) • What you “reckon” changes into sentiment analysis Source Flickr WHO NEEDS BIG DATA?
  • 19. (c) 2013 Ian Brown 19 THE SCALE CHANGES THINGS • Big Data may be analogous to the difference between the insight in a picture vs. a video Source: slowmotionrunninghorse.com
  • 20. (c) 2013 Ian Brown 20 THE SCALE CHANGES THINGS • Big Data may be analogous to the difference between the insight in a picture vs. a video Source: slowmotionrunninghorse.com
  • 21. (c) 2013 Ian Brown 21 WHY CARE? • Governments - release of open data: McKinsey est. $300m per year savings in US, $100m savings in Europe • Banks - fraud detection, algo trading: losses/profits. 2/3rd of 7 Bn US shares a day .. • Life Sciences - genomics, drug research. 10yrs to seq the human genome • Retailers - buying patterns, CRM, if you like this ... : cross-selling • Social - Google, Facebook, LinkedIn,Twitter, Amazon, eBay: - Insight! • Networks - load management/routing, protecting networks • Probabalistic outcomes - Google Flu predictions (Nature: 2009)
  • 22. (c) 2013 Ian Brown 22 WHAT’S THE DIFFERENCE? •EXHAUSTIVE •SCRUFFY •PRAGMATIC Anything missing ...? Source: damfoundation.org
  • 23. (c) 2013 Ian Brown 23 SO WHAT? • Three key pieces have shifted: • A shift from sampling to populations • A shift from exactness to “gisting” • A move from causality to correlation • Data no longer tied to the purpose for which it was collected Data used to be small, exact and causal
  • 24. (c) 2013 Ian Brown 24 ASPECTS Source: www.datasciencecentral.com
  • 25. (c) 2013 Ian Brown 25 NEW SOURCES OF DATA • Information is now gathered on events and values that were not traditionally thought of as data: • Current location (vs. address) • Whether you “like” someone else’s post • Things you nearly bought but didn’t • How much energy your office needs now • PLUS transactional systems, social media, sensors etc etc
  • 26. (c) 2013 Ian Brown 26 HOW DOES IT WORK? • Is this just a big database running on a powerful machine? • Not usually. Traditional databases don’t scale to this • Many hands make light work: Remember S.E.T.I. ? • Split it up and share it out between many nodes • Key analysis perspectives: • Real-time streaming data analysis (detect events and act) • Business Intelligence (asking specific questions of) • Data Mining (asking is there anything interesting here?)
  • 27. (c) 2013 Ian Brown 28 PHYSICALLY Source: Leons Petražickis, IBM Canada
  • 28. (c) 2013 Ian Brown 29 WHAT ARE THE PIECES? • HDFS Distributed File system (Google) • MapReduce (Google) • Split the problem into chunks • Spread it out over lots of (cheap) computing nodes • Reassemble the answer from the parts
  • 29. (c) 2013 Ian Brown 30 LOGICALLY Source: Leons Petražickis, IBM Canada
  • 30. (c) 2013 Ian Brown 31 WHAT IS THE APPROACH? • Somewhere to store it across different systems • e.g. Distributed File System (HDFS) - batch mode • Some way of specifying work in pieces/jobs • e.g. Hadoop (Yahoo) or MapReduce (for low-level jobs) • e.g. Pig or Hive or Oozie (for high-level apps/queries that translate to MapReduce) • Some way of reading/processing in real-time vs batch e.g. Hbase and Flume • Some way mining the data for trends/meaning (Data Mining/Machine learning) e.g. Mahout • Some way of getting data in/out of SQL databases e.g. Sqoop
  • 31. (c) 2013 Ian Brown 32 HOW MANY “CHUNKS”? • eBay had 530 cores in 2010. It’s now in excess of 2’500 cores • Yahoo has >4’000 cores • FaceBook have 23’000 cores with 20Pb of storage - be careful what you “like”... • Google aren’t telling .... (24Pb of data / day) • LinkedIn offer 100Bn recommendations / week
  • 32. (c) 2013 Ian Brown 33 WHERE CAN I GET SOME!! • IBM • ORACLE • MICROSOFT • EMC • Informatica • Apache - Open source • Amazon - Elastic computing / cloud-based hadoop • Small installations are free
  • 33. (c) 2013 Ian Brown 34 THE FUTURE OF BIG DATA
  • 34. (c) 2013 Ian Brown 35 THE FUTURE .. HYPE CYCLE
  • 35. (c) 2013 Ian Brown 36
  • 36. (c) 2013 Ian Brown 37 WHERE AREYOU?
  • 37. (c) 2013 Ian Brown 38 TRENDS • More data - MUCH MUCH MORE data • Internet of Things (IOT) - instrumentation/measurement • SmartEnergy meters 2005, RFID tags (1.3bn 2011 >30bn 2013) • each A380 engine gives 10TB every 30m: 640TB JFK->London • Big Science: Genomics, Pharmacology. LHC experiment gives 40TB/sec!! • Much more video and unstructured stuff (~60% of Internet traffic video by 2015) • The re-invention (or demise) of search/SEO • The need to move from local big data to distributed big data and sense-making networks • The rise of Observation - the need to filter and gain more control
  • 38. (c) 2013 Ian Brown 39 Where does that leave your company? source: sap.com
  • 39. (c) 2013 Ian Brown 40 MAGIC BULLET? • Hadoop probably won’t replace your existing database • It is very good at large files/data sets so you not see so much benefit from large volumes of small files/datasets • It is very good at dealing with unstructured data so if your data is largely structured or can be made to look structured you may be better to stick with traditional databases • It doesn’t need to know about how you want to query the data which makes it very flexible but if your queries are always the same you may be able to stick with SQL databases and BI/DW systems
  • 40. (c) 2013 Ian Brown 41 TWO THINGS WORTH REMEMBERING ..
  • 41. (c) 2013 Ian Brown 42 Questions?