SlideShare a Scribd company logo
1 of 41
1
THE ELEPHANT IN THE ROOM:
WHEN DID DATA GET SO BIG?
(c) 2013 Ian Brown 2
we'll talk ABOUT
(c) 2013 Ian Brown 3
we'll talk ABOUT
(c) 2013 Ian Brown 4
WE’LL TALK ABOUT
(c) 2013 Ian Brown 5
WE’LL TALK ABOUT
• What is Big Data? What makes it "Big"?
• Who needs Big Data? Where does it come from?
• How does Big Data work? What are the tools and the
issues?
• What does the future of Big Data look like?
(c) 2013 Ian Brown 6
WHAT IS BIG DATA?
• To some extent “Big” really means
“Difficult to handle”
• Something of a misnomer:
not only about size as three things
distinguish big data:
• Volume (how much capacity you need
to process/store)
• Velocity (how quickly you need to process updates)
• Variety (how complicated/non-standard the data is)
Volume
VelocityVariety
BIG DATA
(c) 2013 Ian Brown 77
source: datasciencecentral
(c) 2013 Ian Brown 8
UNITS
Source: www.wikipedia.com
(c) 2013 Ian Brown 9
VOLUME
• From pre-history to 2004 the world generated around 5 exabytes
of data - we now produce that amount every 2 days
• Data volumes are huge and growing: 1.8 zettabytes in 2011
• = 1’800 Petabytes
• =1.8 billion Terabytes
• Data is predicted to grow x44 by 2020
• >40% every year
(c) 2013 Ian Brown 10
VOLUME
• Whilst data has previously been “big” for some people,
sometimes in the past - it’s definitely potentially big now (for
everyone) and getting bigger every day
• Sources are networks (voice/data/video), social networks,
sensors & transducers, GPS, banking, logistics, trade etc
• 90% of the World’s digital data was gathered in the last 2
years (source: IBM 2012)
(c) 2013 Ian Brown 11
VARIETY (Variability)
• Governments and Corporates have always had big databases
but the data has always been structured - invoices, customers,
inventory etc
• Of the huge increase in data we just mentioned only 10-20%
will be structured - the rest (80-90%) will be unstructured:
• Video, email, social media, audio, images/scanned material
• Traditional SQL databases (the clue is in the S) don’t do well
with this sort of mixed data
(c) 2013 Ian Brown 12
VELOCITY
• Data is now coming at users constantly from global sources
which therefore gives a 24x7 problem.
• Q.When do you stop to summarise/analyse? At what point
do you cut-off for the day/week/period to run a report or
plan the next action?
• A. Sometimes you can’t! Analysis/processing/Action may
have to happen on streaming data and corrections or
actions are taken on-the-fly. Sometimes without storing the
data!
(c) 2013 Ian Brown 1313
source: datasciencecentral
(c) 2013 Ian Brown 14
HASN’T DATA ALWAYS
BEEN “BIG”?
• Maybe.
• Historically computing was done in “batches” where stacks of
punchcards or reels of tape (first paper, then magnetic) were
processed one file at a time.This had to be done when the business
was “closed”.
• If you closed at 18:00 and opened the next day at 09:00 you had a
window of 15 hours to do all your calculations and reports before
you had to stop and open for the next day’s business.
• If you couldn’t get it done in 15 hours your data was “big”
(c) 2013 Ian Brown 15
• Hence this is a relative question of how much data vs how
much computing you can throw at it
• For more than three decades we have seen a constant
increase in computing power which made the data
generated by most businesses through their local customers
look “small”
• Then the Web happened ....
HASN’T DATA ALWAYS
BEEN “BIG”?
(c) 2013 Ian Brown 16
• Initially Web 1.0 and eCommerce opened up servers to many millions of events in terms
of “hits” on web sites, logs, emails and a global multiplier of who could be a customer and
access your system. Analysis of who was searching for what and who was buying what
absorbed a lot of computing capacity.
• Web 2.0 has added hundreds of millions of social networking users all broadcasting data
in terms of photos, tweets, status updates, blog posts etc which has created a truly vast
ocean of data which can be trawled to learn about our behaviours, beliefs and likely future
actions.
• If you want to process this data it certainly has volume, it doesn’t stop coming at you when
you close for the night and so has tremendous velocity and if you are pulling it in from several
sources it quickly starts to exhibit complexity and variety
• Traditional Hardware/Software has not kept pace with the growth of volume/velocity/variety
HASN’T DATA ALWAYS
BEEN “BIG”?
(c) 2013 Ian Brown 17
WHO NEEDS BIG DATA?
• Generally: anyone who can derive a “big picture” insight by adding up all the small data
points and “zooming out”
• How much can you say about one tweet? A thousand tweets?
• Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add
another billion tweets.
Source: www.statisticbrain.com (2012)
• What you “reckon” changes into sentiment analysis
(c) 2013 Ian Brown 18
• Generally: anyone who can derive a “big picture” insight by adding up all the small data
points and “zooming out”
• How much can you say about one tweet? A thousand tweets?
• Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add
another billion tweets.
Source: www.statisticbrain.com (2012)
• What you “reckon” changes into sentiment analysis
Source
Flickr
WHO NEEDS BIG DATA?
(c) 2013 Ian Brown 19
THE SCALE CHANGES
THINGS
• Big Data may be analogous to the
difference between the insight in
a picture vs. a video
Source: slowmotionrunninghorse.com
(c) 2013 Ian Brown 20
THE SCALE CHANGES
THINGS
• Big Data may be analogous to the
difference between the insight in
a picture vs. a video
Source: slowmotionrunninghorse.com
(c) 2013 Ian Brown 21
WHY CARE?
• Governments - release of open data: McKinsey est. $300m per year
savings in US, $100m savings in Europe
• Banks - fraud detection, algo trading: losses/profits. 2/3rd of 7 Bn US shares
a day ..
• Life Sciences - genomics, drug research. 10yrs to seq the human genome
• Retailers - buying patterns, CRM, if you like this ... : cross-selling
• Social - Google, Facebook, LinkedIn,Twitter, Amazon, eBay: - Insight!
• Networks - load management/routing, protecting networks
• Probabalistic outcomes - Google Flu predictions (Nature: 2009)
(c) 2013 Ian Brown 22
WHAT’S THE DIFFERENCE?
•EXHAUSTIVE
•SCRUFFY
•PRAGMATIC
Anything missing ...?
Source: damfoundation.org
(c) 2013 Ian Brown 23
SO WHAT?
• Three key pieces have shifted:
• A shift from sampling to populations
• A shift from exactness to “gisting”
• A move from causality to correlation
• Data no longer tied to the purpose for which it was
collected
Data used to be
small, exact and
causal
(c) 2013 Ian Brown 24
ASPECTS
Source: www.datasciencecentral.com
(c) 2013 Ian Brown 25
NEW SOURCES OF DATA
• Information is now gathered on events and values that were not
traditionally thought of as data:
• Current location (vs. address)
• Whether you “like” someone else’s post
• Things you nearly bought but didn’t
• How much energy your office needs now
• PLUS transactional systems, social media, sensors etc etc
(c) 2013 Ian Brown 26
HOW DOES IT WORK?
• Is this just a big database running on a powerful machine?
• Not usually. Traditional databases don’t scale to this
• Many hands make light work: Remember S.E.T.I. ?
• Split it up and share it out between many nodes
• Key analysis perspectives:
• Real-time streaming data analysis (detect events and act)
• Business Intelligence (asking specific questions of)
• Data Mining (asking is there anything interesting here?)
(c) 2013 Ian Brown 28
PHYSICALLY
Source: Leons Petražickis, IBM Canada
(c) 2013 Ian Brown 29
WHAT ARE THE PIECES?
• HDFS Distributed File system (Google)
• MapReduce (Google)
• Split the problem into chunks
• Spread it out over lots of (cheap) computing nodes
• Reassemble the answer from the parts
(c) 2013 Ian Brown 30
LOGICALLY
Source: Leons Petražickis, IBM Canada
(c) 2013 Ian Brown 31
WHAT IS THE APPROACH?
• Somewhere to store it across different systems
• e.g. Distributed File System (HDFS) - batch mode
• Some way of specifying work in pieces/jobs
• e.g. Hadoop (Yahoo) or MapReduce (for low-level jobs)
• e.g. Pig or Hive or Oozie (for high-level apps/queries that translate
to MapReduce)
• Some way of reading/processing in real-time vs batch e.g. Hbase and
Flume
• Some way mining the data for trends/meaning (Data Mining/Machine
learning) e.g. Mahout
• Some way of getting data in/out of SQL databases e.g. Sqoop
(c) 2013 Ian Brown 32
HOW MANY “CHUNKS”?
• eBay had 530 cores in 2010. It’s now in excess of 2’500
cores
• Yahoo has >4’000 cores
• FaceBook have 23’000 cores with 20Pb of storage - be
careful what you “like”...
• Google aren’t telling .... (24Pb of data / day)
• LinkedIn offer 100Bn recommendations / week
(c) 2013 Ian Brown 33
WHERE CAN I GET SOME!!
• IBM
• ORACLE
• MICROSOFT
• EMC
• Informatica
• Apache - Open source
• Amazon - Elastic computing / cloud-based hadoop
• Small installations are free
(c) 2013 Ian Brown 34
THE FUTURE OF BIG DATA
(c) 2013 Ian Brown 35
THE FUTURE ..
HYPE CYCLE
(c) 2013 Ian Brown 36
(c) 2013 Ian Brown 37
WHERE AREYOU?
(c) 2013 Ian Brown 38
TRENDS
• More data - MUCH MUCH MORE data
• Internet of Things (IOT) - instrumentation/measurement
• SmartEnergy meters 2005, RFID tags (1.3bn 2011 >30bn 2013)
• each A380 engine gives 10TB every 30m: 640TB JFK->London
• Big Science: Genomics, Pharmacology. LHC experiment gives 40TB/sec!!
• Much more video and unstructured stuff (~60% of Internet traffic video by 2015)
• The re-invention (or demise) of search/SEO
• The need to move from local big data to distributed big data and sense-making networks
• The rise of Observation - the need to filter and gain more control
(c) 2013 Ian Brown 39
Where does that leave your
company?
source: sap.com
(c) 2013 Ian Brown 40
MAGIC BULLET?
• Hadoop probably won’t replace your existing database
• It is very good at large files/data sets so you not see so much
benefit from large volumes of small files/datasets
• It is very good at dealing with unstructured data so if your data is
largely structured or can be made to look structured you may be
better to stick with traditional databases
• It doesn’t need to know about how you want to query the data
which makes it very flexible but if your queries are always the
same you may be able to stick with SQL databases and BI/DW
systems
(c) 2013 Ian Brown 41
TWO THINGS WORTH
REMEMBERING ..
(c) 2013 Ian Brown 42
Questions?

More Related Content

What's hot

Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta diyotta
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjMirko Lorenz
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everythingDavid Gerhard
 
AI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesAI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesBrian Pichman
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013Brian Crotty
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Teaching information: from Google Search to Big Data
Teaching information: from Google Search to Big DataTeaching information: from Google Search to Big Data
Teaching information: from Google Search to Big DataMartin Patrick
 
Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Sanjeev Kumar
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution gngeorge
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data scienceFabio Stella
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...UNDP Eurasia
 
The New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business AdvantageThe New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business AdvantageJoAnna Cheshire
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...IABmembership
 
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...Flevum
 

What's hot (18)

Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everything
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
AI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for LibrariesAI - Artificial Intelligence - Implications for Libraries
AI - Artificial Intelligence - Implications for Libraries
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Teaching information: from Google Search to Big Data
Teaching information: from Google Search to Big DataTeaching information: from Google Search to Big Data
Teaching information: from Google Search to Big Data
 
Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?Why Data is Drowning the (IT) World?
Why Data is Drowning the (IT) World?
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
 
2019 June 27 - Big data and data science
2019 June 27 - Big data and data science2019 June 27 - Big data and data science
2019 June 27 - Big data and data science
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
 
The New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business AdvantageThe New Convergence of Data; the Next Strategic Business Advantage
The New Convergence of Data; the Next Strategic Business Advantage
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
 
Big Data! Dopey Quotes!
Big Data! Dopey Quotes!Big Data! Dopey Quotes!
Big Data! Dopey Quotes!
 

Viewers also liked

State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012PresentMark
 
Acatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAcatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAntonella Stancu
 
Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)David McClelland
 
Malalties relacionades amb estils de vida
Malalties relacionades amb estils de vidaMalalties relacionades amb estils de vida
Malalties relacionades amb estils de vidaNoelia Medina Allué
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单viraree
 
SEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon TyneSEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon TyneWeb Social Media
 
Acatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAcatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAntonella Stancu
 
Zeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeZeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeAntonella Stancu
 
RCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom FranklandRCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom FranklandWeb Science Institute
 
Funny-stuff-my-kids-have-said
 Funny-stuff-my-kids-have-said Funny-stuff-my-kids-have-said
Funny-stuff-my-kids-have-saidAntonella Stancu
 
Kennismaking Flynth Regio Midden
Kennismaking Flynth Regio MiddenKennismaking Flynth Regio Midden
Kennismaking Flynth Regio MiddenFransJansen505
 
The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)David McClelland
 
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)David McClelland
 
FlytelFun
FlytelFunFlytelFun
FlytelFunviraree
 

Viewers also liked (20)

The Digital University #IDRW2014
The Digital University #IDRW2014The Digital University #IDRW2014
The Digital University #IDRW2014
 
State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012State of Central Florida Arts Organizations 2012
State of Central Florida Arts Organizations 2012
 
Acatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritulAcatistul cuviosului paisie_aghioritul
Acatistul cuviosului paisie_aghioritul
 
Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)Lunch and Learn - Backup and Recovery Basics (2011)
Lunch and Learn - Backup and Recovery Basics (2011)
 
Read my blog
Read my blog Read my blog
Read my blog
 
Malalties relacionades amb estils de vida
Malalties relacionades amb estils de vidaMalalties relacionades amb estils de vida
Malalties relacionades amb estils de vida
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单"飞住玩"法国-深度游-菜单
"飞住玩"法国-深度游-菜单
 
SEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon TyneSEO SEM and Web Analytics in Newcastle Upon Tyne
SEO SEM and Web Analytics in Newcastle Upon Tyne
 
Acatistul sfântului luca al Crimeei
Acatistul sfântului luca al CrimeeiAcatistul sfântului luca al Crimeei
Acatistul sfântului luca al Crimeei
 
Curriculum Innovation for DE Lunch
Curriculum Innovation for DE LunchCurriculum Innovation for DE Lunch
Curriculum Innovation for DE Lunch
 
Zeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-anticeZeii-si-miturile-lumii-antice
Zeii-si-miturile-lumii-antice
 
Hype Springs Eternal
Hype Springs EternalHype Springs Eternal
Hype Springs Eternal
 
RCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom FranklandRCUK PATINA presentation - Tom Frankland
RCUK PATINA presentation - Tom Frankland
 
Cool Social Media Tools
Cool Social Media ToolsCool Social Media Tools
Cool Social Media Tools
 
Funny-stuff-my-kids-have-said
 Funny-stuff-my-kids-have-said Funny-stuff-my-kids-have-said
Funny-stuff-my-kids-have-said
 
Kennismaking Flynth Regio Midden
Kennismaking Flynth Regio MiddenKennismaking Flynth Regio Midden
Kennismaking Flynth Regio Midden
 
The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)The Mobile Landscape (March 2014)
The Mobile Landscape (March 2014)
 
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
Data is Dull! Make Challenging Content Interesting with Online Video! (Jan 2012)
 
FlytelFun
FlytelFunFlytelFun
FlytelFun
 

Similar to Big Data

Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
Confluence2016
Confluence2016Confluence2016
Confluence2016Bebo White
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceJedha Bootcamp
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureInside Analysis
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxvarun453331
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data ScienceWim Van Leuven
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)puja singh
 

Similar to Big Data (20)

Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Confluence2016
Confluence2016Confluence2016
Confluence2016
 
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air FranceQu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
Qu'est ce que le Big Data ? Avec Victoria Galano Data Scientist chez Air France
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Think Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information ArchitectureThink Big - How to Design a Big Data Information Architecture
Think Big - How to Design a Big Data Information Architecture
 
Big data
Big dataBig data
Big data
 
big data
big data big data
big data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)Puja(801),sanghamitra(819),surabhi(844)
Puja(801),sanghamitra(819),surabhi(844)
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 

More from Web Science Institute

More from Web Science Institute (10)

Disrupting our education and research through MOOCs
Disrupting our education and research through MOOCsDisrupting our education and research through MOOCs
Disrupting our education and research through MOOCs
 
Working together for MOOCs
Working together for MOOCsWorking together for MOOCs
Working together for MOOCs
 
The Digital University
The Digital UniversityThe Digital University
The Digital University
 
Web Science Research Week
Web Science Research WeekWeb Science Research Week
Web Science Research Week
 
Introduction to Web Science Institute
Introduction to Web Science InstituteIntroduction to Web Science Institute
Introduction to Web Science Institute
 
Making the most of social media july 2013
Making the most of social media   july 2013Making the most of social media   july 2013
Making the most of social media july 2013
 
Design and Implementation of MOOCs
Design and Implementation of MOOCsDesign and Implementation of MOOCs
Design and Implementation of MOOCs
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
 
Digital Champions #digichamps
Digital Champions #digichampsDigital Champions #digichamps
Digital Champions #digichamps
 
Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011Digital Economies 12 Dec 2011
Digital Economies 12 Dec 2011
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

Big Data

  • 1. 1 THE ELEPHANT IN THE ROOM: WHEN DID DATA GET SO BIG?
  • 2. (c) 2013 Ian Brown 2 we'll talk ABOUT
  • 3. (c) 2013 Ian Brown 3 we'll talk ABOUT
  • 4. (c) 2013 Ian Brown 4 WE’LL TALK ABOUT
  • 5. (c) 2013 Ian Brown 5 WE’LL TALK ABOUT • What is Big Data? What makes it "Big"? • Who needs Big Data? Where does it come from? • How does Big Data work? What are the tools and the issues? • What does the future of Big Data look like?
  • 6. (c) 2013 Ian Brown 6 WHAT IS BIG DATA? • To some extent “Big” really means “Difficult to handle” • Something of a misnomer: not only about size as three things distinguish big data: • Volume (how much capacity you need to process/store) • Velocity (how quickly you need to process updates) • Variety (how complicated/non-standard the data is) Volume VelocityVariety BIG DATA
  • 7. (c) 2013 Ian Brown 77 source: datasciencecentral
  • 8. (c) 2013 Ian Brown 8 UNITS Source: www.wikipedia.com
  • 9. (c) 2013 Ian Brown 9 VOLUME • From pre-history to 2004 the world generated around 5 exabytes of data - we now produce that amount every 2 days • Data volumes are huge and growing: 1.8 zettabytes in 2011 • = 1’800 Petabytes • =1.8 billion Terabytes • Data is predicted to grow x44 by 2020 • >40% every year
  • 10. (c) 2013 Ian Brown 10 VOLUME • Whilst data has previously been “big” for some people, sometimes in the past - it’s definitely potentially big now (for everyone) and getting bigger every day • Sources are networks (voice/data/video), social networks, sensors & transducers, GPS, banking, logistics, trade etc • 90% of the World’s digital data was gathered in the last 2 years (source: IBM 2012)
  • 11. (c) 2013 Ian Brown 11 VARIETY (Variability) • Governments and Corporates have always had big databases but the data has always been structured - invoices, customers, inventory etc • Of the huge increase in data we just mentioned only 10-20% will be structured - the rest (80-90%) will be unstructured: • Video, email, social media, audio, images/scanned material • Traditional SQL databases (the clue is in the S) don’t do well with this sort of mixed data
  • 12. (c) 2013 Ian Brown 12 VELOCITY • Data is now coming at users constantly from global sources which therefore gives a 24x7 problem. • Q.When do you stop to summarise/analyse? At what point do you cut-off for the day/week/period to run a report or plan the next action? • A. Sometimes you can’t! Analysis/processing/Action may have to happen on streaming data and corrections or actions are taken on-the-fly. Sometimes without storing the data!
  • 13. (c) 2013 Ian Brown 1313 source: datasciencecentral
  • 14. (c) 2013 Ian Brown 14 HASN’T DATA ALWAYS BEEN “BIG”? • Maybe. • Historically computing was done in “batches” where stacks of punchcards or reels of tape (first paper, then magnetic) were processed one file at a time.This had to be done when the business was “closed”. • If you closed at 18:00 and opened the next day at 09:00 you had a window of 15 hours to do all your calculations and reports before you had to stop and open for the next day’s business. • If you couldn’t get it done in 15 hours your data was “big”
  • 15. (c) 2013 Ian Brown 15 • Hence this is a relative question of how much data vs how much computing you can throw at it • For more than three decades we have seen a constant increase in computing power which made the data generated by most businesses through their local customers look “small” • Then the Web happened .... HASN’T DATA ALWAYS BEEN “BIG”?
  • 16. (c) 2013 Ian Brown 16 • Initially Web 1.0 and eCommerce opened up servers to many millions of events in terms of “hits” on web sites, logs, emails and a global multiplier of who could be a customer and access your system. Analysis of who was searching for what and who was buying what absorbed a lot of computing capacity. • Web 2.0 has added hundreds of millions of social networking users all broadcasting data in terms of photos, tweets, status updates, blog posts etc which has created a truly vast ocean of data which can be trawled to learn about our behaviours, beliefs and likely future actions. • If you want to process this data it certainly has volume, it doesn’t stop coming at you when you close for the night and so has tremendous velocity and if you are pulling it in from several sources it quickly starts to exhibit complexity and variety • Traditional Hardware/Software has not kept pace with the growth of volume/velocity/variety HASN’T DATA ALWAYS BEEN “BIG”?
  • 17. (c) 2013 Ian Brown 17 WHO NEEDS BIG DATA? • Generally: anyone who can derive a “big picture” insight by adding up all the small data points and “zooming out” • How much can you say about one tweet? A thousand tweets? • Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add another billion tweets. Source: www.statisticbrain.com (2012) • What you “reckon” changes into sentiment analysis
  • 18. (c) 2013 Ian Brown 18 • Generally: anyone who can derive a “big picture” insight by adding up all the small data points and “zooming out” • How much can you say about one tweet? A thousand tweets? • Twitter is generating > 9’000 tweets/sec which means it takes around 5 days to add another billion tweets. Source: www.statisticbrain.com (2012) • What you “reckon” changes into sentiment analysis Source Flickr WHO NEEDS BIG DATA?
  • 19. (c) 2013 Ian Brown 19 THE SCALE CHANGES THINGS • Big Data may be analogous to the difference between the insight in a picture vs. a video Source: slowmotionrunninghorse.com
  • 20. (c) 2013 Ian Brown 20 THE SCALE CHANGES THINGS • Big Data may be analogous to the difference between the insight in a picture vs. a video Source: slowmotionrunninghorse.com
  • 21. (c) 2013 Ian Brown 21 WHY CARE? • Governments - release of open data: McKinsey est. $300m per year savings in US, $100m savings in Europe • Banks - fraud detection, algo trading: losses/profits. 2/3rd of 7 Bn US shares a day .. • Life Sciences - genomics, drug research. 10yrs to seq the human genome • Retailers - buying patterns, CRM, if you like this ... : cross-selling • Social - Google, Facebook, LinkedIn,Twitter, Amazon, eBay: - Insight! • Networks - load management/routing, protecting networks • Probabalistic outcomes - Google Flu predictions (Nature: 2009)
  • 22. (c) 2013 Ian Brown 22 WHAT’S THE DIFFERENCE? •EXHAUSTIVE •SCRUFFY •PRAGMATIC Anything missing ...? Source: damfoundation.org
  • 23. (c) 2013 Ian Brown 23 SO WHAT? • Three key pieces have shifted: • A shift from sampling to populations • A shift from exactness to “gisting” • A move from causality to correlation • Data no longer tied to the purpose for which it was collected Data used to be small, exact and causal
  • 24. (c) 2013 Ian Brown 24 ASPECTS Source: www.datasciencecentral.com
  • 25. (c) 2013 Ian Brown 25 NEW SOURCES OF DATA • Information is now gathered on events and values that were not traditionally thought of as data: • Current location (vs. address) • Whether you “like” someone else’s post • Things you nearly bought but didn’t • How much energy your office needs now • PLUS transactional systems, social media, sensors etc etc
  • 26. (c) 2013 Ian Brown 26 HOW DOES IT WORK? • Is this just a big database running on a powerful machine? • Not usually. Traditional databases don’t scale to this • Many hands make light work: Remember S.E.T.I. ? • Split it up and share it out between many nodes • Key analysis perspectives: • Real-time streaming data analysis (detect events and act) • Business Intelligence (asking specific questions of) • Data Mining (asking is there anything interesting here?)
  • 27. (c) 2013 Ian Brown 28 PHYSICALLY Source: Leons Petražickis, IBM Canada
  • 28. (c) 2013 Ian Brown 29 WHAT ARE THE PIECES? • HDFS Distributed File system (Google) • MapReduce (Google) • Split the problem into chunks • Spread it out over lots of (cheap) computing nodes • Reassemble the answer from the parts
  • 29. (c) 2013 Ian Brown 30 LOGICALLY Source: Leons Petražickis, IBM Canada
  • 30. (c) 2013 Ian Brown 31 WHAT IS THE APPROACH? • Somewhere to store it across different systems • e.g. Distributed File System (HDFS) - batch mode • Some way of specifying work in pieces/jobs • e.g. Hadoop (Yahoo) or MapReduce (for low-level jobs) • e.g. Pig or Hive or Oozie (for high-level apps/queries that translate to MapReduce) • Some way of reading/processing in real-time vs batch e.g. Hbase and Flume • Some way mining the data for trends/meaning (Data Mining/Machine learning) e.g. Mahout • Some way of getting data in/out of SQL databases e.g. Sqoop
  • 31. (c) 2013 Ian Brown 32 HOW MANY “CHUNKS”? • eBay had 530 cores in 2010. It’s now in excess of 2’500 cores • Yahoo has >4’000 cores • FaceBook have 23’000 cores with 20Pb of storage - be careful what you “like”... • Google aren’t telling .... (24Pb of data / day) • LinkedIn offer 100Bn recommendations / week
  • 32. (c) 2013 Ian Brown 33 WHERE CAN I GET SOME!! • IBM • ORACLE • MICROSOFT • EMC • Informatica • Apache - Open source • Amazon - Elastic computing / cloud-based hadoop • Small installations are free
  • 33. (c) 2013 Ian Brown 34 THE FUTURE OF BIG DATA
  • 34. (c) 2013 Ian Brown 35 THE FUTURE .. HYPE CYCLE
  • 35. (c) 2013 Ian Brown 36
  • 36. (c) 2013 Ian Brown 37 WHERE AREYOU?
  • 37. (c) 2013 Ian Brown 38 TRENDS • More data - MUCH MUCH MORE data • Internet of Things (IOT) - instrumentation/measurement • SmartEnergy meters 2005, RFID tags (1.3bn 2011 >30bn 2013) • each A380 engine gives 10TB every 30m: 640TB JFK->London • Big Science: Genomics, Pharmacology. LHC experiment gives 40TB/sec!! • Much more video and unstructured stuff (~60% of Internet traffic video by 2015) • The re-invention (or demise) of search/SEO • The need to move from local big data to distributed big data and sense-making networks • The rise of Observation - the need to filter and gain more control
  • 38. (c) 2013 Ian Brown 39 Where does that leave your company? source: sap.com
  • 39. (c) 2013 Ian Brown 40 MAGIC BULLET? • Hadoop probably won’t replace your existing database • It is very good at large files/data sets so you not see so much benefit from large volumes of small files/datasets • It is very good at dealing with unstructured data so if your data is largely structured or can be made to look structured you may be better to stick with traditional databases • It doesn’t need to know about how you want to query the data which makes it very flexible but if your queries are always the same you may be able to stick with SQL databases and BI/DW systems
  • 40. (c) 2013 Ian Brown 41 TWO THINGS WORTH REMEMBERING ..
  • 41. (c) 2013 Ian Brown 42 Questions?