SlideShare a Scribd company logo
1 of 9
Download to read offline
Big Data
Øyvin Halfan Thuv
CTO Whitefox AS
e: oyvin@whitefox.no
t: @oyvinht
torsdag 23. mai 13
Abstract
• Short wrap up of Big Data history
• But, what’s new? Why are we here?
• What can we do now (from our couch) ?
torsdag 23. mai 13
Who am I... to talk about this?
• Ardent interest, B.Sc. in IT
Maths (I particularly recommend discrete maths for Big Data!)
Computational Linguistics
AI stuff
Thesis on data mining Unix system logs for surveillance
• M.Sc. degree in Artificial Intelligence (AI)
Thesis on artificial life:
«Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation»
Nature is packed with Big Data
• Intern at CERN
Developing the search engine
Indexing (and making sense) of > 6 million documents
torsdag 23. mai 13
Mini-history
• Before ~2000
Save just the stuff that could prove useful. Query/filter/select data to present it.
• After ~2000
Just store everything - it’s cheap and we can look into it later.
OLAP automates «looking».
• Gartner 2012:
«Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process optimization.»
(puh!)
Neo: Do you always look at it encoded?
Cypher: Well, you have to (...) there's way too
much information to decode the Matrix. You get
used to it. I — I don't even see the code. All I see is
blonde, brunette, redhead...Hey, you want a drink?
To much data
torsdag 23. mai 13
What’s new, then?
• Data capacity has doubled every 3-4 years since 1980‘ies!
• We used to have a small amount of interestingdata
• Now we have tons of boring stuff!!
• We must handle so that we
«don’t even see the code»
torsdag 23. mai 13
What’s new, then?
• We used algorithms such as apriori and ID3 for log analysis.
Fine for 40MB of data per day.
• In artificial life, there could easily be this amount of data
... per minute.
• Google processed ~24PB of data per day in 2009.
• Your 1.4kg brain can interpret this slide instantly.
torsdag 23. mai 13
This is new
• Your braincells solve one little problem
each, they tell 10 other cells about the
result, and then they tell 10 others ... you
get it (fast!)
• Google distributes their computing
...somewhat like your brain.
• They called it MapReduce.
Node 1
Node 1
Node 1
Node 1
Node 1
Node n
Map Reduce
torsdag 23. mai 13
You have it at home
• Free MapReduce-a-likes (Hadoop) are cheap in the cloud.
• MySQL is probably not a good choice for BigData analysis.
• There are free NoSQL-databases (Cassandra, Berkeley
DB, MongoDB,++) available.
• Lots of data is freely available to play with. Analyze in the
cloud.
• «The Matrix is everywhere. It is all around us. Even now, in this very room.»
torsdag 23. mai 13
That’s it
• Data is growing.
• More information, but harder to find among all
the garbage.
• Free software exists. You can make sense of your
data too.
• Unleash hidden knowledge and work smarter!
torsdag 23. mai 13

More Related Content

Similar to Big data 2013-05-23

Comp wk 1 - introduction
Comp   wk 1 - introductionComp   wk 1 - introduction
Comp wk 1 - introductionguest85dacdf
 
Computing - Week 1 - Introduction
Computing - Week 1 - IntroductionComputing - Week 1 - Introduction
Computing - Week 1 - IntroductionJamie Hutt
 
Comp Wk 1 Introduction
Comp   Wk 1   IntroductionComp   Wk 1   Introduction
Comp Wk 1 Introductionguest85dacdf
 
Week 1 - An Introduction
Week 1 - An IntroductionWeek 1 - An Introduction
Week 1 - An IntroductionJamie Hutt
 
Embedded Systems PPt.pptx
Embedded Systems PPt.pptxEmbedded Systems PPt.pptx
Embedded Systems PPt.pptxTabrezahmed39
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencingGuy Coates
 
Blue brain by rashmi gowri
Blue brain by rashmi gowriBlue brain by rashmi gowri
Blue brain by rashmi gowriRashmi Gowri
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Blue Brain - The Magic of Man
Blue Brain - The Magic of ManBlue Brain - The Magic of Man
Blue Brain - The Magic of ManPranith Chander
 

Similar to Big data 2013-05-23 (20)

Comp wk 1 - introduction
Comp   wk 1 - introductionComp   wk 1 - introduction
Comp wk 1 - introduction
 
Computing - Week 1 - Introduction
Computing - Week 1 - IntroductionComputing - Week 1 - Introduction
Computing - Week 1 - Introduction
 
Comp Wk 1 Introduction
Comp   Wk 1   IntroductionComp   Wk 1   Introduction
Comp Wk 1 Introduction
 
Week 1 - An Introduction
Week 1 - An IntroductionWeek 1 - An Introduction
Week 1 - An Introduction
 
Embedded Systems PPt.pptx
Embedded Systems PPt.pptxEmbedded Systems PPt.pptx
Embedded Systems PPt.pptx
 
Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?Machine Learning Overview: How did we get here ?
Machine Learning Overview: How did we get here ?
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
BLUE BRAIN(J.S.R)
BLUE BRAIN(J.S.R)BLUE BRAIN(J.S.R)
BLUE BRAIN(J.S.R)
 
Blue brain
Blue brainBlue brain
Blue brain
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 
Storage for next-generation sequencing
Storage for next-generation sequencingStorage for next-generation sequencing
Storage for next-generation sequencing
 
Blue brain by rashmi gowri
Blue brain by rashmi gowriBlue brain by rashmi gowri
Blue brain by rashmi gowri
 
BLUEBRAIN(J.S.R)
BLUEBRAIN(J.S.R)BLUEBRAIN(J.S.R)
BLUEBRAIN(J.S.R)
 
Bbppt584.
Bbppt584.Bbppt584.
Bbppt584.
 
Blue Brain
Blue BrainBlue Brain
Blue Brain
 
Meetup8 29 2013
Meetup8 29 2013Meetup8 29 2013
Meetup8 29 2013
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Blue Brain - The Magic of Man
Blue Brain - The Magic of ManBlue Brain - The Magic of Man
Blue Brain - The Magic of Man
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Big data 2013-05-23

  • 1. Big Data Øyvin Halfan Thuv CTO Whitefox AS e: oyvin@whitefox.no t: @oyvinht torsdag 23. mai 13
  • 2. Abstract • Short wrap up of Big Data history • But, what’s new? Why are we here? • What can we do now (from our couch) ? torsdag 23. mai 13
  • 3. Who am I... to talk about this? • Ardent interest, B.Sc. in IT Maths (I particularly recommend discrete maths for Big Data!) Computational Linguistics AI stuff Thesis on data mining Unix system logs for surveillance • M.Sc. degree in Artificial Intelligence (AI) Thesis on artificial life: «Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation» Nature is packed with Big Data • Intern at CERN Developing the search engine Indexing (and making sense) of > 6 million documents torsdag 23. mai 13
  • 4. Mini-history • Before ~2000 Save just the stuff that could prove useful. Query/filter/select data to present it. • After ~2000 Just store everything - it’s cheap and we can look into it later. OLAP automates «looking». • Gartner 2012: «Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.» (puh!) Neo: Do you always look at it encoded? Cypher: Well, you have to (...) there's way too much information to decode the Matrix. You get used to it. I — I don't even see the code. All I see is blonde, brunette, redhead...Hey, you want a drink? To much data torsdag 23. mai 13
  • 5. What’s new, then? • Data capacity has doubled every 3-4 years since 1980‘ies! • We used to have a small amount of interestingdata • Now we have tons of boring stuff!! • We must handle so that we «don’t even see the code» torsdag 23. mai 13
  • 6. What’s new, then? • We used algorithms such as apriori and ID3 for log analysis. Fine for 40MB of data per day. • In artificial life, there could easily be this amount of data ... per minute. • Google processed ~24PB of data per day in 2009. • Your 1.4kg brain can interpret this slide instantly. torsdag 23. mai 13
  • 7. This is new • Your braincells solve one little problem each, they tell 10 other cells about the result, and then they tell 10 others ... you get it (fast!) • Google distributes their computing ...somewhat like your brain. • They called it MapReduce. Node 1 Node 1 Node 1 Node 1 Node 1 Node n Map Reduce torsdag 23. mai 13
  • 8. You have it at home • Free MapReduce-a-likes (Hadoop) are cheap in the cloud. • MySQL is probably not a good choice for BigData analysis. • There are free NoSQL-databases (Cassandra, Berkeley DB, MongoDB,++) available. • Lots of data is freely available to play with. Analyze in the cloud. • «The Matrix is everywhere. It is all around us. Even now, in this very room.» torsdag 23. mai 13
  • 9. That’s it • Data is growing. • More information, but harder to find among all the garbage. • Free software exists. You can make sense of your data too. • Unleash hidden knowledge and work smarter! torsdag 23. mai 13