Big data 2013-05-23

•

0 likes•523 views

1) Big data has grown exponentially in recent decades, from megabytes to petabytes, requiring new techniques for processing and analyzing large, diverse datasets. 2) Distributed computing frameworks like MapReduce allow massive datasets to be processed in parallel across many servers, similarly to how the human brain solves problems. 3) Free and open source big data tools now exist, like Hadoop and NoSQL databases, allowing individuals to leverage large public datasets and gain insights from their own data using cloud computing resources.

Technology

Big Data
Øyvin Halfan Thuv
CTO Whitefox AS
e: oyvin@whitefox.no
t: @oyvinht
torsdag 23. mai 13

Abstract
• Short wrap up of Big Data history
• But, what’s new? Why are we here?
• What can we do now (from our couch) ?
torsdag 23. mai 13

Who am I... to talk about this?
• Ardent interest, B.Sc. in IT
Maths (I particularly recommend discrete maths for Big Data!)
Computational Linguistics
AI stuﬀ
Thesis on data mining Unix system logs for surveillance
• M.Sc. degree in Artiﬁcial Intelligence (AI)
Thesis on artiﬁcial life:
«Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation»
Nature is packed with Big Data
• Intern at CERN
Developing the search engine
Indexing (and making sense) of > 6 million documents
torsdag 23. mai 13

Mini-history
• Before ~2000
Save just the stuﬀ that could prove useful. Query/ﬁlter/select data to present it.
• After ~2000
Just store everything - it’s cheap and we can look into it later.
OLAP automates «looking».
• Gartner 2012:
«Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process optimization.»
(puh!)
Neo: Do you always look at it encoded?
Cypher: Well, you have to (...) there's way too
much information to decode the Matrix. You get
used to it. I — I don't even see the code. All I see is
blonde, brunette, redhead...Hey, you want a drink?
To much data
torsdag 23. mai 13

What’s new, then?
• Data capacity has doubled every 3-4 years since 1980‘ies!
• We used to have a small amount of interestingdata
• Now we have tons of boring stuﬀ!!
• We must handle so that we
«don’t even see the code»
torsdag 23. mai 13

What’s new, then?
• We used algorithms such as apriori and ID3 for log analysis.
Fine for 40MB of data per day.
• In artiﬁcial life, there could easily be this amount of data
... per minute.
• Google processed ~24PB of data per day in 2009.
• Your 1.4kg brain can interpret this slide instantly.
torsdag 23. mai 13

This is new
• Your braincells solve one little problem
each, they tell 10 other cells about the
result, and then they tell 10 others ... you
get it (fast!)
• Google distributes their computing
...somewhat like your brain.
• They called it MapReduce.
Node 1
Node 1
Node 1
Node 1
Node 1
Node n
Map Reduce
torsdag 23. mai 13

You have it at home
• Free MapReduce-a-likes (Hadoop) are cheap in the cloud.
• MySQL is probably not a good choice for BigData analysis.
• There are free NoSQL-databases (Cassandra, Berkeley
DB, MongoDB,++) available.
• Lots of data is freely available to play with. Analyze in the
cloud.
• «The Matrix is everywhere. It is all around us. Even now, in this very room.»
torsdag 23. mai 13

That’s it
• Data is growing.
• More information, but harder to ﬁnd among all
the garbage.
• Free software exists. You can make sense of your
data too.
• Unleash hidden knowledge and work smarter!
torsdag 23. mai 13

Similar to Big data 2013-05-23

Comp wk 1 - introductionguest85dacdf

Computing - Week 1 - IntroductionJamie Hutt

Comp Wk 1 Introductionguest85dacdf

Week 1 - An IntroductionJamie Hutt

Embedded Systems PPt.pptxTabrezahmed39

Machine Learning Overview: How did we get here ?Universitat Politècnica de Catalunya

2014 pycon-talkc.titus.brown

Artificial IntelligenceAbhijit Manohar

BLUE BRAIN(J.S.R)Sreenivasulu Reddy.J

Blue brainyashraj_1216

14 turing wicsashish61_scs

Real-time Analytics with Cassandra, Spark, and SharkEvan Chan

Storage for next-generation sequencingGuy Coates

Blue brain by rashmi gowriRashmi Gowri

BLUEBRAIN(J.S.R)Sreenivasulu Reddy.J

Bbppt584.Sreenivasulu Reddy.J

Blue BrainSwethaS435702

Meetup8 29 2013Spencer Aiello

Big Data Analytics Strategy and RoadmapSrinath Perera

Blue Brain - The Magic of ManPranith Chander

Similar to Big data 2013-05-23 (20)

Comp wk 1 - introduction

Computing - Week 1 - Introduction

Comp Wk 1 Introduction

Week 1 - An Introduction

Embedded Systems PPt.pptx

Machine Learning Overview: How did we get here ?

2014 pycon-talk

Artificial Intelligence

BLUE BRAIN(J.S.R)

Blue brain

14 turing wics

Real-time Analytics with Cassandra, Spark, and Shark

Storage for next-generation sequencing

Blue brain by rashmi gowri

BLUEBRAIN(J.S.R)

Bbppt584.

Blue Brain

Meetup8 29 2013

Big Data Analytics Strategy and Roadmap

Blue Brain - The Magic of Man

Recently uploaded

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Tech Trends Report 2024 Future Today Institute.pdfhans926745

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

A Year of the Servo Reboot: Where Are We Now?Igalia

A Domino Admins Adventures (Engage 2024)Gabriella Davis

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

GenCyber Cyber Security Day PresentationMichael W. Hawkins

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher

Tech Trends Report 2024 Future Today Institute.pdf

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

A Year of the Servo Reboot: Where Are We Now?

A Domino Admins Adventures (Engage 2024)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Scaling API-first – The story of a global engineering organization

Handwritten Text Recognition for manuscripts and early printed texts

[2024]Digital Global Overview Report 2024 Meltwater.pdf

How to Troubleshoot Apps for the Modern Connected Worker

Advantages of Hiring UIUX Design Service Providers for Your Business

2024: Domino Containers - The Next Step. News from the Domino Container commu...

GenCyber Cyber Security Day Presentation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Automating Google Workspace (GWS) & more with Apps Script

Boost PC performance: How more available memory can improve productivity

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Driving Behavioral Change for Information Management through Data-Driven Gree...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Big data 2013-05-23

1. Big Data Øyvin Halfan Thuv CTO Whitefox AS e: oyvin@whitefox.no t: @oyvinht torsdag 23. mai 13

2. Abstract • Short wrap up of Big Data history • But, what’s new? Why are we here? • What can we do now (from our couch) ? torsdag 23. mai 13

3. Who am I... to talk about this? • Ardent interest, B.Sc. in IT Maths (I particularly recommend discrete maths for Big Data!) Computational Linguistics AI stuff Thesis on data mining Unix system logs for surveillance • M.Sc. degree in Artificial Intelligence (AI) Thesis on artificial life: «Incrementally Evolving a Dynamic Neural Network for Tactile-olfactory Insect Navigation» Nature is packed with Big Data • Intern at CERN Developing the search engine Indexing (and making sense) of > 6 million documents torsdag 23. mai 13

4. Mini-history • Before ~2000 Save just the stuﬀ that could prove useful. Query/ﬁlter/select data to present it. • After ~2000 Just store everything - it’s cheap and we can look into it later. OLAP automates «looking». • Gartner 2012: «Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.» (puh!) Neo: Do you always look at it encoded? Cypher: Well, you have to (...) there's way too much information to decode the Matrix. You get used to it. I — I don't even see the code. All I see is blonde, brunette, redhead...Hey, you want a drink? To much data torsdag 23. mai 13

5. What’s new, then? • Data capacity has doubled every 3-4 years since 1980‘ies! • We used to have a small amount of interestingdata • Now we have tons of boring stuﬀ!! • We must handle so that we «don’t even see the code» torsdag 23. mai 13

6. What’s new, then? • We used algorithms such as apriori and ID3 for log analysis. Fine for 40MB of data per day. • In artiﬁcial life, there could easily be this amount of data ... per minute. • Google processed ~24PB of data per day in 2009. • Your 1.4kg brain can interpret this slide instantly. torsdag 23. mai 13

7. This is new • Your braincells solve one little problem each, they tell 10 other cells about the result, and then they tell 10 others ... you get it (fast!) • Google distributes their computing ...somewhat like your brain. • They called it MapReduce. Node 1 Node 1 Node 1 Node 1 Node 1 Node n Map Reduce torsdag 23. mai 13

8. You have it at home • Free MapReduce-a-likes (Hadoop) are cheap in the cloud. • MySQL is probably not a good choice for BigData analysis. • There are free NoSQL-databases (Cassandra, Berkeley DB, MongoDB,++) available. • Lots of data is freely available to play with. Analyze in the cloud. • «The Matrix is everywhere. It is all around us. Even now, in this very room.» torsdag 23. mai 13

9. That’s it • Data is growing. • More information, but harder to ﬁnd among all the garbage. • Free software exists. You can make sense of your data too. • Unleash hidden knowledge and work smarter! torsdag 23. mai 13

Big data 2013-05-23

Recommended

Recommended

More Related Content

Similar to Big data 2013-05-23

Similar to Big data 2013-05-23 (20)

Recently uploaded

Recently uploaded (20)

Big data 2013-05-23