SlideShare a Scribd company logo
Marko Grobelnik
marko.grobelnik@ijs.si
Jozef Stefan Institute
Ljubljana, Slovenia
Stavanger, May 8th 2012
 Introduction
◦ What is Big data?
◦ Why Big-Data?
◦ When Big-Data is really a problem?
 Techniques
 Tools
 Applications
 Literature
 ‘Big-data’ is similar to ‘Small-data’, but
bigger
 …but having data bigger consequently
requires different approaches:
◦ techniques, tools & architectures
 …to solve:
◦ New problems…
◦ …and old problems in a better way.
From “Understanding Big Data” by IBM
Big-Data
 Key enablers for the growth of “Big Data” are:
◦ Increase of storage capacities
◦ Increase of processing power
◦ Availability of data
 NoSQL
◦ DatabasesMongoDB, CouchDB, Cassandra, Redis, BigTable,
Hbase, Hypertable, Voldemort, Riak, ZooKeeper
 MapReduce
◦ Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine,
S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie,
Greenplum
 Storage
◦ S3, Hadoop Distributed File System
 Servers
◦ EC2, Google App Engine, Elastic, Beanstalk, Heroku
 Processing
◦ R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene,
ElasticSearch, Datameer, BigSheets, Tinkerpop
 …when the operations on data are complex:
◦ …e.g. simple counting is not a complex problem
◦ Modeling and reasoning with data of different kinds
can get extremely complex
 Good news about big-data:
◦ Often, because of vast amount of data, modeling
techniques can get simpler (e.g. smart counting can
replace complex model based analytics)…
◦ …as long as we deal with the scale
 Research areas (such
as IR, KDD, ML, NLP,
SemWeb, …) are sub-
cubes within the data
cube
Scalability
Dynamicity
Context
Quality
Usage
 Good recommendations
can make a big
difference when keeping
a user on a web site
◦ …the key is how rich
context model a system is
using to select information
for a user
◦ Bad recommendations <1%
users, good ones >5% users
click
Contextual
personalized
recommendations
generated in ~20ms
 Domain
 Sub-domain
 Page URL
 URL sub-directories
 Page Meta Tags
 Page Title
 Page Content
 Named Entities
 Has Query
 Referrer Query
 Referring Domain
 Referring URL
 Outgoing URL
 GeoIP Country
 GeoIP State
 GeoIP City
 Absolute Date
 Day of the Week
 Day period
 Hour of the day
 User Agent
 Zip Code
 State
 Income
 Age
 Gender
 Country
 Job Title
 Job Industry
Log Files
(~100M
page clicks
per day)
User
profiles
NYT
articles
Stream of
profiles
Advertisers
Segment Keywords
Stock
Market
Stock Market, mortgage, banking,
investors, Wall Street, turmoil, New
York Stock Exchange
Health diabetes, heart disease, disease, heart,
illness
Green
Energy
Hybrid cars, energy, power, model,
carbonated, fuel, bulbs,
Hybrid cars Hybrid cars, vehicles, model, engines,
diesel
Travel travel, wine, opening, tickets, hotel,
sites, cars, search, restaurant
… …
Segments
Trend Detection System
Stream
of clicks
Trends and
updated segments
Campaign
to sell
segments
$
Sales
 50Gb of uncompressed log files
 10Gb of compressed log files
 0.5Gb of processed log files
 50-100M clicks
 4-6M unique users
 7000 unique pages with more then 100 hits
 Index size 2Gb
 Pre-processing & indexing time
◦ ~10min on workstation (4 cores & 32Gb)
◦ ~1hour on EC2 (2 cores & 16Gb)
 Alarms Explorer Server implements three
real-time scenarios on the alarms stream:
1. Root-Cause-Analysis – finding which device is
responsible for occasional “flood” of alarms
2. Short-Term Fault Prediction – predict which
device will fail in next 15mins
3. Long-Term Anomaly Detection – detect
unusual trends in the network
 …system is used in British Telecom
Alarms Server
Alarms
Explorer
Server
Live feed of data
Operator Big board display
Telecom
Network
(~25 000
devices)
Alarms
~10-100/sec
 Presented in “Planetary-Scale Views on a
Large Instant-Messaging Network” by Jure
Leskovec and Eric Horvitz WWW2008
 Observe social and communication
phenomena at a planetary scale
 Largest social network analyzed to date
Research questions:
 How does communication change with user
demographics (age, sex, language, country)?
 How does geography affect communication?
 What is the structure of the communication
network?
33
 We collected the data for June 2006
 Log size:
150Gb/day (compressed)
 Total: 1 month of communication data:
4.5Tb of compressed data
 Activity over June 2006 (30 days)
◦ 245 million users logged in
◦ 180 million users engaged in conversations
◦ 17,5 million new accounts activated
◦ More than 30 billion conversations
◦ More than 255 billion exchanged messages
34
35
36
 Count the number of users logging in from
particular location on the earth
37
 Logins from Europe
38
 6 degrees of separation [Milgram ’60s]
 Average distance between two random users is 6.6
 90% of nodes can be reached in < 8 hops
Hops Nodes
1 10
2 78
3 396
4 8648
5 3299252
6 28395849
7 79059497
8 52995778
9 10321008
10 1955007
11 518410
12 149945
13 44616
14 13740
15 4476
16 1542
17 536
18 167
19 71
20 29
21 16
22 10
23 3
24 2
25 3
Big data tutorial_part4
Big data tutorial_part4

More Related Content

What's hot

How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Ali Dasdan
 
A Planetary-Scale Blockchain Database for the World Computer
A Planetary-Scale Blockchain Database for the World ComputerA Planetary-Scale Blockchain Database for the World Computer
A Planetary-Scale Blockchain Database for the World Computer
Crowdsourcing Week
 
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
BigchainDB
 
The FAIR principle in the Big Data World
The FAIR principle in the Big Data WorldThe FAIR principle in the Big Data World
The FAIR principle in the Big Data World
Johannes Keizer
 
The FAIR Principle in the Big Data World
The FAIR Principle in the Big Data World The FAIR Principle in the Big Data World
The FAIR Principle in the Big Data World
GODAN Secretariat
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach
3 Round Stones
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
Elena Simperl
 
Big data - An Introduction
Big data - An IntroductionBig data - An Introduction
Big data - An Introduction
Spotle.ai
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
Tony Dobaj
 
Columbia citi economics of net 060515 final
Columbia citi economics of net 060515 finalColumbia citi economics of net 060515 final
Columbia citi economics of net 060515 final
Economic Strategy Institute
 
Technological trends by louise thomasen, track 6 leadership and organisation,...
Technological trends by louise thomasen, track 6 leadership and organisation,...Technological trends by louise thomasen, track 6 leadership and organisation,...
Technological trends by louise thomasen, track 6 leadership and organisation,...
Louise Thomasen
 
Rrw a robust and reversible watermarking technique for relational
Rrw   a robust and reversible watermarking technique for relationalRrw   a robust and reversible watermarking technique for relational
Rrw a robust and reversible watermarking technique for relational
Shakas Technologies
 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social Participation
Yasodara Cordova
 
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
National Information Standards Organization (NISO)
 
Web search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introductionWeb search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introduction
Ali Dasdan
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
Elena Simperl
 
Project overview big data europe
Project overview big data europeProject overview big data europe
Project overview big data europe
Sören Auer
 

What's hot (17)

How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
A Planetary-Scale Blockchain Database for the World Computer
A Planetary-Scale Blockchain Database for the World ComputerA Planetary-Scale Blockchain Database for the World Computer
A Planetary-Scale Blockchain Database for the World Computer
 
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
Blockchains and Governance: Interplanetary Database - BigchainDB & IPDB Meetu...
 
The FAIR principle in the Big Data World
The FAIR principle in the Big Data WorldThe FAIR principle in the Big Data World
The FAIR principle in the Big Data World
 
The FAIR Principle in the Big Data World
The FAIR Principle in the Big Data World The FAIR Principle in the Big Data World
The FAIR Principle in the Big Data World
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach
 
Pie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on TwitterPie chart or pizza: identifying chart types and their virality on Twitter
Pie chart or pizza: identifying chart types and their virality on Twitter
 
Big data - An Introduction
Big data - An IntroductionBig data - An Introduction
Big data - An Introduction
 
The technical case for a semantic web
The technical case for a semantic webThe technical case for a semantic web
The technical case for a semantic web
 
Columbia citi economics of net 060515 final
Columbia citi economics of net 060515 finalColumbia citi economics of net 060515 final
Columbia citi economics of net 060515 final
 
Technological trends by louise thomasen, track 6 leadership and organisation,...
Technological trends by louise thomasen, track 6 leadership and organisation,...Technological trends by louise thomasen, track 6 leadership and organisation,...
Technological trends by louise thomasen, track 6 leadership and organisation,...
 
Rrw a robust and reversible watermarking technique for relational
Rrw   a robust and reversible watermarking technique for relationalRrw   a robust and reversible watermarking technique for relational
Rrw a robust and reversible watermarking technique for relational
 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social Participation
 
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
Grant: The Impact of Cloud, Mobile, and Managing the Changing Platforms of Di...
 
Web search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introductionWeb search-metrics-tutorial-www2010-section-1of7-introduction
Web search-metrics-tutorial-www2010-section-1of7-introduction
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Project overview big data europe
Project overview big data europeProject overview big data europe
Project overview big data europe
 

Viewers also liked

iPhone first App Store submission
iPhone  first App Store submissioniPhone  first App Store submission
iPhone first App Store submission
Pragati Singh
 
Android Basic- CMC
Android Basic- CMCAndroid Basic- CMC
Android Basic- CMC
Pragati Singh
 
Iphone lecture imp
Iphone lecture  impIphone lecture  imp
Iphone lecture imp
Pragati Singh
 
Xcode4 userguide Apple
Xcode4 userguide AppleXcode4 userguide Apple
Xcode4 userguide Apple
Pragati Singh
 
Osx workflow guide (1)
Osx workflow guide (1)Osx workflow guide (1)
Osx workflow guide (1)
Pragati Singh
 
Iphone development
Iphone developmentIphone development
Iphone development
Pragati Singh
 
Xml color code for android
Xml color code for androidXml color code for android
Xml color code for android
Pragati Singh
 
Are you worried about how much career success you can gain
Are you worried about how much career success you can gainAre you worried about how much career success you can gain
Are you worried about how much career success you can gain
daniellsmith
 
I phone arc
I phone arcI phone arc
I phone arc
Pragati Singh
 
I phone apps developments interview
I phone apps developments interviewI phone apps developments interview
I phone apps developments interview
Pragati Singh
 
Foros leticia cruz
Foros leticia cruzForos leticia cruz
Foros leticia cruz
Lety Cruz
 
Trabajofinal nerea fabián martamartínez 4ºa
Trabajofinal nerea fabián martamartínez 4ºaTrabajofinal nerea fabián martamartínez 4ºa
Trabajofinal nerea fabián martamartínez 4ºa
martamartiinez
 
Union europea
Union europeaUnion europea
Union europeasacaideas
 
Salud mental 1
Salud mental 1Salud mental 1
Salud mental 1
Lucy Hinojosa Aguirre
 
Comemorando com arte 3
Comemorando com arte 3Comemorando com arte 3
Comemorando com arte 3
Ivania Pereira
 
Presentacion pdf sena
Presentacion pdf senaPresentacion pdf sena
Presentacion pdf sena
andresbenitez29
 
Rosanny delgado
Rosanny delgadoRosanny delgado
Rosanny delgado
rosannyandre
 
9. errores jubilación
9. errores jubilación9. errores jubilación
9. errores jubilación
InstitutoBBVAdePensiones
 
Universidad esan pae daniel vale
Universidad esan pae daniel valeUniversidad esan pae daniel vale
Universidad esan pae daniel vale
daniel453
 
Primera parte para crear un blog
Primera parte para crear un blogPrimera parte para crear un blog
Primera parte para crear un blog
Maria Belmonte Olmo
 

Viewers also liked (20)

iPhone first App Store submission
iPhone  first App Store submissioniPhone  first App Store submission
iPhone first App Store submission
 
Android Basic- CMC
Android Basic- CMCAndroid Basic- CMC
Android Basic- CMC
 
Iphone lecture imp
Iphone lecture  impIphone lecture  imp
Iphone lecture imp
 
Xcode4 userguide Apple
Xcode4 userguide AppleXcode4 userguide Apple
Xcode4 userguide Apple
 
Osx workflow guide (1)
Osx workflow guide (1)Osx workflow guide (1)
Osx workflow guide (1)
 
Iphone development
Iphone developmentIphone development
Iphone development
 
Xml color code for android
Xml color code for androidXml color code for android
Xml color code for android
 
Are you worried about how much career success you can gain
Are you worried about how much career success you can gainAre you worried about how much career success you can gain
Are you worried about how much career success you can gain
 
I phone arc
I phone arcI phone arc
I phone arc
 
I phone apps developments interview
I phone apps developments interviewI phone apps developments interview
I phone apps developments interview
 
Foros leticia cruz
Foros leticia cruzForos leticia cruz
Foros leticia cruz
 
Trabajofinal nerea fabián martamartínez 4ºa
Trabajofinal nerea fabián martamartínez 4ºaTrabajofinal nerea fabián martamartínez 4ºa
Trabajofinal nerea fabián martamartínez 4ºa
 
Union europea
Union europeaUnion europea
Union europea
 
Salud mental 1
Salud mental 1Salud mental 1
Salud mental 1
 
Comemorando com arte 3
Comemorando com arte 3Comemorando com arte 3
Comemorando com arte 3
 
Presentacion pdf sena
Presentacion pdf senaPresentacion pdf sena
Presentacion pdf sena
 
Rosanny delgado
Rosanny delgadoRosanny delgado
Rosanny delgado
 
9. errores jubilación
9. errores jubilación9. errores jubilación
9. errores jubilación
 
Universidad esan pae daniel vale
Universidad esan pae daniel valeUniversidad esan pae daniel vale
Universidad esan pae daniel vale
 
Primera parte para crear un blog
Primera parte para crear un blogPrimera parte para crear un blog
Primera parte para crear un blog
 

Similar to Big data tutorial_part4

Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
heyramzz
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
eswcsummerschool
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
European Data Forum
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Steven Ramage
 
Big Data World
Big Data WorldBig Data World
Big Data World
Hossein Zahed
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Marko Grobelnik
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
Leanne Hwee
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
Sanoj Kumar
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
eGov Innovation Center
 
Big data
Big dataBig data
Big data
raghav125
 
ai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.pptai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.ppt
ALAMGIRHOSSAIN256982
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
Vikas Thada
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
STS FORUM 2016
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
IIIT Allahabad
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
caniceconsulting
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
Eduardo Castro
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
Kathirvel Ayyaswamy
 
Big data
Big dataBig data

Similar to Big data tutorial_part4 (20)

Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Big data
Big dataBig data
Big data
 
ai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.pptai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.ppt
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big data
Big dataBig data
Big data
 

More from Pragati Singh

Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced! Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced!
Pragati Singh
 
Tenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdfTenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdf
Pragati Singh
 
Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)
Pragati Singh
 
Pragati Singh | Sap Badge
Pragati Singh | Sap BadgePragati Singh | Sap Badge
Pragati Singh | Sap Badge
Pragati Singh
 
Ios record of achievement
Ios  record of achievementIos  record of achievement
Ios record of achievement
Pragati Singh
 
Ios2 confirmation ofparticipation
Ios2 confirmation ofparticipationIos2 confirmation ofparticipation
Ios2 confirmation ofparticipation
Pragati Singh
 
Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016
Pragati Singh
 
Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...
Pragati Singh
 
Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...
Pragati Singh
 
Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...
Pragati Singh
 
Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...
Pragati Singh
 
Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...
Pragati Singh
 
Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...
Pragati Singh
 
Certificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the userCertificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the user
Pragati Singh
 
Certificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments apiCertificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments api
Pragati Singh
 
Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...
Pragati Singh
 
Certificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for androidCertificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for android
Pragati Singh
 
Certificate of completion android development concurrent programming
Certificate of completion android development concurrent programmingCertificate of completion android development concurrent programming
Certificate of completion android development concurrent programming
Pragati Singh
 
Certificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence librariesCertificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence libraries
Pragati Singh
 
Certificate of completion android app development restful web services
Certificate of completion android app development restful web servicesCertificate of completion android app development restful web services
Certificate of completion android app development restful web services
Pragati Singh
 

More from Pragati Singh (20)

Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced! Nessus Scanner: Network Scanning from Beginner to Advanced!
Nessus Scanner: Network Scanning from Beginner to Advanced!
 
Tenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdfTenable Certified Sales Associate - CS.pdf
Tenable Certified Sales Associate - CS.pdf
 
Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)Analyzing risk (pmbok® guide sixth edition)
Analyzing risk (pmbok® guide sixth edition)
 
Pragati Singh | Sap Badge
Pragati Singh | Sap BadgePragati Singh | Sap Badge
Pragati Singh | Sap Badge
 
Ios record of achievement
Ios  record of achievementIos  record of achievement
Ios record of achievement
 
Ios2 confirmation ofparticipation
Ios2 confirmation ofparticipationIos2 confirmation ofparticipation
Ios2 confirmation ofparticipation
 
Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016Certificate of completion android studio essential training 2016
Certificate of completion android studio essential training 2016
 
Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...Certificate of completion android development essential training create your ...
Certificate of completion android development essential training create your ...
 
Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...Certificate of completion android development essential training design a use...
Certificate of completion android development essential training design a use...
 
Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...Certificate of completion android development essential training support mult...
Certificate of completion android development essential training support mult...
 
Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...Certificate of completion android development essential training manage navig...
Certificate of completion android development essential training manage navig...
 
Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...Certificate of completion android development essential training local data s...
Certificate of completion android development essential training local data s...
 
Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...Certificate of completion android development essential training distributing...
Certificate of completion android development essential training distributing...
 
Certificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the userCertificate of completion android app development communicating with the user
Certificate of completion android app development communicating with the user
 
Certificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments apiCertificate of completion building flexible android apps with the fragments api
Certificate of completion building flexible android apps with the fragments api
 
Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...Certificate of completion android app development design patterns for mobile ...
Certificate of completion android app development design patterns for mobile ...
 
Certificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for androidCertificate of completion java design patterns and apis for android
Certificate of completion java design patterns and apis for android
 
Certificate of completion android development concurrent programming
Certificate of completion android development concurrent programmingCertificate of completion android development concurrent programming
Certificate of completion android development concurrent programming
 
Certificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence librariesCertificate of completion android app development data persistence libraries
Certificate of completion android app development data persistence libraries
 
Certificate of completion android app development restful web services
Certificate of completion android app development restful web servicesCertificate of completion android app development restful web services
Certificate of completion android app development restful web services
 

Recently uploaded

Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 

Big data tutorial_part4

  • 1. Marko Grobelnik marko.grobelnik@ijs.si Jozef Stefan Institute Ljubljana, Slovenia Stavanger, May 8th 2012
  • 2.  Introduction ◦ What is Big data? ◦ Why Big-Data? ◦ When Big-Data is really a problem?  Techniques  Tools  Applications  Literature
  • 3.
  • 4.
  • 5.  ‘Big-data’ is similar to ‘Small-data’, but bigger  …but having data bigger consequently requires different approaches: ◦ techniques, tools & architectures  …to solve: ◦ New problems… ◦ …and old problems in a better way.
  • 6. From “Understanding Big Data” by IBM
  • 7.
  • 9.  Key enablers for the growth of “Big Data” are: ◦ Increase of storage capacities ◦ Increase of processing power ◦ Availability of data
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.  NoSQL ◦ DatabasesMongoDB, CouchDB, Cassandra, Redis, BigTable, Hbase, Hypertable, Voldemort, Riak, ZooKeeper  MapReduce ◦ Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine, S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie, Greenplum  Storage ◦ S3, Hadoop Distributed File System  Servers ◦ EC2, Google App Engine, Elastic, Beanstalk, Heroku  Processing ◦ R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop
  • 21.
  • 22.  …when the operations on data are complex: ◦ …e.g. simple counting is not a complex problem ◦ Modeling and reasoning with data of different kinds can get extremely complex  Good news about big-data: ◦ Often, because of vast amount of data, modeling techniques can get simpler (e.g. smart counting can replace complex model based analytics)… ◦ …as long as we deal with the scale
  • 23.  Research areas (such as IR, KDD, ML, NLP, SemWeb, …) are sub- cubes within the data cube Scalability Dynamicity Context Quality Usage
  • 24.
  • 25.
  • 26.  Good recommendations can make a big difference when keeping a user on a web site ◦ …the key is how rich context model a system is using to select information for a user ◦ Bad recommendations <1% users, good ones >5% users click Contextual personalized recommendations generated in ~20ms
  • 27.  Domain  Sub-domain  Page URL  URL sub-directories  Page Meta Tags  Page Title  Page Content  Named Entities  Has Query  Referrer Query  Referring Domain  Referring URL  Outgoing URL  GeoIP Country  GeoIP State  GeoIP City  Absolute Date  Day of the Week  Day period  Hour of the day  User Agent  Zip Code  State  Income  Age  Gender  Country  Job Title  Job Industry
  • 28. Log Files (~100M page clicks per day) User profiles NYT articles Stream of profiles Advertisers Segment Keywords Stock Market Stock Market, mortgage, banking, investors, Wall Street, turmoil, New York Stock Exchange Health diabetes, heart disease, disease, heart, illness Green Energy Hybrid cars, energy, power, model, carbonated, fuel, bulbs, Hybrid cars Hybrid cars, vehicles, model, engines, diesel Travel travel, wine, opening, tickets, hotel, sites, cars, search, restaurant … … Segments Trend Detection System Stream of clicks Trends and updated segments Campaign to sell segments $ Sales
  • 29.  50Gb of uncompressed log files  10Gb of compressed log files  0.5Gb of processed log files  50-100M clicks  4-6M unique users  7000 unique pages with more then 100 hits  Index size 2Gb  Pre-processing & indexing time ◦ ~10min on workstation (4 cores & 32Gb) ◦ ~1hour on EC2 (2 cores & 16Gb)
  • 30.
  • 31.  Alarms Explorer Server implements three real-time scenarios on the alarms stream: 1. Root-Cause-Analysis – finding which device is responsible for occasional “flood” of alarms 2. Short-Term Fault Prediction – predict which device will fail in next 15mins 3. Long-Term Anomaly Detection – detect unusual trends in the network  …system is used in British Telecom Alarms Server Alarms Explorer Server Live feed of data Operator Big board display Telecom Network (~25 000 devices) Alarms ~10-100/sec
  • 32.  Presented in “Planetary-Scale Views on a Large Instant-Messaging Network” by Jure Leskovec and Eric Horvitz WWW2008
  • 33.  Observe social and communication phenomena at a planetary scale  Largest social network analyzed to date Research questions:  How does communication change with user demographics (age, sex, language, country)?  How does geography affect communication?  What is the structure of the communication network? 33
  • 34.  We collected the data for June 2006  Log size: 150Gb/day (compressed)  Total: 1 month of communication data: 4.5Tb of compressed data  Activity over June 2006 (30 days) ◦ 245 million users logged in ◦ 180 million users engaged in conversations ◦ 17,5 million new accounts activated ◦ More than 30 billion conversations ◦ More than 255 billion exchanged messages 34
  • 35. 35
  • 36. 36
  • 37.  Count the number of users logging in from particular location on the earth 37
  • 38.  Logins from Europe 38
  • 39.  6 degrees of separation [Milgram ’60s]  Average distance between two random users is 6.6  90% of nodes can be reached in < 8 hops Hops Nodes 1 10 2 78 3 396 4 8648 5 3299252 6 28395849 7 79059497 8 52995778 9 10321008 10 1955007 11 518410 12 149945 13 44616 14 13740 15 4476 16 1542 17 536 18 167 19 71 20 29 21 16 22 10 23 3 24 2 25 3