SlideShare a Scribd company logo
1 of 43
Download to read offline
Big Data: Opportunities and Challenges 
Raja Chiky – raja.chiky@isep.fr
OUTLINE 
¡ About me 
¡ What is Big Data? 
¡ Evolution of Business Intelligence 
¡ Big Data Opportunities 
¡ Big Data challenges 
¡ Conclusion 
3 
24/10/2014
About me 
¡ Associate professor in Computer Science – LISITE-RDI 
¡ Research interest: Data stream mining, scalability and resource optimization in distributed architectures 
(e.g cloud architectures), recommender systems 
¡ Research field: Large scale data management 
4. Optimizing resources in large scale systems 
1. Real-time and 
distributed 
processing of 
various data 
sources 
2. Use semantic 
technologies to 
add a semantic 
layer 
3. Recommender 
systems and 
collaborative data 
mining 
Heterogeneous 
and 
sta1c 
data 
Heterogeneous 
and 
dynamic 
data 
streams 
sensors 
5. Modeling and validation of complex systems 
4 
24/10/2014
What is Big data? 
5 
24/10/2014
6 Big Data: Buzzword! 
24/10/2014
New era 
7 
24/10/2014
8 
24/10/2014 
Where is all this data coming 
from? 
24/10/2014
9 
24/10/2014 
More and More connected 
Things
10 So, what is Big Data? 
§ Wikipedia 
§ GPS 
data 
§ RFID 
§ POS 
Scanners 
§ … 
24/10/2014 
Dawn 
of 
(me 
Volume 
of 
data 
created 
Worldwide 
2003 
2012 
5 
EB 
… 
2.7 
ZB 
2015 
10 
ZB 
(E) 
§ 1 
YB 
= 
10^24 
Bytes 
§ 1 
ZB 
= 
10^21 
Bytes 
§ 1 
EB 
= 
10^18 
Bytes 
§ 1 
PB 
= 
10^15 
Bytes 
§ 1TB 
= 
10^12 
Bytes 
§ 1 
GB 
= 
10^9 
Bytes 
Variety 
of 
data 
§ Radio 
§ TV 
§ News 
§ E-­‐Mails 
§ Facebook 
Posts 
Velocity 
of 
data 
§ Walmart 
handles 
1M 
transac(ons 
per 
hour 
§ Google 
processes 
24PB 
of 
data 
per 
day 
§ AT&T 
transfers 
30 
PB 
of 
data 
per 
day 
§ 90 
trillion 
emails 
are 
sent 
per 
year 
§ World 
of 
WarcraQ 
uses 
1.3 
PB 
of 
storage 
§ Tweets 
§ Blogs 
§ Photos 
§ Videos 
(user 
and 
paid) 
§ RSS 
feeds 
§ Facebook 
when 
had 
a 
user 
base 
of 
900 
M 
users, 
had 
25 
PB 
of 
compressed 
data 
§ 400M 
tweets 
per 
day 
in 
June 
’12 
§ 72 
hours 
of 
video 
is 
uploaded 
to 
Youtube 
every 
minute 
Big 
Data 
Elements 
Volume 
Variety 
Velocity 
+ Veracity (IBM) - 
information 
uncertainty 
Source: Big Data & Analytics - Why Should We Care?, Vishwa Kolla
11 
octobre 
24, 
2014 
Key factors 
¡ Cheap storage 
¡ Recording everything is not expensive anymore 
¡ Cloud computing 
¡ Cheap, on demand computing resources from 
anywhere in the world and for everyone 
¡ Business reasons 
¡ New insights arise that give competitive 
advantage 
¡ Data in various forms everywhere: IoT and 
IoE, Social Networks, Open Data 
¡ The way we interact with each other and 
with data / information 
¡ … 
24/10/2014
12 Transforming our daily lives 
24/10/2014 
Then Now 
One size fits all Personalization & Targeted 
Selling 
Source: Big Data Trends by David Feinleib
13 Fitness 
24/10/2014 
Then Now 
Manual tracking Focus on the goal 
Source: Big Data Trends by David Feinleib
14 Customer service 
24/10/2014 
Then Now 
Reactive Customer Service Pro-active Customer Service 
Source: Big Data Trends by David Feinleib
15 
24/10/2014 
Customer service: 360-degree 
view of the customer 
Why? 
What? 
Who? 
When/ How? 
Where? 
Opera1onal 
data 
Behavioral 
data 
Descrip1ve 
data 
Interac1on 
Contextual 
data 
data
17 Big Data opportunities 
24/10/2014 
Source: Source: Big Data opportunities survey, Unisphere / SAP, May 2013.
Opportunities: big data use cases 
360° 
view 
of 
the 
customer 
• Integra1on 
of 
data 
from 
social 
networks, 
CRM, 
transac1onal 
data, 
etc. 
• Example: 
T-­‐Mobile, 
telecom 
operator 
-­‐ 
> 
Reduc1on 
of 
the 
customer 
leave 
of 
50% 
in 
a 
quarter 
E-­‐reputa?on 
19 
• Sen1ment 
analysis, 
proac1ve 
monitoring 
of 
social 
networks 
• Example: 
Nestlé, 
food 
group-­‐> 
Gain 
of 
4 
places 
in 
the 
Reputa1on 
Ins1tute’s 
Index 
due 
to 
an 
interac1on 
24/7 
Op?misa?on 
• Predic1ve 
analysis 
for 
anomalies 
detec1on, 
processes 
op1miza1on 
using 
sensors 
and 
opera1onal 
data 
• Example: 
Union 
Pacific 
Railroad, 
reduce 
train 
derailments, 
increase 
train 
shipment, 
carbon 
emission 
reduc1on 
Public 
security 
• Monitoring 
social 
networks, 
integra1on 
of 
spa1al 
data 
and 
sensors 
• Example: 
Serious 
Request 
2012 
-­‐> 
monitoring 
of 
crowd 
movements 
with 
Twi^er 
and 
sensors, 
localiza1on 
of 
public 
force, 
integra1on 
with 
GIS 
24/10/2014
Evolution of Business 
Intelligence 
20 
24/10/2014
21 
Real time 
visual-analytics 
Retro-action 
24/10/2014 
Static Data Semantic Data Stream (Big) Data 
Output 
User 
Interac1on 
Store 
Gathering 
Informa1on 
Data 
sources 
Visual analytics 
Flexible 
queries 
/ 
SPARQL 
Triple Sore 
Seman1c 
ETL/Batch 
processing 
Structured/unstructured 
data 
Static report 
Ad-­‐hoc 
queries 
Analy1cs 
C 
Data Warehouse 
ETL/Batch 
processing 
databases 
C 
Real-time analytics 
Databases/ 
Triplestores 
Knowledg 
e 
enrichmen 
t 
Continuous 
queries/ 
Business rules 
Semantic 
ETL 
stream 
processing 
Load shedding 
sensors 
Data streamSst atic data
22 
Real time 
visual-analytics 
Retro-action 
24/10/2014 
Static Data Semantic Data Stream (Big) Data 
Output 
User 
Interac1on 
Store 
Gathering 
Informa1on 
Data 
sources 
Real-time analytics 
Databases/ 
Triplestores 
Knowledg 
e 
enrichmen 
t 
Continuous 
queries/ 
Business rules 
Semantic 
ETL 
stream 
processing 
Load shedding 
sensors 
Data streamSst atic data 
Visual analytics 
Flexible 
queries / 
SPARQL 
C 
Triple Sore 
Semantic 
ETL/Batch 
processin 
g 
Structured/unstructured 
data 
Static report 
Ad-hoc 
queries 
Analytics 
C 
Data 
Warehouse 
ETL/Batch 
processin 
g 
databases
23 
Real time 
visual-analytics 
Retro-action 
24/10/2014 
Static Data Semantic Data Stream (Big) Data 
Output 
User 
Interaction 
Store 
Gathering 
Information 
Data 
sources 
Visual analytics 
Flexible 
queries / 
SPARQL 
C 
Triple Sore 
Semantic 
ETL/Batch 
processin 
g 
Structured/unstructured 
data 
Static report 
Ad-hoc 
queries 
Analytics 
C 
Data 
Warehouse 
ETL/Batch 
processin 
g 
databases 
Real-time analytics 
Databases/ 
Triplestores 
Knowledge 
enrichment 
Continuous 
queries/ 
Business rules 
Semantic 
ETL 
stream 
processing 
Load shedding 
sensors 
Data stream Static data
What are Big Data 
Challenges? 
24 
24/10/2014
Big Data workflow 
1. Capture 
2. Store 
3. Analyze 
4. Visualize 
Challenges arise in all these steps 
25 
24/10/2014
26 Challenges: Data Collection 
¡ Heterogeneity of sources 
¡ Company databases => Silos 
¡ Sensor networks, Intelligent objects 
¡ Data streams: Social Networks, financial information, etc. 
24/10/2014 
¡ Data Velocity 
¡ Data provenance and quality
27 
24/10/2014 
Type of data used in Big Data 
initiatives 
Internal data 
Traditional sources 
« New data » 
Source: Big Data opportunities survey, Unisphere / SAP, May 2013.
28 
24/10/2014 
Challenges: Data Collection 
Velocity 
Website logs 
Network 
monitoring Financial services 
eCommerce Traffic control 
Weather 
forecasting 
Power 
consumption
What is a data stream? 
29 
¡ Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered 
(implicitly by arrival time or explicitly by timestamp) sequence of items. It is 
impossible to control the order in which items arrive, nor is it feasible to locally 
store a stream in its entirety.” 
¡ Massive volumes of data, items arrive at a high rate. 
24/10/2014
30 
24/10/2014 
Data Stream Management 
Systems 
DBMS DSMS 
Data model Permanent updatable relations Streams and permanent updatable 
relations 
Storage Data is stored on disk Permanent relations are stored on disk 
Streams are processed on the fly 
Query SQL language 
Creating structures 
Inserting/updating/deleting data 
Retrieving data (one-time query) 
SQL-like query language 
Standard SQL on permanent relations 
Extended SQL on streams with 
windowing 
Continuous queries 
Performance Large volumes of data Optimization of computer resources to 
deal with 
Several streams 
Several queries 
Ability to face variations in arrival rates 
without crash
Challenges: Data Collection 
Data provenance and quality 
¡ Data provenance: Provenance refers to the information that 
describes data in sufficient detail to facilitate reproduction and 
enable validation of results. 
¡ Data quality: Validity and consistency of the data. Is it up to 
date and fit for the targetted use case ? 
31 
Source: Patrick McDaniel, Kevin Butler, Steve McLaughlin, Radu Sion, Erez Zadok, and Marianne Winslett, Towards a secure 
and ecfficient system for end-to-end provenance, 2010. 
24/10/2014
32 Challenges in data storage 
¡ Large amounts of data 
¡ Need to use a highly distributed architecture 
¡ Massive queries 
¡ Avoid joins since they are very time consuming 
¡ Evolutionary schema 
¡ Flexibility and scalability 
¡ Predictable and low latency 
¡ High availability 
¡ Elasticity : Horizontal extensibility (Scale out) 
¡ No need: Transaction / Strong consistency/ Complex queries 
24/10/2014
Limitation of RDBMS 
“ If the only tool you have is a hammer, you 
tend to see every problem as a nail.” 
Abraham Maslow 
33 
24/10/2014
Limitation of RDBMS 34 
24/10/2014
Not Only 
NO SQL 
Relational 
35 
• No SQL => Not Only SQL 
• SQL must not die but storage solutions should be 
considered for specific applications 
Exact name: Non relational DB 
24/10/2014
CAP theorem (E.Brewer, N. Lynch 
2000) 
consistency 
C 
Claim: every distributed 
system is on one side of 
the triangle. 
CP: always consistent, even in a 
partition, but a reachable replica 
may deny service without 
agreement of the others 
“CAP Theorem”: 
C-A-P: choose two. 
CA: available, and 
consistent, unless there is a 
partition. 
A P 
AP: a reachable replica 
provides service even in a 
partition, but may be 
inconsistent. 
Availability Partition-Tolerance 
36 
24/10/2014
NoSQL Taxonomy 
Key-value 
Data 
Document 
Column 
Graph 
37 
24/10/2014
Challenges in Data Analytics 
¡ Problems in large scale analytics 
¡ Distributed computation efficiency 
¡ Evaluate performance gains from distribution 
¡ Bringing data to the processor 
¡ Efficient parallel algorithms (statistics, summaries) 
¡ Speed analytics 
¡ Streaming computations 
¡ Load balancing 
¡ Load Shedding 
38 
24/10/2014
39 
Challenges in Data Access and 
Visualization 
¡ The main goal of data visualization is to communicate 
information clearly and effectively through graphical means 
¡ Provide results of analytics workflow for faster systems such as 
real-time query interfaces 
24/10/2014 
“Visualization is a form of knowledge compression” 
- David McCandless
40 
Big Data: Technological 
challenges 
¡ Data infrastructure tools and platforms : data centers, cloud 
infrastructures, noSQL databases, in-memory databases, 
Hadoop/Map Reduce Ecosphere 
¡ New generation of front-end tools for BI and analytic systems: 
data visualization and visual analytics, self-service BI, Mobile BI 
24/10/2014 
¡ Data processing : supercomputers, distributed or massively 
parallel-computing
41 
24/10/2014
42 
24/10/2014 
Conclusion: Big Data 
challenges 
¡ Semantic Information aggregation 
¡ Information aggregation: “too much data to assimilate but not 
enough knowledge to act” 
¡ Distributed and real-time processing 
¡ Design of real-time and distributed algorithms for stream processing 
and information aggregation 
¡ Distribution and parallelization of data mining algorithms 
¡ Optimizing resources 
¡ visual analytics and user modeling 
¡ Dynamic user model 
¡ Novel visualizations for very large datasets 
¡ Data protection
43 
24/10/2014 
IEEE Metro Area Smart Tech 
Workshop on Distributed Data 
Streaming Dec 5,2014 Paris 
¡ 08h00: Registration - Breakfast 
08h50: Room L012 - Welcome 
09h00: Room L012 - Introduction to Distributed Data Streaming - Speaker: Raja Chiky (ISEP) 
10h15: Coffee break 
10h45: Room L012 - Real World Issues in Supervised Classification for Data Streams - Speaker: 
Vincent Lemaire (Orange Labs) 
11h30: Room L012 - Use Case 1- Finance - Speaker: Antoine Chambille (Quartet FS) 
12h00: Room L012 - Use Case 2 – Smart metering - Speakers: Marie-Luce Picard (EDF R&D) 
12h30: Lunch offsite 
14h00: Rooms L305-L306 - 2 Parallel labs sessions: Real-Time Data processing with open 
source DSMS - Speakers: Raja Chiky and Sylvain Lefebvre - 1st part 
15:30: Coffee break 
16:00: Rooms L305-L306 - 2 Parallel labs sessions: Real-Time Data processing with open source 
DSMS - Speakers: Raja Chiky and Sylvain Lefebvre - 2nd part 
17h30: Reception onsite
44 
24/10/2014 
Thanks to 
Marie-Aude Aufaure, ECP 
Sylvain lefebvre, ISEP
Big 
Data 
Linked 
Data 
Volume, 
Variety, 
Velocity, 
Veracity, 
… 
Value 
Web 
of 
data, 
Seman(c 
Web 
-­‐ A 
set 
of 
principles 
and 
good 
prac1ces 
allowing 
to 
link, 
publish 
and 
search 
for 
web 
data 
-­‐ Structure 
and 
seman1cally 
enrich 
RDF 
data, 
with 
a 
very 
high 
scalability 
-­‐> 
Big 
Linked 
Data 
Integrate, 
aggregate, 
analyze, 
visualize 
large 
data 
sets, 
whatever 
is 
their 
type, 
provenance, 
speed 
of 
their 
flow 
… 
Big 
Linked 
Data 
Linked 
Big 
Data 
Seman?c 
Technologies 
Living 
Lab 
Linked 
& 
Big 
Data 
Academic 
Chair 
Our 
Value 
proposi?on 
– 
Seman1c 
aggrega1on 
from 
textual 
and 
non 
textual 
streams 
– 
Manage 
seman1c 
heterogeneity, 
real-­‐1me 
and 
distributed 
processing 
– 
Ensure 
data 
quality 
and 
veracity 
– 
Visual 
analy1cs

More Related Content

What's hot

challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
KamleshKumar394
 

What's hot (20)

elgendy2014.pdf
elgendy2014.pdfelgendy2014.pdf
elgendy2014.pdf
 
Web search-metrics-tutorial-www2010-section-7of7-presentation
Web search-metrics-tutorial-www2010-section-7of7-presentationWeb search-metrics-tutorial-www2010-section-7of7-presentation
Web search-metrics-tutorial-www2010-section-7of7-presentation
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
 
Big data storage
Big data storageBig data storage
Big data storage
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data mining
Data miningData mining
Data mining
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data
Big dataBig data
Big data
 
data mining
data miningdata mining
data mining
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 

Viewers also liked

A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
lpaviglianiti
 

Viewers also liked (20)

Big data Analytics opportunities in India
Big data Analytics opportunities in IndiaBig data Analytics opportunities in India
Big data Analytics opportunities in India
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 

Similar to Seminaire bigdata23102014

¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
 

Similar to Seminaire bigdata23102014 (20)

3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Customer 360
Customer 360Customer 360
Customer 360
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
The New Model
The New ModelThe New Model
The New Model
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No MiraclesPeter Elleby - Big Data, Big Noise, Big Hope - No Miracles
Peter Elleby - Big Data, Big Noise, Big Hope - No Miracles
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Recently uploaded (20)

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

Seminaire bigdata23102014

  • 1. Big Data: Opportunities and Challenges Raja Chiky – raja.chiky@isep.fr
  • 2.
  • 3. OUTLINE ¡ About me ¡ What is Big Data? ¡ Evolution of Business Intelligence ¡ Big Data Opportunities ¡ Big Data challenges ¡ Conclusion 3 24/10/2014
  • 4. About me ¡ Associate professor in Computer Science – LISITE-RDI ¡ Research interest: Data stream mining, scalability and resource optimization in distributed architectures (e.g cloud architectures), recommender systems ¡ Research field: Large scale data management 4. Optimizing resources in large scale systems 1. Real-time and distributed processing of various data sources 2. Use semantic technologies to add a semantic layer 3. Recommender systems and collaborative data mining Heterogeneous and sta1c data Heterogeneous and dynamic data streams sensors 5. Modeling and validation of complex systems 4 24/10/2014
  • 5. What is Big data? 5 24/10/2014
  • 6. 6 Big Data: Buzzword! 24/10/2014
  • 7. New era 7 24/10/2014
  • 8. 8 24/10/2014 Where is all this data coming from? 24/10/2014
  • 9. 9 24/10/2014 More and More connected Things
  • 10. 10 So, what is Big Data? § Wikipedia § GPS data § RFID § POS Scanners § … 24/10/2014 Dawn of (me Volume of data created Worldwide 2003 2012 5 EB … 2.7 ZB 2015 10 ZB (E) § 1 YB = 10^24 Bytes § 1 ZB = 10^21 Bytes § 1 EB = 10^18 Bytes § 1 PB = 10^15 Bytes § 1TB = 10^12 Bytes § 1 GB = 10^9 Bytes Variety of data § Radio § TV § News § E-­‐Mails § Facebook Posts Velocity of data § Walmart handles 1M transac(ons per hour § Google processes 24PB of data per day § AT&T transfers 30 PB of data per day § 90 trillion emails are sent per year § World of WarcraQ uses 1.3 PB of storage § Tweets § Blogs § Photos § Videos (user and paid) § RSS feeds § Facebook when had a user base of 900 M users, had 25 PB of compressed data § 400M tweets per day in June ’12 § 72 hours of video is uploaded to Youtube every minute Big Data Elements Volume Variety Velocity + Veracity (IBM) - information uncertainty Source: Big Data & Analytics - Why Should We Care?, Vishwa Kolla
  • 11. 11 octobre 24, 2014 Key factors ¡ Cheap storage ¡ Recording everything is not expensive anymore ¡ Cloud computing ¡ Cheap, on demand computing resources from anywhere in the world and for everyone ¡ Business reasons ¡ New insights arise that give competitive advantage ¡ Data in various forms everywhere: IoT and IoE, Social Networks, Open Data ¡ The way we interact with each other and with data / information ¡ … 24/10/2014
  • 12. 12 Transforming our daily lives 24/10/2014 Then Now One size fits all Personalization & Targeted Selling Source: Big Data Trends by David Feinleib
  • 13. 13 Fitness 24/10/2014 Then Now Manual tracking Focus on the goal Source: Big Data Trends by David Feinleib
  • 14. 14 Customer service 24/10/2014 Then Now Reactive Customer Service Pro-active Customer Service Source: Big Data Trends by David Feinleib
  • 15. 15 24/10/2014 Customer service: 360-degree view of the customer Why? What? Who? When/ How? Where? Opera1onal data Behavioral data Descrip1ve data Interac1on Contextual data data
  • 16. 17 Big Data opportunities 24/10/2014 Source: Source: Big Data opportunities survey, Unisphere / SAP, May 2013.
  • 17. Opportunities: big data use cases 360° view of the customer • Integra1on of data from social networks, CRM, transac1onal data, etc. • Example: T-­‐Mobile, telecom operator -­‐ > Reduc1on of the customer leave of 50% in a quarter E-­‐reputa?on 19 • Sen1ment analysis, proac1ve monitoring of social networks • Example: Nestlé, food group-­‐> Gain of 4 places in the Reputa1on Ins1tute’s Index due to an interac1on 24/7 Op?misa?on • Predic1ve analysis for anomalies detec1on, processes op1miza1on using sensors and opera1onal data • Example: Union Pacific Railroad, reduce train derailments, increase train shipment, carbon emission reduc1on Public security • Monitoring social networks, integra1on of spa1al data and sensors • Example: Serious Request 2012 -­‐> monitoring of crowd movements with Twi^er and sensors, localiza1on of public force, integra1on with GIS 24/10/2014
  • 18. Evolution of Business Intelligence 20 24/10/2014
  • 19. 21 Real time visual-analytics Retro-action 24/10/2014 Static Data Semantic Data Stream (Big) Data Output User Interac1on Store Gathering Informa1on Data sources Visual analytics Flexible queries / SPARQL Triple Sore Seman1c ETL/Batch processing Structured/unstructured data Static report Ad-­‐hoc queries Analy1cs C Data Warehouse ETL/Batch processing databases C Real-time analytics Databases/ Triplestores Knowledg e enrichmen t Continuous queries/ Business rules Semantic ETL stream processing Load shedding sensors Data streamSst atic data
  • 20. 22 Real time visual-analytics Retro-action 24/10/2014 Static Data Semantic Data Stream (Big) Data Output User Interac1on Store Gathering Informa1on Data sources Real-time analytics Databases/ Triplestores Knowledg e enrichmen t Continuous queries/ Business rules Semantic ETL stream processing Load shedding sensors Data streamSst atic data Visual analytics Flexible queries / SPARQL C Triple Sore Semantic ETL/Batch processin g Structured/unstructured data Static report Ad-hoc queries Analytics C Data Warehouse ETL/Batch processin g databases
  • 21. 23 Real time visual-analytics Retro-action 24/10/2014 Static Data Semantic Data Stream (Big) Data Output User Interaction Store Gathering Information Data sources Visual analytics Flexible queries / SPARQL C Triple Sore Semantic ETL/Batch processin g Structured/unstructured data Static report Ad-hoc queries Analytics C Data Warehouse ETL/Batch processin g databases Real-time analytics Databases/ Triplestores Knowledge enrichment Continuous queries/ Business rules Semantic ETL stream processing Load shedding sensors Data stream Static data
  • 22. What are Big Data Challenges? 24 24/10/2014
  • 23. Big Data workflow 1. Capture 2. Store 3. Analyze 4. Visualize Challenges arise in all these steps 25 24/10/2014
  • 24. 26 Challenges: Data Collection ¡ Heterogeneity of sources ¡ Company databases => Silos ¡ Sensor networks, Intelligent objects ¡ Data streams: Social Networks, financial information, etc. 24/10/2014 ¡ Data Velocity ¡ Data provenance and quality
  • 25. 27 24/10/2014 Type of data used in Big Data initiatives Internal data Traditional sources « New data » Source: Big Data opportunities survey, Unisphere / SAP, May 2013.
  • 26. 28 24/10/2014 Challenges: Data Collection Velocity Website logs Network monitoring Financial services eCommerce Traffic control Weather forecasting Power consumption
  • 27. What is a data stream? 29 ¡ Golab & Oszu (2003): “A data stream is a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. It is impossible to control the order in which items arrive, nor is it feasible to locally store a stream in its entirety.” ¡ Massive volumes of data, items arrive at a high rate. 24/10/2014
  • 28. 30 24/10/2014 Data Stream Management Systems DBMS DSMS Data model Permanent updatable relations Streams and permanent updatable relations Storage Data is stored on disk Permanent relations are stored on disk Streams are processed on the fly Query SQL language Creating structures Inserting/updating/deleting data Retrieving data (one-time query) SQL-like query language Standard SQL on permanent relations Extended SQL on streams with windowing Continuous queries Performance Large volumes of data Optimization of computer resources to deal with Several streams Several queries Ability to face variations in arrival rates without crash
  • 29. Challenges: Data Collection Data provenance and quality ¡ Data provenance: Provenance refers to the information that describes data in sufficient detail to facilitate reproduction and enable validation of results. ¡ Data quality: Validity and consistency of the data. Is it up to date and fit for the targetted use case ? 31 Source: Patrick McDaniel, Kevin Butler, Steve McLaughlin, Radu Sion, Erez Zadok, and Marianne Winslett, Towards a secure and ecfficient system for end-to-end provenance, 2010. 24/10/2014
  • 30. 32 Challenges in data storage ¡ Large amounts of data ¡ Need to use a highly distributed architecture ¡ Massive queries ¡ Avoid joins since they are very time consuming ¡ Evolutionary schema ¡ Flexibility and scalability ¡ Predictable and low latency ¡ High availability ¡ Elasticity : Horizontal extensibility (Scale out) ¡ No need: Transaction / Strong consistency/ Complex queries 24/10/2014
  • 31. Limitation of RDBMS “ If the only tool you have is a hammer, you tend to see every problem as a nail.” Abraham Maslow 33 24/10/2014
  • 32. Limitation of RDBMS 34 24/10/2014
  • 33. Not Only NO SQL Relational 35 • No SQL => Not Only SQL • SQL must not die but storage solutions should be considered for specific applications Exact name: Non relational DB 24/10/2014
  • 34. CAP theorem (E.Brewer, N. Lynch 2000) consistency C Claim: every distributed system is on one side of the triangle. CP: always consistent, even in a partition, but a reachable replica may deny service without agreement of the others “CAP Theorem”: C-A-P: choose two. CA: available, and consistent, unless there is a partition. A P AP: a reachable replica provides service even in a partition, but may be inconsistent. Availability Partition-Tolerance 36 24/10/2014
  • 35. NoSQL Taxonomy Key-value Data Document Column Graph 37 24/10/2014
  • 36. Challenges in Data Analytics ¡ Problems in large scale analytics ¡ Distributed computation efficiency ¡ Evaluate performance gains from distribution ¡ Bringing data to the processor ¡ Efficient parallel algorithms (statistics, summaries) ¡ Speed analytics ¡ Streaming computations ¡ Load balancing ¡ Load Shedding 38 24/10/2014
  • 37. 39 Challenges in Data Access and Visualization ¡ The main goal of data visualization is to communicate information clearly and effectively through graphical means ¡ Provide results of analytics workflow for faster systems such as real-time query interfaces 24/10/2014 “Visualization is a form of knowledge compression” - David McCandless
  • 38. 40 Big Data: Technological challenges ¡ Data infrastructure tools and platforms : data centers, cloud infrastructures, noSQL databases, in-memory databases, Hadoop/Map Reduce Ecosphere ¡ New generation of front-end tools for BI and analytic systems: data visualization and visual analytics, self-service BI, Mobile BI 24/10/2014 ¡ Data processing : supercomputers, distributed or massively parallel-computing
  • 40. 42 24/10/2014 Conclusion: Big Data challenges ¡ Semantic Information aggregation ¡ Information aggregation: “too much data to assimilate but not enough knowledge to act” ¡ Distributed and real-time processing ¡ Design of real-time and distributed algorithms for stream processing and information aggregation ¡ Distribution and parallelization of data mining algorithms ¡ Optimizing resources ¡ visual analytics and user modeling ¡ Dynamic user model ¡ Novel visualizations for very large datasets ¡ Data protection
  • 41. 43 24/10/2014 IEEE Metro Area Smart Tech Workshop on Distributed Data Streaming Dec 5,2014 Paris ¡ 08h00: Registration - Breakfast 08h50: Room L012 - Welcome 09h00: Room L012 - Introduction to Distributed Data Streaming - Speaker: Raja Chiky (ISEP) 10h15: Coffee break 10h45: Room L012 - Real World Issues in Supervised Classification for Data Streams - Speaker: Vincent Lemaire (Orange Labs) 11h30: Room L012 - Use Case 1- Finance - Speaker: Antoine Chambille (Quartet FS) 12h00: Room L012 - Use Case 2 – Smart metering - Speakers: Marie-Luce Picard (EDF R&D) 12h30: Lunch offsite 14h00: Rooms L305-L306 - 2 Parallel labs sessions: Real-Time Data processing with open source DSMS - Speakers: Raja Chiky and Sylvain Lefebvre - 1st part 15:30: Coffee break 16:00: Rooms L305-L306 - 2 Parallel labs sessions: Real-Time Data processing with open source DSMS - Speakers: Raja Chiky and Sylvain Lefebvre - 2nd part 17h30: Reception onsite
  • 42. 44 24/10/2014 Thanks to Marie-Aude Aufaure, ECP Sylvain lefebvre, ISEP
  • 43. Big Data Linked Data Volume, Variety, Velocity, Veracity, … Value Web of data, Seman(c Web -­‐ A set of principles and good prac1ces allowing to link, publish and search for web data -­‐ Structure and seman1cally enrich RDF data, with a very high scalability -­‐> Big Linked Data Integrate, aggregate, analyze, visualize large data sets, whatever is their type, provenance, speed of their flow … Big Linked Data Linked Big Data Seman?c Technologies Living Lab Linked & Big Data Academic Chair Our Value proposi?on – Seman1c aggrega1on from textual and non textual streams – Manage seman1c heterogeneity, real-­‐1me and distributed processing – Ensure data quality and veracity – Visual analy1cs