SlideShare a Scribd company logo
Big-Data Computing on the Cloud
an Algorithmic Perspective
Andrea Pietracaprina
Dept. of Information Engineering (DEI)
University of Padova, Italy
andrea.pietracaprina@unipd.it
Supported in part by MIUR-PRIN
Project Amanda: Algorithmics for MAssive and Networked DAta
Roma, May 20, 2016 Data Driven Innovation 1
OUTLINE
Roma, May 20, 2016 Data Driven Innovation 2
OUTLINE
From supercomputing to cloud computing
Paradigm shift
MapReduce
Big data algorithmics
 Coresets
 Decompositions of large networks
Conclusions
From Supercomputing to Cloud Computing
Roma, May 20, 2016 Data Driven Innovation 3
Supercomputing (‘70s – present)
Tianhe-2 (PRC)
Algorithm design
full knowledge and exploitation of
platform architecture
• Low productivity, high costs
• Grand Challenges
• Maximum performance (exascale in 2018?)
• Massively parallel systems
From Supercomputing to Cloud Computing
Roma, May 20, 2016 Data Driven Innovation 4
Cluster era (‘90s – present)
Algorithm design
Exploitation of architectural features
abstracted by few parameters
• Higher productivity and lower costs
• Wide range of commercial/scientific applications
• Good cost/performance tradeoffs
• Distributed systems (e.g., clusters, grids)
Network
(bandwidth/latency)
From Supercomputing to Cloud Computing
Roma, May 20, 2016 Data Driven Innovation 5
Cloud Computing (‘00s – present)
Algorithm design
Architecture-oblivious design
Data-centric perspective
• Novel computing environments: e.g., Hadoop, Spark, Google DF
• Popular for big-data applications
• Flexibility of usage, low costs, reliability
• Infrastructure, Software as Services
(IaaS, SaaS)
INPUT
DATA
OUTPUT
DATA
Map – Shuffle - Reduce
Paradigm Shift
Roma, May 20, 2016 Data Driven Innovation 6
Traditional Algorithmics Big-Data Algorithmics
Best balance between
computation, parallelism,
communication
Few scans of the whole input
data
Machine-conscious design Machine-oblivious design
Noiseless, static input data Noisy, dynamic input data
Polynomial complexity (Sub-)Linear complexity
PARADIGM SHIFT
Roma, May 20, 2016 Data Driven Innovation 7
MAPREDUCE
MapReduce: single round
INPUT
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
OUTPUT
REDUCER
REDUCER
REDUCER
S
H
U
F
F
L
E
MAPPER: computation on individual data items
REDUCER: computation on small subsets of input
Roma, May 20, 2016 Data Driven Innovation 8
MAPREDUCE
MapReduce: multiround
Key Performance Indicators (input size N):
 Memory requirements per reducer: << N
 #Rounds (i.e., #shuffles): 1,2
 Aggregate space and communication  N
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
OUTPUT
REDUCER
REDUCER
REDUCER
INPUT
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
REDUCER
REDUCER
REDUCER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
REDUCER
REDUCER
REDUCER
ROUND 1 ROUND 2 ROUND r
Roma, May 20, 2016 Data Driven Innovation 9
Big Data Algorithmics
Coresets
Roma, May 20, 2016 Data Driven Innovation 10
Big Data Algorithmics
INPUT
CORESET
Coreset: a subset of data (summary) which maintains
the characteristics of the whole input, filtering out
redundancy
Roma, May 20, 2016 Data Driven Innovation 11
Big Data Algorithmics
General 2-round MapReduce approach
Round 1: partition into small subsets and extraction of partial coresets
Round 2: perform analysis on aggregation of partial coresets
INPUT
AGGREGATE
CORESET
PARTIAL
CORESET
CHALLENGE: composability of coresets
Roma, May 20, 2016 Data Driven Innovation 12
Big Data Algorithmics
Example: diversity maximization
Roma, May 20, 2016 Data Driven Innovation 13
Big Data Algorithmics
Example: diversity maximization
Goal: find k most diverse data objects
Applications: Recommendation systems, search engines
Roma, May 20, 2016 Data Driven Innovation 14
Big Data Algorithmics
INPUT
MapReduce
Solution
Round 1:
• Partition input data arbitrarily
Roma, May 20, 2016 Data Driven Innovation 15
Big Data Algorithmics: coresets
MapReduce
Solution
Round 1:
• Partition input data arbitrarily
• In each subset:
 k’-clustering based on similarity (k’>k)
 pick one representative per cluster
( partial coreset)
subset of partition k’-clustering partial coreset
N.B. For enhanced accuracy, it is crucial to fix k’>k
Roma, May 20, 2016 Data Driven Innovation 16
Big Data Algorithmics: coresets
MapReduce
Solution
Round 2:
• Aggregate partial coresets
• Compute output on aggregate coreset
partial
coresets
aggregate coreset
OUTPUT
Roma, May 20, 2016 Data Driven Innovation 17
Big Data Algorithmics
Round 1
Round 2
INPUT
PARTIAL
CORESET
AGGREGATE
CORESET
OUTPUT
Roma, May 20, 2016 Data Driven Innovation 18
Big Data Algorithmics
Experiments:
• N=64000 data objects
• Seek k=64 most diverse ones
• Final coreset size: [21024]∙k
• Measure: accuracy of solution
• 4 diversity measures
N.B. Same approach can be used in a streaming setting
Roma, May 20, 2016 Data Driven Innovation 19
Big Data Algorithmics
Decompositions of Large
Networks
Roma, May 20, 2016 Data Driven Innovation 20
Big Data Algorithmics
Analysis of large networks in
MapReduce must avoid:
Roma, May 20, 2016 Data Driven Innovation 21
Big Data Algorithmics
• Long traversals
• Superlinear complexities
Known exact algorithms often
do not meet these criteria
Analysis of large networks in
MapReduce must avoid:
Roma, May 20, 2016 Data Driven Innovation 22
Big Data Algorithmics
• Long traversals
• Superlinear complexities
Known exact algorithms often
do not meet these criteria
Analysis of large networks in
MapReduce must avoid:
Network decomposition can
provide concise summary of
network characteristics
Roma, May 20, 2016 Data Driven Innovation 23
Big Data Algorithmics
Example : network diameter
Goal: determine max distance
Applications: social networks,
internet/web, linguistics,
biology
B
A
Roma, May 20, 2016 Data Driven Innovation 24
Big Data Algorithmics
MapReduce
Solution
• Cluster the network into
few regions with small
radius R, around random
nodes
•  R rounds
R
Roma, May 20, 2016 Data Driven Innovation 25
Big Data Algorithmics
MapReduce
Solution
• Network summary: one
node per region
• Determine overlay network
of selected nodes
• Few rounds
Roma, May 20, 2016 Data Driven Innovation 26
Big Data Algorithmics
MapReduce
Solution
• Compute diameter of
overlay network
• Adjust for radius of original
regions
• 1 round
R R
N.B. overlay network is a good summary of input network;
its size can be chosen to fit memory constraints of reducers
Roma, May 20, 2016 Data Driven Innovation 27
Big Data Algorithmics
Experiments: 16-node cluster, 10Gbit Ethernet, Apache Spark
Network No. Nodes No. Links Time (s) Rounds Error
Roads-USA 24M 29M 158 74 26%
Twitter 42M 1.5G 236 5 19%
Artificial 500M 8G 6000 5 30%
benchmarks
scalability
(10K nodes in overlay network)
Roma, May 20, 2016 Data Driven Innovation 28
Big Data Algorithmics
Efficient network partitioning
• Progressive node sampling
• Local cluster growth from sampled nodes
• #rounds = #cluster growing steps
Roma, May 20, 2016 Data Driven Innovation 29
Big Data Algorithmics
Example
Roma, May 20, 2016 Data Driven Innovation 30
Big Data Algorithmics
Roma, May 20, 2016 Data Driven Innovation 31
Big Data Algorithmics
Round 2
Roma, May 20, 2016 Data Driven Innovation 32
Big Data Algorithmics
Roma, May 20, 2016 Data Driven Innovation 33
Big Data Algorithmics
Round 4
Roma, May 20, 2016 Data Driven Innovation 34
Big Data Algorithmics
Roma, May 20, 2016 Data Driven Innovation 35
Big Data Algorithmics
Round 6
Roma, May 20, 2016 Data Driven Innovation 36
Big Data Algorithmics
Coping with uncertainty
Links exist with certain
probabilities
Applications: biology, social
network analysis
• Network partitioning strategy suitable for this scenario
• cluster = region connected with high probability
Roma, May 20, 2016 Data Driven Innovation 37
Big Data Algorithmics
• PPI viewed as uncertain network
• Hp: protein complex  region
with high connection probability
• Traditional general partitioning
approaches slowed down by
uncertainty
Example: identification of protein complexes from
Protein-Protein Interaction (PPI) networks
Experiments show effectiveness of approach
Roma, May 20, 2016 Data Driven Innovation 38
Conclusions
CONCLUSIONS
• Design of big data algorithms (on clouds) entails
paradigm shift
 Data centric view
 Handling size through summarization
 Give up exact solution
 Cope with noisy/unreliable data
Roma, May 20, 2016 Data Driven Innovation 39
References
References
M. Ceccarello, A.P., G. Pucci, E. Upfal: Space and Time Efficient Parallel Graph Decomposition,
Clustering, and Diameter Approximation. ACM SPAA 2015
M. Ceccarello, A.P., G. Pucci, E. Upfal : A Practical Parallel Algorithm for Diameter
Approximation of Massive Weighted Graphs. IEEE IPDPS 2016
M. Ceccarello, A.P., G. Pucci, E. Upfal : MapReduce and Streaming Algorithms for Diversity
Maximization in Metric Spaces of Bounded Doubling Dimension. ArXiv 1605.05590 , 2016
M. Ceccarello, C. Fantozzi, A.P., G. Pucci, F. Vandin: Clustering in uncertain graphs. Work in
progress. 2016
Roma, May 20, 2016 Data Driven Innovation 40
Conclusions
THANK YOU!

More Related Content

What's hot

Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
René Kuipers
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
Big Data Spain
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
Annie Pettit, Research Methodologist
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
suresh sood
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Hari Priya
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
Leanne Hwee
 
Big Data
Big DataBig Data
Big Data
Seminar Links
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
Big Data Week
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
The Marketing Distillery
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
Takrim Ul Islam Laskar
 
Data Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data ScienceData Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data Science
DataMites
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Srishti44
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
Aswadmehar
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
suresh sood
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
Ravi Teja
 
One Database Countless Possibilities for Mission-critical Applications
One Database Countless Possibilities for Mission-critical ApplicationsOne Database Countless Possibilities for Mission-critical Applications
One Database Countless Possibilities for Mission-critical Applications
FairCom
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
Kenny Daniel
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
Bernard Marr
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
Tanya Talwar
 

What's hot (20)

Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big Data Landscape 2018
Big Data Landscape 2018Big Data Landscape 2018
Big Data Landscape 2018
 
Big Data
Big DataBig Data
Big Data
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Big data(1st presentation)
Big data(1st presentation)Big data(1st presentation)
Big data(1st presentation)
 
Data Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data ScienceData Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
One Database Countless Possibilities for Mission-critical Applications
One Database Countless Possibilities for Mission-critical ApplicationsOne Database Countless Possibilities for Mission-critical Applications
One Database Countless Possibilities for Mission-critical Applications
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 

Viewers also liked

Outthink code: l'impossibile diventa possibile
Outthink code: l'impossibile diventa possibileOutthink code: l'impossibile diventa possibile
Outthink code: l'impossibile diventa possibile
Data Driven Innovation
 
Open Data e libertà di iniziativa economica
Open Data e libertà di iniziativa economicaOpen Data e libertà di iniziativa economica
Open Data e libertà di iniziativa economica
Data Driven Innovation
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
Data Driven Innovation
 
Evoluzioni architetturali a partire da Hadoop
Evoluzioni architetturali a partire da HadoopEvoluzioni architetturali a partire da Hadoop
Evoluzioni architetturali a partire da Hadoop
Data Driven Innovation
 
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
Data Driven Innovation
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
Data Driven Innovation
 
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
Data Driven Innovation
 
Big data e prevenzione. Verso un minority report per gli incidenti
Big data e prevenzione. Verso un minority report per gli incidentiBig data e prevenzione. Verso un minority report per gli incidenti
Big data e prevenzione. Verso un minority report per gli incidenti
Data Driven Innovation
 
Big Data e Deep Learning: verso una nuova generazione di programmi intelligenti
Big Data e Deep Learning: verso una nuova generazione di programmi intelligentiBig Data e Deep Learning: verso una nuova generazione di programmi intelligenti
Big Data e Deep Learning: verso una nuova generazione di programmi intelligenti
Data Driven Innovation
 
Data-driven Marketing con iBeacon e Physical Web
Data-driven Marketing con iBeacon e Physical WebData-driven Marketing con iBeacon e Physical Web
Data-driven Marketing con iBeacon e Physical Web
Data Driven Innovation
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Data Driven Innovation
 
Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016
Michelle Casbon
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Make your data talk
Make your data talkMake your data talk
Make your data talk
Data Driven Innovation
 
Internet of Things e Industria 4.0: quali policy per il Made in Italy
Internet of Things e Industria 4.0: quali policy per il Made in ItalyInternet of Things e Industria 4.0: quali policy per il Made in Italy
Internet of Things e Industria 4.0: quali policy per il Made in Italy
Data Driven Innovation
 
Data Driven Innovation: sfide e opportunità
Data Driven Innovation: sfide e opportunitàData Driven Innovation: sfide e opportunità
Data Driven Innovation: sfide e opportunità
Data Driven Innovation
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Mammoth Data
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
Data Driven Innovation
 

Viewers also liked (20)

Outthink code: l'impossibile diventa possibile
Outthink code: l'impossibile diventa possibileOutthink code: l'impossibile diventa possibile
Outthink code: l'impossibile diventa possibile
 
Open Data e libertà di iniziativa economica
Open Data e libertà di iniziativa economicaOpen Data e libertà di iniziativa economica
Open Data e libertà di iniziativa economica
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
Evoluzioni architetturali a partire da Hadoop
Evoluzioni architetturali a partire da HadoopEvoluzioni architetturali a partire da Hadoop
Evoluzioni architetturali a partire da Hadoop
 
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
Gli Open Data ci parlano di comunità e di relazioni. Possono dirci qualcosa s...
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
Il pontenziale dei Big Data Europei: nuove risorse per le comunità, i territo...
 
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
Il nuovo Censimento della popolazione in Italia: data driven dall’input all’o...
 
Big data e prevenzione. Verso un minority report per gli incidenti
Big data e prevenzione. Verso un minority report per gli incidentiBig data e prevenzione. Verso un minority report per gli incidenti
Big data e prevenzione. Verso un minority report per gli incidenti
 
Big Data e Deep Learning: verso una nuova generazione di programmi intelligenti
Big Data e Deep Learning: verso una nuova generazione di programmi intelligentiBig Data e Deep Learning: verso una nuova generazione di programmi intelligenti
Big Data e Deep Learning: verso una nuova generazione di programmi intelligenti
 
Data-driven Marketing con iBeacon e Physical Web
Data-driven Marketing con iBeacon e Physical WebData-driven Marketing con iBeacon e Physical Web
Data-driven Marketing con iBeacon e Physical Web
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
 
Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016Advanced Spark Meetup - Jan 12, 2016
Advanced Spark Meetup - Jan 12, 2016
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Make your data talk
Make your data talkMake your data talk
Make your data talk
 
Internet of Things e Industria 4.0: quali policy per il Made in Italy
Internet of Things e Industria 4.0: quali policy per il Made in ItalyInternet of Things e Industria 4.0: quali policy per il Made in Italy
Internet of Things e Industria 4.0: quali policy per il Made in Italy
 
Data Driven Innovation: sfide e opportunità
Data Driven Innovation: sfide e opportunitàData Driven Innovation: sfide e opportunità
Data Driven Innovation: sfide e opportunità
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 

Similar to Big-Data Computing on the Cloud

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
Prasant Misra
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
nitesh saxena
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
Big Data Spain
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
sopekmir
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
BigData_Europe
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Seattle DAML meetup
 
Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
Jenny Midwinter
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
GeeksLab Odessa
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Big Data Spain
 
Ppt (1)
Ppt (1)Ppt (1)
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
William Smith
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
Sabri Skhiri
 
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Communication Systems & Networks
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
geostack
geostackgeostack
geostack
Joana Simoes
 
Performance trends and alerts with ThingSpeak IoT
Performance trends and alerts with ThingSpeak IoTPerformance trends and alerts with ThingSpeak IoT
Performance trends and alerts with ThingSpeak IoT
Anoush Najarian
 
EUBraBIGSEA Project
EUBraBIGSEA Project EUBraBIGSEA Project
EUBraBIGSEA Project
EUBrasilCloudFORUM .
 

Similar to Big-Data Computing on the Cloud (20)

Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Machine Learning meets Granular Computing
Machine Learning meets Granular ComputingMachine Learning meets Granular Computing
Machine Learning meets Granular Computing
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
Begin at the beginning: Feature selection for Big Data by Amparo Alonso at Bi...
 
Ppt (1)
Ppt (1)Ppt (1)
Ppt (1)
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
Massive MIMO: Bristol - Lund Joint Field Trial Experiments and Record Breakin...
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
geostack
geostackgeostack
geostack
 
Performance trends and alerts with ThingSpeak IoT
Performance trends and alerts with ThingSpeak IoTPerformance trends and alerts with ThingSpeak IoT
Performance trends and alerts with ThingSpeak IoT
 
EUBraBIGSEA Project
EUBraBIGSEA Project EUBraBIGSEA Project
EUBraBIGSEA Project
 

More from Data Driven Innovation

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data Driven Innovation
 

More from Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

Big-Data Computing on the Cloud

  • 1. Big-Data Computing on the Cloud an Algorithmic Perspective Andrea Pietracaprina Dept. of Information Engineering (DEI) University of Padova, Italy andrea.pietracaprina@unipd.it Supported in part by MIUR-PRIN Project Amanda: Algorithmics for MAssive and Networked DAta Roma, May 20, 2016 Data Driven Innovation 1
  • 2. OUTLINE Roma, May 20, 2016 Data Driven Innovation 2 OUTLINE From supercomputing to cloud computing Paradigm shift MapReduce Big data algorithmics  Coresets  Decompositions of large networks Conclusions
  • 3. From Supercomputing to Cloud Computing Roma, May 20, 2016 Data Driven Innovation 3 Supercomputing (‘70s – present) Tianhe-2 (PRC) Algorithm design full knowledge and exploitation of platform architecture • Low productivity, high costs • Grand Challenges • Maximum performance (exascale in 2018?) • Massively parallel systems
  • 4. From Supercomputing to Cloud Computing Roma, May 20, 2016 Data Driven Innovation 4 Cluster era (‘90s – present) Algorithm design Exploitation of architectural features abstracted by few parameters • Higher productivity and lower costs • Wide range of commercial/scientific applications • Good cost/performance tradeoffs • Distributed systems (e.g., clusters, grids) Network (bandwidth/latency)
  • 5. From Supercomputing to Cloud Computing Roma, May 20, 2016 Data Driven Innovation 5 Cloud Computing (‘00s – present) Algorithm design Architecture-oblivious design Data-centric perspective • Novel computing environments: e.g., Hadoop, Spark, Google DF • Popular for big-data applications • Flexibility of usage, low costs, reliability • Infrastructure, Software as Services (IaaS, SaaS) INPUT DATA OUTPUT DATA Map – Shuffle - Reduce
  • 6. Paradigm Shift Roma, May 20, 2016 Data Driven Innovation 6 Traditional Algorithmics Big-Data Algorithmics Best balance between computation, parallelism, communication Few scans of the whole input data Machine-conscious design Machine-oblivious design Noiseless, static input data Noisy, dynamic input data Polynomial complexity (Sub-)Linear complexity PARADIGM SHIFT
  • 7. Roma, May 20, 2016 Data Driven Innovation 7 MAPREDUCE MapReduce: single round INPUT MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER OUTPUT REDUCER REDUCER REDUCER S H U F F L E MAPPER: computation on individual data items REDUCER: computation on small subsets of input
  • 8. Roma, May 20, 2016 Data Driven Innovation 8 MAPREDUCE MapReduce: multiround Key Performance Indicators (input size N):  Memory requirements per reducer: << N  #Rounds (i.e., #shuffles): 1,2  Aggregate space and communication  N MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER OUTPUT REDUCER REDUCER REDUCER INPUT MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER REDUCER REDUCER REDUCER MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER REDUCER REDUCER REDUCER ROUND 1 ROUND 2 ROUND r
  • 9. Roma, May 20, 2016 Data Driven Innovation 9 Big Data Algorithmics Coresets
  • 10. Roma, May 20, 2016 Data Driven Innovation 10 Big Data Algorithmics INPUT CORESET Coreset: a subset of data (summary) which maintains the characteristics of the whole input, filtering out redundancy
  • 11. Roma, May 20, 2016 Data Driven Innovation 11 Big Data Algorithmics General 2-round MapReduce approach Round 1: partition into small subsets and extraction of partial coresets Round 2: perform analysis on aggregation of partial coresets INPUT AGGREGATE CORESET PARTIAL CORESET CHALLENGE: composability of coresets
  • 12. Roma, May 20, 2016 Data Driven Innovation 12 Big Data Algorithmics Example: diversity maximization
  • 13. Roma, May 20, 2016 Data Driven Innovation 13 Big Data Algorithmics Example: diversity maximization Goal: find k most diverse data objects Applications: Recommendation systems, search engines
  • 14. Roma, May 20, 2016 Data Driven Innovation 14 Big Data Algorithmics INPUT MapReduce Solution Round 1: • Partition input data arbitrarily
  • 15. Roma, May 20, 2016 Data Driven Innovation 15 Big Data Algorithmics: coresets MapReduce Solution Round 1: • Partition input data arbitrarily • In each subset:  k’-clustering based on similarity (k’>k)  pick one representative per cluster ( partial coreset) subset of partition k’-clustering partial coreset N.B. For enhanced accuracy, it is crucial to fix k’>k
  • 16. Roma, May 20, 2016 Data Driven Innovation 16 Big Data Algorithmics: coresets MapReduce Solution Round 2: • Aggregate partial coresets • Compute output on aggregate coreset partial coresets aggregate coreset OUTPUT
  • 17. Roma, May 20, 2016 Data Driven Innovation 17 Big Data Algorithmics Round 1 Round 2 INPUT PARTIAL CORESET AGGREGATE CORESET OUTPUT
  • 18. Roma, May 20, 2016 Data Driven Innovation 18 Big Data Algorithmics Experiments: • N=64000 data objects • Seek k=64 most diverse ones • Final coreset size: [21024]∙k • Measure: accuracy of solution • 4 diversity measures N.B. Same approach can be used in a streaming setting
  • 19. Roma, May 20, 2016 Data Driven Innovation 19 Big Data Algorithmics Decompositions of Large Networks
  • 20. Roma, May 20, 2016 Data Driven Innovation 20 Big Data Algorithmics Analysis of large networks in MapReduce must avoid:
  • 21. Roma, May 20, 2016 Data Driven Innovation 21 Big Data Algorithmics • Long traversals • Superlinear complexities Known exact algorithms often do not meet these criteria Analysis of large networks in MapReduce must avoid:
  • 22. Roma, May 20, 2016 Data Driven Innovation 22 Big Data Algorithmics • Long traversals • Superlinear complexities Known exact algorithms often do not meet these criteria Analysis of large networks in MapReduce must avoid: Network decomposition can provide concise summary of network characteristics
  • 23. Roma, May 20, 2016 Data Driven Innovation 23 Big Data Algorithmics Example : network diameter Goal: determine max distance Applications: social networks, internet/web, linguistics, biology B A
  • 24. Roma, May 20, 2016 Data Driven Innovation 24 Big Data Algorithmics MapReduce Solution • Cluster the network into few regions with small radius R, around random nodes •  R rounds R
  • 25. Roma, May 20, 2016 Data Driven Innovation 25 Big Data Algorithmics MapReduce Solution • Network summary: one node per region • Determine overlay network of selected nodes • Few rounds
  • 26. Roma, May 20, 2016 Data Driven Innovation 26 Big Data Algorithmics MapReduce Solution • Compute diameter of overlay network • Adjust for radius of original regions • 1 round R R N.B. overlay network is a good summary of input network; its size can be chosen to fit memory constraints of reducers
  • 27. Roma, May 20, 2016 Data Driven Innovation 27 Big Data Algorithmics Experiments: 16-node cluster, 10Gbit Ethernet, Apache Spark Network No. Nodes No. Links Time (s) Rounds Error Roads-USA 24M 29M 158 74 26% Twitter 42M 1.5G 236 5 19% Artificial 500M 8G 6000 5 30% benchmarks scalability (10K nodes in overlay network)
  • 28. Roma, May 20, 2016 Data Driven Innovation 28 Big Data Algorithmics Efficient network partitioning • Progressive node sampling • Local cluster growth from sampled nodes • #rounds = #cluster growing steps
  • 29. Roma, May 20, 2016 Data Driven Innovation 29 Big Data Algorithmics Example
  • 30. Roma, May 20, 2016 Data Driven Innovation 30 Big Data Algorithmics
  • 31. Roma, May 20, 2016 Data Driven Innovation 31 Big Data Algorithmics Round 2
  • 32. Roma, May 20, 2016 Data Driven Innovation 32 Big Data Algorithmics
  • 33. Roma, May 20, 2016 Data Driven Innovation 33 Big Data Algorithmics Round 4
  • 34. Roma, May 20, 2016 Data Driven Innovation 34 Big Data Algorithmics
  • 35. Roma, May 20, 2016 Data Driven Innovation 35 Big Data Algorithmics Round 6
  • 36. Roma, May 20, 2016 Data Driven Innovation 36 Big Data Algorithmics Coping with uncertainty Links exist with certain probabilities Applications: biology, social network analysis • Network partitioning strategy suitable for this scenario • cluster = region connected with high probability
  • 37. Roma, May 20, 2016 Data Driven Innovation 37 Big Data Algorithmics • PPI viewed as uncertain network • Hp: protein complex  region with high connection probability • Traditional general partitioning approaches slowed down by uncertainty Example: identification of protein complexes from Protein-Protein Interaction (PPI) networks Experiments show effectiveness of approach
  • 38. Roma, May 20, 2016 Data Driven Innovation 38 Conclusions CONCLUSIONS • Design of big data algorithms (on clouds) entails paradigm shift  Data centric view  Handling size through summarization  Give up exact solution  Cope with noisy/unreliable data
  • 39. Roma, May 20, 2016 Data Driven Innovation 39 References References M. Ceccarello, A.P., G. Pucci, E. Upfal: Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation. ACM SPAA 2015 M. Ceccarello, A.P., G. Pucci, E. Upfal : A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs. IEEE IPDPS 2016 M. Ceccarello, A.P., G. Pucci, E. Upfal : MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension. ArXiv 1605.05590 , 2016 M. Ceccarello, C. Fantozzi, A.P., G. Pucci, F. Vandin: Clustering in uncertain graphs. Work in progress. 2016
  • 40. Roma, May 20, 2016 Data Driven Innovation 40 Conclusions THANK YOU!

Editor's Notes

  1. Title + authors
  2. Title + authors
  3. Supercomputers start in fact in the 40’sbut here we consider the modern era or cray era Grand Challenges: automotive/aerospace, weather, energy, biology neuroscience) high costs for infrastructure, maintainance, sw
  4. High-level algorithmic and programming models  higher productivity and portability Most existing clusters offer provisions for high availability and fault tolerance. Also provide for load balancing
  5. Paradigmi funzionali: il placement dei dati e della computazione non e' sotto il controllo della programmazione ma il focus e' sui dati e sulle loro trasformazioni.
  6. Linearity constraint often implies giving up existing exact strategies in favor of novel approximate ones
  7. Hadoop: Industry standard, third party infrastructure support (Amazon EMR, Cloudera) Slow performance on communication-intensive or iterative algorithms, because of reliance on HDFS for communication Spark: Emerging industry standard (e.g. Amazon, IBM, GroupOn, Yahoo!) Good performance for iterative and communication intensive algorithms
  8. Hadoop: Industry standard, third party infrastructure support (Amazon EMR, Cloudera) Slow performance on communication-intensive or iterative algorithms, because of reliance on HDFS for communication Spark: Emerging industry standard (e.g. Amazon, IBM, GroupOn, Yahoo!) Good performance for iterative and communication intensive algorithms
  9. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection
  10. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection
  11. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection
  12. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection
  13. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection
  14. Classical IR scenario: retrieve few documents most relevant to user query Diversity maximization scenario: retrieve relavan documents that present all different angles of a query When don’t know user intent you must guess all possible intents and present a selection of results covering all of them E-commerce, recommendation systems: return a “consideration set” hoping that the user be attracted to at least one object in the set. Examples: google news selection