SlideShare a Scribd company logo
1 of 63
Download to read offline
STORIES ABOUT
SPARK, HPC & BARCELONA
Jordi Torres
Barcelona Supercomputing Center
UPC Barcelona Tech
www.JordiTorres.eu - @JordiTorresBCN
Why HPC?
Scientists always needed the best instruments
which technology of the time allowed to build
Microscope (Santiago Ramon y Cajal) Large Hadron Collider (CERN)
And supercomputers today can be
considered as the ultimate scientific
instrument that enables progress in science
The Evolution of The Research Paradigm
High Performance Computing means Numerical
Simulation and Big Data Analysis that allows
Reduce expense Avoid dangerous experiments Help to build knowledge where
experiments are impossible or
not affordable
HPC is an enabler for all scientific fields
Life Sciences &
Medicine
Earth
Sciences
Astro, High Energy &
Plasma Physics
Materials, Chemistry &
Nanoscience
Engineering Neuroscience
 Emergent focus on big
data requires a transition of
computing facilities into a
data-centric paradigm too
However, traditional HPC systems are designed
according to the compute-centric paradigm
We have
experimented
with this in our
HPC facility in
Barcelona.
And this is what
I’m going to talk
about today!
How can traditional HPC existing infrastructure
evolve to meet the new demands?
What is HPC in
Barcelona like?
In Barcelona HPC is without doubt …
A team of
425 people
(from 40 countries)
BSC scientific departments
EARTH
SCIENCES
LIFE
SCIENCES
ENGINEERING
SCIENCE
COMPUTER
SCIENCES
Joint Research Centres with IT Companies
BSC-Microsoft Research Centre
BSC-IBM Technology Center
for Supercomputing
Intel-BSC Exascale Lab
BSC-NVIDIA CUDA
Center of Excellence
Our Supercomputer in Barcelona
Marenostrum
Supercomputer
Born inside a deconsecrated chapel
The Marenostrum 3 Supercomputer
Over 1015 Floating Points
Operations per second
(Petaflop)
– Nearly 50,000 cores
– 100.8 TB of memory
– 2000 TB disk storage
The third of three brothers
• 2004: MareNostrum 1
– Nearly 5x1013 Floating
Points per second
– Nearly 5.000 cores
– 236 TB disk storage
• 2006: MareNostrum 2
– Nearly 1014 Floating
Points per second
– Over 10.000 cores
– 460 disk storage
• 2012: MareNostrum 3
Marenostrum ancestors in the chapel
A parallel system inside the
same chapel:
Grandparent:
Processing capacity: Over
1000 operations-beats per
minute
Parallel system with 8
parallel typewriter units.
Grandmother:
Storage capacity: over
100Mb
Parallel Storage System with
14 drawer devices.
How could BSC meet
new Big Data demands?
Until now, the habitual MN3 workloads
have been numerical applications
• MN3 Basic software Stack:
– OpenMP
– MPI
– Threads
– …
How can MN3 evolve to meet new
Big Data Analytics demand?
 New module developed at BSC
Marenostrum
Supercomputer
SPARK4MN module
• framework to enable Spark workloads over
IBM LSF Platform workload manager on MN3
Spark4MN in action
Lets go!
Spark4MN in action
• We performed a System level Performance
Evaluation & Tuning to MN3
• Example of some results:
– Speed-up
– Scale-up
– Parallelism
Example 1: Kmeans Speed-up
More dimensions  smaller speed-up
because of increased shuffling
(same number of centroids to shuffle but bigger)
• Times for running k-means for
10 iterations.
• Problem size constant =
100GBs (10M1000D = 10M
vectors of 1000 dimensions)
Example 2: Kmeans Scale-up
• modify both the number of
records and the number
of machines.
• Ideally, all the plots
should be horizontal
 our system behaves
closely to that.
Example 3: Configuring task parallelism
Varying the number of tasks over the same amount of cores
for k-means, the best-performing configuration is to have
as many partitions as cores = 1 task per core is better!
• Median times for running k-means
for 10 iterations with different
number of partitions
• In our benchmarks the number of
tasks is equal to the number of
RDD partitions.
Example 3: Configuring task parallelism
• Using Sort-by-key: a more intensive shuffling-intensive scenario
– We sort 1 billion records using 64 nodes & different partition sizes
– Contrary to the previous case, we observe speed-ups when
there are 2 partitions per core
Exemple 4: sort-by-key
• How many concurrent tasks an executor
can supervise?
Having 2 8-core executors
instead of 8 2-core ones,
improves on the running time by
a factor of 2.79 leaving all the
other parameters the same.
More results on
Friday at the
Santa Clara
conference!
2015 IEEE International Conference
on Big Data October 29-November 1,
Santa Clara, CA, USA
Spark and node level
performance?
New Architecture Support for Big
Data Analytics
Exponential increase in core count
Never promising technologies
(Hybrid Memory Cubes, NVRAM, etc)
Our Research Goal
Improve the node
level performance of
state-of-the-art
scale-out data
processing framework
Speed-up vs Executor threads
(*) Processor Intel Xeon E5-2697 (24 cores) & Spark 1.3
Data Processing Capacity scaling at
large input dataset
The performance of Spark workloads degrades with
large volumes of data due to substantial increase in
garbage collection and file I/O time.
Spark workloads do not saturate the available
bandwidth and hence their performance is bound on
DRAM latency
More results on
• A. J. Awan, M. Brorsson, V. Vlassov and E.
Ayguade, "Performance Characterization of In-
Memory Data Analytics on a Modern Cloud Server",
in 5th IEEE International Conference on Big Data
and Cloud Computing (BDCloud), Aug 2015, Dalina,
China (Best Paper Award)
• A. J. Awan, M. Brorsson, V. Vlassov and E.
Ayguade, "How Data Volume Affects Spark Based
Data Analytics on a Scale-up Server", in 6th
International Workshop on Big Data Benchmarks,
Performance Optimization and Emerging Hardware
(BpoE), held in conjunction with 41st International
Conference on Very Large Data Bases, Sep 2015,
Hawaii, USA.
Next generation of
HPC programming
models and Spark?
BSC programming model COMPSs
– Sequential programming
model
– Abstracts the
application from the
underlying distributed
infrastructure
– Exploit the inherent
parallelism at runtime
We are studying the comparison and
interaction between these two programming
models in platforms like marenostrum 3
Marenostrum
Supercomputer
Marenostrum
Supercomputer
Profiling Spark with BSC’s HPC tools
• Relying on over 20
years HPC experience
& tools for profiling
• Preliminary work:
Developed the Hadoop
Instrumentation Toolkit
CPU
Memory
Page Faults
processes and
communication
Project ALOJA: Benchmarking Spark
• Open initiative to Explore and
produce a systematic study of
Hadoop/Spark efficiency on
different SW and HW
• Online repository that allows
compare, side by side all
execution parameters ( 50,000+
runs over 100+ HW config.)
Big Data Analytics
workloads at BSC?
(with Spark)
Preliminary work
• Multimedia Big Data Computing:
Work with three kinds of data at the same time
social
network
relationships
audiovisual
content
metadata
Preliminary case study
Multimodal Data
Analytics systems
E.g. Latent User
Attribute Inference
to Predicting
Desigual Followers
44
Example of tools created: Vectorization
Necessary for visual similarity search, visual clustering, classification, etc.
45
Available in our github: bsc.spark.image
scala> import bsc.spark.image.ImageUtils
…
scala> images = ImageUtils.seqFile("hdfs://...", sc);
scala> dictionary = ImageUtils.BoWDictionary(images);
scala> vectors = dictionary.getBags(images);
…
scala> val splits = vectors.randomSplit(Array(0.6, 0.4), seed = 11L)
scala> training = splits(0)
scala> test = splits(1)
scala> model = NaiveBayes.train(training, lambda = 1.0)
…
Applications: Locality Sensitive Hashing
e.g. near-replica detection (visual spam detection, copyright infringement)
PATCH 1
PATCH 2
PATCH 3
PATCH 4
KP1
KP2
KP3
KP4
feature
detection
feature
description
0000 0100 1100
0010 0110 1110
0011 0111 1111
features are sketched, embedded
into a Hamming space
Similar features are hashed into similar buckets in a hash table
SIFT, SURF, ORB, etc.
0 1 1 0
Current work: Computer Vision
• Makes very productive use of (convolutional) neural networks
• SIFT features became unnecessary (used for decades)
What next at BSC?
BSC vision:
Giving computers a greater
ability to understand
information, and to learn, to
reason, and act upon it
Old wine in a new bottle?
• the term itself dates from the
1950s.
• periods of hype and high
expectations alternating with
periods of setback and
disappointment.
Artificial
Intelligence
plays an
important
role
Why Now?
1. Along the explosion of data …
now algorithms can be “trained” by
exposing them to large data sets that
were previously unavailable.
2. And the computing power
necessary to implement these
algorithms are now available
Evolution of computing power
FLOP/second
1988
Cray Y-MP (8 processadors)
1998
Cray T3E (1024 processadors)
2008
Cray XT5 (15000 processadors)
~2019
? (1x107 processadors
This new type of computing requires
DATA
Supercomputers
Research
Big Data
Technologies
Advanced
Analytic
Algorithms
1. the continuous
development of
supercomputing
systems
2. enabling the
convergence of
advanced analytic
algorithms
3. and big data
technologies
Today technologies & focus at BSC
COMPUTER
VISION
Advanced
Analytics
Algorithms
Cognitive Computing requires a transition of
computing facilities into a new paradigm too
Name? … We use Cognitive Computing
Yesterday Today Tomorrow
And to finish…
Welcome to
Barcelona!
Welcome to our wonderful city
57
Welcome to our university
22 schools - 4K employees - 35K students
Welcome to our research center
Welcome to our everyday life
60
Welcome to our academic activities
• Teaching Spark @ Master courses
• Using Spark @ Final Master Thesis
• Using Spark @ Research activity
• NEW Spark Book in Spanish
• Editorial UOC
• Presentation November 3, 2015
61
Foreword by
Matei Zaharia
1000+ members
62
Welcome to our Spark Community
1000+ members
63
Thank you for your attention!
Jordi Torres @JordiTorresBCN www.JordiTorres.eu
Welcome to our Spark Community

More Related Content

What's hot

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Spark Summit
 
Accelerating Data Processing in Spark SQL with Pandas UDFs
Accelerating Data Processing in Spark SQL with Pandas UDFsAccelerating Data Processing in Spark SQL with Pandas UDFs
Accelerating Data Processing in Spark SQL with Pandas UDFsDatabricks
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentDatabricks
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaSpeeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaDatabricks
 

What's hot (20)

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Accelerating Data Processing in Spark SQL with Pandas UDFs
Accelerating Data Processing in Spark SQL with Pandas UDFsAccelerating Data Processing in Spark SQL with Pandas UDFs
Accelerating Data Processing in Spark SQL with Pandas UDFs
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David OjikaSpeeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
 

Viewers also liked

Moneyball: Contra el olfato, la selección objetiva del talento.
Moneyball:  Contra el olfato, la selección objetiva del talento.Moneyball:  Contra el olfato, la selección objetiva del talento.
Moneyball: Contra el olfato, la selección objetiva del talento.Alejandro Roca
 
「こども保険」提言資料<概要>
「こども保険」提言資料<概要>「こども保険」提言資料<概要>
「こども保険」提言資料<概要>衆議院議員
 
Preservice Teachers' Writing Presentation at TESOL 2017
Preservice Teachers' Writing Presentation at TESOL 2017Preservice Teachers' Writing Presentation at TESOL 2017
Preservice Teachers' Writing Presentation at TESOL 2017Dr. Kate Mastruserio Reynolds
 
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...Juma Yousef J. Saleh جمعة يوسف جمعة صالح
 
Great optical illusions
Great optical illusionsGreat optical illusions
Great optical illusionsPaul Sloane
 
Tobe a superstar programmer
Tobe a superstar programmerTobe a superstar programmer
Tobe a superstar programmerArif Huda
 
Basic Study for Erlang #1
Basic Study for Erlang #1Basic Study for Erlang #1
Basic Study for Erlang #1Masahito Ikuta
 
Postiglione meridionalismo all'olio d'oliva
Postiglione meridionalismo all'olio d'olivaPostiglione meridionalismo all'olio d'oliva
Postiglione meridionalismo all'olio d'olivaAlessio Postiglione
 
Zica profile presentation
Zica profile presentationZica profile presentation
Zica profile presentationZICA ODISHA
 
Maruxa mallo (2017)
Maruxa mallo (2017)Maruxa mallo (2017)
Maruxa mallo (2017)Marlou
 
three.jsによる一歩進めたグラフィカルな表現
three.jsによる一歩進めたグラフィカルな表現three.jsによる一歩進めたグラフィカルな表現
three.jsによる一歩進めたグラフィカルな表現Kei Yagi
 
A Content Marketing Strategy Guide for Winning in Your Market
A Content Marketing Strategy Guide for Winning in Your MarketA Content Marketing Strategy Guide for Winning in Your Market
A Content Marketing Strategy Guide for Winning in Your MarketGabriel Nwatarali
 
Capacità negativa in emergenza contro burnout e mobbing
Capacità negativa in emergenza contro burnout e mobbingCapacità negativa in emergenza contro burnout e mobbing
Capacità negativa in emergenza contro burnout e mobbingRaffaele Pepe
 
일단 시작하는 코틀린
일단 시작하는 코틀린일단 시작하는 코틀린
일단 시작하는 코틀린Park JoongSoo
 
Top 10 Foods to Increase Sperm Count Fast
Top 10 Foods to Increase Sperm Count FastTop 10 Foods to Increase Sperm Count Fast
Top 10 Foods to Increase Sperm Count FastMedisys Kart
 
Biases in military history
Biases in military historyBiases in military history
Biases in military historyAgha A
 
In the DOM, no one will hear you scream
In the DOM, no one will hear you screamIn the DOM, no one will hear you scream
In the DOM, no one will hear you screamMario Heiderich
 

Viewers also liked (20)

Moneyball: Contra el olfato, la selección objetiva del talento.
Moneyball:  Contra el olfato, la selección objetiva del talento.Moneyball:  Contra el olfato, la selección objetiva del talento.
Moneyball: Contra el olfato, la selección objetiva del talento.
 
「こども保険」提言資料<概要>
「こども保険」提言資料<概要>「こども保険」提言資料<概要>
「こども保険」提言資料<概要>
 
Preservice Teachers' Writing Presentation at TESOL 2017
Preservice Teachers' Writing Presentation at TESOL 2017Preservice Teachers' Writing Presentation at TESOL 2017
Preservice Teachers' Writing Presentation at TESOL 2017
 
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...
البرمجة الزمنية لتركيب التكييف فى الفيلا Work progress program step by step-b...
 
Great optical illusions
Great optical illusionsGreat optical illusions
Great optical illusions
 
Tobe a superstar programmer
Tobe a superstar programmerTobe a superstar programmer
Tobe a superstar programmer
 
Basic Study for Erlang #1
Basic Study for Erlang #1Basic Study for Erlang #1
Basic Study for Erlang #1
 
Postiglione meridionalismo all'olio d'oliva
Postiglione meridionalismo all'olio d'olivaPostiglione meridionalismo all'olio d'oliva
Postiglione meridionalismo all'olio d'oliva
 
Zica profile presentation
Zica profile presentationZica profile presentation
Zica profile presentation
 
Maruxa mallo (2017)
Maruxa mallo (2017)Maruxa mallo (2017)
Maruxa mallo (2017)
 
three.jsによる一歩進めたグラフィカルな表現
three.jsによる一歩進めたグラフィカルな表現three.jsによる一歩進めたグラフィカルな表現
three.jsによる一歩進めたグラフィカルな表現
 
A Content Marketing Strategy Guide for Winning in Your Market
A Content Marketing Strategy Guide for Winning in Your MarketA Content Marketing Strategy Guide for Winning in Your Market
A Content Marketing Strategy Guide for Winning in Your Market
 
Capacità negativa in emergenza contro burnout e mobbing
Capacità negativa in emergenza contro burnout e mobbingCapacità negativa in emergenza contro burnout e mobbing
Capacità negativa in emergenza contro burnout e mobbing
 
GFPR overview
GFPR overviewGFPR overview
GFPR overview
 
Appreciative inquiry
Appreciative inquiryAppreciative inquiry
Appreciative inquiry
 
일단 시작하는 코틀린
일단 시작하는 코틀린일단 시작하는 코틀린
일단 시작하는 코틀린
 
Top 10 Foods to Increase Sperm Count Fast
Top 10 Foods to Increase Sperm Count FastTop 10 Foods to Increase Sperm Count Fast
Top 10 Foods to Increase Sperm Count Fast
 
ARM Compute Library
ARM Compute LibraryARM Compute Library
ARM Compute Library
 
Biases in military history
Biases in military historyBiases in military history
Biases in military history
 
In the DOM, no one will hear you scream
In the DOM, no one will hear you screamIn the DOM, no one will hear you scream
In the DOM, no one will hear you scream
 

Similar to Stories About Spark, HPC and Barcelona by Jordi Torres

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
BSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming ModelsBSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming Modelsinside-BigData.com
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Masoud Nikravesh
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 

Similar to Stories About Spark, HPC and Barcelona by Jordi Torres (20)

How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
AI Super computer update
AI Super computer update AI Super computer update
AI Super computer update
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
BSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming ModelsBSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming Models
 
The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Available HPC Resources at CSUC
Available HPC Resources at CSUCAvailable HPC Resources at CSUC
Available HPC Resources at CSUC
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 

Recently uploaded (16)

Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 

Stories About Spark, HPC and Barcelona by Jordi Torres

  • 1. STORIES ABOUT SPARK, HPC & BARCELONA Jordi Torres Barcelona Supercomputing Center UPC Barcelona Tech www.JordiTorres.eu - @JordiTorresBCN
  • 3. Scientists always needed the best instruments which technology of the time allowed to build Microscope (Santiago Ramon y Cajal) Large Hadron Collider (CERN)
  • 4. And supercomputers today can be considered as the ultimate scientific instrument that enables progress in science
  • 5. The Evolution of The Research Paradigm High Performance Computing means Numerical Simulation and Big Data Analysis that allows Reduce expense Avoid dangerous experiments Help to build knowledge where experiments are impossible or not affordable
  • 6. HPC is an enabler for all scientific fields Life Sciences & Medicine Earth Sciences Astro, High Energy & Plasma Physics Materials, Chemistry & Nanoscience Engineering Neuroscience
  • 7.  Emergent focus on big data requires a transition of computing facilities into a data-centric paradigm too However, traditional HPC systems are designed according to the compute-centric paradigm
  • 8. We have experimented with this in our HPC facility in Barcelona. And this is what I’m going to talk about today! How can traditional HPC existing infrastructure evolve to meet the new demands?
  • 9. What is HPC in Barcelona like?
  • 10. In Barcelona HPC is without doubt … A team of 425 people (from 40 countries)
  • 12. Joint Research Centres with IT Companies BSC-Microsoft Research Centre BSC-IBM Technology Center for Supercomputing Intel-BSC Exascale Lab BSC-NVIDIA CUDA Center of Excellence
  • 13. Our Supercomputer in Barcelona Marenostrum Supercomputer
  • 14. Born inside a deconsecrated chapel
  • 15. The Marenostrum 3 Supercomputer Over 1015 Floating Points Operations per second (Petaflop) – Nearly 50,000 cores – 100.8 TB of memory – 2000 TB disk storage
  • 16. The third of three brothers • 2004: MareNostrum 1 – Nearly 5x1013 Floating Points per second – Nearly 5.000 cores – 236 TB disk storage • 2006: MareNostrum 2 – Nearly 1014 Floating Points per second – Over 10.000 cores – 460 disk storage • 2012: MareNostrum 3
  • 17. Marenostrum ancestors in the chapel A parallel system inside the same chapel: Grandparent: Processing capacity: Over 1000 operations-beats per minute Parallel system with 8 parallel typewriter units. Grandmother: Storage capacity: over 100Mb Parallel Storage System with 14 drawer devices.
  • 18. How could BSC meet new Big Data demands?
  • 19. Until now, the habitual MN3 workloads have been numerical applications • MN3 Basic software Stack: – OpenMP – MPI – Threads – …
  • 20. How can MN3 evolve to meet new Big Data Analytics demand?  New module developed at BSC Marenostrum Supercomputer
  • 21. SPARK4MN module • framework to enable Spark workloads over IBM LSF Platform workload manager on MN3
  • 23. Spark4MN in action • We performed a System level Performance Evaluation & Tuning to MN3 • Example of some results: – Speed-up – Scale-up – Parallelism
  • 24. Example 1: Kmeans Speed-up More dimensions  smaller speed-up because of increased shuffling (same number of centroids to shuffle but bigger) • Times for running k-means for 10 iterations. • Problem size constant = 100GBs (10M1000D = 10M vectors of 1000 dimensions)
  • 25. Example 2: Kmeans Scale-up • modify both the number of records and the number of machines. • Ideally, all the plots should be horizontal  our system behaves closely to that.
  • 26. Example 3: Configuring task parallelism Varying the number of tasks over the same amount of cores for k-means, the best-performing configuration is to have as many partitions as cores = 1 task per core is better! • Median times for running k-means for 10 iterations with different number of partitions • In our benchmarks the number of tasks is equal to the number of RDD partitions.
  • 27. Example 3: Configuring task parallelism • Using Sort-by-key: a more intensive shuffling-intensive scenario – We sort 1 billion records using 64 nodes & different partition sizes – Contrary to the previous case, we observe speed-ups when there are 2 partitions per core
  • 28. Exemple 4: sort-by-key • How many concurrent tasks an executor can supervise? Having 2 8-core executors instead of 8 2-core ones, improves on the running time by a factor of 2.79 leaving all the other parameters the same.
  • 29. More results on Friday at the Santa Clara conference! 2015 IEEE International Conference on Big Data October 29-November 1, Santa Clara, CA, USA
  • 30. Spark and node level performance?
  • 31. New Architecture Support for Big Data Analytics Exponential increase in core count Never promising technologies (Hybrid Memory Cubes, NVRAM, etc)
  • 32. Our Research Goal Improve the node level performance of state-of-the-art scale-out data processing framework
  • 33. Speed-up vs Executor threads (*) Processor Intel Xeon E5-2697 (24 cores) & Spark 1.3
  • 34. Data Processing Capacity scaling at large input dataset The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time. Spark workloads do not saturate the available bandwidth and hence their performance is bound on DRAM latency
  • 35. More results on • A. J. Awan, M. Brorsson, V. Vlassov and E. Ayguade, "Performance Characterization of In- Memory Data Analytics on a Modern Cloud Server", in 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud), Aug 2015, Dalina, China (Best Paper Award) • A. J. Awan, M. Brorsson, V. Vlassov and E. Ayguade, "How Data Volume Affects Spark Based Data Analytics on a Scale-up Server", in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with 41st International Conference on Very Large Data Bases, Sep 2015, Hawaii, USA.
  • 36. Next generation of HPC programming models and Spark?
  • 37. BSC programming model COMPSs – Sequential programming model – Abstracts the application from the underlying distributed infrastructure – Exploit the inherent parallelism at runtime
  • 38. We are studying the comparison and interaction between these two programming models in platforms like marenostrum 3 Marenostrum Supercomputer Marenostrum Supercomputer
  • 39. Profiling Spark with BSC’s HPC tools • Relying on over 20 years HPC experience & tools for profiling • Preliminary work: Developed the Hadoop Instrumentation Toolkit CPU Memory Page Faults processes and communication
  • 40. Project ALOJA: Benchmarking Spark • Open initiative to Explore and produce a systematic study of Hadoop/Spark efficiency on different SW and HW • Online repository that allows compare, side by side all execution parameters ( 50,000+ runs over 100+ HW config.)
  • 41. Big Data Analytics workloads at BSC? (with Spark)
  • 42. Preliminary work • Multimedia Big Data Computing: Work with three kinds of data at the same time social network relationships audiovisual content metadata
  • 43. Preliminary case study Multimodal Data Analytics systems E.g. Latent User Attribute Inference to Predicting Desigual Followers
  • 44. 44 Example of tools created: Vectorization Necessary for visual similarity search, visual clustering, classification, etc.
  • 45. 45 Available in our github: bsc.spark.image scala> import bsc.spark.image.ImageUtils … scala> images = ImageUtils.seqFile("hdfs://...", sc); scala> dictionary = ImageUtils.BoWDictionary(images); scala> vectors = dictionary.getBags(images); … scala> val splits = vectors.randomSplit(Array(0.6, 0.4), seed = 11L) scala> training = splits(0) scala> test = splits(1) scala> model = NaiveBayes.train(training, lambda = 1.0) …
  • 46. Applications: Locality Sensitive Hashing e.g. near-replica detection (visual spam detection, copyright infringement) PATCH 1 PATCH 2 PATCH 3 PATCH 4 KP1 KP2 KP3 KP4 feature detection feature description 0000 0100 1100 0010 0110 1110 0011 0111 1111 features are sketched, embedded into a Hamming space Similar features are hashed into similar buckets in a hash table SIFT, SURF, ORB, etc. 0 1 1 0
  • 47. Current work: Computer Vision • Makes very productive use of (convolutional) neural networks • SIFT features became unnecessary (used for decades)
  • 48. What next at BSC?
  • 49. BSC vision: Giving computers a greater ability to understand information, and to learn, to reason, and act upon it
  • 50. Old wine in a new bottle? • the term itself dates from the 1950s. • periods of hype and high expectations alternating with periods of setback and disappointment. Artificial Intelligence plays an important role
  • 51. Why Now? 1. Along the explosion of data … now algorithms can be “trained” by exposing them to large data sets that were previously unavailable. 2. And the computing power necessary to implement these algorithms are now available
  • 52. Evolution of computing power FLOP/second 1988 Cray Y-MP (8 processadors) 1998 Cray T3E (1024 processadors) 2008 Cray XT5 (15000 processadors) ~2019 ? (1x107 processadors
  • 53. This new type of computing requires DATA Supercomputers Research Big Data Technologies Advanced Analytic Algorithms 1. the continuous development of supercomputing systems 2. enabling the convergence of advanced analytic algorithms 3. and big data technologies
  • 54. Today technologies & focus at BSC COMPUTER VISION Advanced Analytics Algorithms
  • 55. Cognitive Computing requires a transition of computing facilities into a new paradigm too Name? … We use Cognitive Computing Yesterday Today Tomorrow
  • 56. And to finish… Welcome to Barcelona!
  • 57. Welcome to our wonderful city 57
  • 58. Welcome to our university 22 schools - 4K employees - 35K students
  • 59. Welcome to our research center
  • 60. Welcome to our everyday life 60
  • 61. Welcome to our academic activities • Teaching Spark @ Master courses • Using Spark @ Final Master Thesis • Using Spark @ Research activity • NEW Spark Book in Spanish • Editorial UOC • Presentation November 3, 2015 61 Foreword by Matei Zaharia
  • 62. 1000+ members 62 Welcome to our Spark Community
  • 63. 1000+ members 63 Thank you for your attention! Jordi Torres @JordiTorresBCN www.JordiTorres.eu Welcome to our Spark Community