SlideShare a Scribd company logo
1 of 10
Dr. Stephan Schenk
Dr. Frank Heilmann
Combining Big Data and
HPC in a GRIDScaler
environment
BASF’s segments
Chemicals
Petrochemicals
Intermediates
Materials
Performance Materials
Monomers
Industrial
Solutions
Dispersions & Pigments
Performance Chemicals
Surface
Technologies
Catalysts
Coatings
Construction Chemicals*
Nutrition &
Care
Nutrition & Health
Care Chemicals
Agricultural
Solutions
* We are considering the possibility of merging our construction chemicals business with a strong partner, as well as the option of divesting this business. The
outcome of this review is open. The Construction Chemicals division will be reported under the Surface Technologies segment until signing of a transaction
agreement.
Integrating digital technologies into BASF’s R&D operations
will boost innovative power
Digital Capabilities
Data and knowledge management
Algorithms and statistical applications
Scientific modeling and simulation
Machine Learning
Research & Development
Hypothesis
Experiments
Analysis
Validation of models
This Photo by Unknown Author is licensed under CC BY-SA
1996 2000 2004 2008 2012 2016 2019
Supercomputing at BASF
PeakPerformance(GFLOPS)
BASF HPC history Quriosity Specifications
 Quriosity debuted at #65 in June 2017
with Rmax = 1.75 PFLOPS
 HPE Apollo 6000 Gen10, 888 nodes
 2x Intel® Xeon Gold 6148 („Skylake“)
 192/384/768/3072 GB RAM
 Intel® Omnipath interconnect
 DDN GRIDScaler 5 PByte (GPFS)
 Red Hat Enterprise Linux 7
 Altair PBSPro scheduler
Significant opportunity for BASF to establish leadership in R&D supercomputing
109
106
103
100
#1 among
TOP500 computers
largest computer
system in BASF
Quriosity
Apache Spark on Quriosity and Spectrum Scale:
Big-Data workflows to complement HPC
Example I: Image classification
Train
classifier
(HPC/AI)
Use classifier in a
Spark job on a huge
numbers of images
Apache Spark job can use
complete API
Spark job is scheduled and
runs like any other job
Job uses existing global
filesystem
Example II: Full-text indexing and text mining
Machine learning,
e.g. document
clustering
Full-text indexing
This Photo by Unknown author
is licensed under CC BY-ND.
This Photo by Unknown author is
licensed under CC BY-SA.
Deploying Apache Spark on an HPC system
 Deploy Spark in standalone mode (untar)
 Spin-up Spark cluster at beginning of HPC job
 Integration with PBS by setting appropriate
environment variables
 Spark job has complete API available
(Python, Scala, Libraries)
 Files can be accessed directly
sc.textFile("/gpfs/big_data")
sc.saveAsTextFile("/gpfs/results")
 Multi-node jobs require global filesystem of your
choice
#!/bin/bash
#PBS -l select=2:ncpus=40:mem=160GB
#PBS -l place=scatter:excl
#PBS –N spark-on-hpc
module load spark
# Spawn the Spark cluster
export SPARK_MASTER_HOST="$(hostname -f)"
export SPARK_MASTER_PORT="7077“
export SPARK_SLAVES="${PBS_NODEFILE}"
${SPARK_HOME}/sbin/start-all.sh
sparkmaster="spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}"
# Run the Spark script
${SPARK_HOME}/bin/spark-submit --master ${sparkmaster} script.py
# Teardown the Spark cluster
${SPARK_HOME}/sbin/stop-all.sh --wait
 Inspired by https://github.com/glennklockwood/hpchadoop
Experimenting with HDFS Transparency in Spectrum Scale
 HDFS Transparency
integrated with
Hortonworks HDP
Hadoop Applications
Spark MapReduce Hive HBase …….
Namespace hdfs://quriosity-hdfs:8020
Block Management using
Spectrum Scale HDFS NameNode
Spectrum Scale DataNode1 Spectrum Scale DataNode2
Namespace hdfs://native-hdfs:8020
Block Management using
native HDFS NameNode
Native HDFS DataNode3Native HDFS DataNode2Native HDFS DataNode1
ViewFS
Benchmarking HDFS Transparency on Quriosity
 Benchmark TestDFSIO executed on a single
compute node
 Consistent performance across all test data
sizes
 I/O rate essentially limited by 10G network used
for communication
10GB 20GB 30GB 40GB 50GB
Avg I/O Rate Write 854.38 861.69 860.52 862 866.59
Avg I/O Rate Read 906.7 904.39 890.99 876.82 892.98
0
100
200
300
400
500
600
700
800
900
1000
I/OrateinMB/s
Size of test files
Avg I/O Rate TestDFSIO
https://www.basf.com/supercomputer
Further information
Combining Big Data and HPC in a GRIDScalar Environment

More Related Content

What's hot

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationieeepondy
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityJAYAPRAKASH JPINFOTECH
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Yahoo Developer Network
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of SparkShao-Yen Hung
 
Hadoop development series(1)
Hadoop development series(1)Hadoop development series(1)
Hadoop development series(1)Amar kumar
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?ArangoDB Database
 
Quandl, r and power bi
Quandl, r and power biQuandl, r and power bi
Quandl, r and power biJohann Krugell
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 

What's hot (19)

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configuration
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
Cluj meetup bigdata-final-version
Cluj meetup bigdata-final-versionCluj meetup bigdata-final-version
Cluj meetup bigdata-final-version
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
 
Hadoop development series(1)
Hadoop development series(1)Hadoop development series(1)
Hadoop development series(1)
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?
 
Quandl, r and power bi
Quandl, r and power biQuandl, r and power bi
Quandl, r and power bi
 
DevTalks Bucharest
DevTalks BucharestDevTalks Bucharest
DevTalks Bucharest
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 

Similar to Combining Big Data and HPC in a GRIDScalar Environment

IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...inside-BigData.com
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC
 
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkBig Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkIRJET Journal
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)Durga Gadiraju
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14Armand Ruis
 

Similar to Combining Big Data and HPC in a GRIDScalar Environment (20)

IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkBig Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Combining Big Data and HPC in a GRIDScalar Environment

  • 1. Dr. Stephan Schenk Dr. Frank Heilmann Combining Big Data and HPC in a GRIDScaler environment
  • 2. BASF’s segments Chemicals Petrochemicals Intermediates Materials Performance Materials Monomers Industrial Solutions Dispersions & Pigments Performance Chemicals Surface Technologies Catalysts Coatings Construction Chemicals* Nutrition & Care Nutrition & Health Care Chemicals Agricultural Solutions * We are considering the possibility of merging our construction chemicals business with a strong partner, as well as the option of divesting this business. The outcome of this review is open. The Construction Chemicals division will be reported under the Surface Technologies segment until signing of a transaction agreement.
  • 3. Integrating digital technologies into BASF’s R&D operations will boost innovative power Digital Capabilities Data and knowledge management Algorithms and statistical applications Scientific modeling and simulation Machine Learning Research & Development Hypothesis Experiments Analysis Validation of models This Photo by Unknown Author is licensed under CC BY-SA
  • 4. 1996 2000 2004 2008 2012 2016 2019 Supercomputing at BASF PeakPerformance(GFLOPS) BASF HPC history Quriosity Specifications  Quriosity debuted at #65 in June 2017 with Rmax = 1.75 PFLOPS  HPE Apollo 6000 Gen10, 888 nodes  2x Intel® Xeon Gold 6148 („Skylake“)  192/384/768/3072 GB RAM  Intel® Omnipath interconnect  DDN GRIDScaler 5 PByte (GPFS)  Red Hat Enterprise Linux 7  Altair PBSPro scheduler Significant opportunity for BASF to establish leadership in R&D supercomputing 109 106 103 100 #1 among TOP500 computers largest computer system in BASF Quriosity
  • 5. Apache Spark on Quriosity and Spectrum Scale: Big-Data workflows to complement HPC Example I: Image classification Train classifier (HPC/AI) Use classifier in a Spark job on a huge numbers of images Apache Spark job can use complete API Spark job is scheduled and runs like any other job Job uses existing global filesystem Example II: Full-text indexing and text mining Machine learning, e.g. document clustering Full-text indexing This Photo by Unknown author is licensed under CC BY-ND. This Photo by Unknown author is licensed under CC BY-SA.
  • 6. Deploying Apache Spark on an HPC system  Deploy Spark in standalone mode (untar)  Spin-up Spark cluster at beginning of HPC job  Integration with PBS by setting appropriate environment variables  Spark job has complete API available (Python, Scala, Libraries)  Files can be accessed directly sc.textFile("/gpfs/big_data") sc.saveAsTextFile("/gpfs/results")  Multi-node jobs require global filesystem of your choice #!/bin/bash #PBS -l select=2:ncpus=40:mem=160GB #PBS -l place=scatter:excl #PBS –N spark-on-hpc module load spark # Spawn the Spark cluster export SPARK_MASTER_HOST="$(hostname -f)" export SPARK_MASTER_PORT="7077“ export SPARK_SLAVES="${PBS_NODEFILE}" ${SPARK_HOME}/sbin/start-all.sh sparkmaster="spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}" # Run the Spark script ${SPARK_HOME}/bin/spark-submit --master ${sparkmaster} script.py # Teardown the Spark cluster ${SPARK_HOME}/sbin/stop-all.sh --wait  Inspired by https://github.com/glennklockwood/hpchadoop
  • 7. Experimenting with HDFS Transparency in Spectrum Scale  HDFS Transparency integrated with Hortonworks HDP Hadoop Applications Spark MapReduce Hive HBase ……. Namespace hdfs://quriosity-hdfs:8020 Block Management using Spectrum Scale HDFS NameNode Spectrum Scale DataNode1 Spectrum Scale DataNode2 Namespace hdfs://native-hdfs:8020 Block Management using native HDFS NameNode Native HDFS DataNode3Native HDFS DataNode2Native HDFS DataNode1 ViewFS
  • 8. Benchmarking HDFS Transparency on Quriosity  Benchmark TestDFSIO executed on a single compute node  Consistent performance across all test data sizes  I/O rate essentially limited by 10G network used for communication 10GB 20GB 30GB 40GB 50GB Avg I/O Rate Write 854.38 861.69 860.52 862 866.59 Avg I/O Rate Read 906.7 904.39 890.99 876.82 892.98 0 100 200 300 400 500 600 700 800 900 1000 I/OrateinMB/s Size of test files Avg I/O Rate TestDFSIO

Editor's Notes

  1. Inspired by talk of Prof. Joel Zysman, Director, HPC, University of Miami at DDN User Group Meeting in 2017
  2. As of January 1, 2019, we have grouped our twelve divisions into six segments: The Chemicals segment will remain the cornerstone of our Verbund structure. It supplies the other segments with basic chemicals and intermediates, contributing to the organic growth of our key value chains. Alongside internal accounts, our customers include the chemical and plastics industries. We aim to increase our competitiveness through technological leadership and operational excellence. The Materials segment’s portfolio comprises advanced materials and their precursors for new applications and systems. These include isocyanates and polyamides as well as inorganic basic products and specialties for the plastics and plastics processing industries. We aim to grow organically through differentiation via specific technological expertise, industry know-how and customer proximity to maximize value in the isocyanate and polyamide value chains. The Industrial Solutions segment develops and markets ingredients and additives for industrial applications such as polymer dispersions, pigments, resins, electronic materials, antioxidants and admixtures. We aim to drive organic growth in key industries such as automotive, plastics or electronics and expand our position in value-enhancing ingredients and solutions by leveraging our comprehensive industry expertise and application know-how. The Surface Technologies segment comprises our businesses that offer chemical solutions on and for surfaces. Its portfolio includes coatings, rust protection products, catalysts and battery materials for the automotive and chemical industries. The aim is to drive organic growth by leveraging our portfolio of technologies and know-how, and to establish BASF as a leading and innovative provider of battery materials as well. In the Nutrition & Care segment, we strive to expand our position as a leading provider of nutrition and care ingredients for consumer products in the area of nutrition, home and personal care. Customers include food and feed producers as well as the pharmaceutical, cosmetics, detergent and cleaner industries. We aim to enhance and broaden our product and technology portfolio. Our goal is to drive organic growth by focusing on emerging markets, new business models and sustainability trends in consumer markets, supported by targeted acquisitions. The Agricultural Solutions segment aims to further strengthen our market position as an integrated provider of crop protection products and seeds. Its portfolio comprises fungicides, herbicides, insecticides and biological crop protection products, as well as seeds and seed treatment products. We also offer farmers digital solutions combined with practical advice. Our main focus is on innovation-driven organic growth, targeted portfolio expansion as well as leveraging synergies from the acquired businesses. Source: BASF Report 2018, page 19
  3. Benchmark with one compute node only I/O bandwidth is limited by 10G network