Dr. Stephan Schenk and Dr. Frank Heilmann discuss combining big data and high-performance computing (HPC) using GRIDScaler at BASF. BASF aims to boost its innovative power by integrating digital technologies like machine learning, algorithms, and scientific modeling into its research and development operations. BASF has a history of supercomputing and currently uses its Quriosity supercomputer along with DDN GRIDScaler and IBM Spectrum Scale to run Apache Spark jobs alongside HPC workloads, enabling new big data workflows.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Combining Big Data and HPC in a GRIDScalar Environment
1. Dr. Stephan Schenk
Dr. Frank Heilmann
Combining Big Data and
HPC in a GRIDScaler
environment
2. BASF’s segments
Chemicals
Petrochemicals
Intermediates
Materials
Performance Materials
Monomers
Industrial
Solutions
Dispersions & Pigments
Performance Chemicals
Surface
Technologies
Catalysts
Coatings
Construction Chemicals*
Nutrition &
Care
Nutrition & Health
Care Chemicals
Agricultural
Solutions
* We are considering the possibility of merging our construction chemicals business with a strong partner, as well as the option of divesting this business. The
outcome of this review is open. The Construction Chemicals division will be reported under the Surface Technologies segment until signing of a transaction
agreement.
3. Integrating digital technologies into BASF’s R&D operations
will boost innovative power
Digital Capabilities
Data and knowledge management
Algorithms and statistical applications
Scientific modeling and simulation
Machine Learning
Research & Development
Hypothesis
Experiments
Analysis
Validation of models
This Photo by Unknown Author is licensed under CC BY-SA
4. 1996 2000 2004 2008 2012 2016 2019
Supercomputing at BASF
PeakPerformance(GFLOPS)
BASF HPC history Quriosity Specifications
Quriosity debuted at #65 in June 2017
with Rmax = 1.75 PFLOPS
HPE Apollo 6000 Gen10, 888 nodes
2x Intel® Xeon Gold 6148 („Skylake“)
192/384/768/3072 GB RAM
Intel® Omnipath interconnect
DDN GRIDScaler 5 PByte (GPFS)
Red Hat Enterprise Linux 7
Altair PBSPro scheduler
Significant opportunity for BASF to establish leadership in R&D supercomputing
109
106
103
100
#1 among
TOP500 computers
largest computer
system in BASF
Quriosity
5. Apache Spark on Quriosity and Spectrum Scale:
Big-Data workflows to complement HPC
Example I: Image classification
Train
classifier
(HPC/AI)
Use classifier in a
Spark job on a huge
numbers of images
Apache Spark job can use
complete API
Spark job is scheduled and
runs like any other job
Job uses existing global
filesystem
Example II: Full-text indexing and text mining
Machine learning,
e.g. document
clustering
Full-text indexing
This Photo by Unknown author
is licensed under CC BY-ND.
This Photo by Unknown author is
licensed under CC BY-SA.
6. Deploying Apache Spark on an HPC system
Deploy Spark in standalone mode (untar)
Spin-up Spark cluster at beginning of HPC job
Integration with PBS by setting appropriate
environment variables
Spark job has complete API available
(Python, Scala, Libraries)
Files can be accessed directly
sc.textFile("/gpfs/big_data")
sc.saveAsTextFile("/gpfs/results")
Multi-node jobs require global filesystem of your
choice
#!/bin/bash
#PBS -l select=2:ncpus=40:mem=160GB
#PBS -l place=scatter:excl
#PBS –N spark-on-hpc
module load spark
# Spawn the Spark cluster
export SPARK_MASTER_HOST="$(hostname -f)"
export SPARK_MASTER_PORT="7077“
export SPARK_SLAVES="${PBS_NODEFILE}"
${SPARK_HOME}/sbin/start-all.sh
sparkmaster="spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}"
# Run the Spark script
${SPARK_HOME}/bin/spark-submit --master ${sparkmaster} script.py
# Teardown the Spark cluster
${SPARK_HOME}/sbin/stop-all.sh --wait
Inspired by https://github.com/glennklockwood/hpchadoop
Inspired by talk of Prof. Joel Zysman, Director, HPC, University of Miami at DDN User Group Meeting in 2017
As of January 1, 2019, we have grouped our twelve divisions into six segments:
The Chemicals segment will remain the cornerstone of our Verbund structure. It supplies the other segments with basic chemicals and intermediates, contributing to the organic growth of our key value chains. Alongside internal accounts, our customers include the chemical and plastics industries. We aim to increase our competitiveness through technological leadership and operational excellence.
The Materials segment’s portfolio comprises advanced materials and their precursors for new applications and systems. These include isocyanates and polyamides as well as inorganic basic products and specialties for the plastics and plastics processing industries. We aim to grow organically through differentiation via specific technological expertise, industry know-how and customer proximity to maximize value in the isocyanate and polyamide value chains.
The Industrial Solutions segment develops and markets ingredients and additives for industrial applications such as polymer dispersions, pigments, resins, electronic materials, antioxidants and admixtures. We aim to drive organic growth in key industries such as automotive, plastics or electronics and expand our position in value-enhancing ingredients and solutions by leveraging our comprehensive industry expertise and application know-how.
The Surface Technologies segment comprises our businesses that offer chemical solutions on and for surfaces. Its portfolio includes coatings, rust protection products, catalysts and battery materials for the automotive and chemical industries. The aim is to drive organic growth by leveraging our portfolio of technologies and know-how, and to establish BASF as a leading and innovative provider of battery materials as well.
In the Nutrition & Care segment, we strive to expand our position as a leading provider of nutrition and care ingredients for consumer products in the area of nutrition, home and personal care. Customers include food and feed producers as well as the pharmaceutical, cosmetics, detergent and cleaner industries. We aim to enhance and broaden our product and technology portfolio. Our goal is to drive organic growth by focusing on emerging markets, new business models and sustainability trends in consumer markets, supported by targeted acquisitions.
The Agricultural Solutions segment aims to further strengthen our market position as an integrated provider of crop protection products and seeds. Its portfolio comprises fungicides, herbicides, insecticides and biological crop protection products, as well as seeds and seed treatment products. We also offer farmers digital solutions combined with practical advice. Our main focus is on innovation-driven organic growth, targeted portfolio expansion as well as leveraging synergies from the acquired businesses.
Source: BASF Report 2018, page 19
Benchmark with one compute node only
I/O bandwidth is limited by 10G network