SlideShare a Scribd company logo
C.W. Smith, S. Tran, O. Sahni, and M.S. Shephard,
Rensselaer Polytechnic Institute
Raminder Singh
Indiana University
ramifnu@iu.edu
Enabling HPC Simulation Workflows
for Complex Industrial Flow
Parallel Data & Services
Domain Topology
Mesh Topology/Shape
Dynamic Load Balancing
Simulation Fields
Physics and Model Parameters Input Domain Definition with Attributes
PHASTA
Parasolid
or
GeomSim
MeshSim and
MeshSim Adapt
Paraview
Solution
Transfer
Hessian-based
error indicator
NS, FE
Level set
Solution transfer constraints
mesh with fields
mesh with
fields
calculated fields
mesh size
field
meshes
and fields
meshing
operation geometric
interrogation
Attributed
topology
non-manifold
model construction
geometry updates
mesh size
field
mesh
Partition Control
Complex Flow Simulations
Project challenges
High barrier to run HPC workflows
– Requires knowledge of file system
– scheduler
– scripting
– runtime environment
– compilers … - for each HPC system
Other Challenges
– Must have very high degree of automation –
human in the loop kills scalability and performance
– Need easy access to parallel computers
User specifies
• problem definition
• simulation parameters
• required compute resources
through experiment creation web page
• Workflow steps are executed on
HPC system
• user is emailed
• output is prepared for download
option to delete or archive
• Scales to multiple users and systems
Science gateway for PHASTA lowers the
barrier
• Used PHP Gateway framework with Airavata to
develop gateway and enable PHASTA application
• Setup a community account to support the
community
• Defining resources to run the application
– TACC Stampede
– CCI IBM Blue Gene.
• Define the PHASTA application.
PHASTA Solution
What is PGA?
• PGA is the sample gateway implemented to
demonstrate Airavata middleware features.
• You can download and use it as it is or modify it
according to your requirements.
• There is an Ansible script available and docker
image worked on by a GSOC Student.
• PGA is developed using PHP.
• Visit PGA at;
– https://testdrive.airavata.org/
2
Landing page
User login and creation
Resource definition
Application definition
Create Experiment
Monitor Experiment
Experiment Statistics
Other Examples using PGA
Gateway Features for Default User
• In the gateway default user can;
– Create and Launch Experiments.
– Monitor Experiments.
– Create Projects (Experiment grouping).
– Clone, Cancel and Edit Experiment.
– Report Issues & Provide Feedback.
6
Apache Airavata
• Address user requests
• Allow staging data from user desktop to
resource and vice-versa
• Tail on remote application logs
• User key generation and CCI user
accounts
Future work
Industry Challenge Talk Wednesday @ 4
Workflow Diagram for SEQC Transcriptome Assembly and Evaluation
Yes
Pre-processing, Input: Sequencing Reads FASTQ Files
• Adapter Trimming (cutadapt so ware)
• Poly A/T Trimming, and Removing mtRNA, rRNA (custom script)
• Error Correc on for RAN-Seq reads (SEECER)
Sta s cal comparison of all the ~60 assemblies (Sta s cal Tes ng for popula on of Assemblies)
• Novel Score: Efficiently Covered Bases for All Genes (EC-BAG) Score (Custom Script)
• Sta s cal Tes ng, e.g. ANOVA
Passed QC? (custom script needed to check the above QC
criteria, e.g.:
If (CEGMA_CEGs > 235) then CEGMA_flag = Passed)
Transcriptome Assemblies, Input: Trimmed Sequencing Reads FASTQ Files
• Assembling Samples A and B for six centers, using different replicate- combina ons (Trinity so ware)
• ~60 Transcriptome Assemblies
Genome Coverage – SNP Detec on for FASTQ Trimmed Input Reads
• Mapping Input Reads to the Reference Genome (TopHat so ware)
• SNP detec on (GATK so ware): Output Called SNP_Reads
• Genome Coverage, using Mapped Reads (featureCounts – R
Bioconductor Package)
Quality Control (QC), Input: Assembled Con gs Files (FASTA Format)
• DETONATE (DETONATE so ware, using human reference genome)
• CEGMA (CEGMA so ware)
• Assemblies sta s cal outputs (provided by Trinity for each assembly)
• Mapping reads back to the con gs (TopHat so ware)
Discard
the
Assembly
No
Genome Coverage – SNP Detec on for FASTA Assembled Con gs
• Mapping assembled con gs to the Reference Genome (GMAP so ware)
• SNP detec on (GATK so ware): Output Called SNP_Con gs
• Genome Coverage, using Mapped Con gs (featureCounts – R
Bioconductor Package)
SNP Compariosn
• Comparing Detected SNP_Con gs with dbSNP (Custom Script and SnpSi )
• Comparing Detected SNP_Reads with dbSNP (Custom Script and SnpSi )
Thanks!!!
Questions?
ramifnu@iu.edu
sgg@iu.edu
https://iu.box.com/xsede15

More Related Content

What's hot

Streamlining pipeline execution for large scale RNA-Seq analysis
Streamlining pipeline execution for large scale RNA-Seq analysisStreamlining pipeline execution for large scale RNA-Seq analysis
Streamlining pipeline execution for large scale RNA-Seq analysisDeepak Purushotham
 
A Source-To-Source Approach to HPC Challenges
A Source-To-Source Approach to HPC ChallengesA Source-To-Source Approach to HPC Challenges
A Source-To-Source Approach to HPC Challenges
Chunhua Liao
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
Olabode Ajayi
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Survey of Program Transformation Technologies
Survey of Program Transformation TechnologiesSurvey of Program Transformation Technologies
Survey of Program Transformation Technologies
Chunhua Liao
 
G1GC
G1GCG1GC
G1GC
koji lin
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 

What's hot (9)

Streamlining pipeline execution for large scale RNA-Seq analysis
Streamlining pipeline execution for large scale RNA-Seq analysisStreamlining pipeline execution for large scale RNA-Seq analysis
Streamlining pipeline execution for large scale RNA-Seq analysis
 
A Source-To-Source Approach to HPC Challenges
A Source-To-Source Approach to HPC ChallengesA Source-To-Source Approach to HPC Challenges
A Source-To-Source Approach to HPC Challenges
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Survey of Program Transformation Technologies
Survey of Program Transformation TechnologiesSurvey of Program Transformation Technologies
Survey of Program Transformation Technologies
 
G1GC
G1GCG1GC
G1GC
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 

Viewers also liked

Alex Esche Concept Aircraft Project
Alex Esche Concept Aircraft ProjectAlex Esche Concept Aircraft Project
Alex Esche Concept Aircraft ProjectAlex Esche
 
Insight and analysis 2016
Insight and analysis 2016Insight and analysis 2016
Insight and analysis 2016Oliver Ranson
 
LITTLE SALES HINGES GUIDE
LITTLE SALES HINGES GUIDELITTLE SALES HINGES GUIDE
LITTLE SALES HINGES GUIDERubin Wald
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
Amirul Shafeeq
 
RDS_Photoscan_Eval_Cloud
RDS_Photoscan_Eval_CloudRDS_Photoscan_Eval_Cloud
RDS_Photoscan_Eval_CloudRaminder Singh
 
REPORT BY SHUBHAM TRIPATHI
REPORT BY SHUBHAM TRIPATHIREPORT BY SHUBHAM TRIPATHI
REPORT BY SHUBHAM TRIPATHIShubham Tripathi
 
Passenger Transport Aircraft Concept Design-Final
Passenger Transport Aircraft Concept Design-FinalPassenger Transport Aircraft Concept Design-Final
Passenger Transport Aircraft Concept Design-FinalAlex Esche
 

Viewers also liked (10)

main_brochure
main_brochuremain_brochure
main_brochure
 
Alex Esche Concept Aircraft Project
Alex Esche Concept Aircraft ProjectAlex Esche Concept Aircraft Project
Alex Esche Concept Aircraft Project
 
Insight and analysis 2016
Insight and analysis 2016Insight and analysis 2016
Insight and analysis 2016
 
LITTLE SALES HINGES GUIDE
LITTLE SALES HINGES GUIDELITTLE SALES HINGES GUIDE
LITTLE SALES HINGES GUIDE
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
RDS_Photoscan_Eval_Cloud
RDS_Photoscan_Eval_CloudRDS_Photoscan_Eval_Cloud
RDS_Photoscan_Eval_Cloud
 
REPORT BY SHUBHAM TRIPATHI
REPORT BY SHUBHAM TRIPATHIREPORT BY SHUBHAM TRIPATHI
REPORT BY SHUBHAM TRIPATHI
 
Periscope
PeriscopePeriscope
Periscope
 
Passenger Transport Aircraft Concept Design-Final
Passenger Transport Aircraft Concept Design-FinalPassenger Transport Aircraft Concept Design-Final
Passenger Transport Aircraft Concept Design-Final
 

Similar to XSEDE15_PhastaGateway

XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
marpierc
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
SQUADEX
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
Valeriia Maliarenko
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
Maté Ongenaert
 
Scientific
Scientific Scientific
Scientific
marpierc
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
Chin Huang
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Hong ChangBum
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
PMIx Updated Overview
PMIx Updated OverviewPMIx Updated Overview
PMIx Updated Overview
Ralph Castain
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
confluent
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
Apache Apex
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
NECST Lab @ Politecnico di Milano
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
acelyc1112009
 

Similar to XSEDE15_PhastaGateway (20)

XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
SaileshKumar_Kumar_Resume
SaileshKumar_Kumar_ResumeSaileshKumar_Kumar_Resume
SaileshKumar_Kumar_Resume
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
Scientific
Scientific Scientific
Scientific
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
PMIx Updated Overview
PMIx Updated OverviewPMIx Updated Overview
PMIx Updated Overview
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
 

XSEDE15_PhastaGateway

  • 1. C.W. Smith, S. Tran, O. Sahni, and M.S. Shephard, Rensselaer Polytechnic Institute Raminder Singh Indiana University ramifnu@iu.edu Enabling HPC Simulation Workflows for Complex Industrial Flow
  • 2. Parallel Data & Services Domain Topology Mesh Topology/Shape Dynamic Load Balancing Simulation Fields Physics and Model Parameters Input Domain Definition with Attributes PHASTA Parasolid or GeomSim MeshSim and MeshSim Adapt Paraview Solution Transfer Hessian-based error indicator NS, FE Level set Solution transfer constraints mesh with fields mesh with fields calculated fields mesh size field meshes and fields meshing operation geometric interrogation Attributed topology non-manifold model construction geometry updates mesh size field mesh Partition Control Complex Flow Simulations
  • 3. Project challenges High barrier to run HPC workflows – Requires knowledge of file system – scheduler – scripting – runtime environment – compilers … - for each HPC system Other Challenges – Must have very high degree of automation – human in the loop kills scalability and performance – Need easy access to parallel computers
  • 4. User specifies • problem definition • simulation parameters • required compute resources through experiment creation web page • Workflow steps are executed on HPC system • user is emailed • output is prepared for download option to delete or archive • Scales to multiple users and systems Science gateway for PHASTA lowers the barrier
  • 5. • Used PHP Gateway framework with Airavata to develop gateway and enable PHASTA application • Setup a community account to support the community • Defining resources to run the application – TACC Stampede – CCI IBM Blue Gene. • Define the PHASTA application. PHASTA Solution
  • 6. What is PGA? • PGA is the sample gateway implemented to demonstrate Airavata middleware features. • You can download and use it as it is or modify it according to your requirements. • There is an Ansible script available and docker image worked on by a GSOC Student. • PGA is developed using PHP. • Visit PGA at; – https://testdrive.airavata.org/ 2
  • 8. User login and creation
  • 15. Gateway Features for Default User • In the gateway default user can; – Create and Launch Experiments. – Monitor Experiments. – Create Projects (Experiment grouping). – Clone, Cancel and Edit Experiment. – Report Issues & Provide Feedback. 6
  • 17. • Address user requests • Allow staging data from user desktop to resource and vice-versa • Tail on remote application logs • User key generation and CCI user accounts Future work
  • 18. Industry Challenge Talk Wednesday @ 4
  • 19. Workflow Diagram for SEQC Transcriptome Assembly and Evaluation Yes Pre-processing, Input: Sequencing Reads FASTQ Files • Adapter Trimming (cutadapt so ware) • Poly A/T Trimming, and Removing mtRNA, rRNA (custom script) • Error Correc on for RAN-Seq reads (SEECER) Sta s cal comparison of all the ~60 assemblies (Sta s cal Tes ng for popula on of Assemblies) • Novel Score: Efficiently Covered Bases for All Genes (EC-BAG) Score (Custom Script) • Sta s cal Tes ng, e.g. ANOVA Passed QC? (custom script needed to check the above QC criteria, e.g.: If (CEGMA_CEGs > 235) then CEGMA_flag = Passed) Transcriptome Assemblies, Input: Trimmed Sequencing Reads FASTQ Files • Assembling Samples A and B for six centers, using different replicate- combina ons (Trinity so ware) • ~60 Transcriptome Assemblies Genome Coverage – SNP Detec on for FASTQ Trimmed Input Reads • Mapping Input Reads to the Reference Genome (TopHat so ware) • SNP detec on (GATK so ware): Output Called SNP_Reads • Genome Coverage, using Mapped Reads (featureCounts – R Bioconductor Package) Quality Control (QC), Input: Assembled Con gs Files (FASTA Format) • DETONATE (DETONATE so ware, using human reference genome) • CEGMA (CEGMA so ware) • Assemblies sta s cal outputs (provided by Trinity for each assembly) • Mapping reads back to the con gs (TopHat so ware) Discard the Assembly No Genome Coverage – SNP Detec on for FASTA Assembled Con gs • Mapping assembled con gs to the Reference Genome (GMAP so ware) • SNP detec on (GATK so ware): Output Called SNP_Con gs • Genome Coverage, using Mapped Con gs (featureCounts – R Bioconductor Package) SNP Compariosn • Comparing Detected SNP_Con gs with dbSNP (Custom Script and SnpSi ) • Comparing Detected SNP_Reads with dbSNP (Custom Script and SnpSi )

Editor's Notes

  1. Need to use multiple CAD and CAE tools
  2. CAD and CAE (Computer Aided Engineering) tools. It includes Finite Element Analysis (FEA), Computational Fluid Dynamics (CFD).