SlideShare a Scribd company logo
1 of 80
Pictures from: https://www.slideshare.net/AdrinBezOrtega/genome-big-data, http://astro-icore.phys.huji.ac.il/node/70,
https://timesofindia.indiatimes.com/city/pune/ligo-observatorys-work-to-begin-in-2018-land-acquisition-
underway/articleshow/60882081.cms, https://www.yumpu.com/en/document/view/3703243/large-geoscience-databases-big-data
 Introduction
 Big Data Genome analysis
 Big DataAnalysis Framework
 Big data application and genome sequence
 De novo Genome assembly
 De novo Genomic error correction
 Big data cyberinfrastructure
 Evaluation of different cluster
 Model for optimally balanced cluster
 Introduction
 Big Data Genome analysis
 Big DataAnalysis Framework
 Big data application and genome sequence
 De novo Genome assembly
 De novo Genomic error correction
 Big data cyberinfrastructure
 Evaluation of different cluster
 Model for optimally balanced cluster
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
Sep-01
Sep-02
Sep-03
Sep-04
Sep-05
Sep-06
Sep-07
Sep-08
Sep-09
Sep-10
Sep-11
Sep-12
Sep-13
Sep-14
Sep-15
Cost per Genome
Moore’s Law
Genome
Short reads
(TBs)
Reconstructed
genome
(few MB/GB)NGS HPC
 NGS technologies
 Outpaced Moore’s Law
 Software Challenges
 Extreme scalabality
 Algorithmic complexity
 HPC Platform Challenges
 More compute cycles
 Extreme I/O performance
 Huge Storage Space
I/O, Compute
and memory-
intensive
application
Data collected from https://www.genome.gov/sequencingcosts/
 MapReduce
 Hadoop
 Vertex-centric graph Processing
 Giraph,GraphX
 Distributed NoSQL
 In-memory: Hazelcast, Redis
 Disk-based: Hbase, MongoDB
 Cost decreases
 Bandwidth increases
1.00E+10
1.00E+11
1.00E+12
1.00E+13
1.00E+14
1.00E+15
1.00E+16
1.00E+17
1993
1994
1997
2000
2003
2005
2007
2009
2011
2012
Increase in FLOPS of fastest
supercomputer
1
10
100
1000
10000
1995
1997
1999
2001
2003
2005
2007
2009
2011
Increase in Bandwidth (MB/s) for storage
and network
I/O bandwidth per device
Network bandwidth per cable
 Hardware evolution
 Processor
 Storage
 Network
 Introduction
 Big Data Genome analysis
 Big DataAnalysis Framework
 Big data application and genome sequence
 De novo Genome assembly
 De novo Genomic error correction
 Big data cyberinfrastructure
 Evaluation of different cluster
 Model for optimally balanced cluster
Big Data Genome Assembly
Picture from http://www.slideshare.net/torstenseemann/approaches-to-analysing-1000s-of-bacterial-isolates-iceid-2015-
atlanta-usa-mon-24-aug-2015
 High throughput sequencing machine
 High coverage to get desired accuracy
 Complex algorithms
 Data, Compute, and Memory intensive application
 Like restoring a damaged book from multiple copies torn at random places
 The problem can be mapped as a graph analytic problem: De Bruijn Graph
 Modified version of parallel list ranking algorithm
 Mark head (h) and tail (t) and merge the h-t link
 Number of rounds: O(log |n|) [where, n: #vertices in the longest path]
Round
#1
Round
#2
Round
#3
 Tips:Vertices with only one incoming edge, no outgoing edge
and length < 100bp
Round
#4
Round
#5
Round
#6
 Bubbles:Vertices with same predecessor and same successor
 Levenshtein like edit distance algorithm is used
 If the distance is less than a threshold, vertex with minimum
frequency is removed
Round
#7
Round
#8
Staphylococc
us Aureus
Rhodobactor
Spharoides
Human
Chromosome
(HCR 13)
Yoruban
Male
Source GAGE GAGE GAGE SRA000271
Read size (bp) 255*106 410*106 5.9*109 141.5*109
Read length (bp) 37 & 101 101 101 101
Total Reads 4791974 4105236 59414772 2-Billion
Ref. Genome
Size
2871915 4603060 88289540 3.3*109
Dataset
(GigaByte)
0.3 0.65 10.0 452.0
GAGE: Genome Assembly Goldstandard Evaluation (http://gage.cbcb.umd.edu/)
NCBI Open dataset
S. Sureus (325MB) GiGA ABySS Contrail
# Contigs 298 300 309
Corrected NG50 26725 24819 31332
NG50 count 34 38 30
Max Contig size 95737 125049 95737
Misassembled contigs 0 4 0
 GiGA vs ABySS: shows higher NG50, lower missassembled contigs
 GiGA vs Contrail: almost comparable accuracy
R.Spharoides (650MB) GiGA ABySS Contrail
# Contigs 737 1912 727
Corrected NG50 10804 4215 11718
NG50 count 134 283 126
Max Contig size 65538 54734 51683
Missassembled contigs 1 78 1
HCR13 (10GB) GiGA ABySS Contrail
# Contigs 76049 51790 76209
Corrected
NG50
658 1269 700
NG50 count 33271 16643 34223
Max Contig
size
19446 30053 19321
Missassembled
contigs
3 17 3
 GiGA vs ABySS: shows higher NG50, lower missassembled contigs
 GiGA vs Contrail: almost comparable accuracy
 XSEDE resource
LSU SuperMic HPC cluster is used
Maximum #nodes 128
Cores/node 20 (Two 10-core Intel IvyBridge)
DRAM/node 64GB
Disk/node 250GB (Hard disk drive)
Network 56Gbps InfiniBand
 ABySS processes failed many times for network issues
 Contrail, being disk-based took more than maximum allocated
time for a single job
GiGA ABySS Contrail
# Contigs 3032297 - -
NG50 827 - -
Max Contig size 35465 - -
# Cores 512 - -
Time (hour) 8.5 Failed Failed
Genomic Error Correction
 Higher read length
 Better genome finishing
 More complete genome assembly
 Higher error rate
 Higher cost
Illumina short reads PacBio long reads
Platform characteristics High throughput SMRT
Read length 100 – 250bp 5kbp – 20kbp
Cost $0.03/mbp $0.30/mbp
Error rate 1-2% 10-15%
Error characteristics Substitution InDel
 De Bruijn graph-based method
 More scalable than overlap-based method
 Widest path algorithm provides accuracy
 Theory behind using the widest path algorithm
Assume K-mer coverage as random variable IID (F(x))
Theory of minimum probability distribution (1-(1-F(x))n)
Proof:The probability of the minimum coverage k-mer is highest in
the erroneous read given many reads sequenced the same region of
the genome
 There may be many error k-mers with high coverage
 But the probability of finding minimum coverage is significantly
higher in the error path
 Hence, a widest path algorithm is used to select the correct path
 Hadoop (MapReduce): for computation
 Hazelcast (In-memory NoSQL): for de Bruijn graph storage
 Map: Emits three k-mers
 First k-mer: incoming edge
 Middle k-mer: vertex
 Third k-mer: outgoing edge
 Coverage of middle: 1
 Reduce
 Group by vertex
 Aggregate incoming edges
 Aggregate outgoing edges
 Sum coverages
 Hadoop with Hazelcast
 Error detection
 Hadoop Map-only job
 k-mer coverage < threshold  Error k-mer
Millions of Searches
over the entire
dataset
 Widest path algorithm
 Maximize the minimum k-mer coverage in the path of the de Bruijn
graph
 Modified version of the Dijkstra’s Algorithm
 Similar time complexity
PacBio Data #Reads Data Size
(GB)
Read length %Reads
Aligned
E. coli 1129576 1.032 1120 78.97
Yeast 2315594 0.53 5874 82.12
Fruit fly 6701498 55 4328 51.14
Human 23897260 312 6587 72.3
Illumina
Data
#Reads Data Size
(GB)
Read length %Reads
Aligned
E. coli 45440200 13.50 101 99.44
Yeast 4503422 1.20 101 93.75
Fruit fly 179363706 59 101 95.56
Human 1420689270 452 101 79.60
 %Read aligned: Percentage of corrected long reads and the base pairs
aligned to the reference genome
 %ReadsAligned = AlignedReads /TotalReads * 100
 %Base pairs aligned: Percentage of base pairs (of total base pairs) of
corrected long reads and the aligned to the reference genome
 %BasePairAligned = AlignedBases /TotalBases * 100
 Widest path (WP): Select the path in the de Bruijn graph which
maximizes the minimum k-mer coverage
 Leverages the coverage information while correcting the error
 Dijkstra’s shortest path (SP): Select the shortest path without taking
any coverage information
 Coverage information is used only when the de Bruijn graph is
constructed  K-mers below a threshold is removed from the graph
 1-step Greedy (Gr): Select the successor k-mer with highest coverage
 High chance of selecting the wrong path
 Stopped after a predefined number of hops
 Widest path shows the best performance
 Greedy algorithm shows the worst performance
 K is set to 15
Data Algorithm %Read aligned %Base pair
aligned
E. coli ParLECHWP 93.69 92.15
ParLECH sp 87.55 86.49
ParLECH Gr 76.68 70.92
Yeast ParLECHWP 86.07 89.31
ParLECH sp 84.92 86.44
ParLECH Gr 75.77 74.68
Fruit fly ParLECHWP 65.92 62.42
ParLECH sp 54.53 49.41
ParLECH Gr 43.97 37.44
 ParLECH aligned more reads and basepairs to the reference
genome comparing to LoRDEC
 K is set to 15
Data Algorithm %Read aligned %Base pair
aligned
E. coli ParLECH 93.69 92.15
LoRDEC 87.55 86.49
Original 78.97 75.07
Yeast ParLECHWP 86.07 89.31
LoRDEC 84.92 87.08
Original 82.12 88.69
Fruit fly ParLECHWP 65.92 62.42
LoRDEC 54.53 49.69
Original 51.14 46.04
 XSEDE resource
LSU SuperMic HPC cluster is used
Maximum #nodes 128
Cores/node 20 (Two 10-core Intel IvyBridge)
DRAM/node 64GB
Disk/node 250GB (Hard disk drive)
Network 56Gbps InfiniBand
 LoRDEC performs better in single node
 ParLECH outperforms when multiple nodes are added
1 2 4 8 16 32
#Nodes
Executiontime(min)
01020304050 ParLECH
LoRDEC
 Almost linear scalability
16 32 64 128
1020501002005002000
Number of Nodes in log scale
Executiontimeinlogscale(min)
KmerCount
LocateError
CorrectError
Total
 A total of 764GB data is processed
 Appreciable accuracy
 LoRDEC could not process
 Could not produce the de Bruijn graph
PacBio data size 312GB
Illumina data size 452GB
#nodes used 128
k 17
Time 28.6 hours
%Read aligned 78.3
%base pair aligned 75.43
 Desired software characteristics for big data genome analysis
 Distributed
 Scalable
 Low cost
 Consider data locality
 Capable to work on commodity hardware
 Develop algorithms using of big data analytics model
 Better performance than other MPI-based software on traditional
HPC environment
Can we get better performance by
changing the hardware infrastructure?
 Introduction
 Big Data Genome analysis
 Big DataAnalysis Framework
 Big data application and genome sequence
 De novo Genome assembly
 De novo Genomic error correction
 Big data cyberinfrastructure
 Evaluation of different cluster
 Model for optimally balanced cluster
Evaluating Different Distributed-HPC
Infrastructure for Data-Driven Science
 Network issues
 Fat tree architecture with Blocking (2:1)
 Low effective bandwidth  Current
programming models needs bandwidth
 Storage issues
 Fewer directly attached device
(normally hard disk drive)
 Low I/O bandwidth  Big data job
becomes I/O bound
 Memory issues
 Low RAM per core  Significant
tradeoff between the degree of data-
parallelism and memory requirement
 Low buffer size increases data spilling
to disks  Causes significant
performance drop with HDD
SuperMike
II
SwatIII-
Basic
SwatIII-
Basic-SSD
SwatIII
Memory
SwatIII-
Scaleup
SwatIII-
Medium
Cluster
Category
HPC-
Cluster
Scaled out
datacenter
Scaled out
datacenter
Memory
Optimized
datacenter
Scaled up
datacenter
Medium
sized
datacenter
#Pcores/n
ode
16 16 16 16 16 16
DRAM(GB)
/node
32 32 32 256 256 64
#Disks/no
de
1-HDD 1-HDD 1-SSD 1-SSD 7HDD/SSD 2HDD/SSD
Network 40Gbps
QDR IB
10Gbps
Eth.
10Gbps
Eth.
10Gbps
Eth.
10Gbps
Eth.
10Gbps
Eth.
Bumble
bee
Job type Input Final
output
#Jobs Shuffled
data
HDFS
data
Graph
construc
tion
Hadoop 90GB
(500M
reads)
95GB 2 2TB 136GB
Graph
simplific
ation
Series of
Giraph
jobs
95GB
(715M
vertices)
640MB
(62K
vertices)
15 - 966GB
 #Nodes used:
 SuperMikeII: 15
 SwatIII-Basic-HDD/SSD: 15
 SwatIII-Memory: 15
Human Job type Input Final
output
#Jobs Shuffled
data
HDFS
data
Graph
construc
tion
Hadoop 452GB
(2B
reads)
3TB 2 9.9TB 3.2TB
Graph
simplific
ation
Series of
Giraph
jobs
3TB
(1.5B
vertices)
3.8GB
(3M
vertices)
15 - 4.1TB
0
0.5
1
1.5
Graph
construction
Graph
simplification
entire pipeline
Executiontimenormalizedto
SuperMikeII
Assembly stagesAxis
Effect of Network(InfiniBand vs Ethernet) to
assemble 90GB Bumble Bee Genome
SuperMikeII SwatIII-Basic-HDD
 40Gbps IB + 2:1 blocking vs 10Gbps Eth. + no blocking  Similar performance
 SSD vs HDD  Hadoop shows 50% improvement
 256GB vs 32GB DRAM  Hadoop shows 70% and Giraph shows 35%
improvement
1.012 1.033 1.025
0
0.5
1
1.5
Graph
construction
Graph
simplification
Entire pipeline
Executiontimenormalizedto
SuperMikeII
Assembly stages
Effect of storage type (HDD vs SSD) and size of
RAM while assembling Bumble Bee Genome
SuperMikeII SwatIII-Basic-SSD SwatIII-Memory
0.5
0.3
0.96
0.65 0.790.67
0
1
2
3
GraphConstruction GraphSimplification EntirePipeline
Performance/$
normalizedto
SuperMikeII
Assembly stages
Performance/$ with bumble bee (90GB) genome assembly
0
1
2
3
GraphConstruction GraphSimplification EntirePipeline
Executiontime
normalizedto
SuperMikeII
Execution time for 90GB bumble bee genome
assembly
 Scaled up cluster
 More execution time
 More Performance/$
 HDD and SSD shows
almost same
execution time
 HDD shows better
Performance/$ than
HDD
0
2
4
6
Graph
construction
Graph
simplification
Entire pipeline
Performance/$
normalizedto
SuperMikeII
Performance/$ for human genome
0
0.5
1
1.5
Graph
construction
Graph
simplification
Entire pipeline
Executiontime
normalizedto
SuperMikeII
Execution time for human genome (452GB)  Fewer scaled up
server
 Better than traditional
HPC cluster (3-4x
benefit in
performance/$)
 HDD performs similar
as SSD
 HDD shows better
performance/$ than
SSD
1.006 1.128 0.898 1.023 0.999 1.077
3.17
4.36 3.88
4.79
3.65
4.21
 1-SSD performs similar to 4-HDD
 Disk controller saturates at ~500MB/s
 Adding more disks (HDD/SSD) does not improve
performance any more
0
1000
2000
3000
4000
5000
6000
7000
1HDD/DataNode 2HDD/DataNode 4HDD/DataNode 1SSD/DataNode 2SSD/DataNode
Executiontime(s)
#DAS/DN and type
5740
4429
3333
2939 2732
0
1
2
3
GraphConstruction GraphSimplification EntirePipeline
Performance/$
normalizedto
SuperMikeII
Performance/$ with Bumble Bee Genome assembly
 Hyperscale system prototype
 32 low-power node: 2 cores, 1 SSD and 16GB RAM/node
 10% better performance than SuperMikeII (16 cores, 1 HDD and
32GB RAM/node)
 More than twice improvement in performance/$
0.8
0.85
0.9
0.95
1
1.05
GraphConstruction GraphSimplification EntirePipeline
Executiontime
normalizedto
SuperMikeII
Execution time for 90GB Bumble Bee Genome Assembly
0.93
0.89 0.90
2.16 2.24 2.215
 Increase compute bandwidth
 Power8 processor has 8 SMT
 16 memory controllers
 Increase I/O bandwidth
 Many HDD per node
 I/O and compute distribution on
SMT
 Increase Network bandwidth
 Clos connection with No blocking
 Intel’s Knights Landing (KNL) cluster
 Low energy consumption
 Knights landing processor with lower clock speed
 Increased compute and I/O parallelism
 4 SMT (instead of 2 hyperthread)
 Non volatile RAM high bandwidth flash memory
 NvidiaGPU cluster
 General Purpose GPU (GP GPU)
 Work in conjunction with Intel or IBM Power8 processor
 NVLink (High speed connection between IBM Pow)
 Limitations in traditional HPC cluster and Data Center
 Network
 Storage
 Memory
 Huge tradeoff between performance and cost
 How to model these observation to develop optimal
cluster architecture?
A Theoretical Model to Build Cost-
Effective Balanced HPC
Infrastructure
for Data-Driven Science
 Amdahl’s I/O number for balanced system
 1-bit (0.125-Byte) of I/O per second per IPS
 Amdahl’s memory number for balanced system
 1-byte of memory per IPS
 Limitation
One-size-fit-all: does not consider the impact
of the application’s characteristics
 Modified Amdahl’s I/O number
 8 MIPS/MBPS I/O
 On the relevant application
 Modified Amdahl’s memory number
 The MB/MIPS ratio is rising from 1 to 4
 Limitation
 Does not consider the cost component
 Observations only: no theoretical background
 Modify Amdahl’s I/O number, 𝛽𝑖𝑜
𝑜𝑝𝑡 = 𝑓𝑖𝑜(𝛾𝑖𝑜, 𝛿𝑖𝑜)
 Modify Amdahl’s mem. number, 𝛽 𝑚𝑒𝑚
𝑜𝑝𝑡 = 𝑓 𝑚𝑒𝑚 𝛾 𝑚𝑒𝑚, 𝛿 𝑚𝑒𝑚
 Ignores overlap of work done by I/O and memory
 Ignores the CPU micro architecture
 Consider the number of instruction executed per cycle (IPC) as
proportional toCPU core frequency
Cluster SuperMikeII SwatIII CeresII
Processor 2 8core Xeon 2 8core Xeon 1 6core Xeon
CPU core speed 2.6 GHz 2.6 GHz 2GHz
#Cores/node 16 16 6
Total CPU speed/node 41.6GHz 41.6GHz 12GHz
#Disks/node 1-HDD (SATA) 4-HDD (SATA) 1-SSD (NVMe)
Seq. I/O bandwidth/disk 0.15GBPS 0.15GBPS 2GBPS
Seq. I/O bandwidth/node 0.15GBPS 0.60GBPS 2GBPS
DRAM/node 32GB 256GB 64GB
Max. nodes available 128 16 40
Cluster SuperMikeII SwatIII CeresII
Cluster type Traditional HPC Datacenter MicroBrick
𝛽𝑖𝑜 0.003 0.015 0.166
𝛽 𝑚𝑒𝑚 0.77 6.15 5.33
𝛾𝑖𝑜 0.0005 0.01 1.03
𝛾 𝑚𝑒𝑚 0.06 1.47 1.25
Optimized for Only compute-
intensive
application
Compute- and
Memory-
intensive
application
I/O-, Compute-
and Memory-
intensive
Application
Application Terasort Wordcount Genome
Assembly
Ph1
Genome
Assembly
Ph2
Job type Hadoop Hadoop Hadoop Giraph
Input 1TB 1TB 452GB (2bn
short reads)
3.2TB (1.5bn
vertices)
Output 1TB 1TB 3TB 3.8GB
Shuffled data 1TB 1TB 9.9TB -
Application
Characteristics
Map: CPU-
intensive,
Reduce:
I/O-intensive
Map and
Reduce: I/O
and CPU-
Intensive
Map and
Reduce: CPU-
and
I/O-intensive
Memory-
Intensive
 Lower is better (Price-to-Performance of SuperMikeII is
considered as 1)
 CeresIIVs. SuperMikeII: >65% improvement for both
 CeresIIVs. SwatIII: >50% improvement for both
0
0.2
0.4
0.6
0.8
1
1.2
TeraSort WordCount
Price-to-Performance
(normalizedtoSuperMike2
Application
SuperMikeII
SwatIII
CeresII
0.76
0.37
0.79
0.35
 Lower is better (Price-to-Performance of SuperMikeII is
considered as 1)
 CeresIIVs. SuperMikeII: 88% and 85% for phase-1 and phase-2
respectively
 CeresII vs. SwatII: 50% and 20% for phase-1 and phase-2
respectively
0
0.2
0.4
0.6
0.8
1
1.2
Graph Construction Graph Simplification
Price-to-Performance
(normalizedto
SuperMike2)
Application
0.24
0.12
0.22
0.15
 For data-driven application with current H/W price
 Amdahl’s I/O number (𝛽𝑖𝑜
𝑜𝑝𝑡) should be increased compared
to Gray’s law (0.125 to 0.17)
 Amdahl’s memory number (𝛽 𝑚𝑒𝑚
𝑜𝑝𝑡) should be decreased
compared to Gray’s law (4 to 2.7)
 For HPC clusters
 𝛽𝑖𝑜 and 𝛽 𝑚𝑒𝑚 provide an easy-to-use alternative for
FLOPS for I/O- and memory-bound applications
Informed choice among hardware components
during investing on HPC cluster when application
characteristics are not known
 Application of Deep Learning and AI methodologies on
genomics
 Key-Value Memory Network
 Metagenomic Assembly and error correction
 Transfer big genomic data on Blockchain
 Security of the sensitive data
 High throughput
 Current collaboration
 SanDiego Supercomputing center
 IBM OpenPower
 Thanks to the Faculty and staff LSU and UW Platteville
 Dr. Seung-Jong Park, Dr. Kisung lee, Dr. SeungwonYang, Dr.
JianhuaChen, Dr. Praveen Koppa, Dr. SayanGoswami, Dr. Richard
Platania, Dr. Chui hui Chiu, Dipak Singh, Dr. Lisa Landgraf, etc.
 Samsung SSD team
▪ Jaeki Hong, Jay Seo, Jinki Kim,WooseokChang, etc.
 IBM Power8 and Open-PowerTeam
▪ Terry Leatherland, Ravi Arimilli,Ganesan Narayanswami, etc.
 Other collaborators
▪ Dr. Ling Liu (GATECH)
 Bioscience experts
▪ Dr. Joohyun Kim, Dr. Nayong Kim, Dr. Maheshi Dassanayake,
Dr. Dong-Ha Oh, etc.
 This work was supported in part by
 NIH-P20GM103424
 NSF-MRI-1338051
 NSF-CC-NIE-1341008
 NSF-IBSS-L-1620451
 LA BoR LEQSF(2016-19)-RD-A-08
 The HPC services are provided by
 LSU HPC
 LONI
 Samsung Research S. Korea
 IBM Research Austin
 “Developing a Meta Framework for Key-Value Memory Networks on HPC Clusters”
ChoonhanYoun, Arghya Kusum Das, SeungwonYang, Joohyun Kim, PEARC 2019
(Collaborative work UW-Platteville, LSU and San Diego SupercomputingCenter)
 “ParLECH: Parallel Long-read Error Correction with Hadoop” Arghya Kusum Das, Seung-
Jong Park, Kisung Lee, IEEE BIBM, 2018
 “A High-Throughput InteroperabilityArchitecture over Ethereum and Swarm for Big
Biomedical Data”, Arghya Kusum Das, Seung-Jong Park, Kisung Lee, IEEECHASE, 2018
(BlockchainWorkshop)
 “Large-scale parallel genome assembler over cloud computing environment” Arghya
Kusum Das, Praveen Kumar Koppa, Sayan Goswami, Richard Platania, Seung-Jong Park.
JBCB May23, 2017 issue
 “ParSECH: Parallel Sequencing Error Correction with Hadoop for Large-Scale Genome
Sequences” Arghya Kusum Das, Shayan Shams, Sayan Goswami, Richard Platania, Kisung
Lee, Seung-Jong Park. BiCOB 2017
 “Lazer: A Memory-Efficient Framework for Large-Scale Genome Assembly” Sayan
Goswami, Arghya Kusum Das, Richard Platania, Kisung Lee, Seung-Jong Park IEEE Big
Data2016.
 “Evaluating Different Distributed-Cyber-Infrastructure for Data and Compute Intensive
ScientificApplication” Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania,
Wooseok Chang, Seung-Jong Park. IEEE Big Data 2015. [With collaboration of SAMSUNG
Electronics Ltd., S. Korea]
 “AugmentingAmdahl’s Second Law: ATheoretical Model for Cost-Effective Balanced HPC
Infrastructure for Data-Driven Science” Arghya Kusum Das, Jaeki Hong, Sayan Goswami,
Richard Platania, Kisung Lee, Wooseok Chang, Seung-Jong Park. IEEE Cloud 2017
[collaboration with SAMSUNG Electronics Ltd, S. Korea]
 “IBM POWER8® HPC SystemAccelerates Genomics Analysis with SMT8 Multithreading”
Arghya Kusum Das, Sayan Goswami, Richard Platania, Seung-Jong Park, Ram
Ramanujam, Gus Kousoulas, Frank Lee, Ravi arimilli,Terry Leatherland, Joana Wong, John
Simpson,Grace Liu, JinchunWang. DynamicWhite Paper for Louisiana State University
collaboration with IBM
 “BIC-LSU: Big Data Research Integration with Cyberinfrastructure for LSU” Chiu, Chui-hui,
Nathan Lewis, Dipak Kumar Singh, Arghya Kusum Das, Mohammad M. Jalazai, Richard
Platania, Sayan Goswami, Kisung Lee, and Seung-Jong Park. XSEDE 2016.
Questions
ThankYou

More Related Content

What's hot

Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015Joy Chatterjee
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...Luke Young
 
Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)PlovDev Conference
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachHong ChangBum
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Cisco Russia
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Cisco Russia
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Miten Jain
 

What's hot (12)

Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...
There's no place like 127.0.0.1 - Achieving "reliable" DNS rebinding in moder...
 
Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)Linux Resource Management - Мариян Маринов (Siteground)
Linux Resource Management - Мариян Маринов (Siteground)
 
Linux resource limits
Linux resource limitsLinux resource limits
Linux resource limits
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 

Similar to Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen HPC Cluster for Big Data Genome Analysis

BC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationBC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationElijah Willie
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethionGenomeInABottle
 
Handling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding TreesHandling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding Treesbutest
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...Amir Shokri
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocolsc.titus.brown
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...Naoki Shibata
 
MSR 2009
MSR 2009MSR 2009
MSR 2009swy351
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 

Similar to Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen HPC Cluster for Big Data Genome Analysis (20)

BC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationBC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan Presentation
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
Handling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding TreesHandling Numeric Attributes in Hoeffding Trees
Handling Numeric Attributes in Hoeffding Trees
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...deepswarm optimising convolutional neural networks using swarm intelligence (...
deepswarm optimising convolutional neural networks using swarm intelligence (...
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Final doc of dna
Final  doc of dnaFinal  doc of dna
Final doc of dna
 
User biglm
User biglmUser biglm
User biglm
 

Recently uploaded

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Recently uploaded (20)

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen HPC Cluster for Big Data Genome Analysis

  • 1.
  • 2. Pictures from: https://www.slideshare.net/AdrinBezOrtega/genome-big-data, http://astro-icore.phys.huji.ac.il/node/70, https://timesofindia.indiatimes.com/city/pune/ligo-observatorys-work-to-begin-in-2018-land-acquisition- underway/articleshow/60882081.cms, https://www.yumpu.com/en/document/view/3703243/large-geoscience-databases-big-data
  • 3.  Introduction  Big Data Genome analysis  Big DataAnalysis Framework  Big data application and genome sequence  De novo Genome assembly  De novo Genomic error correction  Big data cyberinfrastructure  Evaluation of different cluster  Model for optimally balanced cluster
  • 4.  Introduction  Big Data Genome analysis  Big DataAnalysis Framework  Big data application and genome sequence  De novo Genome assembly  De novo Genomic error correction  Big data cyberinfrastructure  Evaluation of different cluster  Model for optimally balanced cluster
  • 5. $1,000 $10,000 $100,000 $1,000,000 $10,000,000 $100,000,000 Sep-01 Sep-02 Sep-03 Sep-04 Sep-05 Sep-06 Sep-07 Sep-08 Sep-09 Sep-10 Sep-11 Sep-12 Sep-13 Sep-14 Sep-15 Cost per Genome Moore’s Law Genome Short reads (TBs) Reconstructed genome (few MB/GB)NGS HPC  NGS technologies  Outpaced Moore’s Law  Software Challenges  Extreme scalabality  Algorithmic complexity  HPC Platform Challenges  More compute cycles  Extreme I/O performance  Huge Storage Space I/O, Compute and memory- intensive application Data collected from https://www.genome.gov/sequencingcosts/
  • 6.  MapReduce  Hadoop  Vertex-centric graph Processing  Giraph,GraphX  Distributed NoSQL  In-memory: Hazelcast, Redis  Disk-based: Hbase, MongoDB
  • 7.  Cost decreases  Bandwidth increases 1.00E+10 1.00E+11 1.00E+12 1.00E+13 1.00E+14 1.00E+15 1.00E+16 1.00E+17 1993 1994 1997 2000 2003 2005 2007 2009 2011 2012 Increase in FLOPS of fastest supercomputer 1 10 100 1000 10000 1995 1997 1999 2001 2003 2005 2007 2009 2011 Increase in Bandwidth (MB/s) for storage and network I/O bandwidth per device Network bandwidth per cable  Hardware evolution  Processor  Storage  Network
  • 8.  Introduction  Big Data Genome analysis  Big DataAnalysis Framework  Big data application and genome sequence  De novo Genome assembly  De novo Genomic error correction  Big data cyberinfrastructure  Evaluation of different cluster  Model for optimally balanced cluster
  • 9. Big Data Genome Assembly
  • 10. Picture from http://www.slideshare.net/torstenseemann/approaches-to-analysing-1000s-of-bacterial-isolates-iceid-2015- atlanta-usa-mon-24-aug-2015  High throughput sequencing machine  High coverage to get desired accuracy  Complex algorithms  Data, Compute, and Memory intensive application
  • 11.  Like restoring a damaged book from multiple copies torn at random places  The problem can be mapped as a graph analytic problem: De Bruijn Graph
  • 12.
  • 13.
  • 14.  Modified version of parallel list ranking algorithm  Mark head (h) and tail (t) and merge the h-t link  Number of rounds: O(log |n|) [where, n: #vertices in the longest path] Round #1 Round #2
  • 15. Round #3  Tips:Vertices with only one incoming edge, no outgoing edge and length < 100bp
  • 17. Round #6  Bubbles:Vertices with same predecessor and same successor  Levenshtein like edit distance algorithm is used  If the distance is less than a threshold, vertex with minimum frequency is removed
  • 19. Staphylococc us Aureus Rhodobactor Spharoides Human Chromosome (HCR 13) Yoruban Male Source GAGE GAGE GAGE SRA000271 Read size (bp) 255*106 410*106 5.9*109 141.5*109 Read length (bp) 37 & 101 101 101 101 Total Reads 4791974 4105236 59414772 2-Billion Ref. Genome Size 2871915 4603060 88289540 3.3*109 Dataset (GigaByte) 0.3 0.65 10.0 452.0 GAGE: Genome Assembly Goldstandard Evaluation (http://gage.cbcb.umd.edu/) NCBI Open dataset
  • 20. S. Sureus (325MB) GiGA ABySS Contrail # Contigs 298 300 309 Corrected NG50 26725 24819 31332 NG50 count 34 38 30 Max Contig size 95737 125049 95737 Misassembled contigs 0 4 0  GiGA vs ABySS: shows higher NG50, lower missassembled contigs  GiGA vs Contrail: almost comparable accuracy R.Spharoides (650MB) GiGA ABySS Contrail # Contigs 737 1912 727 Corrected NG50 10804 4215 11718 NG50 count 134 283 126 Max Contig size 65538 54734 51683 Missassembled contigs 1 78 1
  • 21. HCR13 (10GB) GiGA ABySS Contrail # Contigs 76049 51790 76209 Corrected NG50 658 1269 700 NG50 count 33271 16643 34223 Max Contig size 19446 30053 19321 Missassembled contigs 3 17 3  GiGA vs ABySS: shows higher NG50, lower missassembled contigs  GiGA vs Contrail: almost comparable accuracy
  • 22.  XSEDE resource LSU SuperMic HPC cluster is used Maximum #nodes 128 Cores/node 20 (Two 10-core Intel IvyBridge) DRAM/node 64GB Disk/node 250GB (Hard disk drive) Network 56Gbps InfiniBand
  • 23.
  • 24.  ABySS processes failed many times for network issues  Contrail, being disk-based took more than maximum allocated time for a single job GiGA ABySS Contrail # Contigs 3032297 - - NG50 827 - - Max Contig size 35465 - - # Cores 512 - - Time (hour) 8.5 Failed Failed
  • 26.  Higher read length  Better genome finishing  More complete genome assembly  Higher error rate  Higher cost Illumina short reads PacBio long reads Platform characteristics High throughput SMRT Read length 100 – 250bp 5kbp – 20kbp Cost $0.03/mbp $0.30/mbp Error rate 1-2% 10-15% Error characteristics Substitution InDel
  • 27.  De Bruijn graph-based method  More scalable than overlap-based method  Widest path algorithm provides accuracy
  • 28.  Theory behind using the widest path algorithm Assume K-mer coverage as random variable IID (F(x)) Theory of minimum probability distribution (1-(1-F(x))n) Proof:The probability of the minimum coverage k-mer is highest in the erroneous read given many reads sequenced the same region of the genome
  • 29.  There may be many error k-mers with high coverage  But the probability of finding minimum coverage is significantly higher in the error path  Hence, a widest path algorithm is used to select the correct path
  • 30.  Hadoop (MapReduce): for computation  Hazelcast (In-memory NoSQL): for de Bruijn graph storage
  • 31.  Map: Emits three k-mers  First k-mer: incoming edge  Middle k-mer: vertex  Third k-mer: outgoing edge  Coverage of middle: 1  Reduce  Group by vertex  Aggregate incoming edges  Aggregate outgoing edges  Sum coverages
  • 32.  Hadoop with Hazelcast  Error detection  Hadoop Map-only job  k-mer coverage < threshold  Error k-mer Millions of Searches over the entire dataset
  • 33.  Widest path algorithm  Maximize the minimum k-mer coverage in the path of the de Bruijn graph  Modified version of the Dijkstra’s Algorithm  Similar time complexity
  • 34. PacBio Data #Reads Data Size (GB) Read length %Reads Aligned E. coli 1129576 1.032 1120 78.97 Yeast 2315594 0.53 5874 82.12 Fruit fly 6701498 55 4328 51.14 Human 23897260 312 6587 72.3 Illumina Data #Reads Data Size (GB) Read length %Reads Aligned E. coli 45440200 13.50 101 99.44 Yeast 4503422 1.20 101 93.75 Fruit fly 179363706 59 101 95.56 Human 1420689270 452 101 79.60
  • 35.  %Read aligned: Percentage of corrected long reads and the base pairs aligned to the reference genome  %ReadsAligned = AlignedReads /TotalReads * 100  %Base pairs aligned: Percentage of base pairs (of total base pairs) of corrected long reads and the aligned to the reference genome  %BasePairAligned = AlignedBases /TotalBases * 100
  • 36.  Widest path (WP): Select the path in the de Bruijn graph which maximizes the minimum k-mer coverage  Leverages the coverage information while correcting the error  Dijkstra’s shortest path (SP): Select the shortest path without taking any coverage information  Coverage information is used only when the de Bruijn graph is constructed  K-mers below a threshold is removed from the graph  1-step Greedy (Gr): Select the successor k-mer with highest coverage  High chance of selecting the wrong path  Stopped after a predefined number of hops
  • 37.  Widest path shows the best performance  Greedy algorithm shows the worst performance  K is set to 15 Data Algorithm %Read aligned %Base pair aligned E. coli ParLECHWP 93.69 92.15 ParLECH sp 87.55 86.49 ParLECH Gr 76.68 70.92 Yeast ParLECHWP 86.07 89.31 ParLECH sp 84.92 86.44 ParLECH Gr 75.77 74.68 Fruit fly ParLECHWP 65.92 62.42 ParLECH sp 54.53 49.41 ParLECH Gr 43.97 37.44
  • 38.  ParLECH aligned more reads and basepairs to the reference genome comparing to LoRDEC  K is set to 15 Data Algorithm %Read aligned %Base pair aligned E. coli ParLECH 93.69 92.15 LoRDEC 87.55 86.49 Original 78.97 75.07 Yeast ParLECHWP 86.07 89.31 LoRDEC 84.92 87.08 Original 82.12 88.69 Fruit fly ParLECHWP 65.92 62.42 LoRDEC 54.53 49.69 Original 51.14 46.04
  • 39.  XSEDE resource LSU SuperMic HPC cluster is used Maximum #nodes 128 Cores/node 20 (Two 10-core Intel IvyBridge) DRAM/node 64GB Disk/node 250GB (Hard disk drive) Network 56Gbps InfiniBand
  • 40.  LoRDEC performs better in single node  ParLECH outperforms when multiple nodes are added 1 2 4 8 16 32 #Nodes Executiontime(min) 01020304050 ParLECH LoRDEC
  • 41.  Almost linear scalability 16 32 64 128 1020501002005002000 Number of Nodes in log scale Executiontimeinlogscale(min) KmerCount LocateError CorrectError Total
  • 42.  A total of 764GB data is processed  Appreciable accuracy  LoRDEC could not process  Could not produce the de Bruijn graph PacBio data size 312GB Illumina data size 452GB #nodes used 128 k 17 Time 28.6 hours %Read aligned 78.3 %base pair aligned 75.43
  • 43.  Desired software characteristics for big data genome analysis  Distributed  Scalable  Low cost  Consider data locality  Capable to work on commodity hardware  Develop algorithms using of big data analytics model  Better performance than other MPI-based software on traditional HPC environment Can we get better performance by changing the hardware infrastructure?
  • 44.  Introduction  Big Data Genome analysis  Big DataAnalysis Framework  Big data application and genome sequence  De novo Genome assembly  De novo Genomic error correction  Big data cyberinfrastructure  Evaluation of different cluster  Model for optimally balanced cluster
  • 46.
  • 47.  Network issues  Fat tree architecture with Blocking (2:1)  Low effective bandwidth  Current programming models needs bandwidth  Storage issues  Fewer directly attached device (normally hard disk drive)  Low I/O bandwidth  Big data job becomes I/O bound  Memory issues  Low RAM per core  Significant tradeoff between the degree of data- parallelism and memory requirement  Low buffer size increases data spilling to disks  Causes significant performance drop with HDD
  • 48. SuperMike II SwatIII- Basic SwatIII- Basic-SSD SwatIII Memory SwatIII- Scaleup SwatIII- Medium Cluster Category HPC- Cluster Scaled out datacenter Scaled out datacenter Memory Optimized datacenter Scaled up datacenter Medium sized datacenter #Pcores/n ode 16 16 16 16 16 16 DRAM(GB) /node 32 32 32 256 256 64 #Disks/no de 1-HDD 1-HDD 1-SSD 1-SSD 7HDD/SSD 2HDD/SSD Network 40Gbps QDR IB 10Gbps Eth. 10Gbps Eth. 10Gbps Eth. 10Gbps Eth. 10Gbps Eth.
  • 49.
  • 50. Bumble bee Job type Input Final output #Jobs Shuffled data HDFS data Graph construc tion Hadoop 90GB (500M reads) 95GB 2 2TB 136GB Graph simplific ation Series of Giraph jobs 95GB (715M vertices) 640MB (62K vertices) 15 - 966GB  #Nodes used:  SuperMikeII: 15  SwatIII-Basic-HDD/SSD: 15  SwatIII-Memory: 15 Human Job type Input Final output #Jobs Shuffled data HDFS data Graph construc tion Hadoop 452GB (2B reads) 3TB 2 9.9TB 3.2TB Graph simplific ation Series of Giraph jobs 3TB (1.5B vertices) 3.8GB (3M vertices) 15 - 4.1TB
  • 51. 0 0.5 1 1.5 Graph construction Graph simplification entire pipeline Executiontimenormalizedto SuperMikeII Assembly stagesAxis Effect of Network(InfiniBand vs Ethernet) to assemble 90GB Bumble Bee Genome SuperMikeII SwatIII-Basic-HDD  40Gbps IB + 2:1 blocking vs 10Gbps Eth. + no blocking  Similar performance  SSD vs HDD  Hadoop shows 50% improvement  256GB vs 32GB DRAM  Hadoop shows 70% and Giraph shows 35% improvement 1.012 1.033 1.025 0 0.5 1 1.5 Graph construction Graph simplification Entire pipeline Executiontimenormalizedto SuperMikeII Assembly stages Effect of storage type (HDD vs SSD) and size of RAM while assembling Bumble Bee Genome SuperMikeII SwatIII-Basic-SSD SwatIII-Memory 0.5 0.3 0.96 0.65 0.790.67
  • 52. 0 1 2 3 GraphConstruction GraphSimplification EntirePipeline Performance/$ normalizedto SuperMikeII Assembly stages Performance/$ with bumble bee (90GB) genome assembly 0 1 2 3 GraphConstruction GraphSimplification EntirePipeline Executiontime normalizedto SuperMikeII Execution time for 90GB bumble bee genome assembly  Scaled up cluster  More execution time  More Performance/$  HDD and SSD shows almost same execution time  HDD shows better Performance/$ than HDD
  • 53. 0 2 4 6 Graph construction Graph simplification Entire pipeline Performance/$ normalizedto SuperMikeII Performance/$ for human genome 0 0.5 1 1.5 Graph construction Graph simplification Entire pipeline Executiontime normalizedto SuperMikeII Execution time for human genome (452GB)  Fewer scaled up server  Better than traditional HPC cluster (3-4x benefit in performance/$)  HDD performs similar as SSD  HDD shows better performance/$ than SSD 1.006 1.128 0.898 1.023 0.999 1.077 3.17 4.36 3.88 4.79 3.65 4.21
  • 54.  1-SSD performs similar to 4-HDD  Disk controller saturates at ~500MB/s  Adding more disks (HDD/SSD) does not improve performance any more 0 1000 2000 3000 4000 5000 6000 7000 1HDD/DataNode 2HDD/DataNode 4HDD/DataNode 1SSD/DataNode 2SSD/DataNode Executiontime(s) #DAS/DN and type 5740 4429 3333 2939 2732
  • 55. 0 1 2 3 GraphConstruction GraphSimplification EntirePipeline Performance/$ normalizedto SuperMikeII Performance/$ with Bumble Bee Genome assembly  Hyperscale system prototype  32 low-power node: 2 cores, 1 SSD and 16GB RAM/node  10% better performance than SuperMikeII (16 cores, 1 HDD and 32GB RAM/node)  More than twice improvement in performance/$ 0.8 0.85 0.9 0.95 1 1.05 GraphConstruction GraphSimplification EntirePipeline Executiontime normalizedto SuperMikeII Execution time for 90GB Bumble Bee Genome Assembly 0.93 0.89 0.90 2.16 2.24 2.215
  • 56.  Increase compute bandwidth  Power8 processor has 8 SMT  16 memory controllers  Increase I/O bandwidth  Many HDD per node  I/O and compute distribution on SMT  Increase Network bandwidth  Clos connection with No blocking
  • 57.  Intel’s Knights Landing (KNL) cluster  Low energy consumption  Knights landing processor with lower clock speed  Increased compute and I/O parallelism  4 SMT (instead of 2 hyperthread)  Non volatile RAM high bandwidth flash memory  NvidiaGPU cluster  General Purpose GPU (GP GPU)  Work in conjunction with Intel or IBM Power8 processor  NVLink (High speed connection between IBM Pow)
  • 58.  Limitations in traditional HPC cluster and Data Center  Network  Storage  Memory  Huge tradeoff between performance and cost  How to model these observation to develop optimal cluster architecture?
  • 59. A Theoretical Model to Build Cost- Effective Balanced HPC Infrastructure for Data-Driven Science
  • 60.  Amdahl’s I/O number for balanced system  1-bit (0.125-Byte) of I/O per second per IPS  Amdahl’s memory number for balanced system  1-byte of memory per IPS  Limitation One-size-fit-all: does not consider the impact of the application’s characteristics
  • 61.  Modified Amdahl’s I/O number  8 MIPS/MBPS I/O  On the relevant application  Modified Amdahl’s memory number  The MB/MIPS ratio is rising from 1 to 4  Limitation  Does not consider the cost component  Observations only: no theoretical background
  • 62.  Modify Amdahl’s I/O number, 𝛽𝑖𝑜 𝑜𝑝𝑡 = 𝑓𝑖𝑜(𝛾𝑖𝑜, 𝛿𝑖𝑜)  Modify Amdahl’s mem. number, 𝛽 𝑚𝑒𝑚 𝑜𝑝𝑡 = 𝑓 𝑚𝑒𝑚 𝛾 𝑚𝑒𝑚, 𝛿 𝑚𝑒𝑚
  • 63.  Ignores overlap of work done by I/O and memory  Ignores the CPU micro architecture  Consider the number of instruction executed per cycle (IPC) as proportional toCPU core frequency
  • 64.
  • 65.
  • 66.
  • 67.
  • 68. Cluster SuperMikeII SwatIII CeresII Processor 2 8core Xeon 2 8core Xeon 1 6core Xeon CPU core speed 2.6 GHz 2.6 GHz 2GHz #Cores/node 16 16 6 Total CPU speed/node 41.6GHz 41.6GHz 12GHz #Disks/node 1-HDD (SATA) 4-HDD (SATA) 1-SSD (NVMe) Seq. I/O bandwidth/disk 0.15GBPS 0.15GBPS 2GBPS Seq. I/O bandwidth/node 0.15GBPS 0.60GBPS 2GBPS DRAM/node 32GB 256GB 64GB Max. nodes available 128 16 40
  • 69. Cluster SuperMikeII SwatIII CeresII Cluster type Traditional HPC Datacenter MicroBrick 𝛽𝑖𝑜 0.003 0.015 0.166 𝛽 𝑚𝑒𝑚 0.77 6.15 5.33 𝛾𝑖𝑜 0.0005 0.01 1.03 𝛾 𝑚𝑒𝑚 0.06 1.47 1.25 Optimized for Only compute- intensive application Compute- and Memory- intensive application I/O-, Compute- and Memory- intensive Application
  • 70. Application Terasort Wordcount Genome Assembly Ph1 Genome Assembly Ph2 Job type Hadoop Hadoop Hadoop Giraph Input 1TB 1TB 452GB (2bn short reads) 3.2TB (1.5bn vertices) Output 1TB 1TB 3TB 3.8GB Shuffled data 1TB 1TB 9.9TB - Application Characteristics Map: CPU- intensive, Reduce: I/O-intensive Map and Reduce: I/O and CPU- Intensive Map and Reduce: CPU- and I/O-intensive Memory- Intensive
  • 71.  Lower is better (Price-to-Performance of SuperMikeII is considered as 1)  CeresIIVs. SuperMikeII: >65% improvement for both  CeresIIVs. SwatIII: >50% improvement for both 0 0.2 0.4 0.6 0.8 1 1.2 TeraSort WordCount Price-to-Performance (normalizedtoSuperMike2 Application SuperMikeII SwatIII CeresII 0.76 0.37 0.79 0.35
  • 72.  Lower is better (Price-to-Performance of SuperMikeII is considered as 1)  CeresIIVs. SuperMikeII: 88% and 85% for phase-1 and phase-2 respectively  CeresII vs. SwatII: 50% and 20% for phase-1 and phase-2 respectively 0 0.2 0.4 0.6 0.8 1 1.2 Graph Construction Graph Simplification Price-to-Performance (normalizedto SuperMike2) Application 0.24 0.12 0.22 0.15
  • 73.  For data-driven application with current H/W price  Amdahl’s I/O number (𝛽𝑖𝑜 𝑜𝑝𝑡) should be increased compared to Gray’s law (0.125 to 0.17)  Amdahl’s memory number (𝛽 𝑚𝑒𝑚 𝑜𝑝𝑡) should be decreased compared to Gray’s law (4 to 2.7)  For HPC clusters  𝛽𝑖𝑜 and 𝛽 𝑚𝑒𝑚 provide an easy-to-use alternative for FLOPS for I/O- and memory-bound applications Informed choice among hardware components during investing on HPC cluster when application characteristics are not known
  • 74.  Application of Deep Learning and AI methodologies on genomics  Key-Value Memory Network  Metagenomic Assembly and error correction  Transfer big genomic data on Blockchain  Security of the sensitive data  High throughput  Current collaboration  SanDiego Supercomputing center  IBM OpenPower
  • 75.  Thanks to the Faculty and staff LSU and UW Platteville  Dr. Seung-Jong Park, Dr. Kisung lee, Dr. SeungwonYang, Dr. JianhuaChen, Dr. Praveen Koppa, Dr. SayanGoswami, Dr. Richard Platania, Dr. Chui hui Chiu, Dipak Singh, Dr. Lisa Landgraf, etc.  Samsung SSD team ▪ Jaeki Hong, Jay Seo, Jinki Kim,WooseokChang, etc.  IBM Power8 and Open-PowerTeam ▪ Terry Leatherland, Ravi Arimilli,Ganesan Narayanswami, etc.  Other collaborators ▪ Dr. Ling Liu (GATECH)  Bioscience experts ▪ Dr. Joohyun Kim, Dr. Nayong Kim, Dr. Maheshi Dassanayake, Dr. Dong-Ha Oh, etc.
  • 76.  This work was supported in part by  NIH-P20GM103424  NSF-MRI-1338051  NSF-CC-NIE-1341008  NSF-IBSS-L-1620451  LA BoR LEQSF(2016-19)-RD-A-08  The HPC services are provided by  LSU HPC  LONI  Samsung Research S. Korea  IBM Research Austin
  • 77.  “Developing a Meta Framework for Key-Value Memory Networks on HPC Clusters” ChoonhanYoun, Arghya Kusum Das, SeungwonYang, Joohyun Kim, PEARC 2019 (Collaborative work UW-Platteville, LSU and San Diego SupercomputingCenter)  “ParLECH: Parallel Long-read Error Correction with Hadoop” Arghya Kusum Das, Seung- Jong Park, Kisung Lee, IEEE BIBM, 2018  “A High-Throughput InteroperabilityArchitecture over Ethereum and Swarm for Big Biomedical Data”, Arghya Kusum Das, Seung-Jong Park, Kisung Lee, IEEECHASE, 2018 (BlockchainWorkshop)  “Large-scale parallel genome assembler over cloud computing environment” Arghya Kusum Das, Praveen Kumar Koppa, Sayan Goswami, Richard Platania, Seung-Jong Park. JBCB May23, 2017 issue  “ParSECH: Parallel Sequencing Error Correction with Hadoop for Large-Scale Genome Sequences” Arghya Kusum Das, Shayan Shams, Sayan Goswami, Richard Platania, Kisung Lee, Seung-Jong Park. BiCOB 2017  “Lazer: A Memory-Efficient Framework for Large-Scale Genome Assembly” Sayan Goswami, Arghya Kusum Das, Richard Platania, Kisung Lee, Seung-Jong Park IEEE Big Data2016.
  • 78.  “Evaluating Different Distributed-Cyber-Infrastructure for Data and Compute Intensive ScientificApplication” Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania, Wooseok Chang, Seung-Jong Park. IEEE Big Data 2015. [With collaboration of SAMSUNG Electronics Ltd., S. Korea]  “AugmentingAmdahl’s Second Law: ATheoretical Model for Cost-Effective Balanced HPC Infrastructure for Data-Driven Science” Arghya Kusum Das, Jaeki Hong, Sayan Goswami, Richard Platania, Kisung Lee, Wooseok Chang, Seung-Jong Park. IEEE Cloud 2017 [collaboration with SAMSUNG Electronics Ltd, S. Korea]  “IBM POWER8® HPC SystemAccelerates Genomics Analysis with SMT8 Multithreading” Arghya Kusum Das, Sayan Goswami, Richard Platania, Seung-Jong Park, Ram Ramanujam, Gus Kousoulas, Frank Lee, Ravi arimilli,Terry Leatherland, Joana Wong, John Simpson,Grace Liu, JinchunWang. DynamicWhite Paper for Louisiana State University collaboration with IBM  “BIC-LSU: Big Data Research Integration with Cyberinfrastructure for LSU” Chiu, Chui-hui, Nathan Lewis, Dipak Kumar Singh, Arghya Kusum Das, Mohammad M. Jalazai, Richard Platania, Sayan Goswami, Kisung Lee, and Seung-Jong Park. XSEDE 2016.

Editor's Notes

  1. Good morning everybody. I am Arghya Kusum Das from CCT LSU. Today I am going to present our paper titled augmenting Amdahl’s second law Augmenting Amdahl’s Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science.
  2. Good morning everybody. I am Arghya Kusum Das from CCT LSU. Today I am going to present our paper titled augmenting Amdahl’s second law Augmenting Amdahl’s Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science.
  3. Good morning everybody. I am Arghya Kusum Das from CCT LSU. Today I am going to present our paper titled augmenting Amdahl’s second law Augmenting Amdahl’s Second Law: A Theoretical Model to Build Cost-Effective Balanced HPC Infrastructure for Data-Driven Science.
  4. The most popular and effective law is proposed by Computer scientist Jean M Dell in 1960s where he told that a balanced system needs one bit off io far second per cpu instruction per second. This is known as Amdahl’s Io number. Regarding memory he told that a balanced system needs one byte of memory per CPU instruction per second. This is known as Amdahl’s memory number. The major limitation of the law is each propose and one size feet tall type of design reach does not consider the impact off application characteristics which are changing frequently nowadays
  5. To address this limitation Computer scientist to Jim Gray modified the original law where he can't the Io number same as as the original law but it should be done on your relevant application. Regarding the memory number he observed that it easy arising from 1 to 4. Although Jim Gray consider the application characteristics for the Io number, he did not consider the cost component. Farther more the memory number is this simply an observation which does not have any theoretical background
  6. This slide shows the concrete problem definition. We need to modify Amdahl’s io and memory numbers that is beta-io-opt and beta-mem-opt as a function off application balance and the cost balance. The application balance is a measurement of whether the application is io intensive or CPU intensive or memory intensive. basically it is ratio between required io bandwidth to required CPU speed or the ratio between required memory and required CPU speed. On the other hand the cost balance are the ratio between the io cost per gbps or memory cost per gb and cpu cost per gigahartz.
  7. The slide shows the model’ assumption. First the model is additive in nature that is ignores any overlap between io and memory operation. This is a valid assumption because when a CPU is busy in doing io it it does not do any memory operation. Second the it Ignores any CPU microarchitecture which means it considers the number of instruction executed per cycle as proportional to the CPU core frequency [Micro architecture is ignored in Alex Szalay’s paper on Amdahl’s Balanced Blade Ref: Szalay, Alexander S., et al. "Low-power amdahl-balanced blades for data intensive computing." ACM SIGOPS Operating Systems Review 44.1 (2010): 71-75.]
  8. While driving the model at Forest we have taken a cross product of time spent at each hardware component and the cost off the corresponding hardware components. then we have simply replaced the resulting cross product with the balance terminology that is beta-io and beta-mem. After that we took a partial differentiation off the price to performance ratio with respect to beta-io and beta-mem. Since our motivation is to minimize the price to performance we solve that equation for zero
  9. As the outcome we get two modfied Amdahl’s numbers. According to our model Amdahl’s io number is Square root of the ratio of the (application balance between I/O and CPU) and the (cost balance between the disk and CPU). On the other hand the M Dells memory number is Square root of the ratio of the (application balance between mem. and CPU) and the (cost balance between the mem. and CPU)
  10. So the actual implication of the model lies in its consideration off the cost component. As it can be seen in the figure considering the lower cost of disk the model produce a higher value for amdahl’s io number comparing to gray’s amendment. On the other hand considering the higher cost off memory its produce lower number for amdahl’s memory number comparing to gray's law. In our module gray’s law is a special case when the applications resource requirement exactly compensate the corresponding hardware cost that is Application balance is inverse off the cost balance Also using these figure you can easily say which are system architecture is optimized for what type of applications. or the other way, what type of applications should perform best in what type of architecture. We will use these characteristics Farm for cluster classification later in the presentation
  11. Dislike shows an example in the current scenario where we have analyzed amazon.com new egg.com etc. To get the average price off different module of the system that is io and CPU and then directly feed those to our model for io and computing intensive application which means, gamma-io equals one. This way our modified amdahl’s io number comes as 0.17. Similarly considering a memory and computer intensive application that is, gamma-mem equals one we calculated amdahl’s memory number as 2.7. The price table for some hardware is given in the paper Table-II Practical implication of application balance: (gamma_io or mem) = (data read from IO or memory)/(instructions-per-second) Flow of the derivation
  12. We have used three different types of cluster for this work. first is supermike2 which has 16 Xeon processing cores one hard disk drive producing only 0.15 GBPS bandwidth and 32 GB of ram. The second cluster is swat3 which has same processor configuration as supermike2 but four times more io bandwidth and eight times more memory. The third one is ceres2 richie's Samsung micro brick based cluster powered by NVMe ssd. Each node off these cluster has only 6 Xeon core and 64 GB of memory. however because of NVMe ssd each node has 2GBPS io bandwidth which is much higher than the other two cluster
  13. Based on these configuration that I io bandwidth, memory and cpu speed we calculated the Beta-io and beta-mem off all of these cluster. As it can be seen ceres2’s beta-io is almost similar to the Optimal produced by our model. Whereas supermike2 shows extremely low value for this. Swat3 lies in between these two. In terms of memory again supermike2 shows very little value and swat3 shows very high value. Now using the curve shown our earlier (that is beta versus, gamma plot ) we can easily determine which kind of application is optimized for these different architecture. This way supermike2 to can be classified as a transitional HPC cluster which is optimized only for computer intensive application with the Very little value of gamma-io and gamma-mem. Swat3 can be classified as regular data center which is optimized for both compute and memory intensive application. Serious to on the other hand event with lowest processing speed per node is darned out to be the best for all io compute and memory intensive applications So for today’s hadoop based Scientific applications ceres2 is expected to show much better performance
  14. To prove that we have a used three different type of benchmarks. The first to are very common terasort and wordcount. Terasort has CPU intensive map phase and then io intensive reduce phase. Wordcount has both CPU and io intensive map and reduce phase based on the data size. The third benchmark is a genome assembly Application developed by us using hadoop and giraph. The first phase off the SM loader is the shuffle intensive hadoop job producing almost 10 TB of shuffle data and the second phase is the memory intensive giraph job reach process at 3.2 TB of graph.
  15. This slide compares different clusters. as it can be seen ceres2 shows more than 65% improvement for both tera sort and word count. Comparing to swat3 ceres2 shows more than 50% benefit for both applications.
  16. The slide compares the price to performance off different clusters for human genome assembly. Ceres2 shows more than 85% benefit for both of the phases of assembly comparing to supermike2 which is optimized for compute intensive applications only. Comparing to swat3 it shows 50% benefit for the hadoop phase and 20% benefit in the giraffe face.
  17. Now it is the last part of our presentation. In this work we have provided theoretical background for amdahl’s are you number and memory number and modified these based on the application characteristics and the hardware price trend. According to our observation amdahl’s io number should be increased comparing to gray’s law because low price of disc. On the other hand M Dells memory number should decrease compared to Gray's Law as the memory price is high. The model also provides and easy to use alternative for flops to show the capability off HPC clusters which has better expressive power for io and memory bound applications In this work our focus was on simplicity so that the morning can be used by the system designers however many subtle parameters like CPU multithreading io latency etc. Can be added to improve its accuracy