SlideShare a Scribd company logo
1 of 76
Download to read offline
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
Programming HPC systems is hard today and will only get
harder without new approaches
n  Aspects of programming future large-scale systems
•  Focusing on full-system, data-awareness and improved productivity
—  Programming in the small is not the challenge, we must take a broader view
—  Co-designing the full tool chain with applications
—  End-to-end awareness is required to avoid point solutions “non-solutions”
•  Targeting large scale dynamic computation environments
—  Hardware dynamics: frequency scaling, dark silicon, adaptive routing, …
—  Software dynamics: system services, in-situ/co-resident services, …
—  Application dynamics: multiscale andmultiphysics
n  Programming model goals (What must we deliver?)
•  High performance – we must be fast
•  Performance portability – across many kinds of machines over many generations
•  Programmability – sequential semantics, parallel execution
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
Can we fulfill our programming model goals today?
We can, to some degree…
… but at great cost: Programmer Pain
Task graph for one time step on one node…
… of a mini-app
Do you want to schedule this graph?
(High Performance)
Do you want to re-schedule this graph
for every new machine?
(Performance Portability)
Do you want to be responsible
for generating this graph?
(Programmability)
Today: programmer’s responsibility
Tomorrow: programming system’s
responsibility
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
§ AsyncRecv(X);
§ DoWork(Y);
§ Sync();
§ F(X);
  How	much	work	should	I	do?		
  Is	this	performance	portable?	
  When	does	forward	progress	really	occur?	
  What	if	I	have	more	work	and	data	
movement	happening	in	DoWork?	
  What	resources	are	in	use?	Where	is	the	
data?	Who	is	using	it	and	how?	
  Is	this	modular?	
  What	if	there	is	a	fault?	
§ 	Concept	from:		Mike	Bauer’s	Thesis	(Stanford),	
Legion:		Programming	Distributed	Heterogeneous	Architectures	with	Logical	Regions	
We need the right programming abstractions to
achieve our goals
§ 		Today’s	programming	environments:	
•  Are	using	the	wrong	abstrac:ons…	
•  Focus	on	control	flow,	parallelism	
and	have	low-level	data	abstrac:ons	
(i.e.	no	data	model)
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
Our approach to meeting our programing model
goals is different
Functionally correct
application code
Mapping to target
machine
Extraction of
parallelism
Management of
data transfers
Task scheduling and
Latency hiding
Data-Dependent
Behavior
Compiler/
Runtime
understanding of
data
Legion Programs Legion Mappers
Legion
Programming
System
Los Alamos National Laboratory
2/8/17 | 6
Legion: Separation of Concerns
Tasks
(execution model)
Describe parallel execution
elements and algorithmic
operations with sequential
semantics, out-of-order execution
Regions
(data model)
Describe decomposition of
computational domain.
•  Privileges (read-write, read-only, reduce)
•  Coherence (exclusive, atomic)
Mapper Describes how tasks and regions
should be mapped to the target
architecture
Mapper allows architecture-specific optimization without
effecting the correctness of the task or domain descriptions
[=](int i) { rho(i) = … }
rho0
rho1
Mapper
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
The Legion programming model: enabling the runtime to
manage the complexity of a dynamic environment
Legion
Program
Legion Legion
Mapper
Machine-Independent
Specification
Defines application
correctness
Sequential semantics
Programmatic interface for performing
machine-specific and app-specific
mapping
Only impacts performance
Yesterday: manual mapping
Today: programmatic mapping
Tomorrow: generated mappers
Analysis!
Slide 8
Data Model – Logical Regions
•  Unbounded set of rows (index space)
•  Bounded set of columns (fields)
•  Can be partitioned (disjoint or aliased)
•  Tasks operate on regions
–  Must specify which fields and how they
“use” them (fields and privileges: read,
write, read+write, exclusive, etc.)
–  Allows fields to be “sliced”
•  Tasks launched in program order
(execution order relaxed based on
dependencies)
–  “out of order” software processor
Index 0
Field 1 Field 2 Field 3 Field 4
Index 1
Index 2
Index 3
Index 4
Index 5
Index 6
Index 7
Index 8
…
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235
Our approach shows compelling results
with a full application
•  High-level application constructs remain
in MPI form… - essentially all the initial
setup was kept
•  Full right hand side function
implemented in Legion including all
communication (the full stencil
computation and all chemistry
calculations)
•  Ported production combustion
application (S3D)
§  Sufficient complexity to validate our
approach – beyond the “Proxy”
n ~2-3X faster than MPI+OpenACC
n ~7X faster than MPI only
n Enabling new science that is not
possible with current approaches
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
Nodes
0
20000
40000
60000
80000
100000
120000
140000
160000
Throughput(Points/s)
Legion
MPI+OpenACC
1.75
X
2.85X
7.2X
Legion & Regent
ATPESC 2017 10
•  Legion is
–  a C++ runtime
–  a programming model
•  Regent is a programming language
–  For the Legion programming model
–  Current implementation is embedded in Lua
–  Has an optimizing compiler
•  This tutorial will focus on Regent
Regent/Legion Design Goals
ATPESC 2017 11
•  Sequential semantics
–  The better to understand what you write
–  Parallelism is extracted automatically
•  Throughput-oriented
–  The latency of a single thread/process is (mostly) irrelevant
–  The overall time is what matters
•  Runtime decision making
–  Because machines are unpredictable/dynamic
Throughput-Oriented
ATPESC 2017 12
•  Keep the machine busy
•  How? Ideally,
–  Every core has a queue of independent work to do
–  Every memory unit has a queue of transfers to do
–  At all times
Consequences
ATPESC 2017 13
•  Highly asynchronous
–  Minimize synchronization
–  Esp. global synchronization
•  Sequential semantics but support for
parallelism
•  Emphasis on describing the structure of data
–  Later
Regent Stack
ATPESC 2017 14
Regent
Language and
compiler
Legion
High-level runtime
Realm
Low-level runtime
Terra
Sequential
performance
Lua
Host language
Regent in Lua
ATPESC 2017 15
•  Embedded in Lua
–  Popular scripting language in the graphics community
•  Excellent interoperation with C
–  And with other languages
•  Python-ish syntax
–  For both Lua and Regent
•  Examples Overview/1.rg & 2.rg
ATPESC 2017 16
•  To run:
–  ssh 	l USER bootcamp.regent-lang.org
–  cd Bootcamp/Overview
–  qsub r1.sh
ATPESC 2017 17
Tasks
Tasks
ATPESC 2017 18
•  Tasks are Regent’s unit of parallel execution
–  Distinguished functions that can be executed
asynchronously
•  No preemption
–  Tasks run until they block or terminate
–  And ideally they don’t block …
Blocking
ATPESC 2017 19
•  Blocking means a task cannot continue
–  So the task stops running
•  Blocking does not prevent independent work from
being done
–  If the processor has something else to do
–  Does prevent the task from continuing and
launching more tasks
•  Avoid blocking.
Subtasks
ATPESC 2017 20
•  Tasks can call subtasks
–  Nested parallelism
•  Terminology: parent and child tasks
Example
task tester(sum: int64)
end
task main()
var sum: int64 = summer(10)
sum = tester(sum)
c.printf("The answer is: %dn",sum)
end ATPESC 2017 21
If a parent task inspects the result of a child task,
the parent task blocks pending completion of
the child task.
ATPESC 2017 22
•  Examples Tasks/1.rg & 2.rg
ATPESC 2017 23
•  Reminder:
cd Bootcamp/Tasks
qsub r1.sh
ATPESC 2017 16
Legion Prof
Legion Prof
ATPESC 2017 25
•  A tool for showing performance timeline
–  Each processor is a timeline
–  Each operation is a time interval
–  Different kinds of operations have different colors
•  White space = idle time
Example 1: Legion Prof
ATPESC 2017 26
cd Bootcamp/Tasks qsub rp1.sh
make prof
http://bootcamp.regent-lang.org/~USER/prof1
Example 2: Legion Prof
ATPESC 2017 27
cd Bootcamp/Tasks qsub rp2.sh
make prof
http://bootcamp.regent-lang.org/~USER/prof2
Mapping
•  How does Regent/Legion decide on which
processor to run tasks?
•  This decision is under the mapper’s control
•  Here we are using the default mapper
–  Passes out tasks to which CPU on a node is not busy
–  Programmers can write their own mappers
ATPESC 2017 21
Parallelism
Example Tasks/3.rg
ATPESC 2017 30
•  “for all” style parallelism
•  Note the order of completion of the tasks
–  main() finishes first (or almost first)!
–  All subtasks managed by the runtime system
–  Subtasks execute in non-deterministic order
•  How?
–  Regent notices that the tasks are independent
–  No task depends on another task for its inputs
Runtime Dependence Analysis
ATPESC 2017 31
•  Example Tasks/4.rg is more involved
–  Positive tasks (print a positive integer)
–  Negative tasks (print a negative integer)
•  Some tasks are dependent
–  The task for -5 depends on the task for 5
–  Note loop in main() does not block on the value of j!
•  Some are independent
–  Positive tasks are independent of each other
–  Negative tasks are independent of each other
ATPESC 2017 24
Legion Spy
Legion Spy
•  A tool for showing ordering dependencies
•  Very useful for figuring out why things are not running
in parallel
ATPESC 2017 33
Example Tasks/4.rg: Legion Spy
ATPESC 2017 34
cd Bootcamp/Tasks qsub
rs4.sh
make spy
http://bootcamp.regent-lang.org/~USER/spy4.pdf
Workflow
ATPESC 2017 35
•  Use Legion Prof to find idle time
–  white space
•  Use Legion Spy to examine tasks that are
delayed
–  What are they waiting for?!
ATPESC 2017 28
Exercise 1
Computing the Area of a Unit Circle
ATPESC 2017 37
•  A Monte Carlo simulation to
compute the area of a unit
circle inscribed in a square
•  Throw darts
–  Fraction of darts landing in
the circle = ratio of circle’s
area to square’s area
1
y
x
Computing the Area of a Unit Circle
ATPESC 2017 38
•  Example Pi/1.rg
–  Slow!
–  Why?
Exercise 1
ATPESC 2017 39
•  Compare Pi/1.rg and Pi/x1.rg
–  Identify use of multiple trials per subtask
•  Modify Pi/x1.rg (change “terra hits” to “task hits”)
•  Uses
–  4 subtasks
–  2500 trials per subtask
• Which is faster? Why?
–  Hint: Use Legion Prof and Legion Spy
ATPESC 2017 32
Terra
Leaf Tasks
ATPESC 2017 41
•  Leaf tasks call no other tasks
–  The “leaves” of the task tree
•  Leaf tasks are sequential programs
–  And generally where the heavy compute will be
•  Thus, leaf tasks should be optimized for
latency, not throughput
–  Want them to finish as fast as possible!
Terra
ATPESC 2017 42
•  Terra is a low-level, typed language embedded in Lua
•  Designed to be like C
–  And to compile to similarly efficient code
•  Also supports vector intrinsics
–  Not illustrated today
Terra Example
ATPESC 2017 43
•  Terra/1.rg converts the hits task in Terra/ x1.rg to a
Terra function
•  Trivial in this example
–  Just change task to terra
–  Marginally faster
•  On average
Considerations in Writing Regent Programs
ATPESC 2017 44
•  The granularity of tasks must be sufficient
–  Don’t write very short running tasks
•  Don’t block in tasks that launch many subtasks
•  Terra is an option for heavy sequential
computations
ATPESC 2017 37
Structured Regions
Regions
ATPESC 2017
38
•  A region is a (typed) collection
•  Regions are the cross product of
–  An index space
–  A field space
StructuredRegions/1.rg
ion Bootcamp 201
397
0
1
2
3
4
5
6
7
8
Bit
false
false
false
false
false
true
true
true
true
false
Discussion
ATPESC 2017 48
•  Regions are the way to organize large data collections in
Regent
•  Regions can be
–  Structured (e.g., like arrays)
–  Unstructured (e.g., pointer data structures)
•  Any number of fields
•  Built-in support for 1D, 2D and 3D index spaces
Privileges
ATPESC 2017 49
•  A task that takes region arguments must
–  Declare its privileges on the region
–  Reads, Writes, Reduces
•  The task may only perform operations for which
it has privileges
–  Including any subtasks it calls
•  Example StructuredRegions/2.rg
•  Example StructuredRegions/3.rg
ATPESC 2017 50
Reduction Privileges
ATPESC 2017 51
•  StructuredRegions/4.rg
–  A sequence of tasks that increment elements of a region
–  With Read/Write privileges
•  StructuredRegions/5.rg
–  4.rg but with Reduction privileges
•  Note: Reductions can create additional copies
–  To get more parallelism
–  Under mapper control
–  Not always preferred to Read/Write privileges
ATPESC 2017 44
Partitioning
Partitioning
ATPESC 2017 53
•  To enable parallelism on a region, partition it into
smaller pieces
–  And then run a task on each piece
•  Legion/Regent have a rich set of partitioning
primitives
Partitioning Example
ATPESC 2017 54
0
1
2
3
4
5
6
7
8
9
Bit
false
false
false
false
false
true
true
true
true
false
Partitioning Example
ATPESC 2017 55
0
1
2
3
4
5
6
7
8
9
bit_region_partition[0]
bit_region_partition[1]
false
false
false
false
false
true
true
true
true
false
Bit
Equal Partitions
•  One commonly used primitive is to split a region
into a number of (nearly) equal size subregions
•  Partitioning/1.rg
•  Partitioning/2.rg
ATPESC 2017 56
Discussion
ATPESC 2017 57
•  Partitioning does not create copies
–  It names subsets of the data
•  Partitioning does not remove the parent region
–  It still exists and can be used
•  Regions and partitions are first-class values
–  Can be created, destroyed, stored in data structures, passed to and
returned from tasks
Region Trees
0
bit_region
1 2 3 4 5
ATPESC 2017 58
More Discussion
ATPESC 2017 59
•  The same data can be partitioned multiple ways
–  Again, these are just names for subsets
•  Subregions can themselves be partitioned
Dependence Analysis
ATPESC 2017 60
•  Regent uses tasks’ region arguments to
compute which tasks can run in parallel
–  What region is being accessed
•  Does it overlap with another region that is in use?
–  What field is being accessed
•  If a task is using an overlapping region, is it using the same field?
–  What are the privileges?
•  If two tasks are accessing the same field, are they both reading
or both reducing?
A Crucial Fact
ATPESC 2017 61
•  Regent analyzes sibling tasks
–  Tasks launched directly by the same parent task
•  Theorem: Analyzing dependencies between sibling tasks is
sufficient to guarantee sequential semantics
- Question: Why does this hold? (Intuitively)
•  Never check for dependencies otherwise
–  Crucial to the overall design of Regent
Consequences
ATPESC 2017 62
•  Dependence analysis is a source of runtime overhead
•  Can be reduced by reducing the number of sibling tasks
–  Group some tasks into subtasks
•  But beware!
–  This may also reduce the available parallelism
•  Partitioning/3.rg
Partitioning/3.rg
ATPESC 2017 63
•  Note that passing a region to a task does not mean
the data is copied to where that task runs
–  C.f., launcher task must name the parent region for type
checking reasons
•  If the task doesn’t touch a region/field, that data
doesn’t need to move
Fills
ATPESC 2017 64
•  A better way to initialize regions is to use fill
operations
fill(region.field, value)
•  Partitioning/4.rg
Multiple Partitions
ATPESC 2017 65
0
bit_region
1 2
10 elements each
3 4 5 0 1 2
20 elements each
Discussion
•  Different views onto the same data
•  Again, can have multiple views in use at the same
time
•  Regent will figure out the data dependencies
ATPESC 2017 66
Exercise 2
•  Modify Partitioning/4.rg to
•  Have two partitions of bit_region
–  One with 3 subregions of size 20
–  One with 6 subregions of size 10
•  In a loop, alternately launch subtasks on one
partition and then the other
•  Edit x2.rg ATPESC 2017 59
Aliased Partitions
ATPESC 2017 68
•  So far all of our examples have been disjoint
partitions
•  It is also possible for partitions to be aliased
–  The subregions overlap
•  Partitioning/5.rg
Partitioning Summary
ATPESC 2017 69
•  Significant Regent applications have interesting region trees
–  Multiple views
–  Aliased partitions
–  Multiple levels of nesting
•  And complex task dependencies
–  Subregions, fields, privileges, coherence
•  Regions express locality
–  Data that will be used together
–  An example of a “local address space” design
•  Tasks can only access their region arguments
Regions Review
ATPESC 2017 70
•  A region is a (typed) collection
•  Regions are the cross product of
–  An index space
–  A field space
•  A structured region has a structured index space
–  E.g., int1d, int2d, int3d
ATPESC 2017 70
Dependent Partitioning
Partitioning, Revisited
ATPESC 2017 72
•  Why do we want to partition data?
–  For parallelism
–  We will launch many tasks over many subregions
•  A problem
–  We often need to partition multiple data
structures in a consistent way
–  E.g., given that we have partitioned the nodes a
particular way, that will dictate the desired
partitioning of the edges
Dependent Partitioning
ATPESC 2017 73
•  Distinguish two kinds of partitions
•  Independent partitions
–  Computed from the parent region, using, e.g.,
•  partition(equals, … )
•  Dependent partitions
–  Computed using another partition
Dependent Partitioning Operations
ATPESC 2017 74
•  Image
–  Use the image of a field in a partition to define a new
partition
•  Preimage
–  Use the pre-image of a field in a partition …
•  Set operations
–  Form new partitions using the intersection, union, and
set difference of other partitions
Image
•  Computes elements reachable via a
field lookup
–  Can be applied to index space or
another partition
–  Computation is distributed based
on location of data
•  Regent understands relationship
between partitions
–  Can check safety of region relation
assertions at compile time IS 1 IS2
source partition
pointer
field
destination
index space
s1 s2 s3 s1 s2 s3
ATPESC 2017 75
Preimage
•  Inverse of image
–  Computes elements that
reach a given subspace
–  Preserves disjointness
•  Multiple images/preimages
can be combined
–  Can capture complex task
access patterns
IS1 IS2
source partition
pointer
field
destination
index space
s1 s2 s3 s1 s2 s3
ATPESC 2017 76

More Related Content

What's hot

Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraParis Carbone
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeAdaCore
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Scalable and Reliable Data Stream Processing - Doctorate SeminarScalable and Reliable Data Stream Processing - Doctorate Seminar
Scalable and Reliable Data Stream Processing - Doctorate SeminarParis Carbone
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsParis Carbone
 
Model Transformation A Personal Perspective
Model Transformation A Personal PerspectiveModel Transformation A Personal Perspective
Model Transformation A Personal PerspectiveEdward Willink
 
SDN: Situação do mercado e próximos movimentos
SDN: Situação do mercado e próximos movimentosSDN: Situação do mercado e próximos movimentos
SDN: Situação do mercado e próximos movimentosChristian Esteve Rothenberg
 
Attach Me, Detach Me, Assemble Me like you Work
Attach Me, Detach Me, Assemble Me like you WorkAttach Me, Detach Me, Assemble Me like you Work
Attach Me, Detach Me, Assemble Me like you WorkJean Vanderdonckt
 
Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)Igalia
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopPallav Jha
 

What's hot (13)

ULMAN-GUI
ULMAN-GUIULMAN-GUI
ULMAN-GUI
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 
Graph processing
Graph processingGraph processing
Graph processing
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical College
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Scalable and Reliable Data Stream Processing - Doctorate SeminarScalable and Reliable Data Stream Processing - Doctorate Seminar
Scalable and Reliable Data Stream Processing - Doctorate Seminar
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream Windows
 
Model Transformation A Personal Perspective
Model Transformation A Personal PerspectiveModel Transformation A Personal Perspective
Model Transformation A Personal Perspective
 
SDN: Situação do mercado e próximos movimentos
SDN: Situação do mercado e próximos movimentosSDN: Situação do mercado e próximos movimentos
SDN: Situação do mercado e próximos movimentos
 
Attach Me, Detach Me, Assemble Me like you Work
Attach Me, Detach Me, Assemble Me like you WorkAttach Me, Detach Me, Assemble Me like you Work
Attach Me, Detach Me, Assemble Me like you Work
 
Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)Practical virtual network functions with Snabb (8th SDN Workshop)
Practical virtual network functions with Snabb (8th SDN Workshop)
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in Hadoop
 

Similar to The Legion Programming Model for HPC

Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Nagasuri Bala Venkateswarlu
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptRubenGabrielHernande
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
SDN Demystified, by Dean Pemberton [APNIC 38]
SDN Demystified, by Dean Pemberton [APNIC 38]SDN Demystified, by Dean Pemberton [APNIC 38]
SDN Demystified, by Dean Pemberton [APNIC 38]APNIC
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)Sudarshan Mondal
 
Tutorial on SDN data plane evolution
Tutorial on SDN data plane evolutionTutorial on SDN data plane evolution
Tutorial on SDN data plane evolutionAntonio Capone
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM Joy Rahman
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_enMiya Kohno
 

Similar to The Legion Programming Model for HPC (20)

Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.Swift: A parallel scripting for applications at the petascale and beyond.
Swift: A parallel scripting for applications at the petascale and beyond.
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
try
trytry
try
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
SDN Demystified, by Dean Pemberton [APNIC 38]
SDN Demystified, by Dean Pemberton [APNIC 38]SDN Demystified, by Dean Pemberton [APNIC 38]
SDN Demystified, by Dean Pemberton [APNIC 38]
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Tutorial on SDN data plane evolution
Tutorial on SDN data plane evolutionTutorial on SDN data plane evolution
Tutorial on SDN data plane evolution
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
MapReduce on Zero VM
MapReduce on Zero VM MapReduce on Zero VM
MapReduce on Zero VM
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_en
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

The Legion Programming Model for HPC

  • 1.
  • 2. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 Programming HPC systems is hard today and will only get harder without new approaches n  Aspects of programming future large-scale systems •  Focusing on full-system, data-awareness and improved productivity —  Programming in the small is not the challenge, we must take a broader view —  Co-designing the full tool chain with applications —  End-to-end awareness is required to avoid point solutions “non-solutions” •  Targeting large scale dynamic computation environments —  Hardware dynamics: frequency scaling, dark silicon, adaptive routing, … —  Software dynamics: system services, in-situ/co-resident services, … —  Application dynamics: multiscale andmultiphysics n  Programming model goals (What must we deliver?) •  High performance – we must be fast •  Performance portability – across many kinds of machines over many generations •  Programmability – sequential semantics, parallel execution
  • 3. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 Can we fulfill our programming model goals today? We can, to some degree… … but at great cost: Programmer Pain Task graph for one time step on one node… … of a mini-app Do you want to schedule this graph? (High Performance) Do you want to re-schedule this graph for every new machine? (Performance Portability) Do you want to be responsible for generating this graph? (Programmability) Today: programmer’s responsibility Tomorrow: programming system’s responsibility
  • 4. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 § AsyncRecv(X); § DoWork(Y); § Sync(); § F(X);   How much work should I do?   Is this performance portable?   When does forward progress really occur?   What if I have more work and data movement happening in DoWork?   What resources are in use? Where is the data? Who is using it and how?   Is this modular?   What if there is a fault? §  Concept from: Mike Bauer’s Thesis (Stanford), Legion: Programming Distributed Heterogeneous Architectures with Logical Regions We need the right programming abstractions to achieve our goals §  Today’s programming environments: •  Are using the wrong abstrac:ons… •  Focus on control flow, parallelism and have low-level data abstrac:ons (i.e. no data model)
  • 5. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 Our approach to meeting our programing model goals is different Functionally correct application code Mapping to target machine Extraction of parallelism Management of data transfers Task scheduling and Latency hiding Data-Dependent Behavior Compiler/ Runtime understanding of data Legion Programs Legion Mappers Legion Programming System
  • 6. Los Alamos National Laboratory 2/8/17 | 6 Legion: Separation of Concerns Tasks (execution model) Describe parallel execution elements and algorithmic operations with sequential semantics, out-of-order execution Regions (data model) Describe decomposition of computational domain. •  Privileges (read-write, read-only, reduce) •  Coherence (exclusive, atomic) Mapper Describes how tasks and regions should be mapped to the target architecture Mapper allows architecture-specific optimization without effecting the correctness of the task or domain descriptions [=](int i) { rho(i) = … } rho0 rho1 Mapper
  • 7. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 The Legion programming model: enabling the runtime to manage the complexity of a dynamic environment Legion Program Legion Legion Mapper Machine-Independent Specification Defines application correctness Sequential semantics Programmatic interface for performing machine-specific and app-specific mapping Only impacts performance Yesterday: manual mapping Today: programmatic mapping Tomorrow: generated mappers Analysis!
  • 8. Slide 8 Data Model – Logical Regions •  Unbounded set of rows (index space) •  Bounded set of columns (fields) •  Can be partitioned (disjoint or aliased) •  Tasks operate on regions –  Must specify which fields and how they “use” them (fields and privileges: read, write, read+write, exclusive, etc.) –  Allows fields to be “sliced” •  Tasks launched in program order (execution order relaxed based on dependencies) –  “out of order” software processor Index 0 Field 1 Field 2 Field 3 Field 4 Index 1 Index 2 Index 3 Index 4 Index 5 Index 6 Index 7 Index 8 …
  • 9. Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D – LA-UR-16-20235 Our approach shows compelling results with a full application •  High-level application constructs remain in MPI form… - essentially all the initial setup was kept •  Full right hand side function implemented in Legion including all communication (the full stencil computation and all chemistry calculations) •  Ported production combustion application (S3D) §  Sufficient complexity to validate our approach – beyond the “Proxy” n ~2-3X faster than MPI+OpenACC n ~7X faster than MPI only n Enabling new science that is not possible with current approaches 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 Nodes 0 20000 40000 60000 80000 100000 120000 140000 160000 Throughput(Points/s) Legion MPI+OpenACC 1.75 X 2.85X 7.2X
  • 10. Legion & Regent ATPESC 2017 10 •  Legion is –  a C++ runtime –  a programming model •  Regent is a programming language –  For the Legion programming model –  Current implementation is embedded in Lua –  Has an optimizing compiler •  This tutorial will focus on Regent
  • 11. Regent/Legion Design Goals ATPESC 2017 11 •  Sequential semantics –  The better to understand what you write –  Parallelism is extracted automatically •  Throughput-oriented –  The latency of a single thread/process is (mostly) irrelevant –  The overall time is what matters •  Runtime decision making –  Because machines are unpredictable/dynamic
  • 12. Throughput-Oriented ATPESC 2017 12 •  Keep the machine busy •  How? Ideally, –  Every core has a queue of independent work to do –  Every memory unit has a queue of transfers to do –  At all times
  • 13. Consequences ATPESC 2017 13 •  Highly asynchronous –  Minimize synchronization –  Esp. global synchronization •  Sequential semantics but support for parallelism •  Emphasis on describing the structure of data –  Later
  • 14. Regent Stack ATPESC 2017 14 Regent Language and compiler Legion High-level runtime Realm Low-level runtime Terra Sequential performance Lua Host language
  • 15. Regent in Lua ATPESC 2017 15 •  Embedded in Lua –  Popular scripting language in the graphics community •  Excellent interoperation with C –  And with other languages •  Python-ish syntax –  For both Lua and Regent
  • 16. •  Examples Overview/1.rg & 2.rg ATPESC 2017 16 •  To run: –  ssh l USER bootcamp.regent-lang.org –  cd Bootcamp/Overview –  qsub r1.sh
  • 18. Tasks ATPESC 2017 18 •  Tasks are Regent’s unit of parallel execution –  Distinguished functions that can be executed asynchronously •  No preemption –  Tasks run until they block or terminate –  And ideally they don’t block …
  • 19. Blocking ATPESC 2017 19 •  Blocking means a task cannot continue –  So the task stops running •  Blocking does not prevent independent work from being done –  If the processor has something else to do –  Does prevent the task from continuing and launching more tasks •  Avoid blocking.
  • 20. Subtasks ATPESC 2017 20 •  Tasks can call subtasks –  Nested parallelism •  Terminology: parent and child tasks
  • 21. Example task tester(sum: int64) end task main() var sum: int64 = summer(10) sum = tester(sum) c.printf("The answer is: %dn",sum) end ATPESC 2017 21
  • 22. If a parent task inspects the result of a child task, the parent task blocks pending completion of the child task. ATPESC 2017 22
  • 23. •  Examples Tasks/1.rg & 2.rg ATPESC 2017 23 •  Reminder: cd Bootcamp/Tasks qsub r1.sh
  • 25. Legion Prof ATPESC 2017 25 •  A tool for showing performance timeline –  Each processor is a timeline –  Each operation is a time interval –  Different kinds of operations have different colors •  White space = idle time
  • 26. Example 1: Legion Prof ATPESC 2017 26 cd Bootcamp/Tasks qsub rp1.sh make prof http://bootcamp.regent-lang.org/~USER/prof1
  • 27. Example 2: Legion Prof ATPESC 2017 27 cd Bootcamp/Tasks qsub rp2.sh make prof http://bootcamp.regent-lang.org/~USER/prof2
  • 28. Mapping •  How does Regent/Legion decide on which processor to run tasks? •  This decision is under the mapper’s control •  Here we are using the default mapper –  Passes out tasks to which CPU on a node is not busy –  Programmers can write their own mappers
  • 30. Example Tasks/3.rg ATPESC 2017 30 •  “for all” style parallelism •  Note the order of completion of the tasks –  main() finishes first (or almost first)! –  All subtasks managed by the runtime system –  Subtasks execute in non-deterministic order •  How? –  Regent notices that the tasks are independent –  No task depends on another task for its inputs
  • 31. Runtime Dependence Analysis ATPESC 2017 31 •  Example Tasks/4.rg is more involved –  Positive tasks (print a positive integer) –  Negative tasks (print a negative integer) •  Some tasks are dependent –  The task for -5 depends on the task for 5 –  Note loop in main() does not block on the value of j! •  Some are independent –  Positive tasks are independent of each other –  Negative tasks are independent of each other
  • 33. Legion Spy •  A tool for showing ordering dependencies •  Very useful for figuring out why things are not running in parallel ATPESC 2017 33
  • 34. Example Tasks/4.rg: Legion Spy ATPESC 2017 34 cd Bootcamp/Tasks qsub rs4.sh make spy http://bootcamp.regent-lang.org/~USER/spy4.pdf
  • 35. Workflow ATPESC 2017 35 •  Use Legion Prof to find idle time –  white space •  Use Legion Spy to examine tasks that are delayed –  What are they waiting for?!
  • 37. Computing the Area of a Unit Circle ATPESC 2017 37 •  A Monte Carlo simulation to compute the area of a unit circle inscribed in a square •  Throw darts –  Fraction of darts landing in the circle = ratio of circle’s area to square’s area 1 y x
  • 38. Computing the Area of a Unit Circle ATPESC 2017 38 •  Example Pi/1.rg –  Slow! –  Why?
  • 39. Exercise 1 ATPESC 2017 39 •  Compare Pi/1.rg and Pi/x1.rg –  Identify use of multiple trials per subtask •  Modify Pi/x1.rg (change “terra hits” to “task hits”) •  Uses –  4 subtasks –  2500 trials per subtask • Which is faster? Why? –  Hint: Use Legion Prof and Legion Spy
  • 41. Leaf Tasks ATPESC 2017 41 •  Leaf tasks call no other tasks –  The “leaves” of the task tree •  Leaf tasks are sequential programs –  And generally where the heavy compute will be •  Thus, leaf tasks should be optimized for latency, not throughput –  Want them to finish as fast as possible!
  • 42. Terra ATPESC 2017 42 •  Terra is a low-level, typed language embedded in Lua •  Designed to be like C –  And to compile to similarly efficient code •  Also supports vector intrinsics –  Not illustrated today
  • 43. Terra Example ATPESC 2017 43 •  Terra/1.rg converts the hits task in Terra/ x1.rg to a Terra function •  Trivial in this example –  Just change task to terra –  Marginally faster •  On average
  • 44. Considerations in Writing Regent Programs ATPESC 2017 44 •  The granularity of tasks must be sufficient –  Don’t write very short running tasks •  Don’t block in tasks that launch many subtasks •  Terra is an option for heavy sequential computations
  • 46. Regions ATPESC 2017 38 •  A region is a (typed) collection •  Regions are the cross product of –  An index space –  A field space
  • 48. Discussion ATPESC 2017 48 •  Regions are the way to organize large data collections in Regent •  Regions can be –  Structured (e.g., like arrays) –  Unstructured (e.g., pointer data structures) •  Any number of fields •  Built-in support for 1D, 2D and 3D index spaces
  • 49. Privileges ATPESC 2017 49 •  A task that takes region arguments must –  Declare its privileges on the region –  Reads, Writes, Reduces •  The task may only perform operations for which it has privileges –  Including any subtasks it calls
  • 50. •  Example StructuredRegions/2.rg •  Example StructuredRegions/3.rg ATPESC 2017 50
  • 51. Reduction Privileges ATPESC 2017 51 •  StructuredRegions/4.rg –  A sequence of tasks that increment elements of a region –  With Read/Write privileges •  StructuredRegions/5.rg –  4.rg but with Reduction privileges •  Note: Reductions can create additional copies –  To get more parallelism –  Under mapper control –  Not always preferred to Read/Write privileges
  • 53. Partitioning ATPESC 2017 53 •  To enable parallelism on a region, partition it into smaller pieces –  And then run a task on each piece •  Legion/Regent have a rich set of partitioning primitives
  • 54. Partitioning Example ATPESC 2017 54 0 1 2 3 4 5 6 7 8 9 Bit false false false false false true true true true false
  • 55. Partitioning Example ATPESC 2017 55 0 1 2 3 4 5 6 7 8 9 bit_region_partition[0] bit_region_partition[1] false false false false false true true true true false Bit
  • 56. Equal Partitions •  One commonly used primitive is to split a region into a number of (nearly) equal size subregions •  Partitioning/1.rg •  Partitioning/2.rg ATPESC 2017 56
  • 57. Discussion ATPESC 2017 57 •  Partitioning does not create copies –  It names subsets of the data •  Partitioning does not remove the parent region –  It still exists and can be used •  Regions and partitions are first-class values –  Can be created, destroyed, stored in data structures, passed to and returned from tasks
  • 58. Region Trees 0 bit_region 1 2 3 4 5 ATPESC 2017 58
  • 59. More Discussion ATPESC 2017 59 •  The same data can be partitioned multiple ways –  Again, these are just names for subsets •  Subregions can themselves be partitioned
  • 60. Dependence Analysis ATPESC 2017 60 •  Regent uses tasks’ region arguments to compute which tasks can run in parallel –  What region is being accessed •  Does it overlap with another region that is in use? –  What field is being accessed •  If a task is using an overlapping region, is it using the same field? –  What are the privileges? •  If two tasks are accessing the same field, are they both reading or both reducing?
  • 61. A Crucial Fact ATPESC 2017 61 •  Regent analyzes sibling tasks –  Tasks launched directly by the same parent task •  Theorem: Analyzing dependencies between sibling tasks is sufficient to guarantee sequential semantics - Question: Why does this hold? (Intuitively) •  Never check for dependencies otherwise –  Crucial to the overall design of Regent
  • 62. Consequences ATPESC 2017 62 •  Dependence analysis is a source of runtime overhead •  Can be reduced by reducing the number of sibling tasks –  Group some tasks into subtasks •  But beware! –  This may also reduce the available parallelism •  Partitioning/3.rg
  • 63. Partitioning/3.rg ATPESC 2017 63 •  Note that passing a region to a task does not mean the data is copied to where that task runs –  C.f., launcher task must name the parent region for type checking reasons •  If the task doesn’t touch a region/field, that data doesn’t need to move
  • 64. Fills ATPESC 2017 64 •  A better way to initialize regions is to use fill operations fill(region.field, value) •  Partitioning/4.rg
  • 65. Multiple Partitions ATPESC 2017 65 0 bit_region 1 2 10 elements each 3 4 5 0 1 2 20 elements each
  • 66. Discussion •  Different views onto the same data •  Again, can have multiple views in use at the same time •  Regent will figure out the data dependencies ATPESC 2017 66
  • 67. Exercise 2 •  Modify Partitioning/4.rg to •  Have two partitions of bit_region –  One with 3 subregions of size 20 –  One with 6 subregions of size 10 •  In a loop, alternately launch subtasks on one partition and then the other •  Edit x2.rg ATPESC 2017 59
  • 68. Aliased Partitions ATPESC 2017 68 •  So far all of our examples have been disjoint partitions •  It is also possible for partitions to be aliased –  The subregions overlap •  Partitioning/5.rg
  • 69. Partitioning Summary ATPESC 2017 69 •  Significant Regent applications have interesting region trees –  Multiple views –  Aliased partitions –  Multiple levels of nesting •  And complex task dependencies –  Subregions, fields, privileges, coherence •  Regions express locality –  Data that will be used together –  An example of a “local address space” design •  Tasks can only access their region arguments
  • 70. Regions Review ATPESC 2017 70 •  A region is a (typed) collection •  Regions are the cross product of –  An index space –  A field space •  A structured region has a structured index space –  E.g., int1d, int2d, int3d
  • 71. ATPESC 2017 70 Dependent Partitioning
  • 72. Partitioning, Revisited ATPESC 2017 72 •  Why do we want to partition data? –  For parallelism –  We will launch many tasks over many subregions •  A problem –  We often need to partition multiple data structures in a consistent way –  E.g., given that we have partitioned the nodes a particular way, that will dictate the desired partitioning of the edges
  • 73. Dependent Partitioning ATPESC 2017 73 •  Distinguish two kinds of partitions •  Independent partitions –  Computed from the parent region, using, e.g., •  partition(equals, … ) •  Dependent partitions –  Computed using another partition
  • 74. Dependent Partitioning Operations ATPESC 2017 74 •  Image –  Use the image of a field in a partition to define a new partition •  Preimage –  Use the pre-image of a field in a partition … •  Set operations –  Form new partitions using the intersection, union, and set difference of other partitions
  • 75. Image •  Computes elements reachable via a field lookup –  Can be applied to index space or another partition –  Computation is distributed based on location of data •  Regent understands relationship between partitions –  Can check safety of region relation assertions at compile time IS 1 IS2 source partition pointer field destination index space s1 s2 s3 s1 s2 s3 ATPESC 2017 75
  • 76. Preimage •  Inverse of image –  Computes elements that reach a given subspace –  Preserves disjointness •  Multiple images/preimages can be combined –  Can capture complex task access patterns IS1 IS2 source partition pointer field destination index space s1 s2 s3 s1 s2 s3 ATPESC 2017 76