SlideShare a Scribd company logo
1 of 30
An Introduction to SequenceL
Auto-Parallelizing Programming Language and Toolset
www.texasmulticore.com
Brad Nemanich, PhD
Chief Technology Officer
Why is SequenceL Needed?
”The way the processor industry is going is
to add more and more cores, but nobody
knows how to program those things. I mean,
two, yeah; four, not really; eight, forget it.”
– Steve Jobs
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved2
This shift now affects every software company,
large enterprise, and government agency that
develops software
Current (Manual) Approach to Multicore Programming
1. Be sure you identify truly independent computations.
2. Implement concurrency at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers of
cores.
4. Make use of thread-safe libraries wherever possible.
5. Use the right threading model.
6. Never assume a particular order of execution.
7. Use thread-local storage whenever possible; associate locks to specific
data, if needed.
8. Don’t be afraid to change the algorithm for a better chance of
concurrency.
8 “Simple” Rules for Designing Threaded Applications
(0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved3
Current (Manual) Approach to Multicore Programming
1. Be sure you identify truly independent computations.
2. Implement concurrency at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers of
cores.
4. Make use of thread-safe libraries wherever possible.
5. Use the right threading model.
6. Never assume a particular order of execution.
7. Use thread-local storage whenever possible; associate locks to specific
data, if needed.
8. Don’t be afraid to change the algorithm for a better chance of
concurrency.
8 “Simple” Rules for Designing Threaded Applications
(0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved4
“The significant problems we face cannot be solved using
the same level of thinking we used when we created them.”
-Albert Einstein
“Parallel Ninja” Approach Does Not Scale
 How do you:
─ find them?
─ afford them?
─ retain them?
─ support rapid innovation?
─ ensure accuracy and correctness?
─ keep them current on platform technologies?
─ do this for all your software?
Einstein was right;
There’s a much better way….
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved5
It’s Time to Change the Game (Again)
6
Wiring Machine CodeWiring
Machine Code Machine Code
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
It’s Time to Change the Game (Again)
7
Wiring Machine CodeWiring
Machine Code Machine Code
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949
2004: Multicore
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
It’s Time to Change the Game (Again)
8
Wiring Machine CodeWiring
Machine Code Machine Code
Assembly
Language
Netlist
Netlist
1954 1957 1980
Machine Code
HLL + Compiler
(Fortran, COBOL,
PL/I, Lisp, C,…)
Machine Code
Object Oriented
(SmallTalk, C++,
Java, C#,)
19491949 2014
Machine Code
Object Oriented
C++
Functional,
Auto-
Parallelizing
Object Oriented
C++
Functional,
Auto-
Parallelizing
2004: Multicore
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
SequenceL is a Game Changer
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved9
Faster Performance;
Uses all cores, GPUs
10X Faster Time to
Innovation/Market
Get it Right the
First Time
Quickly Leverage New
Computing Platforms
Built Upon Open Industry
Standards; Works with Existing
Tools & Methodologies
Customer Example: Industrial Control Networking
(WirelessHART, IEC 62591, IEEE 802.15.4)
 New algorithm, developed for large, noisy industrial
process control environments
─ Presented white paper to IEEE
─ Won an award
 Asked TMT to implement for comparison purposes
─ Finished in SequenceL in 3 weeks
 10X faster performance and right the first time
─ Java finished by the inventors in 3 months
 Had errors and much slower; used SequenceL code to debug Java
 Another month getting code correct
 A 5th month improving performance that still fell short
 Bottom line
─ SL was finished in 15% of the time
─ SL was correct the first time
─ SL out-performed the Java code 1.5x-3.0x on a 2 core AMD APU
─ Robust and fast code, fast time to market
10
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Customer Example: Video Processing Using SequenceL
 Goal: 30Hz to keep up with input video feed
 Best performance (8 core x86 platform)
─ 58 Hz: SequenceL
─ 21 Hz: Matlab (Interpreter)
─ 1.2 Hz: Matlab (Coder/C-out)
Input video feed
(e.g.- Apache helicopter gyro camera)
Processed video
(Proprietary algorithms remove air
turbulence, radiated heat, etc.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved11
Customer Example: Video Processing Using SequenceL
 Goal: 30Hz to keep up with input video feed
 Best performance (8 core x86 platform)
─ 58 Hz: SequenceL
─ 21 Hz: Matlab (Interpreter)
─ 1.2 Hz: Matlab (Coder/C-out)
Input video feed
(e.g.- Apache helicopter gyro camera)
Processed video
(Proprietary algorithms remove air
turbulence, radiated heat, etc.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved12
Customer Example: Video Processing Using SequenceL
 Goal: 30Hz to keep up with input video feed
 Best performance (8 core x86 platform)
─ 58 Hz: SequenceL
─ 21 Hz: Matlab (Interpreter)
─ 1.2 Hz: Matlab (Coder/C-out)
Input video feed
(e.g.- Apache helicopter gyro camera)
Processed video
(Proprietary algorithms remove air
turbulence, radiated heat, etc.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved13
Customer Example: Video Processing Using SequenceL
 Goal: 30Hz to keep up with input video feed
 Best performance (8 core x86 platform)
─ 58 Hz: SequenceL
─ 21 Hz: Matlab (Interpreter)
─ 1.2 Hz: Matlab (Coder/C-out)
Input video feed
(e.g.- Apache helicopter gyro camera)
Processed video
(Proprietary algorithms remove air
turbulence, radiated heat, etc.)
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved14
What is SequenceL?
SequenceL is a…
 High-Abstraction
 Functional
 Self-Parallelizing
…programming language and tool set
….designed to work in concert with other
popular programming languages and tools
15
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
High-Abstraction, High Performance
 Most common programming languages are imperative
─ Detailed sequence of commands for carrying out the computation;
i.e.- tell the computer both “what” to do and “how” to do it
─ Inherently sequential, written for classic Von Neumann computers
─ e.g.- C/C++, Java, C#, Python, Fortran
─ Some add explicit “directives” to manually enable low-level parallelism
 SequenceL is declarative & functional – higher abstraction
─ Describe the desired output in terms of the input, as functions;
i.e.- tell the computer only “what” to do, so no thinking about parallel
─ Abstracts away complex multicore and many-core platforms
 Best analogy is SQL database language
─ A programmer could write their own database procedures in low level C
─ But would be error-prone and not perform as well as with Oracle or DB2
16
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Drops Into Your Current Design Flow
 Designed to work in concert with
other programming languages,
legacy code and libraries
 Additive: works with existing
design flows, tools, and training
 Builds upon open industry
standards
17
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Drops Into Your Current Design Flow
 Adds a multicore “power tool” to
the programmers toolbox
 Complete add-on solution
─ IDE plug-ins, debugger, interpreter, auto-
parallelizing compiler, runtime environment
 Easy to modernize legacy applications
─ Parallel C++ output enables just a portion to
be refactored in SequenceL and linked in
─ Uses Vector (SIMD) processor instructions
─ Automatic OpenCL generation averts the
need to learn and incorporate low-level
CUDA or OpenCL code and associated
scaffolding to exploit systems with (GP)GPUs
─ Often faster to refactor portions of code in
SequenceL than find and fix bugs in old code
18
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
The Problem With Directive-Based Programming
Example: 3-body problem
//P1
a1 = grav(P1, P2, m2) + grav(P1, P3, m3);
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
//P2
a2 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
//P3
a3 = grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
19
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
The Problem With Directive-Based Programming
Example: 3-body problem
//P1
a1 = grav(P1, P2, m2) + grav(P1, P3, m3);
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
//P2
a2 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
//P3
a3 = grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
Each body can be
calculated at the same
time to give in theory a
3x speedup
20
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
The Problem With Directive-Based Programming
Example: 3-body problem
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
{
a1 = grav(P1, P2, m2) + grav(P1, P3, m3);
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
}
#pragma omp task
{
a2 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
}
#pragma omp task
{
a3 = grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
}
#pragma omp taskwait
}
Using directive-based
approaches like OpenMP,
the burden is on the
programmer to identify
where the program can
be safely parallelized.
Programmer then has to
add the correct pragmas.
21
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
The Problem With Directive-Based Programming
Example: 3-body problem
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
{
a1 = grav(P1, P2, m2) + grav(P1, P3, m3);
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
}
#pragma omp task
{
a2 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
}
#pragma omp task
{
a3 = grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
}
#pragma omp taskwait
}
But maybe you could
parallelize other things…
22
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
The Problem With Directive-Based Programming
Example: 3-body problem
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
g1 = grav(P1, P2, m2);
#pragma omp task
g2 = grav(P1, P3, m3);
#pragma omp task
g3 = grav(P2, P1, m1);
#pragma omp task
g4 = grav(P2, P3, m3);
#pragma omp task
g5 = grav(P3, P2, m2);
#pragma omp task
g6 = grav(P3, P1, m1);
#pragma omp taskwait
}
a1 = g1 + g2;
dv1 = a1*dt;
v1 = v1 + dv1;
dp1 = v1*dt;
a2 = g3 + g4;
dv2 = a2*dt;
v2 = v2 + dv2;
dp2 = v2*dt;
a3 = g5 + g6;
dv3 = a3*dt;
v3 = v3 + dv3;
dp3 = v3*dt;
But now you have to start
re-arranging the code,
moving further away from
the original description of
the algorithm
Possible Race Conditions!
If the grav function modifies its
inputs or calls non thread-safe
functions, there could be hard to
detect race conditions, leading to
incorrect results
23
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
SequenceL: Self-Parallelizes, Race-Free, Readable
Example: 3-body problem
threeBody(P1, m1, P2, m2, P3, m3, dt) :=
let
a1 := grav(P1, P2, m2) + grav(P1, P2, m2);
dv1 := a1*dt;
v1 := v1 + dv1;
dp1 := v1*dt;
a2 := g3 = grav(P2, P1, m1) + grav(P2, P3, m3);
dv2 := a2*dt;
v2 := v2 + dv2;
dp2 := v2*dt;
a3 := grav(P3, P2, m2) + grav(P3, P1, m1);
dv3 := a3*dt;
v3 := v3 + dv3;
dp3 := v3*dt;
in
[dp1, dp2, dp3];
With SequenceL the programmer
does not add any parallel
constructs or pragmas.
The program will self-parallelize if
safe to do so (No race conditions).
Code clarity and intent remain,
greatly improving correctness and
quality.
Subsequent enhancements and
innovations are rapid.
This ease of reading/writing
is not by accident.
24
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Ease of Reading/Writing SequenceL
 Matrix Multiply:
─ The product of an m×p matrix A with a p×n matrix B is
an m×n matrix denoted AB whose entries are given by:
𝐴𝐵 𝑖𝑗 = 𝑘=1
𝑝
𝐴𝑖𝑘 𝐵 𝑘𝑗
25
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Ease of Reading/Writing SequenceL
 Matrix Multiply in Java:
𝐴𝐵 𝑖𝑗 = 𝑘=1
𝑝
𝐴𝑖𝑘 𝐵 𝑘𝑗
26
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Ease of Reading/Writing SequenceL
 Matrix Multiply in SequenceL:
─ The product of an m×p matrix A with a p×n matrix B is
an m×n matrix denoted AB whose entries are given by:
𝐴𝐵 𝑖𝑗 = 𝑘=1
𝑝
𝐴𝑖𝑘 𝐵 𝑘𝑗
27
- or -
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
High-Abstraction, High Performance
-
10
20
30
40
50
60
70
C++ Ref. 1 2 4 8 16 32
X
Cores
Matrix Multiply Acceleration
Reference = sequential C++
28
 Parallel Matrix Multiply in SequenceL:
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
Sample SequenceL Performance Speedups
29
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 2 4 6 8 10 12 14 16
Matrix Multiply
Game Of Life
2D FFT
LU factorization
QuickSort
String Search
Barnes-Hut
n-Body
Matrix Inverse
Sparse Matrix
Compression
Adesk (DC)
Adesk (LW)
Matrix Multiply
(blocking)
Semblance
Speech filter
Perfect
Number of Processor Cores
TimesFaster
© 2015 Texas Multicore Technologies, Inc.
All Rights Reserved
To learn more:
Watch an short 3-part video tutorial at:
http://www.texasmulticoretechnologies.com/resources/videos/
Email: sales@texasmulticore.com for a free 45 day trial
www.texasmulticore.com

More Related Content

What's hot

Alley vsu functional_coverage_1f
Alley vsu functional_coverage_1fAlley vsu functional_coverage_1f
Alley vsu functional_coverage_1f
Obsidian Software
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
DVClub
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
imec.archive
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Pradeep Singh
 
Standard embedded c
Standard embedded cStandard embedded c
Standard embedded c
Tam Thanh
 
Track B- Advanced ESL verification - Mentor
Track B- Advanced ESL verification - MentorTrack B- Advanced ESL verification - Mentor
Track B- Advanced ESL verification - Mentor
chiportal
 

What's hot (19)

Design and Optimize your code for high-performance with Intel® Advisor and I...
Design and Optimize your code for high-performance with Intel®  Advisor and I...Design and Optimize your code for high-performance with Intel®  Advisor and I...
Design and Optimize your code for high-performance with Intel® Advisor and I...
 
Tools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software ApplicationsTools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software Applications
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08
 
Alley vsu functional_coverage_1f
Alley vsu functional_coverage_1fAlley vsu functional_coverage_1f
Alley vsu functional_coverage_1f
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
GPU Design on FPGA
GPU Design on FPGAGPU Design on FPGA
GPU Design on FPGA
 
Automatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCVAutomatic License Plate Recognition using OpenCV
Automatic License Plate Recognition using OpenCV
 
The new reality and tremendous opportunity of open source processing
The new reality and tremendous opportunity of open source processingThe new reality and tremendous opportunity of open source processing
The new reality and tremendous opportunity of open source processing
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
Chris brown ti
Chris brown tiChris brown ti
Chris brown ti
 
SWEET - A Tool for WCET Flow Analysis - Björn Lisper
SWEET - A Tool for WCET Flow Analysis - Björn LisperSWEET - A Tool for WCET Flow Analysis - Björn Lisper
SWEET - A Tool for WCET Flow Analysis - Björn Lisper
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Hemanth_Krishnan_resume
Hemanth_Krishnan_resumeHemanth_Krishnan_resume
Hemanth_Krishnan_resume
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing
 
Standard embedded c
Standard embedded cStandard embedded c
Standard embedded c
 
Track B- Advanced ESL verification - Mentor
Track B- Advanced ESL verification - MentorTrack B- Advanced ESL verification - Mentor
Track B- Advanced ESL verification - Mentor
 

Similar to SequenceL Auto-Parallelizing Toolset Intro slideshare

“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
Todd Nguyen
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
WE-IT TUTORIALS
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
Todd Nguyen
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
Todd Nguyen
 

Similar to SequenceL Auto-Parallelizing Toolset Intro slideshare (20)

SequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggageSequenceL gets rid of decades of programming baggage
SequenceL gets rid of decades of programming baggage
 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
 
TMT SequenceL customer use cases and results
TMT SequenceL customer use cases and resultsTMT SequenceL customer use cases and results
TMT SequenceL customer use cases and results
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
 
Os Lamothe
Os LamotheOs Lamothe
Os Lamothe
 
Fine line between performance and security
Fine line between performance and securityFine line between performance and security
Fine line between performance and security
 
Ankit sarin
Ankit sarinAnkit sarin
Ankit sarin
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel
 
Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)Introduction to C to Hardware (programming FPGAs and CPLDs in C)
Introduction to C to Hardware (programming FPGAs and CPLDs in C)
 
Larson and toubro
Larson and toubroLarson and toubro
Larson and toubro
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
 
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
 
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
 
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
“eXtending” the Automation Toolbox: Introduction to TwinCAT 3 Software and eX...
 
SoftwareEngineer
SoftwareEngineerSoftwareEngineer
SoftwareEngineer
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 

Recently uploaded

Recently uploaded (20)

Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 

SequenceL Auto-Parallelizing Toolset Intro slideshare

  • 1. An Introduction to SequenceL Auto-Parallelizing Programming Language and Toolset www.texasmulticore.com Brad Nemanich, PhD Chief Technology Officer
  • 2. Why is SequenceL Needed? ”The way the processor industry is going is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight, forget it.” – Steve Jobs © 2015 Texas Multicore Technologies, Inc. All Rights Reserved2 This shift now affects every software company, large enterprise, and government agency that develops software
  • 3. Current (Manual) Approach to Multicore Programming 1. Be sure you identify truly independent computations. 2. Implement concurrency at the highest level possible. 3. Plan early for scalability to take advantage of increasing numbers of cores. 4. Make use of thread-safe libraries wherever possible. 5. Use the right threading model. 6. Never assume a particular order of execution. 7. Use thread-local storage whenever possible; associate locks to specific data, if needed. 8. Don’t be afraid to change the algorithm for a better chance of concurrency. 8 “Simple” Rules for Designing Threaded Applications (0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved3
  • 4. Current (Manual) Approach to Multicore Programming 1. Be sure you identify truly independent computations. 2. Implement concurrency at the highest level possible. 3. Plan early for scalability to take advantage of increasing numbers of cores. 4. Make use of thread-safe libraries wherever possible. 5. Use the right threading model. 6. Never assume a particular order of execution. 7. Use thread-local storage whenever possible; associate locks to specific data, if needed. 8. Don’t be afraid to change the algorithm for a better chance of concurrency. 8 “Simple” Rules for Designing Threaded Applications (0. Hire team of “Parallel Ninjas”, PhD experts in computer architecture.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved4 “The significant problems we face cannot be solved using the same level of thinking we used when we created them.” -Albert Einstein
  • 5. “Parallel Ninja” Approach Does Not Scale  How do you: ─ find them? ─ afford them? ─ retain them? ─ support rapid innovation? ─ ensure accuracy and correctness? ─ keep them current on platform technologies? ─ do this for all your software? Einstein was right; There’s a much better way…. © 2015 Texas Multicore Technologies, Inc. All Rights Reserved5
  • 6. It’s Time to Change the Game (Again) 6 Wiring Machine CodeWiring Machine Code Machine Code Assembly Language Netlist Netlist 1954 1957 1980 Machine Code HLL + Compiler (Fortran, COBOL, PL/I, Lisp, C,…) Machine Code Object Oriented (SmallTalk, C++, Java, C#,) 19491949 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 7. It’s Time to Change the Game (Again) 7 Wiring Machine CodeWiring Machine Code Machine Code Assembly Language Netlist Netlist 1954 1957 1980 Machine Code HLL + Compiler (Fortran, COBOL, PL/I, Lisp, C,…) Machine Code Object Oriented (SmallTalk, C++, Java, C#,) 19491949 2004: Multicore © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 8. It’s Time to Change the Game (Again) 8 Wiring Machine CodeWiring Machine Code Machine Code Assembly Language Netlist Netlist 1954 1957 1980 Machine Code HLL + Compiler (Fortran, COBOL, PL/I, Lisp, C,…) Machine Code Object Oriented (SmallTalk, C++, Java, C#,) 19491949 2014 Machine Code Object Oriented C++ Functional, Auto- Parallelizing Object Oriented C++ Functional, Auto- Parallelizing 2004: Multicore © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 9. SequenceL is a Game Changer © 2015 Texas Multicore Technologies, Inc. All Rights Reserved9 Faster Performance; Uses all cores, GPUs 10X Faster Time to Innovation/Market Get it Right the First Time Quickly Leverage New Computing Platforms Built Upon Open Industry Standards; Works with Existing Tools & Methodologies
  • 10. Customer Example: Industrial Control Networking (WirelessHART, IEC 62591, IEEE 802.15.4)  New algorithm, developed for large, noisy industrial process control environments ─ Presented white paper to IEEE ─ Won an award  Asked TMT to implement for comparison purposes ─ Finished in SequenceL in 3 weeks  10X faster performance and right the first time ─ Java finished by the inventors in 3 months  Had errors and much slower; used SequenceL code to debug Java  Another month getting code correct  A 5th month improving performance that still fell short  Bottom line ─ SL was finished in 15% of the time ─ SL was correct the first time ─ SL out-performed the Java code 1.5x-3.0x on a 2 core AMD APU ─ Robust and fast code, fast time to market 10 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 11. Customer Example: Video Processing Using SequenceL  Goal: 30Hz to keep up with input video feed  Best performance (8 core x86 platform) ─ 58 Hz: SequenceL ─ 21 Hz: Matlab (Interpreter) ─ 1.2 Hz: Matlab (Coder/C-out) Input video feed (e.g.- Apache helicopter gyro camera) Processed video (Proprietary algorithms remove air turbulence, radiated heat, etc.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved11
  • 12. Customer Example: Video Processing Using SequenceL  Goal: 30Hz to keep up with input video feed  Best performance (8 core x86 platform) ─ 58 Hz: SequenceL ─ 21 Hz: Matlab (Interpreter) ─ 1.2 Hz: Matlab (Coder/C-out) Input video feed (e.g.- Apache helicopter gyro camera) Processed video (Proprietary algorithms remove air turbulence, radiated heat, etc.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved12
  • 13. Customer Example: Video Processing Using SequenceL  Goal: 30Hz to keep up with input video feed  Best performance (8 core x86 platform) ─ 58 Hz: SequenceL ─ 21 Hz: Matlab (Interpreter) ─ 1.2 Hz: Matlab (Coder/C-out) Input video feed (e.g.- Apache helicopter gyro camera) Processed video (Proprietary algorithms remove air turbulence, radiated heat, etc.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved13
  • 14. Customer Example: Video Processing Using SequenceL  Goal: 30Hz to keep up with input video feed  Best performance (8 core x86 platform) ─ 58 Hz: SequenceL ─ 21 Hz: Matlab (Interpreter) ─ 1.2 Hz: Matlab (Coder/C-out) Input video feed (e.g.- Apache helicopter gyro camera) Processed video (Proprietary algorithms remove air turbulence, radiated heat, etc.) © 2015 Texas Multicore Technologies, Inc. All Rights Reserved14
  • 15. What is SequenceL? SequenceL is a…  High-Abstraction  Functional  Self-Parallelizing …programming language and tool set ….designed to work in concert with other popular programming languages and tools 15 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 16. High-Abstraction, High Performance  Most common programming languages are imperative ─ Detailed sequence of commands for carrying out the computation; i.e.- tell the computer both “what” to do and “how” to do it ─ Inherently sequential, written for classic Von Neumann computers ─ e.g.- C/C++, Java, C#, Python, Fortran ─ Some add explicit “directives” to manually enable low-level parallelism  SequenceL is declarative & functional – higher abstraction ─ Describe the desired output in terms of the input, as functions; i.e.- tell the computer only “what” to do, so no thinking about parallel ─ Abstracts away complex multicore and many-core platforms  Best analogy is SQL database language ─ A programmer could write their own database procedures in low level C ─ But would be error-prone and not perform as well as with Oracle or DB2 16 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 17. Drops Into Your Current Design Flow  Designed to work in concert with other programming languages, legacy code and libraries  Additive: works with existing design flows, tools, and training  Builds upon open industry standards 17 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 18. Drops Into Your Current Design Flow  Adds a multicore “power tool” to the programmers toolbox  Complete add-on solution ─ IDE plug-ins, debugger, interpreter, auto- parallelizing compiler, runtime environment  Easy to modernize legacy applications ─ Parallel C++ output enables just a portion to be refactored in SequenceL and linked in ─ Uses Vector (SIMD) processor instructions ─ Automatic OpenCL generation averts the need to learn and incorporate low-level CUDA or OpenCL code and associated scaffolding to exploit systems with (GP)GPUs ─ Often faster to refactor portions of code in SequenceL than find and fix bugs in old code 18 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 19. The Problem With Directive-Based Programming Example: 3-body problem //P1 a1 = grav(P1, P2, m2) + grav(P1, P3, m3); dv1 = a1*dt; v1 = v1 + dv1; dp1 = v1*dt; //P2 a2 = grav(P2, P1, m1) + grav(P2, P3, m3); dv2 = a2*dt; v2 = v2 + dv2; dp2 = v2*dt; //P3 a3 = grav(P3, P2, m2) + grav(P3, P1, m1); dv3 = a3*dt; v3 = v3 + dv3; dp3 = v3*dt; 19 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 20. The Problem With Directive-Based Programming Example: 3-body problem //P1 a1 = grav(P1, P2, m2) + grav(P1, P3, m3); dv1 = a1*dt; v1 = v1 + dv1; dp1 = v1*dt; //P2 a2 = grav(P2, P1, m1) + grav(P2, P3, m3); dv2 = a2*dt; v2 = v2 + dv2; dp2 = v2*dt; //P3 a3 = grav(P3, P2, m2) + grav(P3, P1, m1); dv3 = a3*dt; v3 = v3 + dv3; dp3 = v3*dt; Each body can be calculated at the same time to give in theory a 3x speedup 20 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 21. The Problem With Directive-Based Programming Example: 3-body problem #pragma omp parallel #pragma omp single nowait { #pragma omp task { a1 = grav(P1, P2, m2) + grav(P1, P3, m3); dv1 = a1*dt; v1 = v1 + dv1; dp1 = v1*dt; } #pragma omp task { a2 = grav(P2, P1, m1) + grav(P2, P3, m3); dv2 = a2*dt; v2 = v2 + dv2; dp2 = v2*dt; } #pragma omp task { a3 = grav(P3, P2, m2) + grav(P3, P1, m1); dv3 = a3*dt; v3 = v3 + dv3; dp3 = v3*dt; } #pragma omp taskwait } Using directive-based approaches like OpenMP, the burden is on the programmer to identify where the program can be safely parallelized. Programmer then has to add the correct pragmas. 21 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 22. The Problem With Directive-Based Programming Example: 3-body problem #pragma omp parallel #pragma omp single nowait { #pragma omp task { a1 = grav(P1, P2, m2) + grav(P1, P3, m3); dv1 = a1*dt; v1 = v1 + dv1; dp1 = v1*dt; } #pragma omp task { a2 = grav(P2, P1, m1) + grav(P2, P3, m3); dv2 = a2*dt; v2 = v2 + dv2; dp2 = v2*dt; } #pragma omp task { a3 = grav(P3, P2, m2) + grav(P3, P1, m1); dv3 = a3*dt; v3 = v3 + dv3; dp3 = v3*dt; } #pragma omp taskwait } But maybe you could parallelize other things… 22 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 23. The Problem With Directive-Based Programming Example: 3-body problem #pragma omp parallel #pragma omp single nowait { #pragma omp task g1 = grav(P1, P2, m2); #pragma omp task g2 = grav(P1, P3, m3); #pragma omp task g3 = grav(P2, P1, m1); #pragma omp task g4 = grav(P2, P3, m3); #pragma omp task g5 = grav(P3, P2, m2); #pragma omp task g6 = grav(P3, P1, m1); #pragma omp taskwait } a1 = g1 + g2; dv1 = a1*dt; v1 = v1 + dv1; dp1 = v1*dt; a2 = g3 + g4; dv2 = a2*dt; v2 = v2 + dv2; dp2 = v2*dt; a3 = g5 + g6; dv3 = a3*dt; v3 = v3 + dv3; dp3 = v3*dt; But now you have to start re-arranging the code, moving further away from the original description of the algorithm Possible Race Conditions! If the grav function modifies its inputs or calls non thread-safe functions, there could be hard to detect race conditions, leading to incorrect results 23 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 24. SequenceL: Self-Parallelizes, Race-Free, Readable Example: 3-body problem threeBody(P1, m1, P2, m2, P3, m3, dt) := let a1 := grav(P1, P2, m2) + grav(P1, P2, m2); dv1 := a1*dt; v1 := v1 + dv1; dp1 := v1*dt; a2 := g3 = grav(P2, P1, m1) + grav(P2, P3, m3); dv2 := a2*dt; v2 := v2 + dv2; dp2 := v2*dt; a3 := grav(P3, P2, m2) + grav(P3, P1, m1); dv3 := a3*dt; v3 := v3 + dv3; dp3 := v3*dt; in [dp1, dp2, dp3]; With SequenceL the programmer does not add any parallel constructs or pragmas. The program will self-parallelize if safe to do so (No race conditions). Code clarity and intent remain, greatly improving correctness and quality. Subsequent enhancements and innovations are rapid. This ease of reading/writing is not by accident. 24 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 25. Ease of Reading/Writing SequenceL  Matrix Multiply: ─ The product of an m×p matrix A with a p×n matrix B is an m×n matrix denoted AB whose entries are given by: 𝐴𝐵 𝑖𝑗 = 𝑘=1 𝑝 𝐴𝑖𝑘 𝐵 𝑘𝑗 25 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 26. Ease of Reading/Writing SequenceL  Matrix Multiply in Java: 𝐴𝐵 𝑖𝑗 = 𝑘=1 𝑝 𝐴𝑖𝑘 𝐵 𝑘𝑗 26 © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 27. Ease of Reading/Writing SequenceL  Matrix Multiply in SequenceL: ─ The product of an m×p matrix A with a p×n matrix B is an m×n matrix denoted AB whose entries are given by: 𝐴𝐵 𝑖𝑗 = 𝑘=1 𝑝 𝐴𝑖𝑘 𝐵 𝑘𝑗 27 - or - © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 28. High-Abstraction, High Performance - 10 20 30 40 50 60 70 C++ Ref. 1 2 4 8 16 32 X Cores Matrix Multiply Acceleration Reference = sequential C++ 28  Parallel Matrix Multiply in SequenceL: © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 29. Sample SequenceL Performance Speedups 29 0.00 2.00 4.00 6.00 8.00 10.00 12.00 0 2 4 6 8 10 12 14 16 Matrix Multiply Game Of Life 2D FFT LU factorization QuickSort String Search Barnes-Hut n-Body Matrix Inverse Sparse Matrix Compression Adesk (DC) Adesk (LW) Matrix Multiply (blocking) Semblance Speech filter Perfect Number of Processor Cores TimesFaster © 2015 Texas Multicore Technologies, Inc. All Rights Reserved
  • 30. To learn more: Watch an short 3-part video tutorial at: http://www.texasmulticoretechnologies.com/resources/videos/ Email: sales@texasmulticore.com for a free 45 day trial www.texasmulticore.com