1. Overview of Parallel
Development
Visual Studio 2010 + a little on Axum and Concurrent Basic
Eric Nelson
Eric.nelson@microsoft.com
http://geekswithblogs.net/iupdateable
http://blogs.msdn.com/goto100
http://twitter.com/ericnel
1
2. Microsoft UK MSDN Flash Newsletter
Every two weeks, pure joy enters your Inbox
MSDN Flash Podcast Pilot
For feedback
http://bit.ly/flashpod1
http://msdn.microsoft.com/uk/flash
MSDN Flash eBook
13 of the “Best Technical Technical Authors wanted
Articles of 2008”
for the Flash – 400 to 500
http://bit.ly/flashebook1
words. Fancy it?
3. Agenda
Overview of what we are up to
Drill down into parallel programming for
managed developers
If we have time, “heads up” on Axum and CB
4. Things I learnt...
We have a very large investment in parallel computing
We have “something for everyone”
It is not all synced, it is sometimes overlapping
It is a big topic
Managed vs native vs client vs server vs task vs data...
Even with the investment, design/code/test for parallel is far
harder
Locking, Deadlocks, Livelocks
It is about getting ready for the future
Code today – run better tomorrow?
VS2010 CTP – not a great place for parallel
Single core in guest
Unsupported route to use Hyper-V
Easiest route to dabble – Microsoft Parallel Extensions June CTP
for VS2008
5. Buying a new Processor
£100 - £300
Core
64-bit
2-3GHz
Core
2 cores or 4
6. Buying a new Processor
£200 - £500
Core Core Core Core
64-bit
2-3GHz
4 cores with HT
Memory Controller
QuickPath
Interconnect
8. Was it a wise purchase?
App 1
My Code
.NET Framework
.NET CLR
App 1 App 2 ...
Windows OS
9. Was it a wise purchase?
Some environments scale to take advantage of
additional CPU cores (mostly server-side)
...
ASP.NET Web Forms/Services WCF Services WF Engine
.NET ThreadPool or Custom Threading Strategy
A lot of code does not (mostly client-side)
This code will see little benefit from future
hardware advances
10. What happened to “The Free Lunch”?
Bad sequential code will run faster on a faster processor
Bad parallel code WILL NOT run faster on more cores
Just using parallel code is not enough
Speedup
3
2.5
2
1.5
Speedup
1
0.5
0
1 2 4 8 16 32
11. Applications Can Scale Well
64 Production Fluid
Production Face
Production Cloth
Parallel Speedup
48 Game Fluid
Game Rigid Body
Game Cloth
32 Marching Cubes
Sports Video Analysis
Video Cast Indexing
Home Video Editing
16
Text Indexing
Ray Tracing
Foreground Estimation
0
Human Body Tracker
0 16 32 48 64 Portifolio Management
Geometric Mean
Cores
Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics
12. What's The Problem?
Multithreaded programming is “hard” today
Doable by only a subgroup of senior specialists
Parallel patterns are not prevalent, well known, nor
easy to implement
So many potential problems
Races, deadlocks, livelocks, lock convoys, cache coherency
overheads, lost event notifications, broken
serializability, priority inversion, and so on…
Businesses have little desire to “go deep”
Best developers should focus on business value,
not concurrency
Need simple ways to allow all developers to write
concurrent code
13. void MatrixMult(
int size, double** m1, double** m2, double** result)
{
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
result[i][j] = 0;
for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j];
}
}
}
}
14. Static partitioning
void MatrixMult(
int size, double** m1, double** m2, double** result) {
int N = size;
Synchronization Knowledge
int P = 2 * NUMPROCS;
int Chunk = N / P;
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
Error prone
long counter = P;
for (int c = 0; c < P; c++) {
std::thread t ([&,c] { Lots of boilerplate
for (int i = c * Chunk;
i < (c + 1 == P ? N : (c + 1) * Chunk); i++) {
for (int j = 0; j < size; j++) {
result[i][j] = 0;
for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j]; Tricks
}
}
}
Lack of thread reuse
if (InterlockedDecrement(counter) == 0)
SetEvent(hEvent);
});
} Heavy synchronization
WaitForSingleObject(hEvent,INFINITE);
CloseHandle(hEvent);
}
24. Higher Level Constructs
Even with Task there are common patterns that
build into higher level abstractions
The Parallel class
Invoke, For, For<T>, Foreach
Care needs to be taken with state, ordering
“This is not your Father’s for loop”
26. Declarative Data Parallelism
Parallel LINQ-to-Objects (PLINQ)
Enables LINQ devs to leverage multiple cores
Fully supports all .NET standard query operators
Minimal impact to existing LINQ model
var q = from p in people.AsParallel()
where p.Name == queryInfo.Name &&
p.State == queryInfo.State &&
p.Year >= yearStart &&
p.Year <= yearEnd
orderby p.Year ascending
select p;
33. What Next?
http://geekswithblogs.net/iupdateable
Slides and links
http://blogs.msdn.com/pfxteam/
http://msdn.com/concurrency
Wait for the Beta of Visual Studio 2008 and
OR for the most impatient
Download VS 2010 CTP
Remember to set the clock back
Or
Download Parallel Extensions June 2008 CTP for VS2008
35. Heads up: Axum
Previously called Maestro
Incubation project!
New programming language
Lets you take advantage of parallelism without
“thinking about it”
Agent based programming vs Object based
programming
Model agents and their interactions via messages
No public methods, fields
36. Axum “Hello World”
using System;
agent Program :
Microsoft.Axum.ConsoleApplication
{
override int Run(String[] args)
{
Console.WriteLine(quot;Hello, World!quot;);
}
}
37. Channels and Agents
using System;
using System.Concurrency;
agent MainAgent : channel Microsoft.Axum.Application
using Microsoft.Axum;
{
public MainAgent()
channel Adder
{
{
var adder = AdderAgent.CreateInNewDomain();
input int Num1;
adder::Num1 <-- 10;
input int Num2;
adder::Num2 <-- 20;
output int Sum;
// do something useful ...
}
var sum = receive(adder::Sum);
agent AdderAgent : channel Adder
Console.WriteLine(sum);
{
public AdderAgent()
PrimaryChannel::ExitCode <-- 0;
{
}
int result = receive(PrimaryChannel::Num1) +
}
receive(PrimaryChannel::Num2);
PrimaryChannel::Sum <-- result;
}
}
38. Heads up: Concurrent Basic
Research Project
http://channel9.msdn.com/shows/Going+Deep/Claudio-Russo-and-Lucian-Wischik-Inside-Concurrent-
Basic/
Added message passing primitives – channels
Module Buffer
Public Asynchronous Put(ByVal s As String)
Public Synchronous Take() As String
Private Function CaseTakeAndPut(ByVal s As String) As String When Take, Put
Return s
End Function
End Module
Thread1: Thread2:
Put(“Hello”) result = Take()