Your SlideShare is downloading. ×
Java parallel programming made simple
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Java parallel programming made simple

3,377

Published on

With Ateji PX, no need to be specialized in Java threads, parallel programming on multicore, GPU, cloud and grid can be as simple as inserting a || operator in the source code.

With Ateji PX, no need to be specialized in Java threads, parallel programming on multicore, GPU, cloud and grid can be as simple as inserting a || operator in the source code.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,377
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ateji PX: Java Parallel Programming made Simple
    © Ateji – All rights reserved.
  • 2. Ateji – the Company
    Specialized in parallelism & language technologies
    Founded by Patrick Viry in 2005
    Java extensions for optimization (OptimJ, 2008),
    Parallelism (Ateji PX, 2010)
    January 2010: 1st round of investment
    AtejiPX Selected as Disruptive Technology during SC10
    Member of HiPEAC, OpenGPU
  • 3. The Grand Challenge : Parallel Programming for All Application Developers
    2010 (100 cores)
    2008 (4 cores)
    enterpriseservers
  • 4. Why Java ?
    Increasingly used for HPC because:
    Most popular language today
    Good runtime performance
    Much better productivity and code quality
    Faster time-to-market, less bugs, less maintenance
    Much easier staffing
    Used in aerospace, bioinformatics, physics, finance,
    data mining, statistics, ...
    Details and references in our latest blog posting: ateji.blogspot.com
  • 5. How to parallelize Java code ?
    for(int i : I) {
    for(int j : J) {
    for(int k : K) {
    C[i][j] += A[i][k] * B[k][j];
    }
    }
    }
    Ateji PX
    Threads
    final int nThreads = System.getAvailableProcessors();
    final int blockSize = I / nThreads;
    Thread[] threads = new Thread[nThreads];
    for(int n=0; n<nThreads; n++) {
    final int finalN = n;
    threads[n] = new Thread() {
    void run() {
    final int beginIndex = finalN*blockSize;
    final int endIndex = (finalN ==
    (nThreads-1))?I :(finalN+1)*blockSize;
    for( int i=beginIndex; i<endIndex; i++) {
    for(int j=0; j<J; j++) {
    for(int k=0; k<K; k++) {
    C[i][j] += A[i][k] * B[k][j];
    }}}}};
    threads[n].start();
    }
    for(int n=0; n<nThreads; n++) {
    try {
    threads[n].join();
    } catch (InterruptedException e) {
    System.exit(-1);
    }
    }
    for||(int i : I) {
    for(int j : J) {
    for(int k : K) {
    C[i][j] += A[i][k] * B[k][j];
    }
    }
    }
    for||(int i : I) {
    for(int j : J) {
    for(int k : K) {
    C[i][j] += A[i][k] * B[k][j];
    }
    }
    }
    for||
  • 6. It’s easy AND efficient :
    12.5x speedup on 16 cores
    Seewhitepaper
    on www.ateji.com/px
    Ateji PX
    for||(int i : I) {
    for(int j : J) {
    for(int k : K) {
    C[i][j] += A[i][k] * B[k][j];
    }
    }
    }
    for||(int i : I) {
    for(int j : J) {
    for(int k : K) {
    C[i][j] += A[i][k] * B[k][j];
    }
    }
    }
    for||
  • 7. “The problem with threads”
    [Technical Report, Edward A. Lee, EECS Berkeley]
    Threads are a hardware-level concept, not a practical
    abstraction for programming
    Threads do not compose
    Code correctness requires intricate thinking and
    inspection of the whole program
    Most multi-threaded programs are bugged ...
    … and debuggers do not help
    Not an option for most application programmers !
  • 8. Introducing Parallelism at the Language Level
    Sequential composition operator: “;”
    Parallel composition operator: “||”
    “Hello World!”
    [ ||System.out.println("Hello");||System.out.println("World");]
    Run two branches in parallel, wait for termination
    prints either or
    Hello
    World
    World
    Hello
  • 9. DataParallelism
    Same operation on all elements
    [
    // quantified branches
    || (inti : N) array[i]++;
    ]
    Multiple dimensions and filters
    e.g. update the upper left triangle of a matrix
    [
    || (int i:N, int j:N, i+j<N) m[i][j]++;
    ]
  • 10. Task Parallelism
    intfib(int n) {
    if(n <= 1) return 1;
    int fib1, fib2;
    [
    || fib1 = fib(n-1);
    || fib2 = fib(n-2);
    ];
    return fib1 + fib2;
    }
    Note the recursivity: ||compatible with all language constructs
  • 11. Speculative Parallelism
    Stop when the fastest algorithm succeeds
    [ || return algorithm1(); || return algorithm2(); ]
    Stop sister branches then return
    Same behaviour for break, continue, throw
    Non-local exit very difficult to get right with threads
  • 12. Parallel reductions
    Same behaviour for break, continue, throw
  • 13. Message Passing
    Is an essential aspect of parallelism
    Must be part of the language
    Send a message: chan ! Value
    Receive a message: chan ? value
    Typed Channels
    Chan<T> : synchronous (rendez-vous)
    AsyncChan<T>: asynchronous (buffered)‏
    User-defined serialization (Java, XML, ASN.1, ...)
    Can be mapped to I/O devices (files, sockets, MPI)
  • 14. in1
    adder
    out
    in2
    Data Flow and Stream parallelism
    An adder
    void adder(Chan<Integer> in1, in2, out) { for(;;) {int value1, value2;[in1 ? value1; ||in2 ? value2; ];out ! (value1 + value2);}}
  • 15. c1
    adder
    source
    c3
    sink
    c2
    source
    Data Flow and Stream parallelism
    Compose processes
    [ || source(c1); // generates values on c1
    || source(c2); // generates values on c2
    || adder(c1, c2, c3);
    || sink(c3); ] // read values from c3
    Numeric values + sync = “data flow”
    String or tuples + async = “stream programming”
    e.g. MapReduce algorithm
  • 16. Expressing non-determinism
    Note the parallel reads
    [ in1 ? value1 || in2 ? value2 ]
    Impossibleto express in a sequential language
    || for performance, but also expressivity
    See also the select construct
  • 17. Distributing branches
    Use indications
    [ || #Remote(“192.168.20.1”)source(c1);
    ||#Remote(“Amazon EC2”) source(c2);
    ||#Remote(“GPU”) adder(c1, c2, c3);
    || sink(c3); ]
    Multicore Desktop/Server
    Multicore CPU/GPU cluster
  • 18. Compiler handles the boring stuff
    Passing parameters
    Returning results
    Throwing exceptions
    Accessing non-final fields
    Performing non-local exits
    Stopping branches properly
  • 19. Makingiteasyisalso about tools:EclipseIntegration
  • 20. Ateji PX Summary
    Parallelism at the language level is simple and intuitive,
    efficient, compatible with source code and tools
    Most patterns in a single language:
    data, task, recursive and speculative parallelism
    shared memory and distributed memory
    Covers OpenMP, Cilk, MPI, Occam, Erlang, etc…
    Most hardware architectures from a single language:
    Manycore, grid, cloud, GPU
  • 21. Roadmap as of February 2011
    Ateji PX 1.1 (multicore version) available today
    Free evaluation version on www.ateji.com
    GPU version coming soon
    OpenGPU project
    Distributed version coming soon
    Grid / Cluster / Cloud
    Interactive correctness proofs
    Integration of profiling tools
  • 22. Call to Action
    Free download on www.ateji.com/px
    Read the whitepapers
    Play with the online demo
    Look at the samples library
    Benchmark your || code
    Contact  info@ateji.com
    Blog : ateji.blogspot.com
  • 23. © Ateji – All rights reserved.

×