JavaOne2013: Implement a High Level Parallel API - Richard Ning
Upcoming SlideShare
Loading in...5
×
 

JavaOne2013: Implement a High Level Parallel API - Richard Ning

on

  • 723 views

This session discusses how to implement a high-level parallel API (such as parallel_for, parallel_while, or parallel_scan) and math calculation based on a thread pool and task in OpenJDK that aligns ...

This session discusses how to implement a high-level parallel API (such as parallel_for, parallel_while, or parallel_scan) and math calculation based on a thread pool and task in OpenJDK that aligns with the development of multicores and parallel computing. At present, programmers have to use a schedule strategy statically in code instead of choosing it dynamically based on the core number and load balance on the computer with the current Java concurrent package. In the design presented in the session, the function parallel_for(array, task) is a high-level API that can divide the task range dynamically, based on the condition of and load on different computers.

Presented by Richard Ning at JavaOne 2013

Statistics

Views

Total Views
723
Views on SlideShare
723
Embed Views
0

Actions

Likes
2
Downloads
22
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

JavaOne2013: Implement a High Level Parallel API - Richard Ning JavaOne2013: Implement a High Level Parallel API - Richard Ning Presentation Transcript

  • © 2013 IBM Corporation Richard Ning – Enterprise Developer 9/24/2013 Implement high-level parallel API in JDK 1
  • © 2013 IBM Corporation 2 Important Disclaimers – THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. – WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. – ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. – ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. – IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. – IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. – NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS.
  • © 2013 IBM Corporation About me  Richard Ning  IBM JDK development  Developing enterprise application software since 1999 (C++, Java)  My contact information: mail:huaningnh@gmail.com 3
  • © 2013 IBM Corporation What should you get from this talk? ■By the end of this session, you should be able to: –Understand implementation of high-level parallel API in JDK –Understand how parallel computing works on multi-cores 4
  • © 2013 IBM Corporation Agenda Introduction: multi-threading, multi-cores, parallel computing Case study Other high-level parallel API 1 2 3 Roadmap4 5
  • © 2013 IBM Corporation Introduction Multi-Threading Multi-core computer Parallel computing 6
  • © 2013 IBM Corporation Case study ■ Execute the same task for every element in a loop ■ Use multi-threading for the execution 7
  • © 2013 IBM Corporation ■ Can it improve performance? 8
  • © 2013 IBM Corporation time C P U t1 t2 t1 t2 t1 ■ Multi-threading on computer with one core 9
  • © 2013 IBM Corporation ■ 100% CPU usage with single thread and multi-threading • Performance even decreases with extra threading consuming • Can't improve performance • It is useless to use multi- threading(paral lel API) 10
  • © 2013 IBM Corporation ■ Multi-threading on computer with multi-core 11
  • © 2013 IBM Corporation Cor4 t4 t2 t3 t1 Cor3 Cor2 Cor1 Thread runs separately on every core time 12
  • © 2013 IBM Corporation ■ Raw thread  Any improvement? Executor –Users need to create and manage it  Disadvantages – Not flexible – the number of threads is hard to configure flexibly > core number, resources are consumed in thread context, even decrease performance < core number, some cores are wasted No balance, the calculation can't be allocated into every core equally 13
  • © 2013 IBM Corporation ■ Separate creation and execution of thread ■ Use thread pool to reuse thread 14
  • © 2013 IBM Corporation ■ A high-level API concurrent_for 15
  • © 2013 IBM Corporation16
  • © 2013 IBM Corporation  The API is easy to use, users only need to input executed task and data range and don't care about how they are executed. However they still have disadvantages. 1. The number of thread in thread pool isn't aligned to core number 2. Task executes an entry once, which isn't sufficient 3. A task is targeted to a thread, which isn't flexible 17
  • © 2013 IBM Corporation 1 2 3 n Thread Pool 1 3 n2 Tasks m 1 2 3 4 CPU Core Thread Task Core: 4 Thread: n Task: m Overloading: n>>4 Not flexible: m >n 18
  • © 2013 IBM Corporation 1 2 3 4 Thread Pool 1 2 3 4 CPU Core Thread Thread number = core number  Core number doesn't align to thread number: Use fixed thread pool 19
  • © 2013 IBM Corporation  Task division: another task division strategy ForkJoinPool Fork Join Task2 Task3 Task5 Task6 Task7 Divide and conquer 1. Divide big task into small tasks recursively 2. Execute the same operation for every task 3. Join result of every small task Task4 20 Task1
  • © 2013 IBM Corporation21
  • © 2013 IBM Corporation22
  • © 2013 IBM Corporation  Better use for divide and conquer problem  Balancing: Work queue by thread and task stealing  Oversubscription and starvation: Configuring thread number Task dividing is static instead of dynamic. Task dividing granularity isn't configured properly according to running condition. Task daviding strategy is from programmers who need to design it themselves in different implementation scenarios. 23
  • © 2013 IBM Corporation  New parallel API based on task scheduler 24
  • © 2013 IBM Corporation 1 2 3 4 Thread Pool 1 2 3 4 CPU Core Thread 1 2 3 4 5 T A S K Q U E U E 6 7 8 11 12 16 13 14 15 9 10 17 18 19 20 Initial status  Tasks are allocated equally,  One thread by one core  Every thread maintains its task queue which consists of affiliated tasks 25
  • © 2013 IBM Corporation 1 2 3 4 Thread Pool 1 2 3 4 CPU Core Thread 2 3 4 5 10 15 Unbalancing loading T A S K Q U E U E 26
  • © 2013 IBM Corporation 1 2 3 4 Thread Pool 1 2 3 4 CPU Core Thread 2 3 22 10 4 15 5 21 Balancing loading by task stealing and adding new tasks who probably have different task granularity. T A S K Q U E U E 27
  • © 2013 IBM Corporation  Parallel API with new working mechanism - concurrent_for Range: the range of data set [0, n) Strategy: the strategy of dividing range: automatic, static with fixed granularity. In automatic case, task granularity is probably different Task: the task which executes the same operation on range 28
  • © 2013 IBM Corporation29
  • © 2013 IBM Corporation30
  • © 2013 IBM Corporation Other high-level parallel API Can add data set while executing it concurrently. concurrent _while Use divide_join based task to return calculation result.concurrent_ reduce Sort data set concurrently.concurrents ort for example, a matrix multiply another matrix int[5][10] matrix1 , int[10][5] matrix2 int[5][5] matrix3 = matrix1 * matrix2 int[5][5] matrix3 = concurrent_multiply(matrix1, matrix2) Math calculation 31
  • © 2013 IBM Corporation Anyway we always can achieve performance improvement by parallel computing based on multi-cores. 32
  • © 2013 IBM Corporation Scalable Roadmap ■Implement high-level parallel API in JDK based on new task scheduler Correct Portable High performance 33
  • © 2013 IBM Corporation Review of Objectives ■Now that you’ve completed this session, you are able to: –Understand design of new parallel API based on task. –Understand what parallel computing is and what is good for 34
  • © 2013 IBM Corporation Q & A 35
  • © 2013 IBM Corporation Thanks! 36