• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию и векторизации приложений с использованием Parallels Composer

  • 547 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
547
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Intel® Parallel Studio 2011 Essential Performance Design l Build & Debug l Verify l Tune Kirill Mavrodiev kirill.mavrodiev@intel.com EMEA Compiler TCE (Software Engineer) SSG DPD ICL Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
  • 2. AGENDA •Intel® Parallel Studio 2011 •Intel® Parallel Composer – Intel® Cilk Plus Key words – Array Notation – Guided Auto-Parallelization (GAP) – Intel® Parallel Debugger Extension Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2
  • 3. Three Product Lines for Diverse Needs Essential Advanced Distributed Performance Performance Performance C/C++ developers C++ and Fortran developers C++ and Fortran developers Microsoft Visual Studio* Windows* and Linux* on Windows* and Linux* Take advantage of multicore High performance, cross platform apps High performance MPI clusters intel.com/software/products Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3
  • 4. What’s New in Intel® Parallel Studio 2011 Intel® Parallel Building Blocks – A comprehensive, portable, reliable, future proof parallel models for both data and task parallelism • Intel® Threading Building Blocks • Intel® Cilk Plus • Intel® Array Building Blocks (Beta) Intel® Parallel Advisor –Parallelism design guide • Demystifies and speeds parallel application design • Gives parallelism design insight and analysis through Explorer and Modeler analysis tools Now includes Intel® Premier Support • Unlimited technical support and upgrades for one year Enhancements • Intel® Threading Building Blocks 3.0 • Compiler improvements • Microsoft Visual Studio* 2010 integration Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4
  • 5. Intel® Parallel Studio 2011 • All-in-one toolset for the software development lifecycle • Microsoft Visual Studio* plug-in – 2005, 2008 and 2010 Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5
  • 6. Intel® Parallel Advisor Step by Step Guidance Focus on the hot call trees and loops as locations to experiment with parallelism. Advisor annotations into source code to describe their parallel experiment. Evaluate the performance of parallel experiment by displaying the performance projection for each parallel site and how each site‟s performance impacts the entire program. Identifies data issues (races) of each parallel experiment. Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6
  • 7. Intel® Parallel Composer BUILD & DEBUG PHASE Improve Performance Easier, faster performance for Windows* apps Intel® Parallel Building Blocks Comprehensive set of parallel development models that support multiple approaches to parallelism. Intel® Integrated Performance Primitives Extensive library highly optimized software functions for digital media and data-processing applications Develop high performance applications with a optimized C/C++ compiler and comprehensive threaded libraries Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 7
  • 8. Intel’s Family of Parallel Programming Models Fixed Intel® Parallel Function Established Research and Building Blocks (PBB) Standards Exploration Libraries Intel® Intel® Math Intel® Threading Kernel MPI Concurrent Building Library (MKL) Collections Blocks (TBB) Intel® Cilk Intel® Intel® Array Plus Integrated OpenCL* Building Performance OpenMP* Blocks (ArBB) Primitives (IPP) Intel® Cilk Plus, Intel® TBB: Part of Intel® Parallel Studio 2011 Intel® Array Building Blocks: Known by code names „Intel Ct‟ or „Intel Firetown”; public beta started around mid September 2010 Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division 8 8 Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
  • 9. Intel® Parallel Building Blocks Details Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9
  • 10. Intel® Parallel Inspector VERIFY PHASE Improve Reliability Ensure application reliability with proactive memory and threading error checking Find Threading Errors Find data races and deadlocks Find Memory Errors Find a wide variety of memory errors Identify memory issues in serial and parallel applications in addition to threading errors. Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10
  • 11. Intel® Parallel Amplifier TUNE PHASE Hotspot Analysis Where does my program spend most of the time? Concurrency Analysis Where and Why doesn‟t my program utilize all available cores? Locks & Wait Analysis Where and Why does my program wait? Optimize serial and parallel application performance with 3 easy to use, powerful analysis methods Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11
  • 12. Intel® Parallel Studio 2011 Summary • For over a year, Intel Parallel Studio has made it easier for Windows* developers to create fast, reliable applications for multicore • This release is a major update – Intel® Parallel Building Blocks adds significant new parallelism models – Intel® Parallel Advisor empowers software architects with parallelism design insight and analysis for building reliable, high performance applications for multicore – Other enhancements including support for Visual Studio* 2010 www.intel.com/go/parallel: Try it Right Now! Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12
  • 13. Intel® Parallel Composer Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
  • 14. Intel® Cilk™ Plus Language extensions to simplify task & data parallelism • C++ language extension that provides three simple keywords to write parallel code – Loop-type data parallelism using cilk_for – General parallelism using cilk_spawn and cilk_sync • Unambiguous semantics, strict fork–join model via compiler support • Easiest for the programmer to understand the parallel control flow • Automatic load balancing via work stealing • Low-overhead task spawning, encourage programmers to create many small tasks • A program with many small tasks provide opportunity for the task scheduler both to load balance and forward scale to larger core counts Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14
  • 15. Intel® Cilk™ Plus Cilk adds the following new keywords:  _Cilk_spawn for spawning a function call that executes asynchronously,  _Cilk_sync for synchronization point to wait for children spawned inside that function,  _Cilk_for for parallel for-loop that executes iterations in parallel. Cilk includes Reducers for lock-free access to global data:  Use built-in reducers for common types – strings, summation, min/max, logical operations, and more.  Write custom reducers to manage any data type. Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15
  • 16. Simple Divide and Conquer Example #include "cilk/cilk.h" int fib(int n) { int x, y; Allow fib(n-1) to run in parallel if (n<2) return n; with fib(n-2) x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; Ensure that all parallel work is return x+y; complete before using the result }; int main () { printf("Fib of 40 is %dn", fib(40)); return 0; } Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16
  • 17. Intel® Cilk Plus Tachyon Implementation Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17
  • 18. Array Notations for data parallelism • New array extension to C/C++ language • Specify parallel operations on arrays (instead of sequential loops) • Predictable performance based on mapping parallel constructs to underlying multi-threading/SIMD hardware • Works seamlessly with existing C/C++ frameworks and runtimes: Intel® TBB, OpenMP*, MPI, Intel® Cilk™ Plus, Pthreads, etc. Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 18
  • 19. Array Section Notation • Array Section Notation <array base> [<lower_bound> : <length> : <stride>]* – „:<stride>‟ is optional ( defaults to stride=1 ) – missing „:<length>:<stride>‟ implies length=1 – Simple „:‟ select all elements of this dimension – Note syntax difference to Fortran section which is lower_bound : upper_bound : [stride] • Samples: A[:] // All elements of vector A B[2:6] // Elements 2 to 7 of vector B C[:][5] // Column 5 of matrix C D[0:3:2] // Elements 0,2,4 of vector D E[0:3][0:4] // 12 elements from E[0][0] to E[2][3] Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 19
  • 20. Intel® Cilk Plus implementation cilk_for(int k = 0; k < nz; k++){ for(int j = 0; j < ny; j++) for(int i = 0; i < nx; i+=STRIDE){ tmp[:] = rhs[ID(i,j,k) : STRIDE] + x[IDEA(i,j,k) : STRIDE] * 6.0 - (x[ID(i-1,j,k) : STRIDE] + x[ID(i+1,j,k) : STRIDE]+ x[ID(i,j-1,k) : STRIDE] + x[ID(i,j+1,k) : STRIDE]+ x[ID(i,j,k-1) : STRIDE] + x[ID(i,j,k+1) : STRIDE]); residueConvergeStrongCReducer = cilk::max_of ( residueConvergeStrongCReducer, __sec_reduce_max (fabs(tmp[:])) ); residueConvergeStrongL2Reducer += __sec_reduce_add (tmp[:]*tmp[:]); } } Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20
  • 21. Intel® Cilk Plus implementation Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 21
  • 22. GAP: Guided Auto-parallelization • Targeted for Mainstream and HPC Users • Advice to change code for more auto-vectorization, auto-parallelization and data transformations • Diagnostic guidance generated when invoked • Advice may involve – suggestions for source-change – adding pragmas – adding new options • Simple source changes that assert new properties – Add a new pragma for loop if semantics are satisfied – Use a local-variable for the upper-bound of a loop – Initialize scalar variable unconditionally at top of loop – Reorder fields of a structure (or split into two) • Desired behavior – Each advice is specific using source-level variable names – User does semantic analysis – apply or reject each advice – Advice should be as localized as possible – Following the advice should result in better optimizations Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 22
  • 23. GAP – How it Works Selection of most Relevant Switches Multiple compiler switches to activate and fine-tune guidance analysis • Activate messages individually for vectorization, parallelization, data transformations or all three -guide-vec[=level] -guide-par[=level] -guide-data-trans[=level] -guide[=level] Optional argument level=1,2,3,4 controls extend of analysis; Intel Composer only supports up to level 3 • Control the source code part for which analysis is done -guide-opts=<arg> Samples: -guide-opts=“convert.c,'funca(int)'“ -guide-opts="bar.f90,'module_1::func_solve'“ • Control where the message are going -guide-file=<file_name> -guide-file-append<=file_name> Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23
  • 24. GAP Case Study [c:/test2/usability2] icl -c distance.cpp -Qguide=4 -Qparallel extern int num_nodes; GAP REPORT LOG OPENED ON Wed Mar 03 18:34:01 2010 typedef struct TEST_STRUCT { // Coordinates of city1 c:test2usability2distance.h(2): remark #30755: (DTRANS) Reorderi float latitude1; ng the fields of the structure 'TEST_STRUCT' will improve data locality. float longitude1; int city_id1; Suggested field order: 'stops, latitude1, longitude1, latitude2, longitude2, int stops[10000]; // Currently unused field city_id1, city_id2'. [VERIFY] The suggestion is based on the field references // Coordinates of city2 in current compilation. Please make sure that the restructured code satisfies float latitude2; the original program semantics. float longitude2; int city_id2; } test_struct; c:test2gap_examplesusability2distance.cpp(30): remark #30534: (LOOP) Add -Qansi-alias option for better type-based disambiguation extern float *distances; extern test_struct** nodes; analysis by the compiler if appropriate (option will apply for entire void process_nodes(void) compilation). This will improve optimizations for the loop at line 30 [VERIFY] { Make sure that the semantics of this option is obeyed for entire float const R = 3964.0; compilation. float temp, lat1, lat2, long1, long2, result; int temp1 = num_nodes; //#pragma loop count min(16) c:test2usability2distance.cpp(29): remark #30519: (PAR) Use "#pr //#pragma parallel agma parallel" to parallelize the loop at line 29, if these arrays in the loop d // for (int k=0; k < temp1; k++) { o not have cross-iteration dependencies: nodes, distances. [VERIFY] A cross- for (int k=0; k < num_nodes; k++) { iteration dependency exists if a memory location is modified in an iteration lat1 = nodes[k]->latitude1; lat2 = nodes[k]->latitude2; of the loop and accessed (a read or a write) in another iteration of the loop. Make sure that there are no such dependencies. long1 = nodes[k]->longitude1; long2 = nodes[k]->longitude2; c:test2gap_examplesusability2distance.cpp(29): remark #30525: (PAR) // Compute the distance between the two cities If the trip count of the loop at line 29 is greater than 16, then use "#pragma temp = sin(lat1) * sin(lat2) + loop count min(16)" to parallelize this loop. [VERIFY] Make sure that the cos(lat1) * cos(lat2) * cos(long1-long2); loop has a minimum of 16 iterations. result = 2.0 * R * atan(sqrt((1.0-temp)/(1.0+temp))); // Store the distance computed in the distances array c:test2gap_examplesusability2distance.cpp(48): remark #30525: (PAR) distances[k] = result; If the trip count of the loop at line 48 is greater than 751, then use } "#pragma loop count min(751)" to parallelize this loop. [VERIFY] Make sure } that the loop has a minimum of 751 iterations. END OF GAP REPORT LOG Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 24
  • 25. Intel® Debugger & Intel® Parallel Debugger Extension Intel® Debugger (IDB) Intel® Parallel Debugger Extension Linux* Mac*OS Windows* Intel® C++ Composer XE Intel® C++ Composer XE Intel® C++ Composer XE Intel® Fortran Composer XE Intel® Fortran Composer XE Intel® Visual Fortran Composer XE Intel® Parallel Composer Intel® Cluster Toolkit Compiler Edition Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
  • 26. Key Features Thread Shared Data Event Detection Break on Thread Shared Data Access (read/write) Re-entrant Function Detection SIMD SSE Registers Window Enhanced OpenMP* Support Serialize OpenMP threaded application execution on the fly Insight into thread groups, barriers, locks, wait lists etc. Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 26
  • 27. Questions? Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 27
  • 28. Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2010. Intel Corporation. http://www.intel.com/software/products Intel® Parallel Studio 2011 Introduction Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 28