• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
BSC-Microsoft Research Center Overview

BSC-Microsoft Research Center Overview






Total Views
Views on SlideShare
Embed Views



1 Embed 2

http://www.slideshare.net 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    BSC-Microsoft Research Center Overview BSC-Microsoft Research Center Overview Presentation Transcript

    • BSC-Microsoft Research Centre Overview Osman Unsal Barcelona 26.March.2010
    • BSC Departments • Computer Science • Computer Architecture • Life Sciences • Earth Sciences • Computer Applications
    • Barcelona Supercomputing Center • Mission – Investigate, develop and manage technology to facilitate the advancement of science. • Objectives – Operate national supercomputing facility – R&D in Supercomputing and Computer Architecture. – Collaborate in R&D e-Science • Consortium – the Spanish Government (MEC) – the Catalonian Government (DURSI) – the Technical University of Catalonia (UPC)
    • Location
    • Private/Public Partnership in Research • Two main models for company/university lablets – Closed – Open University Company Closed Generated IP is private Students highly motivated ? University Company Open Requires more effort from company ? Higher visibility for researchers • Finding the appropriate balance is key for success closed open
    • BSCMSRC experience (I) • Open model – No patents, public IP, papers and open source main focus • Making positive impact in the community • Contributing to standard setting bodies – Participation of Microsoft Cambridge Senior Researchers – Similar research agenda with parallel computing centers in US • Berkeley • Illinois • Rice • Stanford Combining efforts is absolutely necessary
    • BSCMSRC experience (II) • Internships/stays: – UPC students exposed to Microsoft Research through internships – Future extension: longer stays of Cambridge researchers in Barcelona • Collaboration between BSCMSRC and other Microsoft institutes – Cosbi in planning phase • BSC coordinating 9 partner, 5M Euro EC VELOX project – Microsoft Cambridge catalyst in introducing potential partners – Microsoft observers to project (as is Intel and SUN)
    • BSCMSR Centre: general overview • Microsoft-BSC joint project – Combining research expertise • Computer Architectures for Parallel Paradigms at BSC: computer architecture • Microsoft Research Cambridge: programming systems – Kickoff meeting in April 2006 – Total Effort: • 2 Senior BSC researchers, 23 PhD students – Support and part-time involvement of senior researchers and faculty from UPC • BSCMSRC inaugurated January 2008
    • BSCMSR Centre: research approach • Processor design historically bottom-up: – Build the chip and let the software guys sort it out – Difficult to defend in the era of multi- and many-core • We prefer a top-down approach: – Let software developments guide hardware design – “Coger el toro por los cuernos” – Utilize feedback from the OS, compiler, programmer language experts about the features they would like to see in future processors
    • The motivation • “It is better for Intel to architecture,soinclude: the Top-down get involved in this now when we get to point of having 10s and 100s of cores we will have the answers. – Application There is a lot of architecture work to do to release the potential, – Debugging and we will not bring these products to market until we have good solutions to the programming problem” – Programming models Justin Rattner Intel CTO Marenostrum – Programming languages Most beautiful supercomputer Fortune magazine, Sept. Source: Intel 2006 – Compilers #1 in Europe, #5 in the World – Operating Systems 100's of TeraFlops – Runtime environment with general purpose Linux supercluster of commodity • As design drivers PowerPC-based Blade Servers “Now, •80 grains inside thesedie of 300more and mm. will be multi-core type devices, the processors in a machines square more and so•Terabytesparallelization of memory bandwidth the idea of per second won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memory 10.000 Pentium •Note: The barrier of the Teraflops was obtained by Intel in 1991 using TOP-DOWN COMPUTER ARCHITECTURE that will allow us to get the full benefit of all those transistors Pro processors contained in more than 85 cabinets occupying 200 square meters  and map that into higher and higher performance.” Bill Gates, Supercomputing 05 in 5 years from now •This will be possible keynote
    • BSC Contributors Mateo Valero Professor Spain Part-time Gulay Yalcin PhD Student Turkey Full-Time Eduard Ayguade Professor Spain Part-time Vesna Smiljovic PhD Student Serbia Full-Time Adrian Cristal Doctor Argentina Full-time Nebosja Miletic PhD Student Serbia Full-Time Ibrahim Hur Doctor Turkey/USA Full-time Oriol Arcas M.S. Student Spain Intern Osman Unsal Doctor Turkey Full-time Adria Armejach PhD Student Spain Full-Time Enrique Vallejo PhD Student Spain Part-time Otto Pflucker M.S. Student Spain Intern Srdjan Stipic PhD Student Serbia Full-time Vasilis Karakostas M.S. Student Greece Full-time Ferad Zyulkyarov PhD Student Bulgaria Full-time Oscar Palomar Ph.D. Student Spain Full-time Sasa Tomic PhD Student Serbia Full-time Milan Stanic PhD Student Serbia Full-time Nehir Sonmez PhD Student Turkey Full-time Milovan Djuric PhD Student Serbia Full-time Vladimir Gajinov PhD Student Serbia Full-time Timothy Hayes PhD Student Ireland Full-time Gokcen Kestor PhD Student Turkey Full-Time Ivan Ratkovic PhD Student Serbia Full-time Azam Seyedi PhD Student Iran Full-Time Nikola Bezanic PhD Student Serbia Full-time Nikola Markovic PhD Student Serbia Full-Time Sutirtha Sanyal PhD Student India Full-time
    • Microsoft- BSC Project: A quick snapshot • Project started in April 2006 • Two-year initial duration • Initial topic:Transactional Memory • BSC – Microsoft Research Centre inaugurated January 2008
    • Transactions vs.Locks Transactions Locks • Non-blocking • Blocking synchronization synchronization • Deadlock free • Deadlock risk • Composable • Non-composable • Easy programmable • Coarse grain locks • Efficiency of fine limit TLP grain locks • Fine grain locks are B1 difficult to program
    • Slide 13 B1 Osman added BSC-CNS; 02/09/2007
    • What is Transactional Memory (TM) ? • TM optimistically runs transactions in parallel, in the hope that they do not perform conflicting memory accesses. • If the transactions do not conflict then the optimism has paid off. • If transactions do attempt conflicting accesses, then the TM must delay or abort the work of one or the other. • The partial effects of aborted transactions are "rolled back" before they are re-executed .
    • Transaction Execution • The TM system lets different threads to execute the atomic regions speculatively. • The TM system guarantees: – Atomicity – all tentative memory changes become visible to the other threads simultaneously at the time when a transaction commits. – Isolation – while transaction executes the tentative memory updates are not visible by the other threads.
    • Readset / Writeset • Readset: The set of all the distict memory locations read by a transaction • Writeset: The set of all the distinct memory locations written by a transaction
    • atomic { atomic { … … read (a); read (a); read (b); write (c); … … write (a); } }
    • Versioning and Conflict resolution • Basic implementation requirements – Data versioning – Conflict detection & resolution • Conflicts happen if – One transaction (attemps to) reads a data item while another one tries to write to the item – At least two transactions (attempt to) write a data item • If conflict is detected one of the transactions could be aborted
    • atomic { atomic { … … read (a); read (a); read (b); write (c); … … write (a); } }
    • Conflict Detection and Resolution • Detect and handle conflicts between transaction – Read-Write and (often) Write-Write conflicts – For detection, a transactions tracks its read-set and write-set 1. Eager detection •Check for conflicts during loads or stores HW: check through coherence lookups SW: checks through locks and/or version numbers •Use contention manager to decide to stall or abort Various priority policies to handle common case fast 2.Lazy detection •Detect conflicts when a transaction attempts to commit HW: write-set of committing transaction compared to read-set of others –Committing transaction succeeds; others may abort SW: validate write-set and read-set using locks and version numbers
    • atomic { atomic { … … read (a); read (a); read (b); write (c); … … write (a); } }
    • Versioning • Manage uncommited(new) and commited(old) versions of data for concurrent transactions 1.Eager (undo-log based) •Update memory location directly; maintain undo info in a log +Faster commit, direct reads (SW) –Slower aborts, no fault tolerance, weak atomicity (SW) 2.Lazy (write-buffer based) •Buffer writes until commit; update memory location on commit +Faster abort, fault tolerance, strong atomicity (SW) –Slower commits, indirect reads (SW)
    • atomic { atomic { … … read (a); read (a); read (b); write (c); … … write (a); } }
    • Transactional Memory • Advent of multicore chips • Advent of multicore chips Architecture • Parallel programming • Parallel programming commonplace commonplace • Simplicity vs. wide applicability • Simplicity vs. wide applicability • Full performance not the • Full performance not the Transactional Memory initial driving force, but the initial driving force, but the final objective final objective STM HTM Programming model Functional Applications MSRC Imperative BSC
    • Research onTM •Haskell Transactional •Haskell Transactional Architecture Benchmark Benchmark •RMS applications •RMS applications •Quake-TM •Quake-TM •Atomic-Quake •Atomic-Quake Transactional Memory •Wormbench •Wormbench STM HTM Programming model Functional Applications MSRC Imperative BSC
    • Research on TM •Haskell STM •Haskell STM Architecture •OpenMP + TM •OpenMP + TM Transactional Memory STM HTM Programming model Functional Applications MSRC Imperative BSC
    • Research on TM •HW acceleration of STM •HW acceleration of STM Architecture •Simulators •Simulators •Dynamic filtering •Dynamic filtering •Eazy HTM •Eazy HTM Transactional Memory STM HTM Programming model Functional Applications MSRC Imperative BSC
    • Selected Publications on TM F. Zyulkyarov, T. Harris, O. Unsal, A. Cristal, M. Valero,” Debugging Programs that use Atomic Blocks and Transactional Memory “, 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10), January 2010 S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, M. Valero, “EazyHTM, Eager- Lazy Hardware Transactional Memory”, 42nd International Symposium on Microarchitecture, Micro'09, December 2009 V. Gajinov, F. Zyulkyarov, A. Cristal, O. Unsal, T. Harris, M. Valero, “QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory”, 23rd International Conference on Supercomputing (ICS 2009), June 2009. N. Sonmez, A. Cristal, T. Harris, O. Unsal, M. Valero, “Taking the heat off transactions: dynamic selection of pessimistic concurrency control”, 23th International Parallel and Distributed Systems Symposium, (IPDPS 2009), May 2009, F. Zyulkyarov , V. Gajinov, O. Unsal , A. Cristal , E. Ayguade, T. Harris , M. Valero, “Atomic Quake: Use Case of Transactional Memory in an Interactive Multiplayer Game Server”, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2009. C. Perfumo, N. Sonmez, S. Stipic, O. Unsal, A. Cristal, T. Harris, M. Valero, “The Limits of Software Transactional Memory (STM): Dissecting Haskell STM Applications on a Many-Core Environment”, ACM International Conference on Computing Frontiers, May 2008 E. Vallejo, T. Harris, A. Cristal, O. Unsal, M. Valero, “Hybrid Transactional Memory to Accelerate Safe Lock- based Transactions”, Third ACM SIGPLAN Workshop on Transactional Computing TRANSACT, February 2008 M. Milovanović, R. Ferrer, O. Unsal, A. Cristal, X. Martorell, E. Ayguadé, J. Labarta and M. Valero, “Transactional Memory and OpenMP”, International Journal of Parallel Programming. C. Perfumo, N. Sonmez, O. Unsal, A. Cristal, M. Valero, T. Harris, “Dissecting Transactional Executions in Haskell”, Second ACM Workshop on Transactional Computing TRANSACT, August 2007 T. Harris, A. Cristal, O. Unsal, E. Ayguade, F. Gagliardi, B. Smith, M. Valero, “Transactional Memory: An Overview”, IEEE Micro, May-June 2007
    • FP7 TM-Project VELOX • Goal: Unified TM system stack, Duration: 2008-2011, Budget: 5M Euros • BSC involvement initiated through BSC-MSR Centre • Project participants: – BSC – EPFL – University of Neuchatel – Technical University of Dresden – Tel Aviv University – Chalmers Institute of Technology – AMD – Red-Hat – VirtualLogix • Project observers: – Microsoft – Intel – SUN
    • Empowering TM across the system stack C TM-based Applications and Benchmarks TM-based Applications and Benchmarks (Legacy software migration, TM-aware development, etc.) (Legacy software migration, TM-aware development, etc.) A 7 Application Environments Application Environments B (Application servers, event processing engine, etc.) (Application servers, event processing engine, etc.) 6 Programming Language Programming Language (Contention, isolation, algorithms, etc.) (Adaptive TM, hybrid approaches, etc.) (Contention, isolation, algorithms, etc.) (Adaptive TM, hybrid approaches, etc.) (Java/C/C++ integration, exception handling, etc.) (Java/C/C++ integration, exception handling, etc.) 5 Compiler Compiler (Code generation, bytecode instrumentation, etc.) (Code generation, bytecode instrumentation, etc.) Integration Integration Theory Theory 4 TM Libraries TM Libraries (Word-based, object-based, PLS, TBB, etc.) (Word-based, object-based, PLS, TBB, etc.) 3 Native Libraries Native Libraries (System libraries) (System libraries) 2 Operating System Operating System (Low-level TM support) (Low-level TM support) 1 TM Hardware and Adaptation Layer TM Hardware and Adaptation Layer (Multi-core CPUs, HW+SW transactions, etc.) (Multi-core CPUs, HW+SW transactions, etc.) http://www.velox-project.eu/
    • Thank you! http://www.bsc.es http://www.bscmsrc.eu http://www.velox-project.eu