Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Conference on Adaptive Hardware and Systems (AHS'14) - FlexTiles Introductions

683 views

Published on

FlexTiles is a FP7 Project with the goal of designing a tool-chain for the design of a 3D SoC and prototype on a FPGA Development Platform. This presentation covers the "why, how, when and where" of the Project that will complete in Year 2015

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Conference on Adaptive Hardware and Systems (AHS'14) - FlexTiles Introductions

  1. 1. www.thalesgroup.com Research & Technology 2014/07/14/PhM Designing Sophisticated Signal Processing Architectures for challenging real-time applications The FP7 - FlexTiles project www.flextiles.eu Philippe MILLET, PhD, AHS 2014 philippe.millet@thalesgroup.com www.thalesgroup.com
  2. 2. /2 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM FlexTiles Workshop FlexTiles : Self-Adaptive Heterogeneous Many-Core Technology Based on Flexible Tiles Workshop On Friday 18th in the morning (9:00 - 13:00) • 3-D Stacked Chip Technology and Strategies for Optimal Usage of Through Silicon Vias (TSV) • FlexTiles Simulating Environment Based on Open Virtual Platform (OVP) • Low-Power DSP Accelerator Embedded in a Heterogeneous Many-Core Architecture. • Dynamically Reconfigurable Embedded FPGA System • FPGA-Based Emulation of FlexTiles Platform • Demonstration: OVP Simulation of the FlexTiles Platform
  3. 3. /3 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Some challenging applications within THALES Cognitive radio Source: the India economy review Adapt continuously the frequency and protocol to available ones Avoid jammers or obfuscated communications
  4. 4. /4 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Some challenging applications within THALES Smart camera Highway: follow cars, detect traffic jam or accidents Airport : find and follow people, detect abandoned luggage, strange or dangerous behaviours. Dynamicity depends on the number of detections Cameras have local processing capability to send data only when something "interesting" has been detected.
  5. 5. /5 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Some challenging applications within THALES UAV Autonomous, take decisions without or with low control. React to the environment. Self-repair. Adapt the mission to what the UAV finds. Activate software parts to match the actual situation. The software is dynamically activated and mapped to the available resources.
  6. 6. /6 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Real-time embedded products at THALES Embedded Real-Time Market  low power consumption  target in a range from 10W - 40W  some products are designed with <1W (low adaptivity)  General Purpose Processors are too hungry  low volumes (less than 1000 pieces/year)  designing dedicated ASIC is not an option  long life-time (~20 years)  Long Life No Maintenance  hardware upgrade or retrofit must cost as little as possible  programmable device is preferred
  7. 7. /7 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Some challenging applications within THALES Embedded Real-Time Market  low power consumption  low volumes  long life-time (~20 years) Adapt to environment  dynamicity, flexibility & dependability Smart cameraCognitive radio UAV We need more than static dataflow. We need adaptability in the software as well as in the hardware Source: the India economy review
  8. 8. /8 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Homogeneous Manycore a solution? One way to get high performance / watt is parallelism. • Instead of 1 big core with high computation power but also high power consumption, get more "smaller" cores in parallel
  9. 9. /9 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Homogeneous manycores: Good at Parallelism Parallelisation: raise computing power / lower power consumption. Homogeneity eases programming (C-Like + tools) but: Maximum performance only with static application. automatic optimisation (data parallelism) static allocation and scheduling. Else  Average performances / No guaranty Tilera - Tile-Gx72 – 72 cores  C/C++ Nvidia - Kepler 2000+ cores  OpenCL/CUDA (C like+kernels) Kalray - MPPA - 256/1024 cores  SigmaC (C++ like for dataflow) source: www.tilera.com http://www.kalray.eu source: www.nvidia.fr
  10. 10. 0 /10 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Manycore is a main issue for the industry  Programmability (industrial view):  Time to market  SW Development costs  Reuse of legacy code  What about Manycores?  Homogeneous?  Heterogeneous?
  11. 11. 1 /11 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Manycore is a main issue for the industry  Programmability (industrial view):  Time to market  SW Development costs  Reuse of legacy code  What about Manycores?  Homogeneous?  Heterogeneous? Why taking risks with Manycores ? We want to continue like in the good days: compile “without thinking” and get performances (keep it as long/simple as possible) !
  12. 12. 2 /12 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Manycore No more choice, we HAVE TO jump ManyCores! Problem Solved...?
  13. 13. 3 /13 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Manycore No more choice, we HAVE TO jump ManyCores! Problem Solved...? WAIT!
  14. 14. 4 /14 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Parallelisation is not enough: did we miss something? Homogeneous?
  15. 15. 5 /15 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Challenge PROCESSORS (GPPs) FPGA DSP available architectures: already homogeneous systems With ManyCores and integration, the architectures are changing...
  16. 16. 6 /16 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Challenge PROCESSORS FPGA DSP Source: http://www.gamearenaph.com Source: http://www.vision.caltech.edu APPLICATIONS computation demanding applications Usual way: put as many resources as necessary to execute the application in any situation. => hardware must allow the hardest case to execute Dynamicity: => the hardest case is unknown => too costly, too heavy, too high power consumption.
  17. 17. 7 /17 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Challenge PROCESSORS FPGA DSP Source: http://www.gamearenaph.com Source: http://www.vision.caltech.edu APPLICATIONS Source: http://www.funtoosh.com how can we fit big applications in the hardware? How to efficiently map complex applications to heterogeneous many-core architectures with limited budget (power, performance, …) ??? LIMITED BUDGET Source: http://www.lnci.org.au
  18. 18. www.thalesgroup.com Research & Technology 2014/07/14/PhM www.flextiles.eu Philippe MILLET philippe.millet@thalesgroup.com Project coordinator: THALES Funding budget: 3,670,000€ Starting date: 15/10/2011 Duration: 36 months (42) www.thalesgroup.com
  19. 19. 9 /19 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Consortium and questions Partners & Third Party Country Main scientific and technical contributions THALES France Infrastructure and applications KIT Germany Virtualisation layer TUE Netherlands Kernel ; NoC CSEM Switzerland DSP CEA France NoC ; 3D stacking UR1 France Reconfigurable technology SUNDANCE United Kingdom FPGA Demonstrator ACE Netherlands Parallelisation and compilation Tools RUB Germany Integration FPGA scheduling 9 partners in 5 countries
  20. 20. 0 /20 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Did I mension our FlexTiles Workshop? FlexTiles : Self-Adaptive Heterogeneous Many-Core Technology Based on Flexible Tiles Workshop On Friday 18th in the morning (9:00 - 13:00) • 3-D Stacked Chip Technology and Strategies for Optimal Usage of Through Silicon Vias (TSV) • FlexTiles Simulating Environment Based on Open Virtual Platform (OVP) • Low-Power DSP Accelerator Embedded in a Heterogeneous Many-Core Architecture. • Dynamically Reconfigurable Embedded FPGA System • FPGA-Based Emulation of FlexTiles Platform • Demonstration: OVP Simulation of the FlexTiles Platform
  21. 21. 1 /21 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM A Town close to Madrid
  22. 22. 2 /22 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Customized/Customizable chips vs. FPGA Xilinx – ZYNQ : FPGA with a dual ARM A9 core  MPCore with reconfiguration capabilities ClusterCluster ClusterCluster ClusterCluster ClusterCluster ClusterCluster ClusterCluster ClusterCluster ClusterCluster ClusterCluster Fabric Controller core Fabric Controller core Fabric GOOD Parallelization POOR Customization POOR Parallelization GOOD Customization ST – P2012 aka STHORM (Heterogeneous manycore fabric)  Once done: Dedicated to a specific domain of applications  Affordable only for large series of products. Main issue: Domain dedication idem with MPSoCs (TI-OMAPs)
  23. 23. 3 /23 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM FlexTiles Proposes A 3D stacked chip based on:  A manycore layer  GPPs  DSPs  A FPGA layer  A 3D-NoC GOOD Parallelization GOOD Customization Customization at low price Opportunity: self adaptive capabilities  Future application needs
  24. 24. 4 /24 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Self adaptive?  Adapt the architecture to application requests at "real-time"  Improve yield and extend life-time of sub-micron technologies  Fault tolerance  Increase energy efficiency  give the right task to the best available processor  finalize the mapping at runtime  Temperature management  re-mapping  Triplication, voting  fault / error detection  Self-repair  re-mapping taking dead cores into account How to program it?
  25. 25. 5 /25 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Holistic Approach: Model of Execution Model of ComputationModel of Computation Optimisation toolsOptimisation tools Programming Efficiency Self-Adaptive Capabilities Relocation strategiesRelocation strategies Model of programmationModel of programmation Flexible HardwareFlexible Hardware Common InterfacesCommon Interfaces Model of Execution
  26. 26. 6 /26 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Model of Execution Master NodesMaster Nodes Slave NodesSlave Nodes GPP nodes eFPGA nodes DSP nodes GPP Node accelerator node NI NoC NI Accelerator Interface (AI) acc requests control / status control / status DMA DMA requests data Master-slave execution model AI HW / SW independency regarding accelerator specificities
  27. 27. 7 /27 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Model of Computation & Model of Programmation Optimisation toolsOptimisation tools Programming Efficiency Self-Adaptive Capabilities Relocation strategiesRelocation strategies Flexible HardwareFlexible Hardware Common InterfacesCommon Interfaces Model of Computation Model of ExecutionModel of Execution Model of programmation
  28. 28. 8 /28 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM : Clusters group managed by a state management : Cluster group input/output ActAct ActAct ActAct ActAct ActAct ActAct ActActActAct state 1state 1 state 2state 2 state 3state 3 states managementstates management cluster groupcluster group event Model of Computation & Model of Programmation Optimisation and parallelisation tools work on static applications find static clusters inside the applications based on SDF/CSDF MoC Bring Dynamicity with higher hierarchical level : actor ~ task or tasks : static cluster ActAct : Cluster input/output actor: consumes and produces token of data with predefined and static rules SDF, CSDF MoC
  29. 29. 9 /29 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM ActAct sensor data states managementstates management event ActAct state 1state 1 nopnop state 1state 1 states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event Model of Programmation : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  30. 30. 0 /30 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Dynamicity at cluster group level ActAct sensor data states managementstates management event ActAct state 1state 1 nopnop state 1state 1 states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r ActAct ActAct ActAct state 1.1state 1.1 ActAct ActAct ActAct ActAct ActAct state 1.2state 1.2 ActAct ActAct g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  31. 31. 1 /31 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Start a new part of the application ActAct sensor data states managementstates management event ActAct state 1state 1 states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r ActAct ActAct ActAct state 1.1state 1.1 ActAct ActAct ActAct ActAct ActAct state 1.2state 1.2 ActAct ActAct g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event ActAct ActAct ActAct state 2state 2 ActAct : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  32. 32. 2 /32 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Modification of the behaviour sensor data states managementstates management event states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r ActAct ActAct ActAct state 1.1state 1.1 ActAct ActAct ActAct ActAct ActAct state 1.2state 1.2 ActAct ActAct g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event ActAct ActAct ActAct state 2state 2 ActAct ActAct ActAct ActAct state 2state 2 : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  33. 33. 3 /33 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Modification of the parallelisation level sensor data states managementstates management event states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event ActAct ActAct ActAct state 2state 2 ActAct ActAct ActAct ActAct state 2state 2 : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  34. 34. 4 /34 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM ActAct sensor data states managementstates management event ActAct state 1state 1 states managementstates management states managementstates management ActAct ActAct ActAct state 2state 2 ActAct ActAct states managementstates management event ActAct ActAct ActAct state 1state 1 ActAct ActAct states managementstates management ActAct ActAct ActAct state 1state 1 ActAct ActActs c a t t e r s c a t t e r ActAct ActAct ActAct state 1.1state 1.1 ActAct ActAct ActAct ActAct ActAct state 1.2state 1.2 ActAct ActAct g a t h e r g a t h e r sensor data cluster group 3cluster group 3 cluster group 4cluster group 4 cluster group 5cluster group 5 cluster group 2cluster group 2 cluster group 1cluster group 1 event event event ActAct ActAct ActAct state 2state 2 ActAct Model of Programmation : Actor : static cluster ActAct : Clusters group managed by one state management : Cluster group input/output : Cluster input/output
  35. 35. 5 /35 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Programming efficiency: Model of Computation Programming Efficiency Self-Adaptive Capabilities Relocation strategiesRelocation strategies Model of programmationModel of programmation Flexible HardwareFlexible Hardware Common InterfacesCommon Interfaces Model of ExecutionModel of Execution Optimisation tools Model of ComputationModel of Computation
  36. 36. 6 /36 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Application (C code) Application (C code) C to SpearDE representation Conversion (Thales) C to SpearDE representation Conversion (Thales) Data parallelisation Mapping (Thales)Data parallelisation Mapping (Thales) Graphic input (manual) + C kernels Graphic input (manual) + C kernels Streaming optimisation (ACE) Streaming optimisation (ACE) Compilation & Link (ACE) Compilation & Link (ACE) architecture representation architecture representation Master coresMaster coresSlave coresSlave cores Library of IPsLibrary of IPs Tool flow and MoC Tool flow based Programming efficiency: Model of Computation Binaries Acc compiler or C2VHDL tools (CSEM / UR1 / RUB) Acc compiler or C2VHDL tools (CSEM / UR1 / RUB)
  37. 37. 7 /37 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Programming efficiency: Model of Computation Programming Efficiency Self-Adaptive Capabilities Relocation strategiesRelocation strategies Model of programmationModel of programmation Flexible HardwareFlexible Hardware Model of ExecutionModel of ExecutionModel of ComputationModel of Computation Common Interfaces Optimisation toolsOptimisation tools
  38. 38. 8 /38 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Modularity and scalability: common interfaces Homogeneous GPP nodes Heterogeneous accelerators nodes GPP Node AI DSP Node NI GPP Node NI NoC NI NI NI AI AI NI Config. Ctrl. DDR Ctrl. NI GPP Node NI I/O NI Generic Interfaces eFPGA Domain (Reconfigurable HW acc.) Dedicated Accelerator Node Dedicated Accelerator Node
  39. 39. 9 /39 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM GPP Node AI DSP eFPGA Domain (Reconfigurable HW acc.) NI GPP Node NI NoC NI NI NI AI AI NI Config. Ctrl. DDR Ctrl. NI Tile Tile GPP Node NI I/O NI TILE TILE AIAI Accelerator Interface Interpret requests from GPP NINI Network Interface Interfaces a node with NoC Modularity and scalability: common interfaces
  40. 40. 0 /40 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Relocation Strategies Programming Efficiency Self-Adaptive Capabilities Model of programmationModel of programmation Flexible HardwareFlexible Hardware Model of ExecutionModel of ExecutionModel of ComputationModel of Computation Optimisation toolsOptimisation tools Relocation Strategies Common InterfacesCommon Interfaces
  41. 41. 1 /41 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM A1.1A1.1 A2.1A2.1 A3A3 A5A5 A4A4 A1.2A1.2 A2.2A2.2 A1.3A1.3 A2.3A2.3 A1.4A1.4 A2.4A2.4 •FPGA•FPGA •GPP•GPP •FPGA•FPGA A1.1A1.1 A2.1A2.1 A3A3 A5A5 A4A4 A1.2A1.2 A2.2A2.2 A1.3A1.3 A2.3A2.3 A1.4A1.4 A2.4A2.4 •DSP•DSP •GPP•GPP •DSP•DSP A1.1A1.1 A2.1A2.1 A3A3 A5A5 A4A4 A1.2A1.2 A2.2A2.2 A1.3A1.3 A2.3A2.3 A1.4A1.4 A2.4A2.4 •DSP•DSP •DSP•DSP •DSP•DSP timerelocation relocation relocation Relocation Strategies
  42. 42. 2 /42 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Self-adaptation Accelerator/Virtual Code Dynamic allocation / binding DIAGNOSIS O = F(L) ACTION SYSTEM MONITORING GPP Node AI DSP Node NI GPP Node NI NoC NI NI NI AI AI NI Config. Ctrl. DDR Ctrl. NI GPP Node NI I/O NI Dedicated Accelerator Node Dedicated Accelerator Node eFPGA Domain (Reconfigurable HW acc.)
  43. 43. 3 /43 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Flexible Hardware Programming Efficiency Self-Adaptive Capabilities Model of programmationModel of programmation Model of ExecutionModel of ExecutionModel of ComputationModel of Computation Optimisation toolsOptimisation tools Common InterfacesCommon Interfaces Flexible Hardware Relocation strategiesRelocation strategies
  44. 44. 4 /44 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Tile Tile Tile Tile Tile Tile Tile Tile Tile New dynamic reconfigurable technology Homogeneous manycore NoC FlexTiles: a 3D stack chip 3D stacked reconfigurable layer
  45. 45. 5 /45 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Tile Tile Tile Tile Tile Tile Tile Tile Tile New dynamic reconfigurable technology 3D stacked reconfigurable layer Homogeneous manycore NoC FlexTiles: a 3D stack chip Map Accelerated functions
  46. 46. 6 /46 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Tile Tile Tile Tile Tile Tile Tile Tile Tile New dynamic reconfigurable technology 3D stacked reconfigurable layer Homogeneous manycore NoC FlexTiles: a 3D stack chip Duplicate
  47. 47. 7 /47 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Tile Tile Tile Tile Tile Tile Tile Tile Tile New dynamic reconfigurable technology 3D stacked reconfigurable layer Homogeneous manycore NoC FlexTiles: a 3D stack chip Migrate
  48. 48. 8 /48 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM 3D Network Programming Efficiency Self-Adaptive Capabilities Model of programmationModel of programmation Model of ExecutionModel of ExecutionModel of ComputationModel of Computation Optimisation toolsOptimisation tools Common InterfacesCommon Interfaces Flexible Hardware Relocation strategiesRelocation strategies The Flexibility of the tile is based on the capabilities of the 3D Network
  49. 49. 9 /49 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM did you say 3D?
  50. 50. 0 /50 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM NoC QoS chipchip GPP icache dcache dLMEM GPP NI iLMEM eFPGA eFPGA dLMEM eFPGA iLMEM DSP DSP dLMEM DSP DDR NI + DDR ctrl on chip shMEM NI NI control NOC bitstream NOC data NOC instruction NOC test/debug NOC Avoid Bus contention, QoS depends on what you send through the NoC
  51. 51. 1 /51 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM ANoC (CEA) GALS: asynchronous logic in nodes, local synchronous cores -highly scalable -between nodes: no global clock, not even local clock -power efficient and dependable -packet switching -wormhole protocol -low latency
  52. 52. 2 /52 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM AElite NoC (TUe) Guaranteed levels of services and performances Contention free routing by construction - wormhole routing specified at design time Globally Synchronous with time slots
  53. 53. 3 /53 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Demonstration On a FPGA board provided by Sundance we demonstrate the self adaptive capabilities of the solution. An OVP simulator is also available.
  54. 54. 4 /54 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Demonstration: Building the HW platform
  55. 55. 5 /55 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM FlexTiles Development Board Virtex6 FPGA 2 Virtex6 FPGA 1 Implementation of multicore Implementation of accelerators Aurora or Ethernet Demonstration
  56. 56. 6 /56 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Demonstration FlexTiles Board FPGA 2FPGA 1 FPGA 1 N O C N I A I Acc FPGA 2 AURORA interface AURORA interface N I Multi GPP core A I Acc
  57. 57. 7 /57 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM FPGA BOARDFPGA BOARD NoC Monitor uBlaze DMA 2x uBlaze DMA 2x 256kbyte 256kbyte 8 kbyte 8 kbyte 8 kbyte 8 kbyte 256kbyte 256kbyteTile 1 Tile 2 Host PC Debug Link 256kbyte Shared Memory
  58. 58. 8 /58 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Demonstration: monitoring
  59. 59. 9 /59 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Conclusion FlexTiles … a complete platform Virtualisation layer Virtualisation layer relocatable binary coderelocatable binary code Parallelisation, partioningParallelisation, partioning Application Hardware Nodes CompilationCompilation Synthesis, P&RSynthesis, P&R relocatable bitstreamrelocatable bitstream Hardware Abstraction Layer Hardware Abstraction Layer API Operating Library API KernelKernel Resource Monitoring & Allocation Resource Monitoring & Allocation DIAGNOSIS O = F(L) ACTION SYSTEM toolchain operating library heterogenous manycore MONITORING
  60. 60. 0 /60 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Conclusion Parallelisation is the only way to reach HPC for low power consumption. But parallelism is not enough, customisation is also necessary  Only affordable for high volumes Reconfigurable customisation is the solution:  Increase accessibility to heterogeneous manycore technology  Offers self-adaptive capabilities
  61. 61. 1 /61 / 60 TheinformationcontainedinthisdocumentandanyattachmentsarethepropertyofTHALES.Youareherebynotifiedthatanyreview,dissemination,distribution,copyingor otherwiseuseofthisdocumentisstrictlyprohibitedwithoutThalespriorwrittenapproval.©THALES2011.Templatetrtpversion7.0.8 2014/07/14/PhM Come visit us next Friday Morning FlexTiles : Self-Adaptive Heterogeneous Many-Core Technology Based on Flexible Tiles Workshop On Friday 18th morning (9:00 - 13:00)

×