The document discusses the problem of heterogeneous multicore processor (HMP) composition and configuration selection. Emerging HMPs will contain diverse core types, memories and accelerators. This leads to a very large design space of possible HMP configurations. The goal is to explore this design space and select the optimal configuration for goals like performance maximization or energy efficiency maximization, subject to constraints like area. This is a challenging optimization problem due to the extremely large design space involving different core types and numbers. The paper proposes a cross-layer approach to tackle this problem.
Shift Remote: DevOps: Gitlab ci hands-on experience - Ivan Rimac (Barrage)Shift Conference
DevOps tooling and practices are changing every day. Nowadays you can standardize and automate your infrastructure, application delivery, and policies as code. You’ll be ready to adapt quickly—helping your team do their best work faster while staying competitive. Gitlab CI is a modern tool which can help you manage, package, configure and much more with your apps. You can get your infrastructure to play very nice with it. It is designed to improve software development productivity. Topics we will be covering in a talk are pipeline configuration, DAG, components, controls, job configuration.
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...Patrick Bellasi
Power and resource management are key goals for the success of modern battery-supplied multimedia devices. This kind of devices are usually based on SoCs with a wide range of subsystems, that compete in
the usage of shared resources, and offer several power saving capabilities, but need an adequate software support to exploit such capabilities.
This presentation introduces Constrained Power Management (CPM), a cross-layer formal model and framework for power and resource management, targeted to MPSoC-based devices. CPM allows coordination and communication, among applications and device drivers, to reduce energy consumption without compromising QoS. A dynamic and multi-objective optimization strategy is supported, which has been designed to have a negligible overhead on the development process and at run-time.
Shift Remote: DevOps: Gitlab ci hands-on experience - Ivan Rimac (Barrage)Shift Conference
DevOps tooling and practices are changing every day. Nowadays you can standardize and automate your infrastructure, application delivery, and policies as code. You’ll be ready to adapt quickly—helping your team do their best work faster while staying competitive. Gitlab CI is a modern tool which can help you manage, package, configure and much more with your apps. You can get your infrastructure to play very nice with it. It is designed to improve software development productivity. Topics we will be covering in a talk are pipeline configuration, DAG, components, controls, job configuration.
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...Patrick Bellasi
Power and resource management are key goals for the success of modern battery-supplied multimedia devices. This kind of devices are usually based on SoCs with a wide range of subsystems, that compete in
the usage of shared resources, and offer several power saving capabilities, but need an adequate software support to exploit such capabilities.
This presentation introduces Constrained Power Management (CPM), a cross-layer formal model and framework for power and resource management, targeted to MPSoC-based devices. CPM allows coordination and communication, among applications and device drivers, to reduce energy consumption without compromising QoS. A dynamic and multi-objective optimization strategy is supported, which has been designed to have a negligible overhead on the development process and at run-time.
Speaker: Chris Du Preez
Host: Angel Alberici
Youtube: Virtual Muleys (https://www.youtube.com/c/VirtualMuleysOnline/videos)
Meetups: https://meetups.mulesoft.com/events/details/mulesoft-online-group-english-presents-runtime-fabric-rtf-foundations/
Runtime Fabric Foundations. Tune in this time to get a full overview around RTF: architecture, learning paths, tips, how to avoid pitfalls and more. Time to learn. Chris Du Preez will be guiding us through this 50 minutes session!
Anypoint Runtime Fabric is a container service that automates the deployment and orchestration of Mule applications and API gateways. Runtime Fabric runs within a customer-managed infrastructure on AWS, Azure, virtual machines (VMs), and bare-metal servers. (Find out more: https://docs.mulesoft.com/runtime-fabric/1.7/)
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
These are slides from my talk @ DataDay Texas, in Austin on 30 Mar 2013
(http://2013.datadaytexas.com/schedule)
Favorite and Fork PyMADlib on GitHub: https://github.com/gopivotal/pymadlib
MADlib: http://madlib.net
講者:
Jeff Chu (Director of Enterprise Solutions, ARM)
Kan Yan Rong (Technical Expert in Storage and Application) Technology, WDC/SanDisk)
概要:
Jeff from ARM will provide a brief update on the activities furthering Ceph on ARM including some recent progress from ARM as well some increased community activity. After that Chris and Yan from Western Digital/San Disk will be presenting the topic on Ceph Block Performance on Cavium ARM and SATA SSDs.
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Is Multicore Hardware For General-Purpose Parallel Processing Broken?
By Uzi Vishkin
Communications of the ACM, April 2014, Vol. 57 No. 4, Pages 35-39
10.1145/2580945
Massively Parallel RISC-V Processing with Transactional MemoryNetronome
In this talk, we discuss some of the background, and describe the example of a thousand RISC-V harts performing the processing required in a SmartNIC. We show how a RISC-V solution can be tailored with a suitable choice of instruction set features, privilege modes and debug methodology.
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead?
At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
Talk about 4 leading edge projects:
1) Optimal pricing for energy management, online pricing, and truck scheduling @Princeton University
2) Infinite DRAM - RAMCloud @Stanford University
Applications: Extremely low latency and very high bandwidth
a) Facebook like problems with high read AND write rate,
b) advanced analytics, c) what-if scenarios for demand planning
3) Hybrid In-Memory Store @MIT CSAIL
4) Multithreading Real Time Event Platform @MIT Auto-ID Lab
500k events/s and millions of threads in-memory or distributed used for automatic meter reading, online billing, mobile billing and Smart Grid
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
Speaker: Chris Du Preez
Host: Angel Alberici
Youtube: Virtual Muleys (https://www.youtube.com/c/VirtualMuleysOnline/videos)
Meetups: https://meetups.mulesoft.com/events/details/mulesoft-online-group-english-presents-runtime-fabric-rtf-foundations/
Runtime Fabric Foundations. Tune in this time to get a full overview around RTF: architecture, learning paths, tips, how to avoid pitfalls and more. Time to learn. Chris Du Preez will be guiding us through this 50 minutes session!
Anypoint Runtime Fabric is a container service that automates the deployment and orchestration of Mule applications and API gateways. Runtime Fabric runs within a customer-managed infrastructure on AWS, Azure, virtual machines (VMs), and bare-metal servers. (Find out more: https://docs.mulesoft.com/runtime-fabric/1.7/)
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
These are slides from my talk @ DataDay Texas, in Austin on 30 Mar 2013
(http://2013.datadaytexas.com/schedule)
Favorite and Fork PyMADlib on GitHub: https://github.com/gopivotal/pymadlib
MADlib: http://madlib.net
講者:
Jeff Chu (Director of Enterprise Solutions, ARM)
Kan Yan Rong (Technical Expert in Storage and Application) Technology, WDC/SanDisk)
概要:
Jeff from ARM will provide a brief update on the activities furthering Ceph on ARM including some recent progress from ARM as well some increased community activity. After that Chris and Yan from Western Digital/San Disk will be presenting the topic on Ceph Block Performance on Cavium ARM and SATA SSDs.
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Is Multicore Hardware For General-Purpose Parallel Processing Broken?
By Uzi Vishkin
Communications of the ACM, April 2014, Vol. 57 No. 4, Pages 35-39
10.1145/2580945
Massively Parallel RISC-V Processing with Transactional MemoryNetronome
In this talk, we discuss some of the background, and describe the example of a thousand RISC-V harts performing the processing required in a SmartNIC. We show how a RISC-V solution can be tailored with a suitable choice of instruction set features, privilege modes and debug methodology.
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead?
At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
Talk about 4 leading edge projects:
1) Optimal pricing for energy management, online pricing, and truck scheduling @Princeton University
2) Infinite DRAM - RAMCloud @Stanford University
Applications: Extremely low latency and very high bandwidth
a) Facebook like problems with high read AND write rate,
b) advanced analytics, c) what-if scenarios for demand planning
3) Hybrid In-Memory Store @MIT CSAIL
4) Multithreading Real Time Event Platform @MIT Auto-ID Lab
500k events/s and millions of threads in-memory or distributed used for automatic meter reading, online billing, mobile billing and Smart Grid
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
VLSID_2015_DSE_HMP_v3
1. VLSI
Design
&
Embedded
Systems
Conference
January
2015
Bengaluru,
India
Cross-Layer Exploration of
Heterogeneous Multicore
Processor Configurations
Santanu Sarma and N. Dutt
3. Examples of Existing HMPs
Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU
Trend
towards
Heterogeneous
Mul7core
Processors
with
different
core
specializa7on
Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU