Context-oriented programming is proposed as a solution for adapting cyber-physical systems to continuously changing environments. It represents environmental situations as contexts that define context-dependent functionality through layered functions. This allows decoupling functionality so that implementations are simpler. The approach is made language-independent through design concepts and programming support like ConesC. Evaluation shows it reduces complexity and coupling while incurring negligible overhead, enabling easier maintenance and verification against environmental evolutions. It has been applied to wildlife tracking and drone coordination.
Low Energy Task Scheduling based on Work StealingLEGATO project
Abstract: Optimizing energy efficiency of parallel execution on computing systems, ranging from server farms, mobile devices to embedded systems, becomes increasingly one of the first-order concerns. A common way to express a parallel application is as a directed acyclic graph (DAG) in which each node represents a task. The problem of such task scheduling on multiprocessor systems is to find the proper execution processors. Especially nowadays asymmetric multiprocessor systems feature different type of cores with different performance and power consumption, e.g. Arm big.LITTLE and Intel Lakefield. However, naive task assignment without considering core types and task features could result in inefficient resources utilization and detrimentally impacts the overall energy consumption. Dynamic task scheduling is a widely used scheduling strategy, which does not require prior knowledge, e.g. architecture heterogeneity, task DAG structure, before execution but makes the decisions during runtime. Work stealing has been proven to be an effective method among dynamic task scheduling with better scalability in larger systems. DVFS is a common technique to achieve better energy efficiency, however, exploiting it costs reconfiguration overhead ranging from tens of microseconds to one millisecond. With fine-grained tasks as small as milliseconds, as required to expose large parallelism, it is not realistic to use DVFS on a per-task level. Also, it shows that the energy consumed in cores’ under-utilized period is significant.
Based on these problem statements, we come up with a low energy task scheduling work stealing runtime based on XiTAO where the system environment configurations are either fixed or managed by the O/S power governors or system administrators. The runtime contains dynamic performance tracing module, idleness tracing module, power profiling module and a task mapping algorithm. The dynamic performance model is able to give the accurate predictions for future tasks given a set of resources. It is independent of platforms and frequencies and achieves scalability and portability. Power profiling helps runtime systems to understand CPU power consumption trends with respect to number/type of cores and frequencies. Idleness tracing presents the real-time status of cores and contributes to the energy conservation of under-utilized period. It also provides the real-time parallel slackness of active cores, which allows the task mapping algorithm to attribute corresponding power consumption on each concurrent running task. The task mapping algorithm integrates the information from above three modules and outputs the predicted best resources placements for ready tasks.
Poster presented by jing Chen at the LEGaTO Final Event: 'Low-Energy Heterogeneous Computing Workshop'
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsPandey_G
Presentation for the paper C-SAW: A Framework for Graph Sampling and Random Walk on GPUs published in SC20.
Paper link: https://arxiv.org/pdf/2009.09103.pdf
Towards Exascale Simulations for Regional-Scale Earthquake Hazard and Riskinside-BigData.com
In this deck from the HPC User Forum in Tucson, David McCallen from LBNL presents: Towards Exascale Simulations for
Regional-Scale Earthquake Hazard and Risk.
"With the major advances occurring in high performance computing, the ability to accurately simulate the complex processes associated with major earthquakes is becoming a reality. High performance simulations offer a transformational approach to earthquake hazard and risk assessments that can dramatically increase our understanding of earthquake processes and provide improved estimates of the ground motions that can be expected in future earthquakes. This work will bring together a multidisciplinary team of earth scientists and earthquake engineers from the DOE national laboratory complex to develop advanced computational tools that will take full advantages of emerging, cutting-edge DOE computational platforms."
Watch the video: https://wp.me/p3RLHQ-ioE
Learn more: https://www.exascaleproject.org/advanced-simulations-for-earthquake-risk-assessment/
and
http://hpcuserforum.com
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
Feedforward Artificial Neural network (FF ANN) for storm surge prediction in North Carolina. Presentation at Coastal Resilience Center by Anton Bezuglov, Ph.D. Usage of TensorFlow and Python with links to the code on GitHub.
Create a Thermal Camera With Python On a Raspberry PiNUS-ISS
Guided by Mr Kenneth Pang, Senior Lecturer & Consultant of NUS-ISS' Software Systems Practice, learn how to use Python and MicroPython through this interesting workshop to build a thermal sensor that will be activated on pre-trained faces.
Low Energy Task Scheduling based on Work StealingLEGATO project
Abstract: Optimizing energy efficiency of parallel execution on computing systems, ranging from server farms, mobile devices to embedded systems, becomes increasingly one of the first-order concerns. A common way to express a parallel application is as a directed acyclic graph (DAG) in which each node represents a task. The problem of such task scheduling on multiprocessor systems is to find the proper execution processors. Especially nowadays asymmetric multiprocessor systems feature different type of cores with different performance and power consumption, e.g. Arm big.LITTLE and Intel Lakefield. However, naive task assignment without considering core types and task features could result in inefficient resources utilization and detrimentally impacts the overall energy consumption. Dynamic task scheduling is a widely used scheduling strategy, which does not require prior knowledge, e.g. architecture heterogeneity, task DAG structure, before execution but makes the decisions during runtime. Work stealing has been proven to be an effective method among dynamic task scheduling with better scalability in larger systems. DVFS is a common technique to achieve better energy efficiency, however, exploiting it costs reconfiguration overhead ranging from tens of microseconds to one millisecond. With fine-grained tasks as small as milliseconds, as required to expose large parallelism, it is not realistic to use DVFS on a per-task level. Also, it shows that the energy consumed in cores’ under-utilized period is significant.
Based on these problem statements, we come up with a low energy task scheduling work stealing runtime based on XiTAO where the system environment configurations are either fixed or managed by the O/S power governors or system administrators. The runtime contains dynamic performance tracing module, idleness tracing module, power profiling module and a task mapping algorithm. The dynamic performance model is able to give the accurate predictions for future tasks given a set of resources. It is independent of platforms and frequencies and achieves scalability and portability. Power profiling helps runtime systems to understand CPU power consumption trends with respect to number/type of cores and frequencies. Idleness tracing presents the real-time status of cores and contributes to the energy conservation of under-utilized period. It also provides the real-time parallel slackness of active cores, which allows the task mapping algorithm to attribute corresponding power consumption on each concurrent running task. The task mapping algorithm integrates the information from above three modules and outputs the predicted best resources placements for ready tasks.
Poster presented by jing Chen at the LEGaTO Final Event: 'Low-Energy Heterogeneous Computing Workshop'
C-SAW: A Framework for Graph Sampling and Random Walk on GPUsPandey_G
Presentation for the paper C-SAW: A Framework for Graph Sampling and Random Walk on GPUs published in SC20.
Paper link: https://arxiv.org/pdf/2009.09103.pdf
Towards Exascale Simulations for Regional-Scale Earthquake Hazard and Riskinside-BigData.com
In this deck from the HPC User Forum in Tucson, David McCallen from LBNL presents: Towards Exascale Simulations for
Regional-Scale Earthquake Hazard and Risk.
"With the major advances occurring in high performance computing, the ability to accurately simulate the complex processes associated with major earthquakes is becoming a reality. High performance simulations offer a transformational approach to earthquake hazard and risk assessments that can dramatically increase our understanding of earthquake processes and provide improved estimates of the ground motions that can be expected in future earthquakes. This work will bring together a multidisciplinary team of earth scientists and earthquake engineers from the DOE national laboratory complex to develop advanced computational tools that will take full advantages of emerging, cutting-edge DOE computational platforms."
Watch the video: https://wp.me/p3RLHQ-ioE
Learn more: https://www.exascaleproject.org/advanced-simulations-for-earthquake-risk-assessment/
and
http://hpcuserforum.com
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
Feedforward Artificial Neural network (FF ANN) for storm surge prediction in North Carolina. Presentation at Coastal Resilience Center by Anton Bezuglov, Ph.D. Usage of TensorFlow and Python with links to the code on GitHub.
Create a Thermal Camera With Python On a Raspberry PiNUS-ISS
Guided by Mr Kenneth Pang, Senior Lecturer & Consultant of NUS-ISS' Software Systems Practice, learn how to use Python and MicroPython through this interesting workshop to build a thermal sensor that will be activated on pre-trained faces.
Pysense: wireless sensor computing in Python?Davide Carboni
PySense aims at bringing wireless sensor (and "internet of things") macroprogramming to the audience of Python programmers. WSN macroprogramming is an emerging approach where the network is seen as a whole and the programmer focuses only on the application logic. The PySense runtime environment
partitions the code and transmits code snippets to the right nodes finding a balance between energy
consumption and computing performances.
Updates on the Fake Object Pipeline for HSC Survey Song Huang
This is the presentation for the development of SynPipe on Feb 2016. SynPipe is a synthetic object pipeline to test the photometric performance of the HSC survey. SynPipe can be found here: https://github.com/dr-guangtou/synpipe
AES encryption on modern consumer architecturesGrigore Lupescu
Specialized cryptographic processors target professional applications and offer both low latency and high throughput at the expense of cost. At the consumer level, a modern SoC embodies several accelerators and vector extensions (e.g. SSE, AES-NI), having a high degree of programmability through multiple APIs (OpenMP, OpenCL, etc). This work explains how a modern x86 system that encompasses several compute architectures (MIMD/SIMD) might perform well compared to a specialized cryptographic unit at the fraction of the cost. The analyzed algorithm is AES (AES-128, AES-256) and the mode of operation is ECB. The initial test system is built around SoC AMD A6 5400K (CPU + integrated GPU), coupled with a discrete GPU – AMD R7 250. Benchmark results compare CPU OpenSSL execution (no AES-NI), CPU AES-NI acceleration, integrated GPU, discrete GPU and heterogeneous combinations of the above processing units. Multiple test results are presented and inconsistencies are explained. Finally based on initial results a system composed only of low-end and low power consumer components is designed, built and tested.
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
Time series related problems have traditionally been solved using engineered features obtained by heuristic processes.
https://www.bigdataspain.org/2017/talk/state-of-the-art-time-series-analysis-with-deep-learning
Big Data Spain 2017
November 16th - 17th
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
This poster presents Weather Research and Forecast (WRF)
model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting
and atmospheric research communities. WRF offers multiple
physics options, one of which is the Long-Wave Rapid Radiative
Transfer Model (RRTM). Even with the advent of large-scale
parallelism in weather models, much of the performance increase
has came from increasing processor speed rather than increased
parallelism. We present an alternative method of scaling model
performance by exploiting emerging architectures like GPGPU
using the fine-grain parallelism. We claim to get much more than
23.71x, performance gain by using asynchronous data transfer,
use of texture memory and the techniques like loop unrolling.
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Kento Aoyama
(Journal Club at AIS Lab. on April 22, 2019)
Reading: “Pi in the sky: Calculating a record-breaking 31.4 trillion digits of Archimedes’ constant on Google Cloud”
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
Pysense: wireless sensor computing in Python?Davide Carboni
PySense aims at bringing wireless sensor (and "internet of things") macroprogramming to the audience of Python programmers. WSN macroprogramming is an emerging approach where the network is seen as a whole and the programmer focuses only on the application logic. The PySense runtime environment
partitions the code and transmits code snippets to the right nodes finding a balance between energy
consumption and computing performances.
Updates on the Fake Object Pipeline for HSC Survey Song Huang
This is the presentation for the development of SynPipe on Feb 2016. SynPipe is a synthetic object pipeline to test the photometric performance of the HSC survey. SynPipe can be found here: https://github.com/dr-guangtou/synpipe
AES encryption on modern consumer architecturesGrigore Lupescu
Specialized cryptographic processors target professional applications and offer both low latency and high throughput at the expense of cost. At the consumer level, a modern SoC embodies several accelerators and vector extensions (e.g. SSE, AES-NI), having a high degree of programmability through multiple APIs (OpenMP, OpenCL, etc). This work explains how a modern x86 system that encompasses several compute architectures (MIMD/SIMD) might perform well compared to a specialized cryptographic unit at the fraction of the cost. The analyzed algorithm is AES (AES-128, AES-256) and the mode of operation is ECB. The initial test system is built around SoC AMD A6 5400K (CPU + integrated GPU), coupled with a discrete GPU – AMD R7 250. Benchmark results compare CPU OpenSSL execution (no AES-NI), CPU AES-NI acceleration, integrated GPU, discrete GPU and heterogeneous combinations of the above processing units. Multiple test results are presented and inconsistencies are explained. Finally based on initial results a system composed only of low-end and low power consumer components is designed, built and tested.
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
Time series related problems have traditionally been solved using engineered features obtained by heuristic processes.
https://www.bigdataspain.org/2017/talk/state-of-the-art-time-series-analysis-with-deep-learning
Big Data Spain 2017
November 16th - 17th
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
This poster presents Weather Research and Forecast (WRF)
model is a next-generation mesoscale numerical weather
prediction system designed to serve both operational forecasting
and atmospheric research communities. WRF offers multiple
physics options, one of which is the Long-Wave Rapid Radiative
Transfer Model (RRTM). Even with the advent of large-scale
parallelism in weather models, much of the performance increase
has came from increasing processor speed rather than increased
parallelism. We present an alternative method of scaling model
performance by exploiting emerging architectures like GPGPU
using the fine-grain parallelism. We claim to get much more than
23.71x, performance gain by using asynchronous data transfer,
use of texture memory and the techniques like loop unrolling.
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Kento Aoyama
(Journal Club at AIS Lab. on April 22, 2019)
Reading: “Pi in the sky: Calculating a record-breaking 31.4 trillion digits of Archimedes’ constant on Google Cloud”
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
The increasing demand for computing power in fields such as biology, finance, machine learning is pushing the adoption of reconfigurable hardware in order to keep up with the required performance level at a sustainable power consumption. Within this context, FPGA devices represent an interesting solution as they combine the benefits of power efficiency, performance and flexibility. Nevertheless, the steep learning curve and experience needed to develop efficient FPGA-based systems represents one of the main limiting factor for a broad utilization of such devices.
In this talk, we present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA.
Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting.
About the Speaker
Jason Cacciatore Senior Software Engineer, Netflix
Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.
Slides from the Linux Conference Australia 2021 conference https://linux.conf.au/schedule/presentation/64/ .
Tempesta TLS is an implementation of TLS handshakes for the Linux kernel. Since the kernel already provides symmetric ciphers, we focus on asymmetric cryptography only, elliptic curves in particular.
Use used the mbed TLS library as the foundation and almost fully rewrote it to make is x40 faster. During our development we also use parts of WolfSSL library. While WolfSSL outperforms OpenSSL, it uses the same algorithms, which are 5-7 years of old. Tempesta TLS uses newer and more efficient algorithms from the modern cryptography research.
While we still improving performance of Tempesta TLS, the implementation already establishes 40-80% more TLS handshakes per second than OpenSSL/Nginx and provides up to x4 lower latency in several tests.
This talk covers following topics with plenty of benchmarks:
* The fundamentals of elliptic curve computations and the most "hot spots"
* Side channel attacks (SCA) and methods to prevent them
* How the recent CPU vulnerabilities impact TLS handshakes
* Basics of the new fast algorithms used in the Tempesta TLS
* The design trade offs in OpenSSL, WolfSSL, mbed TLS, and Tempesta TLS
Slides from a presentation by Monal Daxini at Disney, Glendale CA about Netflix Open Source Software, Cloud Data Persistence, and Cassandra best Practices
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
Giuseppe will present the differences between high-performance and high-throughput applications. High-throughput computing (HTC) refers to computations where individual tasks do not need to interact while running. It differs from High-performance (HPC) where frequent and rapid exchanges of intermediate results is required to perform the computations. HPC codes are based on tightly coupled MPI, OpenMP, GPGPU, and hybrid programs and require low latency interconnected nodes. HTC makes use of unreliable components distributing the work out to every node and collecting results at the end of all parallel tasks.
Visit: https://www.eudat.eu/eudat-summer-school
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
Apache Spark provides an elegant API for developing machine learning pipelines that can be deployed seamlessly in production. However, one of the most intriguing and performant family of algorithms – deep learning – remains difficult for many groups to deploy in production, both because of the need for tremendous compute resources and also because of the inherent difficulty in tuning and configuring.
In this session, you’ll discover how to deploy the Microsoft Cognitive Toolkit (CNTK) inside of Spark clusters on the Azure cloud platform. Learn about the key considerations for administering GPU-enabled Spark clusters, configuring such workloads for maximum performance, and techniques for distributed hyperparameter optimization. You’ll also see a real-world example of training distributed deep learning learning algorithms for speech recognition and natural language processing.Microsoft Cognitive Toolkit (CNTK) inside of Spark clusters on the Azure cloud platform. We’ll discuss the key considerations for administering GPU-enabled Spark clusters, configuring such workloads for maximum performance, and techniques for distributed hyperparameter optimization. We’ll illustrate a real-world example of training distributed deep learning learning algorithms for speech recognition and natural language processing.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
3. Problem
• CPSs are intimately tied to the real world
– Multiple environment dimensions at hand
– Continuously changing environment
• The software must adapt
• Missing design and
programming support
10. Design Concepts
• Context represents a single environmental
situation
• Context group as collection of environmental
situations sharing the same characteristics
Context group
Context Context
19. Evaluation: Run-time Overhead
0
5
10
15
20
25
30
Wildlife tracking Adaptive stack Smart-home
MCUcycles
Context transition overhead
Function call overhead
MCU Overhead
Memory Overhead
0
0.5
1
1.5
2
2.5
3
Wildlife tracking Adaptive stack Smart-home
%
Binary overhead
RAM overhead
Turning an LED on is
8 MCU cycles Max 2.5%!
20. Conclusions and Ongoing Work
• Context as a CPS programming concept:
– Language independent design
– Programming support: ConesC, IDE, Model-checker
• Key results:
– Easier to maintain and to understand
– Verification against environment evolutions
– Negligible performance overhead
• Current work:
– Extending on micro & nano aerial drones
21. Other Activities @POLIMI
• Programming Systems for Coordinating Drones
• Example:
– L. Mottola et al. “Team-level
programming of drone sensor
networks”, in ACM SENSYS 2014
– Real-deployment for aerial
photogrammetry in
archaeological sites
(Aquileia, Italy)
– Currently being extended to
indoor scenarios for tiny devices
“Domus
dei putti
danzanti”
youtu.be/PPDGO-jc0It
22. Other Activities @POLIMI
• Integration and
remote control
– Mission-level
service-oriented
interfaces (REST)
– Remote-control service
oriented interfaces (CoAP)
• Flight control loops
– Reactive programming techniques
– Testing and verification
Atteched to collars of animals, what makes them mobile as drones. They are buttery operated as well, and, with respect to drones, situations are changed even fastly then in this example.
nesC is component-based language, endeed, but there are too much dimentions, changing independently and the functionaluty is relies on these dimentions
Enable it on the r-c platforms, limited multythreading and no memory protection… as in high langs
We claim that our design concepts inspired by COP can significantly simplify the software developing process for WSN. It is worth noticing, that these concepts are language independent and can be implemented in any of them.
Layered function call
The team-level programming model provides a middle ground between programming individual devices and swarm programming, enabling the specification of coordinated actions based on global states (unlike swarm programming), but still without resorting to individual addressing (like when programming individual devices)
We built abstractions and prototypes allowing the integration of aerail drones into larger processes, also using graphical interfaces. The picture shows an example of a composition of individual processing blocks that instruct the drone to take pictures in a given area and then post-process the pictures looking for a certain pattern as well as uploading them on Flickr.