The document discusses challenges for implementing persistent memory (PMEM) aware applications using the Persistent Memory Development Kit (PMDK). It describes how to use PMDK with Direct Access (DAX) filesystems and outlines challenges in rewriting PostgreSQL to use PMEM, including resizing checkpoint files and selecting appropriate sync functions for write ahead logging (WAL) files. Performance evaluation challenges are also discussed.
This document summarizes a presentation on introducing the Persistent Memory Development Kit (PMDK) into PostgreSQL to utilize persistent memory (PMEM). The presentation covers: (1) hacking the PostgreSQL write-ahead log (WAL) and relation files to directly memory copy to PMEM, (2) evaluating the hacks which showed a 3% improvement to transactions and 30% reduction to checkpoint time, and (3) tips for PMEM programming like cache flushing and avoiding volatile layers.
The document discusses applying RDMA (Remote Direct Memory Access) to improve performance in distributed deep learning frameworks. It describes implementing RDMA in MXNet, a distributed deep learning framework that uses a parameter server model. The implementation reduces memory copies and network overhead. Initial results showed a 1.5x speedup over the initial RDMA implementation, but the existing implementation using ZeroMQ was still faster. Further optimizations to RDMA are needed to fully realize its performance benefits.
This document discusses using Fluentd and Norikra to collect, process, and summarize OpenStack logs. Fluentd is used to collect logs from OpenStack components like Nova and forward them to Norikra for processing. Norikra allows logs to be queried and aggregated using SQL. It can summarize logs by hostname, log level, and message to detect issues. Notifications of warnings or errors can then be sent via tools like Slack to alert operators. Together, Fluentd and Norikra provide a scalable log management system that makes it easier to monitor OpenStack deployments and detect problems in large, high-volume log streams.
Using Storlets/Docker For Large Scale Image ProcessingKota Tsuyuzaki
OpenStack Storlets, one of the OpenStack official project to provide Function as a Service-like computation environment on top of OpenStack Swift, leverages a user defined code computation environment inside the object storage system in a secure and isolated manner through the use of Docker containers. Docker container has pretty benefited mechanisms for the perspective; high-performance computing (HPC), secure, and efficient network transfer to run the user code. Therefore, Storlets allows users to invoke their HPC applications much more efficiently without downloading the data binary from their object storage system to their computing resources.
In this presentation, we introduce high level architecture of OpenStack Storlets, and use cases of NTT, the telecommunications company in Japan. Especially, we focus on our commercial image processing applications for data analytics, which use a large amount of picture data growing to approximately PB scales.
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...VirtualTech Japan Inc.
タイトル:NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack
アジェンダ:
- Current Challenge
-- DOCOMO Cloud Platform
-- BizDevOps
- Challenge for the future
-- DOCOMO 5G Open Cloud
-- Next Challenge
Transport layer development kit ( on top of DPDK by Intel)
Provide set of libraries for L4 protocol processing (UDP, TCP etc.) and VPP graph nodes, plugins, etc using those libraries to implement a host stack.
The FD.io TLDK project scope is:
The project scope includes implementing a set of libraries for L4 protocol processing (UDP, TCP etc.) for both IPv4 and IPv6.
The project scope includes creating VPP graph nodes, plugins etc using those libraries to implement a host stack.
The project scope includes such mechanisms (netlink agents, packaging, etc) necessary to make the resulting host stack easily usable by existing non-vpp aware software.
The document discusses new features of OpenStack Swift object storage and OpenStack Storlets. It summarizes global erasure coding in OpenStack Swift, which improves storage efficiency and reliability. It also discusses Storlets, which allow running compute logic directly on Swift storage nodes to process objects. The presentation provides an overview of these features and recommends related documentation for further reference.
This document summarizes a presentation on introducing the Persistent Memory Development Kit (PMDK) into PostgreSQL to utilize persistent memory (PMEM). The presentation covers: (1) hacking the PostgreSQL write-ahead log (WAL) and relation files to directly memory copy to PMEM, (2) evaluating the hacks which showed a 3% improvement to transactions and 30% reduction to checkpoint time, and (3) tips for PMEM programming like cache flushing and avoiding volatile layers.
The document discusses applying RDMA (Remote Direct Memory Access) to improve performance in distributed deep learning frameworks. It describes implementing RDMA in MXNet, a distributed deep learning framework that uses a parameter server model. The implementation reduces memory copies and network overhead. Initial results showed a 1.5x speedup over the initial RDMA implementation, but the existing implementation using ZeroMQ was still faster. Further optimizations to RDMA are needed to fully realize its performance benefits.
This document discusses using Fluentd and Norikra to collect, process, and summarize OpenStack logs. Fluentd is used to collect logs from OpenStack components like Nova and forward them to Norikra for processing. Norikra allows logs to be queried and aggregated using SQL. It can summarize logs by hostname, log level, and message to detect issues. Notifications of warnings or errors can then be sent via tools like Slack to alert operators. Together, Fluentd and Norikra provide a scalable log management system that makes it easier to monitor OpenStack deployments and detect problems in large, high-volume log streams.
Using Storlets/Docker For Large Scale Image ProcessingKota Tsuyuzaki
OpenStack Storlets, one of the OpenStack official project to provide Function as a Service-like computation environment on top of OpenStack Swift, leverages a user defined code computation environment inside the object storage system in a secure and isolated manner through the use of Docker containers. Docker container has pretty benefited mechanisms for the perspective; high-performance computing (HPC), secure, and efficient network transfer to run the user code. Therefore, Storlets allows users to invoke their HPC applications much more efficiently without downloading the data binary from their object storage system to their computing resources.
In this presentation, we introduce high level architecture of OpenStack Storlets, and use cases of NTT, the telecommunications company in Japan. Especially, we focus on our commercial image processing applications for data analytics, which use a large amount of picture data growing to approximately PB scales.
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...VirtualTech Japan Inc.
タイトル:NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack
アジェンダ:
- Current Challenge
-- DOCOMO Cloud Platform
-- BizDevOps
- Challenge for the future
-- DOCOMO 5G Open Cloud
-- Next Challenge
Transport layer development kit ( on top of DPDK by Intel)
Provide set of libraries for L4 protocol processing (UDP, TCP etc.) and VPP graph nodes, plugins, etc using those libraries to implement a host stack.
The FD.io TLDK project scope is:
The project scope includes implementing a set of libraries for L4 protocol processing (UDP, TCP etc.) for both IPv4 and IPv6.
The project scope includes creating VPP graph nodes, plugins etc using those libraries to implement a host stack.
The project scope includes such mechanisms (netlink agents, packaging, etc) necessary to make the resulting host stack easily usable by existing non-vpp aware software.
The document discusses new features of OpenStack Swift object storage and OpenStack Storlets. It summarizes global erasure coding in OpenStack Swift, which improves storage efficiency and reliability. It also discusses Storlets, which allow running compute logic directly on Swift storage nodes to process objects. The presentation provides an overview of these features and recommends related documentation for further reference.
Vector Packet Technologies such as DPDK and FD.io/VPP revolutionized software packet processing initially for discrete appliances and then for NFV use cases. Container based VNF deployments and it's supporting NFV infrastructure is now the new frontier in packet processing and has number of strong advocates among both traditional Comms Service Providers and in the Cloud. This presentation will give an overview of how DPDK and FD.io/VPP project are rising to meet the challenges of the Container dataplane. The discussion will provide an overview of the challenges, recent new features and what is coming soon in this exciting new area for the software dataplane, in both DPDK and FD.io/VPP!
About the speaker: Ray Kinsella has been working on Linux and various other open source technologies for about twenty years. He is recently active in open source communities such as VPP and DPDK but is a constant lurker in many others. He is interested in the software dataplane and optimization, virtualization, operating system design and implementation, communications and networking.
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this talk, we outline a kernel and upstream centric approach to data plane acceleration using an upstream SmartNIC BPF JIT. This allows extended Berkeley Packet Filter (eBPF) bytecode to be transparently offloaded to the SmartNIC from either the Traffic Control (TC) or Express Data Path (XDP) hooks in the kernel and could be used for applications such as DoS protection, load balancing and software switching e.g., Open vSwitch (OVS). We then follow this by outlining the proposed ICONICS OCP contribution related to an open approach for reconfiguration using directly compiled SmartNIC programs in situations where BPF bytecode alone is not sufficient to accommodate changing semantics in the network.
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
Summit 16: ARM Mini-Summit - Efficient NFV solutions for Cloud and Edge - CaviumOPNFV
The document discusses efficient NFV solutions using ARM technology for cloud and edge environments. It provides an overview of Cavium's ARM processor roadmap, NFV reference platforms, ecosystem enablement activities, and proof-of-concept demonstrations for centralized and distributed NFV deployments including virtualized baseband units and mobile-CORD.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
Fueling the datasphere how RISC-V enables the storage ecosystemRISC-V International
This document summarizes Seagate's work with RISC-V processors for storage applications. It discusses Seagate's history with custom CPUs and reasons for adopting RISC-V. Seagate has developed two RISC-V cores - a high-performance out-of-order core currently powering a hard drive demonstration, and an area-optimized in-order core for auxiliary workloads. RISC-V allows innovation for real-time processing and security at the edge by enabling domain-specific architectures. The talk promotes collaboration and involvement in the open RISC-V ecosystem.
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
Event: Arm Architecture HPC Workshop by Linaro and HiSilicon
Location: Santa Clara, CA
Speaker: Andrew J Younge
Talk Title: Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing
Talk Desc: The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community. Specifically, there is a need to expand the supercomputing ecosystem by investing and developing emerging, yet-to-be-proven technologies and address both hardware and software challenges together, as well as to prove-out the viability of such novel platforms for production HPC workloads.
The first deployment of the Vanguard program will be Astra, a prototype Petascale Arm supercomputer to be sited at Sandia National Laboratories during 2018. This talk will focus on the arthictecural details of Astra and the significant investments being made towards the maturing the Arm software ecosystem. Furthermore, we will share initial performance results based on our pre-general availability testbed system and outline several planned research activities for the machine.
Bio: Andrew Younge is a R&D Computer Scientist at Sandia National Laboratories with the Scalable System Software group. His research interests include Cloud Computing, Virtualization, Distributed Systems, and energy efficient computing. Andrew has a Ph.D in Computer Science from Indiana University, where he was the Persistent Systems fellow and a member of the FutureGrid project, an NSF-funded experimental cyberinfrastructure test-bed. Over the years, Andrew has held visiting positions at the MITRE Corporation, the University of Southern California / Information Sciences Institute, and the University of Maryland, College Park. He received his Bachelors and Masters of Science from the Computer Science Department at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively.
Data Plane and VNF Acceleration Mini Summit Open-NFP
This event will deliver technical sessions related to the acceleration of server-based networking data planes and VNFs using SmartNIC hardware within the framework of the NFVi and orchestration components in the OPNFV Danube software platform. Virtual switching data planes using Open vSwitch (OVS) and FD.IO (with VPP) will be discussed and offload architectures for such data planes into SmartNIC platforms will be presented. Extension of such data planes for VNF specific requirements utilizing micro-VNF sandbox functions utilizing P4 and/or C language programming will be presented and discussed. Early work defining VNF acceleration architectures and APIs for SmartNIC-accelerated data planes and VNF sandbox extensions will be presented and discussed, followed by a proposal to create a collaborative project to complete the definition of such an API within the frameworks of OPNFV and ETSI. In addition, a proposal for a Pharos community lab related VNF acceleration will be put forward by the Open-NFP organization.
In this deck, Ronald P. Luijten from IBM Research in Zurich presents: DOME 64-bit μDataCenter.
I like to call it a datacenter in a shoebox. With the combination of power and energy efficiency, we believe the microserver will be of interest beyond the DOME project, particularly for cloud data centers and Big Data analytics applications."
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size. Not only is the microserver compact, it is also very energy-efficient.
Watch the video: http://wp.me/p3RLHQ-gJM
Learn more: https://www.zurich.ibm.com/microserver/
Sign up for our insideHPC Newsletter: http://insideHPC/newsletter
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...KTN
This document provides a summary of a presentation about quantized neural network inference on FPGAs using FINN and LogicNets. It discusses:
- Xilinx Research Labs in Dublin and their work quantifying machine learning applications on Xilinx devices.
- How neural network quantization can improve efficiency by reducing precision while trading off accuracy, and how this is well-suited for FPGAs.
- The FINN toolflow which includes quantization-aware training in PyTorch with Brevitas, the FINN compiler to map networks to hardware, and deployment with PYNQ.
- LogicNets which further improves efficiency by unfolding DNNs into fully pipelined datapath circuits for
NFV-SDN projects provide open source solutions for NFV and SDN. There are several initiatives focused on different components of the NFV architecture including OpenStack Tacker for VNF management, OpenBaton for NFV orchestration, and ETSI OSG proposing an open source MANO implementation. OPNFV is expanding its scope to include all ETSI NFV functional blocks including MANO. Individual open source projects like OpenMANO and Cloudify also provide NFV orchestration functionality.
Enabling accelerated networking - seminar by Enea at the Embedded Conference ...EneaSoftware
The open source revolution brings a wealth of features and functionality at a rapid growth, and industry leaders come together to shape standardized interfaces, protocols, and ways of working. ARM is such a player and drives several collaborative projects under the Linaro (http://www.linaro.org) umbrella. One such initiative is the ODP (http://www.opendataplane.org/) open-source, cross-platform set of application programming interfaces for the networking data plane.
This document discusses Netronome's Agilio server-based networking solutions that use SmartNICs to offload networking functions from server CPUs. This allows more server cores to be used for applications and reduces data center costs. Specifically, it can achieve 5x higher throughput and use 80% less CPU resources compared to legacy server-based networking solutions. Netronome aims to help data center operators innovate more rapidly and lower costs through its intelligent server networking approach.
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
This document discusses data plane evolutions and future implementations. It summarizes a presentation on network virtualization overlays (NVO3) and encapsulation considerations. Programmable silicon that is field upgradable could simplify deployment of future encapsulations. The P4 programming language also aims to accelerate programmability and wider feature deployment in a target independent way. Overall, future data plane implementations require openness, flexibility, and careful consideration to avoid overly complex architectures.
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
This demo/lab will guide you to install and configure FD.io Vector Packet Processing (VPP) on Intel® Architecture (AI) Server. You will also learn to install TRex* on another AI Server to send packets to the VPP, and use some VPP commands to forward packets back to the TRex*.
Speaker: Loc Nguyen. Loc is a Software Application Engineer in Data Center Scale Engineering Team. Loc joined Intel in 2005, and has worked in various projects. Before joining the network group, Loc worked in High-Performance Computing area and supported Intel® Xeon Phi™ Product Family. His interest includes computer graphics, parallel computing, and computer networking.
A Look Inside Google’s Data Center NetworksRyousei Takano
1) Google has been developing their own data center network architectures using merchant silicon switches and centralized network control since 2005 to keep up with increasing bandwidth demands.
2) Their network designs have evolved from Firehose and Watchtower to the current Saturn and Jupiter networks, increasing port speeds from 1/10Gbps to 40/100Gbps and aggregate bandwidth from terabits to petabits per second.
3) Their network architectures employ Clos topologies with merchant silicon switches at the top-of-rack, aggregation, and spine layers and centralized control of traffic routing.
In this deck, Jean-Pierre Panziera from Atos presents: BXI - Bull eXascale Interconnect.
"Exascale entails an explosion of performance, of the number of nodes/cores, of data volume and data movement. At such a scale, optimizing the network that is the backbone of the system becomes a major contributor to global performance. The interconnect is going to be a key enabling technology for exascale systems. This is why one of the cornerstones of Bull’s exascale program is the development of our own new-generation interconnect. The Bull eXascale Interconnect or BXI introduces a paradigm shift in terms of performance, scalability, efficiency, reliability and quality of service for extreme workloads."
Watch the video: http://wp.me/p3RLHQ-gJa
Learn more: https://bull.com/bull-exascale-interconnect/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document provides an overview and agenda for an Intel Ethernet product update presentation. It includes the following:
1) Trends in the Ethernet market showing adoption of higher port speeds like 25GbE and 50GbE in data centers over the next few years.
2) Intel's Ethernet product roadmaps covering upcoming offerings like the XXV710 25GbE adapter and X550 10GbE adapter.
3) Key advantages of Intel Ethernet solutions like reliability, validation, and support to enable new network architectures.
In Network Computing Prototype Using P4 at KSC/KREONET 2019Kentaro Ebisawa
Case Study of P4 applying to CAN (Control Area Network) data pre-processing using FPGA + Netcope P4 compiler.
Presented at KSC / KREONET WORKSHOP 2019 | DAY 1 Session 1: SDN/NFV/P4
http://www.ksc2019.re.kr/
Vector Packet Technologies such as DPDK and FD.io/VPP revolutionized software packet processing initially for discrete appliances and then for NFV use cases. Container based VNF deployments and it's supporting NFV infrastructure is now the new frontier in packet processing and has number of strong advocates among both traditional Comms Service Providers and in the Cloud. This presentation will give an overview of how DPDK and FD.io/VPP project are rising to meet the challenges of the Container dataplane. The discussion will provide an overview of the challenges, recent new features and what is coming soon in this exciting new area for the software dataplane, in both DPDK and FD.io/VPP!
About the speaker: Ray Kinsella has been working on Linux and various other open source technologies for about twenty years. He is recently active in open source communities such as VPP and DPDK but is a constant lurker in many others. He is interested in the software dataplane and optimization, virtualization, operating system design and implementation, communications and networking.
In this deck from ATPESC 2019, Ken Raffenetti from Argonne presents an overview of HPC interconnects.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two-week training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-luc
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this talk, we outline a kernel and upstream centric approach to data plane acceleration using an upstream SmartNIC BPF JIT. This allows extended Berkeley Packet Filter (eBPF) bytecode to be transparently offloaded to the SmartNIC from either the Traffic Control (TC) or Express Data Path (XDP) hooks in the kernel and could be used for applications such as DoS protection, load balancing and software switching e.g., Open vSwitch (OVS). We then follow this by outlining the proposed ICONICS OCP contribution related to an open approach for reconfiguration using directly compiled SmartNIC programs in situations where BPF bytecode alone is not sufficient to accommodate changing semantics in the network.
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
In this deck from the ECSS Symposium, Abe Stern from NVIDIA presents: CUDA-Python and RAPIDS for blazing fast scientific computing.
"We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming."
Watch the video: https://wp.me/p3RLHQ-lvu
Learn more: https://developer.nvidia.com/rapids
and
https://www.xsede.org/for-users/ecss/ecss-symposium
Sign up for our insideHPC Newsletter: http://insidehp.com/newsletter
Summit 16: ARM Mini-Summit - Efficient NFV solutions for Cloud and Edge - CaviumOPNFV
The document discusses efficient NFV solutions using ARM technology for cloud and edge environments. It provides an overview of Cavium's ARM processor roadmap, NFV reference platforms, ecosystem enablement activities, and proof-of-concept demonstrations for centralized and distributed NFV deployments including virtualized baseband units and mobile-CORD.
In this deck from the Performance Optimisation and Productivity group, Lubomir Riha from IT4Innovations presents: Energy Efficient Computing using Dynamic Tuning.
"We now live in a world of power-constrained architectures and systems and power consumption represents a significant cost factor in the overall HPC system economy. For these reasons, in recent years researchers, supercomputing centers and major vendors have developed new tools and methodologies to measure and optimize the energy consumption of large-scale high performance system installations. Due to the link between energy consumption, power consumption and execution time of an application executed by the final user, it is important for these tools and the methodology used to consider all these aspects, empowering the final user and the system administrator with the capability of finding the best configuration given different high level objectives.
This webinar focused on tools designed to improve the energy-efficiency of HPC applications using a methodology of dynamic tuning of HPC applications, developed under the H2020 READEX project. The READEX methodology has been designed for exploiting the dynamic behaviour of software. At design time, different runtime situations (RTS) are detected and optimized system configurations are determined. RTSs with the same configuration are grouped into scenarios, forming the tuning model. At runtime, the tuning model is used to switch system configurations dynamically.
The MERIC tool, that implements the READEX methodology, is presented. It supports manual or binary instrumentation of the analysed applications to simplify the analysis. This instrumentation is used to identify and annotate the significant regions in the HPC application. Automatic binary instrumentation annotates regions with significant runtime. Manual instrumentation, which can be combined with automatic, allows code developer to annotate regions of particular interest."
Watch the video: https://wp.me/p3RLHQ-lJP
Learn more: https://pop-coe.eu/blog/14th-pop-webinar-energy-efficient-computing-using-dynamic-tuning
and
https://code.it4i.cz/vys0053/meric
Sign up for our insideHPC Newsletter: http://insidehpc.com/newslett
Fueling the datasphere how RISC-V enables the storage ecosystemRISC-V International
This document summarizes Seagate's work with RISC-V processors for storage applications. It discusses Seagate's history with custom CPUs and reasons for adopting RISC-V. Seagate has developed two RISC-V cores - a high-performance out-of-order core currently powering a hard drive demonstration, and an area-optimized in-order core for auxiliary workloads. RISC-V allows innovation for real-time processing and security at the edge by enabling domain-specific architectures. The talk promotes collaboration and involvement in the open RISC-V ecosystem.
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
Event: Arm Architecture HPC Workshop by Linaro and HiSilicon
Location: Santa Clara, CA
Speaker: Andrew J Younge
Talk Title: Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing
Talk Desc: The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community. Specifically, there is a need to expand the supercomputing ecosystem by investing and developing emerging, yet-to-be-proven technologies and address both hardware and software challenges together, as well as to prove-out the viability of such novel platforms for production HPC workloads.
The first deployment of the Vanguard program will be Astra, a prototype Petascale Arm supercomputer to be sited at Sandia National Laboratories during 2018. This talk will focus on the arthictecural details of Astra and the significant investments being made towards the maturing the Arm software ecosystem. Furthermore, we will share initial performance results based on our pre-general availability testbed system and outline several planned research activities for the machine.
Bio: Andrew Younge is a R&D Computer Scientist at Sandia National Laboratories with the Scalable System Software group. His research interests include Cloud Computing, Virtualization, Distributed Systems, and energy efficient computing. Andrew has a Ph.D in Computer Science from Indiana University, where he was the Persistent Systems fellow and a member of the FutureGrid project, an NSF-funded experimental cyberinfrastructure test-bed. Over the years, Andrew has held visiting positions at the MITRE Corporation, the University of Southern California / Information Sciences Institute, and the University of Maryland, College Park. He received his Bachelors and Masters of Science from the Computer Science Department at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively.
Data Plane and VNF Acceleration Mini Summit Open-NFP
This event will deliver technical sessions related to the acceleration of server-based networking data planes and VNFs using SmartNIC hardware within the framework of the NFVi and orchestration components in the OPNFV Danube software platform. Virtual switching data planes using Open vSwitch (OVS) and FD.IO (with VPP) will be discussed and offload architectures for such data planes into SmartNIC platforms will be presented. Extension of such data planes for VNF specific requirements utilizing micro-VNF sandbox functions utilizing P4 and/or C language programming will be presented and discussed. Early work defining VNF acceleration architectures and APIs for SmartNIC-accelerated data planes and VNF sandbox extensions will be presented and discussed, followed by a proposal to create a collaborative project to complete the definition of such an API within the frameworks of OPNFV and ETSI. In addition, a proposal for a Pharos community lab related VNF acceleration will be put forward by the Open-NFP organization.
In this deck, Ronald P. Luijten from IBM Research in Zurich presents: DOME 64-bit μDataCenter.
I like to call it a datacenter in a shoebox. With the combination of power and energy efficiency, we believe the microserver will be of interest beyond the DOME project, particularly for cloud data centers and Big Data analytics applications."
The microserver’s team has designed and demonstrated a prototype 64-bit microserver using a PowerPC based chip from Freescale Semiconductor running Linux Fedora and IBM DB2. At 133 × 55 mm2 the microserver contains all of the essential functions of today’s servers, which are 4 to 10 times larger in size. Not only is the microserver compact, it is also very energy-efficient.
Watch the video: http://wp.me/p3RLHQ-gJM
Learn more: https://www.zurich.ibm.com/microserver/
Sign up for our insideHPC Newsletter: http://insideHPC/newsletter
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...KTN
This document provides a summary of a presentation about quantized neural network inference on FPGAs using FINN and LogicNets. It discusses:
- Xilinx Research Labs in Dublin and their work quantifying machine learning applications on Xilinx devices.
- How neural network quantization can improve efficiency by reducing precision while trading off accuracy, and how this is well-suited for FPGAs.
- The FINN toolflow which includes quantization-aware training in PyTorch with Brevitas, the FINN compiler to map networks to hardware, and deployment with PYNQ.
- LogicNets which further improves efficiency by unfolding DNNs into fully pipelined datapath circuits for
NFV-SDN projects provide open source solutions for NFV and SDN. There are several initiatives focused on different components of the NFV architecture including OpenStack Tacker for VNF management, OpenBaton for NFV orchestration, and ETSI OSG proposing an open source MANO implementation. OPNFV is expanding its scope to include all ETSI NFV functional blocks including MANO. Individual open source projects like OpenMANO and Cloudify also provide NFV orchestration functionality.
Enabling accelerated networking - seminar by Enea at the Embedded Conference ...EneaSoftware
The open source revolution brings a wealth of features and functionality at a rapid growth, and industry leaders come together to shape standardized interfaces, protocols, and ways of working. ARM is such a player and drives several collaborative projects under the Linaro (http://www.linaro.org) umbrella. One such initiative is the ODP (http://www.opendataplane.org/) open-source, cross-platform set of application programming interfaces for the networking data plane.
This document discusses Netronome's Agilio server-based networking solutions that use SmartNICs to offload networking functions from server CPUs. This allows more server cores to be used for applications and reduces data center costs. Specifically, it can achieve 5x higher throughput and use 80% less CPU resources compared to legacy server-based networking solutions. Netronome aims to help data center operators innovate more rapidly and lower costs through its intelligent server networking approach.
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
This document discusses data plane evolutions and future implementations. It summarizes a presentation on network virtualization overlays (NVO3) and encapsulation considerations. Programmable silicon that is field upgradable could simplify deployment of future encapsulations. The P4 programming language also aims to accelerate programmability and wider feature deployment in a target independent way. Overall, future data plane implementations require openness, flexibility, and careful consideration to avoid overly complex architectures.
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
This demo/lab will guide you to install and configure FD.io Vector Packet Processing (VPP) on Intel® Architecture (AI) Server. You will also learn to install TRex* on another AI Server to send packets to the VPP, and use some VPP commands to forward packets back to the TRex*.
Speaker: Loc Nguyen. Loc is a Software Application Engineer in Data Center Scale Engineering Team. Loc joined Intel in 2005, and has worked in various projects. Before joining the network group, Loc worked in High-Performance Computing area and supported Intel® Xeon Phi™ Product Family. His interest includes computer graphics, parallel computing, and computer networking.
A Look Inside Google’s Data Center NetworksRyousei Takano
1) Google has been developing their own data center network architectures using merchant silicon switches and centralized network control since 2005 to keep up with increasing bandwidth demands.
2) Their network designs have evolved from Firehose and Watchtower to the current Saturn and Jupiter networks, increasing port speeds from 1/10Gbps to 40/100Gbps and aggregate bandwidth from terabits to petabits per second.
3) Their network architectures employ Clos topologies with merchant silicon switches at the top-of-rack, aggregation, and spine layers and centralized control of traffic routing.
In this deck, Jean-Pierre Panziera from Atos presents: BXI - Bull eXascale Interconnect.
"Exascale entails an explosion of performance, of the number of nodes/cores, of data volume and data movement. At such a scale, optimizing the network that is the backbone of the system becomes a major contributor to global performance. The interconnect is going to be a key enabling technology for exascale systems. This is why one of the cornerstones of Bull’s exascale program is the development of our own new-generation interconnect. The Bull eXascale Interconnect or BXI introduces a paradigm shift in terms of performance, scalability, efficiency, reliability and quality of service for extreme workloads."
Watch the video: http://wp.me/p3RLHQ-gJa
Learn more: https://bull.com/bull-exascale-interconnect/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This document provides an overview and agenda for an Intel Ethernet product update presentation. It includes the following:
1) Trends in the Ethernet market showing adoption of higher port speeds like 25GbE and 50GbE in data centers over the next few years.
2) Intel's Ethernet product roadmaps covering upcoming offerings like the XXV710 25GbE adapter and X550 10GbE adapter.
3) Key advantages of Intel Ethernet solutions like reliability, validation, and support to enable new network architectures.
In Network Computing Prototype Using P4 at KSC/KREONET 2019Kentaro Ebisawa
Case Study of P4 applying to CAN (Control Area Network) data pre-processing using FPGA + Netcope P4 compiler.
Presented at KSC / KREONET WORKSHOP 2019 | DAY 1 Session 1: SDN/NFV/P4
http://www.ksc2019.re.kr/
OSMC 2018 | Why we recommend PMM to our clients by Matthias CrauwelsNETWAYS
As service providers, one of our responsibilities is helping clients understand what causes contributed to a production downtime incident, and how to avoid (as much as possible) them from happening again. We do this with Incident Reports, and one common recommendation we make is to have a historical monitoring system in place. All our clients have point-in-time monitoring solutions in place, solutions that can alert them when a system is down or behaving in unacceptable ways. But historical monitoring is still not common, and we believe a lot of companies can benefit from deploying one of them. In most cases, we have recommended Percona Monitoring and Management (PMM), as a good and Open Source solution for this problem. In this session, we will talk about the reasons why we recommend PMM as a way to prevent incidents, and also to investigate their possible causes when one has happened.
Iperconvergenza come migliora gli economics del tuo ITNetApp
The document describes instructions for connecting audio to an online webinar. It provides three options for connecting audio: calling using a computer, calling a phone number, or having the system call back a provided number. It also includes the webinar title and information about asking questions.
Genomics Deployments - How to Get Right with Software Defined StorageSandeep Patil
This document discusses genomics workloads and the requirements for storage infrastructure to support them. It begins with an introduction to genomics and the growth of the field. It then examines the characteristics of genomic sequencing workloads, including the multi-step process and file-based nature. Key requirements for storage are outlined, such as high throughput, large ingestion of files, and support for POSIX and other access protocols. The document proposes a solution using a software-defined, clustered file system like IBM Spectrum Scale to provide scalable, high performance file storage as a building block of a composable infrastructure for genomics applications. It provides an example architecture and performance results for GATK-based analysis.
IBM Spectrum Protect and IBM Spectrum Protect Plus - What's new! June '18Pawel Maczka
Subject: IBM Spectrum Protect, Plus - What's new! June '18
When: June 7th 10AM ET, 4PM CET
Speakers: Harley Puckett (IBM), Pawel Maczka (Storware)
Harley Puckett is the Program Director for IBM #SpectrumProtect development. He has over 30 years experience in Storage and Data Protection, regularly presenting at client briefings and user conferences. Harley spent 6 ½ years as an Executive Storage Software Consultant and manager of the Client Workshop Program in the Tucson Executive Briefing Center. He was the Solutions Architect for IBM’s Global Archive Solutions Center in Guadalajara Mexico. Prior to a temporary assignment as the technical assistant to the GM of System Storage he spent 9 ½ years as a senior development manager for IBM Tivoli Storage Manager (TSM). He led IBM’s efforts to implement Systems Managed Storage on its internal systems and outsourced accounts. Graduated University of Arizona with BS MIS
Pawel Maczka - CTO, VP at Storware Company. Addicted to Storage and Data Protection Solutions serve in every combination - cloud, hybrid, on premise.More than 10 years work with Enterpise Backup & Recovery platforms. Technology evangelist, always with strong cooperation among IBM Spectrum Storage team.
This document provides instructions for a P4 tutorial being conducted using a virtual machine (VM). It outlines how to download and set up the VM, including logging in and pulling the latest tutorial files. It describes the overall goals of learning the P4 language, tools, and future technology trends through a series of presentations and exercises. Finally, it provides an agenda with topics that will be covered over the course of the tutorial.
Are you ready for NVMe? IBM FlashSystem uses NVMe inside, and is NVMe-ready for use with FCP and Ethernet fabrics. This session explains FC-NVMe and NVMe-OF and how IBM FlashSystem uses NVMe inside.
In the big data world, it's not always easy for Python users to move huge amounts of data around. Apache Arrow defines a common format for data interchange, while Arrow Flight introduced in version 0.11.0, provides a means to move that data efficiently between systems. Arrow Flight is a framework for Arrow-based messaging built with gRPC. It enables data microservices where clients can produce and consume streams of Arrow data to share it over the wire. In this session, I'll give a brief overview of Arrow Flight from a Python perspective, and show that it's easy to build high performance connections when systems can talk Arrow. I'll also cover some ongoing work in using Arrow Flight to connect PySpark with TensorFlow - two systems with great Python APIs but very different underlying internal data.
Container Attached Storage (CAS) with OpenEBS - SDC 2018OpenEBS
The document discusses container attached storage (CAS), which aims to provide storage for containers in a container-native way. CAS is designed to run in containers for containers in user space, using the Kubernetes substrate. It addresses challenges like small working sets, ephemeral storage, and cloud lock-in by keeping data local to workloads and allowing per-workload optimization and migration. The document outlines the CAS design and implementation, including using an input/output container to handle storage IO in user space and leveraging technologies like SPDK, virtio, and Kubernetes custom resources.
Application Modernization with PKS / KubernetesPaul Czarkowski
This document discusses strategies for modernizing applications and replatforming them using Project Kubernetes Service (PKS). It outlines how companies have different options for packaging and running workloads, such as using containers, microservices, serverless functions, and monolithic applications. PKS aims to provide the right runtime for each workload type. The document compares container orchestrators, application platforms, and serverless functions, noting that PKS aims to push workloads higher in the platform hierarchy for more flexibility and less enforcement of standards while lowering development complexity and improving operational efficiency. It provides recommendations for getting started with migrating workloads to PKS, such as lifting and shifting applications with minimal modernization, leveraging platform capabilities, and fully modernizing
Persistent Memory Programming: The Current State of the Ecosysteminside-BigData.com
In this presentation, Andy Rudoff from Intel reports on the latest developments around persistent memory programming. He’ll describing current discussions in the SNIA NVM Programming Technical Work Group, the current state of operating system support, recent tool and library development, and finally he’ll describe some of the upcoming challenges for high performance persistent memory use.
Watch the video: https://wp.me/p3RLHQ-gUP
Learn more: http://storageconference.us/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Backend Cloud Storage Access in Video StreamingRufael Mekuria
Presentation about optimizing access of backend storage in cloud video streaming deployments, given at Packet Video 2018 in Amsterdam the Netherlands, joint work with Christina Kylili TU Delft
Developer insight into why applications run amazingly Fast in CF 2018Pavan Kumar
One of the Release goals of ColdFusion 2018 is improve the out of the box performance of the server to an extent that it becomes the best performing CFML engine out there. This talk delves into the Overall strategy adopted in going about measuring & improving performance. The Design challenges we confronted & resolved . The optimization we have done across various CFML constructs . We shall also delve into how a developer can leverage server features and configuration to further improve application performance. We shall also discuss and share the our performance metrics collected across various applications. In spite of having a high performing server one can still face issues so help on that we shall also demonstrate how developers can go about cracking performance bottle necks faced in their applications using tools available.
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
The document discusses optimizing Spark machine learning pipelines. It describes using parallel model evaluation to speed up hyperparameter tuning by training multiple models simultaneously. This reduces the time spent on cross-validation for hyperparameter selection. The document also discusses optimizing tuning for pipeline models by treating the pipeline as a directed acyclic graph and parallelizing the fitting in breadth-first order to avoid duplicating work where possible.
The document summarizes a POC conducted using an Oracle Exadata X7-2 system with Oracle VM (OVM) to evaluate performance against an existing IBM P8 system. The POC involved loading an 18TB database onto different Exadata configurations with varying numbers of vCPUs. Initial loads took 48 hours on Exadata compared to over 54 hours on IBM. Exadata achieved a 2x performance increase with 36 vCPUs and low CPU usage, while IBM achieved a 4x increase but required 14 cores and setting optimizer features to an older version.
FabricPool allows automatic tiering of inactive data blocks from performance storage to object storage in the cloud. The presentation covered the key capabilities and use cases of FabricPool, including tiering of primary and secondary data. New features in recent ONTAP releases were also discussed, such as an auto tiering policy, support for additional object storage providers, and client-side encryption.
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB
This document discusses how NetApp solutions can help businesses bridge their MongoDB databases across on-premises and cloud environments. It provides an introduction to NetApp and describes how their storage solutions and data fabric can enable hybrid cloud for MongoDB. Specific solutions and technologies discussed include NetApp ONTAP for storage management and provisioning, FlexClone for development/testing, and SolidFire for high performance MongoDB deployments. Customer examples and performance benefits are also summarized.
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
Nesta palestra vamos abordar algumas das tendências em Inteligência Artificial e as dificuldades na uso da Inteligência Artificial. Por isso, também apresentaremos algumas ferramentas disponíveis em código livre que podem ajudar a simplificar a adoção da IA. E faremos uma breve introdução ao “Call for Code” que é uma iniciativa da IBM para construir soluções na prevenção e reação a desastres naturais.
This document discusses Cloud4Media's IMF package management tools, which allow users to view and manage IMF packages. It describes the tools' abilities to parse metadata from AssetMaps, CPLs, and OPLs; generate previews; and integrate with Cloud4Media's workflow manager to orchestrate tasks like encoding and delivery. A demo is proposed to showcase how the IMF package explorer, proxy generator, and IMF web viewer allow browsing and interacting with IMF assets, and how the workflow manager can initiate processes based on package metadata.
Similar to Challenges for Implementing PMEM Aware Application with PMDK (20)
We have published a document, "A Global Data Infrastructure for Data Sharing Between Businesses".
This document introduces the current trends toward the implementation of digital management tools that support cross border data sharing between businesses, which will be indispensable for future business transformations and pandemic responses. Today we find ourselves at the confluence of multiple evolving global trends. These include the emergence of new data driven business models, the expansion of B2B platform business, the accelerating pace of digital transformation, the growing expectations for the fulfillment of Sustainable Development Goals (SDGs) and other social needs, the rise of New Glocalism, the growth of stakeholder capitalism, and the Great Reset. In this article, we discuss the challenges of establishing a global data infrastructure for data sharing between businesses as a key ICT infrastructure for the construction of a next generation society, and the efforts that are being made to address these challenges.
NTT Laboratories
J. Arai, S. Yagi, H. Uchiyama, T. Honjo, T. Inagaki, K. Inaba, T. Ikuta, H. Takesue, K. Horikawa
This material is a poster exhibited at the ITBL community booth in SC19 (The International Conference for High Performance Computing, Networking, Storage, and Analysis 2019).
NTT Software Innovation Center
Hiroki Miura, Kota Tsuyuzaki, Junya Arai, Kohei Yamaguchi, Kengo Okitsu, Shinji Morishita
This material is a poster exhibited at the ITBL community booth in SC19 (The International Conference for High Performance Computing, Networking, Storage, and Analysis 2019).
NTT is developing a hybrid sourcing approach to address Japan's projected shortage of 430,000 IT engineers by 2025, known as the "Digital Cliff 2025". Their approach combines crowdsourcing, using platforms like Topcoder, with innersourcing by decomposing projects into microtasks that can be completed by both internal and external workers. In a case study, they developed a B2B application using this hybrid model, with crowdsourced and innersourced workers completing 49% and 39% of the code respectively. They aim to create a framework to promote this hybrid sourcing approach within NTT to help organizations overcome skills shortages and achieve digital transformations.
1) The document proposes a method for layer-level pruning of ResNet models to reduce computation costs during inference.
2) It introduces weights to Residual Units to determine their importance, allowing less important units to be erased. Units with small absolute weight values on their nonlinear maps can be erased with little impact.
3) The method repeats training and erasing layers based on unit importance. It erases layers after training and retrains, iteratively erasing more layers until accuracy drops, to prune the model while maintaining performance.
Edge computing solves issues with IoT deployment like data privacy and volume. It also allows companies to gain valuable customer and product data rather than relying on web giants. For CIOs, edge computing influences strategies around data infrastructure, organization, and IT architecture - shifting from offline to real-time analytics, human-readable to machine formats, and app-centric to data-centric designs.
BuildKit is a next-generation build system that provides efficient caching, multi-stage builds, and secure access to private assets without requiring root privileges. It can be deployed on Kubernetes using a DaemonSet or StatefulSet for caching benefits. Build definitions can be provided via Dockerfiles, Buildpacks, or CRDs like Tekton to build images on Kube nodes and push to a remote registry. Consistent hashing with StatefulSets ensures builds always hit the fastest daemon-local cache.
The document discusses utilizing spatiotemporal data from IoT devices in Redis. It proposes using a technique called "ST-coding" to encode location and timestamp data into a single code. This addresses two problems: 1) ST range queries were slow due to searching many keys; and 2) data insertion was inefficient due to load concentration on a single Redis server. By splitting the ST-code into a "PRE-code" and "SUF-code", ST range queries can be performed on a single key, avoiding use of the slow KEYS command. This improves query performance and distributes load across Redis servers.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
When it is all about ERP solutions, companies typically meet their needs with common ERP solutions like SAP, Oracle, and Microsoft Dynamics. These big players have demonstrated that ERP systems can be either simple or highly comprehensive. This remains true today, but there are new factors to consider, including a promising new contender in the market that’s Odoo. This blog compares Odoo ERP with traditional ERP systems and explains why many companies now see Odoo ERP as the best choice.
What are ERP Systems?
An ERP, or Enterprise Resource Planning, system provides your company with valuable information to help you make better decisions and boost your ROI. You should choose an ERP system based on your company’s specific needs. For instance, if you run a manufacturing or retail business, you will need an ERP system that efficiently manages inventory. A consulting firm, on the other hand, would benefit from an ERP system that enhances daily operations. Similarly, eCommerce stores would select an ERP system tailored to their needs.
Because different businesses have different requirements, ERP system functionalities can vary. Among the various ERP systems available, Odoo ERP is considered one of the best in the ERp market with more than 12 million global users today.
Odoo is an open-source ERP system initially designed for small to medium-sized businesses but now suitable for a wide range of companies. Odoo offers a scalable and configurable point-of-sale management solution and allows you to create customised modules for specific industries. Odoo is gaining more popularity because it is built in a way that allows easy customisation, has a user-friendly interface, and is affordable. Here, you will cover the main differences and get to know why Odoo is gaining attention despite the many other ERP systems available in the market.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Good afternoon, everyone. Thank you for your participating to this presentation.
My name is Yoshimi Ichiyanagi. I’m a open-source software engineer at NTT labs in Japan.
I came from Japan a few days ago. So I have jet lag.
I'm very happy to speak to you today.
I’d like to talk about the challenges for implementing PMEM aware application with PMDK.
The purpose of my presentation is to share know-how gained by rewriting PMEM-aware application.
This is an overview of my talk.
Firstly, my introduction; secondly, our background and our motivation; thirdly, Challenges for implementing PMEM-aware applications; and finally, Challenges for performance evaluation to get valid results.
First of all, let me introduce myself.
I have worked at Software Innovation Center of NTT labs for twelve years.
NTT is a Japanese telecommunication company.
NTT Software Innovation Center is undertaking promotion of open innovation centering on the development of open source platforms.
There are many committers and developers of open source software, such as Openstack, Swift, and apache projects in NTT labs.
I worked on system software, which are distributed filesystem such as HDFS, and operating system such as Linux kernel.
Now, I work on storage applications.
Today I’d like to talk about rewriting open source software using a new storage and a new library.
Next, I’d like to talk about our research background and motivation.
Persistent Memory such as NVDIMM-N and Intel Optane DC Persistent Memory, begins to be supplied.
PMEM has several features. Memory-like features of PMEM are Low-latency and Byte-addressable, and Storage-like features are large-capacity and non-volatile.
We try to rewrite storage applications since PMEM features are utilized. I think storage applications are RDBMS such as PostgreSQL, Message queue system such as Apache Kafka and so on.
Let me share my challenges to rewrite the PMEM-aware applications and evaluate their performance.
Next, I’d like to talk about how to use PMEM.
We used written hardware and software.
There are 3 components, which are NVDIMM-N, Direct-Access for files, and Persistent Memory Developer Kid, PMDK.
DAX enabled FS and PMDK are already supported on Linux and Windows.
My previous speaker have talked about them. I think you know them.
But I’m going to explain DAX FS and PMDK just in case someone doesn’t know them.
I think 3 pattern of I/O stacks.
First, the left figure shows the current I/O stack.
Many applications uses File I/O APIs such as read and write system calls.
Second, the center figure shows DAX FS.
DAX filesystem is the filesystem which allows applications direct access to persistent memory without page cache. So, it can run faster than before, since unnecessary copies are removed.
But context switch is needed between user and kernel spaces. So context switch becomes overhead.
[ポインタを動かして、スライドさせる]
Finally, the right figure shows how to access PMEM with DAX FS and PMDK.
DAX filesystem also provides memory-mapped file feature.
DAX FS maps the file on PMEM directly into the virtual address space of the application.
The application can use CPU instructions in order to access the file data without context switches so it can run very very faster than before.
In this way, PMDK provides primitive memory functions such as [CPU cache flush or memory barrier] to make sure that the data reaches PMEM.
So, by using PMDK, we don’t need to write such functions by ourselves. [end]
The biggest benefit of DAX-enabled FS is that it is not necessary to rewrite applications.
The memory-mapped file and PMDK can improve the performance of I/O-intensive workload, because it reduces context switches and the overhead of API calls.
But PMDK cannot use without change of the application. So I rewrote PostgreSQL and apache Kafka in order to improve the application performance with PMDK.
I think there are 3 features of PMDK.
First, as I said, an application accesses PMEM as memory-mapped I/O.
Second, an application developers can select fine-grained sync size. For details, I’ll show you it in next slides.
Finally, CPU instructions suitable for copy data size are selected.
1. For example, when CPUs of the server supports SSE2, data is copied with 16(sixteen) bytes registers as possible.
2. When CPUs of the server supports AVX, data is copied with 32(thirty-two) bytes registers as possible.
3. And if recent CPUs is working on the server, data is copied with 64 bytes register as possible.
Furthermore, CPU instructions, “MOVNT(move non-temporal instructions) and SFENCE” store data to memory, without CPU caches.
Next, I explain on the second feature of PMDK.
The second feature is that PMDK provides some sync APIs. In particular, I’ll talk about pmem_msync() and pmem_drain(). Both are PMDK APIs.
The key difference between pmem_msync() and pmem_drain() is what kind of data is flushed.
pmem_msync() calls msync syscall.
The most general sync function for memory-mapped file is msync syscall. By calling msync syscall, the file metadata and written file data is flushed.
On the other hand, By calling pmem_drain(), Only written file data is flushed.
So, pmem_drain() is faster than pmem_msync().
We try to rewrite PostgreSQL and Apache Kafka as storage applications in order to be improved I/O performance with PMDK.
Let me share know-how gained by rewriting PostgreSQL.
We looked into PostgreSQL to find out what kind of files was used in it.
We chose 2 points, which are checkpoint files and WAL files.
Checkpoint file was chosen because many writes occurred during checkpoint.
WAL files was chosen because it was critical for transaction performance.
I’ll talk about challenges for implementing PostgreSQL.
But before that, I’d like to talk about how to rewrite PostgreSQL.
We replaced standard file I/O as some syscalls with mmap I/O as PMDK APIs simply.
As I said, we chose two points. That’s checkpoint files and WAL files.
First, I’ll show you how to hack checkpoint files.
On PostgreSQL, the huge table and so forth consist of multiple checkpoint files.
The size of checkpoint files is up to 1GB. So it’s necessary to resizing the checkpoint file.
PMDK provides APIs for memory-mapped file. And it’s difficult to resize memory-mapped file without overhead. I think that best practice is to access only fixed-size file. So, we changed how to access only 1GB checkpoint file
Then, what’s this overhead?
It’s to remap a file. I’d like to talk about how to resize the memory-mapped file.
When the memory-mapped file is enlarged, only part of the file can be accessed with PMDK.
This(指す, other) range can’t be accessed with PMDK. So remapping the file is needed.
The file is remapped, so that whole file can be accessed with PMDK.
Next, I’ll show you how to implement resizing the memory-mapped file.
Here, it’s how to enlarge a file.
On DAX-enabled FS, open syscall and close syscall are called.
Then, the largest data offset become file size. So, application developers don’t need to write ftruncate syscall in source code.
On the other hand, on DAX-enabled FS with PMDK, pmem_map_file(), pmem_unmap(), truncate syscall, pmam_map_file(), and pmem_unmap() are called.
3 function calls are added to use PMDK. 3 function become overhead.
Here, it’s how to shrink a file.
On DAX-enabled FS, open syscall, ftruncate syscall, and close syscall are called.
On the other hand, on DAX-enabled FS with PMDK, pmem_map_file(), pmem_unmap(), truncate syscall, pmam_map_file(), and pmem_unmap() are called. 2 function calls are added to use PMDK. 2 functions become overhead.
Then, if data is written to out of range, Segment fault happen.
I think when the file is remapped, it’s necessary to manage this mapped address. Of course, while the file is unmapped, the file can’t accessed with PMDK. I think the lock functions are needed, and the application performance reduces.
In these cases, it would be better to use only DAX FS. [end]
So, it’s difficult to use PMDK unless file size is fixed. By using PMDK, munmap(), close(), open(), and mmap() syscall is called again.
Repeating remapping many times degrades performance, because remapping file has large overhead. Or mapping large files may make file system full.
I think that best practice is to use only fixed-size file.
Next, I’ll show you how to hack WAL files.
The size of WAL files is fixed. This fixed-size file is highly suitable for memory-mapped I/O, because it is not necessary to either enlarge or shrink them.
For initialization of the WAL file. PostgreSQL create the WAL file and fill the file with zero. PostgreSQL flush file metadata at the end of initialization. This file metadata includes the size of file, the indirect node of inode, and so on.
On the other hand, synchronous logging is sequential synchronous writes. It’s necessary to flush only written data.
PMDK provides some sync APIs, for example pmem_msync() and pmem_drain().
Which sync function is better for initialization of the WAL file? And which sync function is better for "synchronous logging"?
It would be better to use pmem_msync() for initialization of the WAL file, because the file metadata should be flushed.
On the other hand, either’s fine for synchronous logging. But pmem_drain() is faster than pmem_msync(). So We selected pmem_drain() for synchronous logging.
Next, I’d like to talk about comparing the performance between pmem_msync() and pmem_drain().
I ran micro-benchmark to compare the performance between pmem_msync() and pmem_drain(). This emulates the WAL file I/O inside PostgreSQL. I measured only synchronous logging.
This is the detail of micro-benchmark. I replaced write syscall with the PMDK API as pmem_memcpy_nodrain(). And I replaced fdatasync syscall with the PMDK API, which is pmem_msync or pmem_nodrain.
This is the evaluation setup.
I used one HPE computer server with two NUMA nodes.
I ran the micro-benchmark on Node 1, because there was PMEM on Node 1.
The total size of the written data is 10GB and the block size is 8KB.
The micro-benchmark I/O throughput on DAX-enabled FS is 4.3 GB/sec, the throughput with pmem_msync() is 0.0023 GB/sec, and the throughput with pmem_drain() is 15GB/sec.
pmem_drain() is fastest of 3 patterns as expected. [end]
pmem_drain() greatly improves performance of I/O-intensive workload.
But you should use pmem_drain() with caution.
pmem_drain() can’t flush file metadata. So pmem_msync() should be called by application that uses file metadata such as time of last modification, time of last access, and so on.
In addition, pmem_drain() doesn’t work without pmem_memcpy_nodrain().
Now, I'd like to move on to the next topic, which is challenges for performance evaluation to get valid results.
It’s difficult to get valid results in PG performance evaluation.
The results such as application performance depends on NUMA and CPUs. In order to get valid results, it is better to avoid NUMA effects and CPUs becoming hotspots. It’s necessary to tuning an application for PMEM.
I’d like to talk about how to evaluate the performance to get the valid result.
Now, I’d like to talk about NUMA effect.
Would you please look at this(指す) figure, which is our evaluation setup. For the CPU on node 1, accessing to local memory on node 1 is fast, while remote node 0 is slow.
This also applies to PCIe SSD, but persistent memory is more sensitive.
I ran the micro-benchmark on local NUMA Node 1 and I also did on the remote Node 0.
This benchmark is the same as before.
The I/O throughput on the local node is 15GB/sec and one on the remote node is 11GB/sec.
Synchronous write is about 1.5 times faster on local NUMA node than on remote NUMA node.
Finally, I’d like to talk about tuning an application for persistent memory.
It’s Important to avoid calculation processing becoming hotspot. For example, it’s SQL processing in PostgreSQL.
Stored Procedure improves the PostgreSQL performance, since user-defined functions are pre-compiled and stored in the PostgreSQL server.
For that purpose, we used pgbench command with prepared option, which is this(指す) command line.
I ran pgbench client with “the prepared option”, and I also did without “the prepared option”. I compared them.
This pgbench is a popular PostgreSQL benchmarking tools used by many developers to do quick performance test.
I ran this PostgreSQL server on Node 1 and I ran the client on Node 0.
I wrote patches for PMEM-aware PostgreSQL, it’s available on pgsql-hackers mailing list.
The improvement ratio using PMEM is improved by 12% compared with using SSD.
On intel optane SSD, in original PostgreSQL, pgbench result with Stored Procedure is 18,396 tps, and one without Stored Procedure is 29,125 tps.
The result with Stored Procedure is 1.58 times faster than without Stored Procedure.
On Persistent memory, in PMEM-aware PostgreSQL, the pgbench result with Stored Procedure is 21,406 tps and one without Stored Procedure is 36,449 tps.
The result with Stored Procedure is 1.70 times faster than without Stored Procedure.
Improvement ratio using PMEM is higher than using intel Optane SSD as expected.
I’d like to give you a quick summary of what we’ve seen today.
The topics that were covered today were how to implement a PMEM-aware application and how to evaluate it to get valid results.
By applying PMDK into PostgreSQL, I understood it was difficult to implement PMEM-aware applications and to get valid results in performance evaluation.
I hope that my presentation answered your questions.
Thank you so much for your kind attention. Now, are there any questions?[end]
Q&A
That’s a good question. Or Thank you for your question.
Could you speak a little slower, please? Or Could you repeat the question?
わかりましたか?
Does it make a sense? Or Do you have any questions so far?
ほとんどの人はOptane SSDで満足できる。低遅延が必要なシステムは、PMEMが必要である。
Most people are satisfied with the use of Optane SSD. The systems that require low latency require Optane memory.
AとBの違いはなんですか?
What is the difference between A and B?
If not, thank you very much.
DAX FS tries to map 2M linear space of PMEM to each file as possible.
Linux does not know the virtual address where the application has written data. And Linux manages memory on a page basis.
Linux makes a page dirty when a write page fault occurs. Then DAX FS use huge page as possible. So, when msync syscalls is called, 2MB dirty page and file metadata is flushed. It has huge overhead.
I rewrote PG using this way. It has huge overhead.
So, my coworker, Menjo is developing this. I guess this will be published next year, maybe.