This document provides a historical overview of the evolution of FPGA technology and programming approaches over several decades. It discusses early theoretical foundations in the 1930s-40s and the development of integrated circuits, hardware description languages, and high-level synthesis tools from the 1950s onwards. More recently, it describes the rise of heterogeneous computing using GPUs, FPGAs and other accelerators, and the ongoing challenges around programming such systems at a suitable level of abstraction.
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
Huawei outlines requirements for developing a competitive ARM-based HPC solution. They plan a two-phase strategy using existing Hi1616 platforms followed by more powerful Hi1620 platforms. Requirements include high-performance CPUs, optimized software stack, support for applications and ISVs, and cloud deployment. Huawei aims to demonstrate ARM's value in HPC by 2018-2020 through partnerships and turnkey solutions.
Session ID: HKG18-312
Session Name: HKG18-312 - CMSIS-NN
Speaker: Rod Crawford
Track: LEG;Ecosystem Day
★ Session Summary ★
Deep learning algorithms are gaining popularity in IoT edge devices, because of their human-level accuracy in many applications, such as image classification and speech recognition. There is an increasing interest in deploying neural networks (NN) on always-on systems such as Arm Cortex-M microcontrollers. In this session, we introduce the challenges of deploying neural networks on microcontrollers with limited memory/compute resources and power budget. We introduce CMSIS-NN, optimized software kernels to enable deployment of neural networks on Cortex-M cores. We further present NN architecture exploration, using keyword spotting application as an example, to develop light-weight models suitable for resource constrained systems.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-312/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-312.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-312.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: LEG;Ecosystem Day
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
- The document discusses programming models and challenges for exascale systems. It focuses on MPI and PGAS models like OpenSHMEM.
- Key challenges include supporting hybrid MPI+PGAS programming, efficient communication for multi-core and accelerator nodes, fault tolerance, and extreme low memory usage.
- The MVAPICH2 project aims to address these challenges through its high performance MPI and PGAS implementation and optimization of communication for technologies like InfiniBand.
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
Presentation by Brocade Chief Scientist and Fellow, David Meyer, given at Orange Gardens July 2016. What is Machine Learning and what is all the excitement about?
An associated blog is available here: http://community.brocade.com/t5/CTO-Corner/Networking-Meets-Artificial-Intelligence-A-Glimpse-into-the-Very/ba-p/88196
Sean Hefty from Intel presented this deck at the OpenFabrics Workshop.
“With its initial release two years ago, libfabric advanced the state of fabric software interfaces. One of the promises of OFI was extensibility: adapting to increased demands of fabric services from applications. This session explores the first major enhancements to the libfabric API in response to user demands and learnings.”
Watch the video: http://insidehpc.com/2017/04/video-advancing-open-fabrics-interfaces/
Learn more: http://hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
Huawei outlines requirements for developing a competitive ARM-based HPC solution. They plan a two-phase strategy using existing Hi1616 platforms followed by more powerful Hi1620 platforms. Requirements include high-performance CPUs, optimized software stack, support for applications and ISVs, and cloud deployment. Huawei aims to demonstrate ARM's value in HPC by 2018-2020 through partnerships and turnkey solutions.
Session ID: HKG18-312
Session Name: HKG18-312 - CMSIS-NN
Speaker: Rod Crawford
Track: LEG;Ecosystem Day
★ Session Summary ★
Deep learning algorithms are gaining popularity in IoT edge devices, because of their human-level accuracy in many applications, such as image classification and speech recognition. There is an increasing interest in deploying neural networks (NN) on always-on systems such as Arm Cortex-M microcontrollers. In this session, we introduce the challenges of deploying neural networks on microcontrollers with limited memory/compute resources and power budget. We introduce CMSIS-NN, optimized software kernels to enable deployment of neural networks on Cortex-M cores. We further present NN architecture exploration, using keyword spotting application as an example, to develop light-weight models suitable for resource constrained systems.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-312/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-312.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-312.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: LEG;Ecosystem Day
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
- The document discusses programming models and challenges for exascale systems. It focuses on MPI and PGAS models like OpenSHMEM.
- Key challenges include supporting hybrid MPI+PGAS programming, efficient communication for multi-core and accelerator nodes, fault tolerance, and extreme low memory usage.
- The MVAPICH2 project aims to address these challenges through its high performance MPI and PGAS implementation and optimization of communication for technologies like InfiniBand.
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
Presentation by Brocade Chief Scientist and Fellow, David Meyer, given at Orange Gardens July 2016. What is Machine Learning and what is all the excitement about?
An associated blog is available here: http://community.brocade.com/t5/CTO-Corner/Networking-Meets-Artificial-Intelligence-A-Glimpse-into-the-Very/ba-p/88196
Sean Hefty from Intel presented this deck at the OpenFabrics Workshop.
“With its initial release two years ago, libfabric advanced the state of fabric software interfaces. One of the promises of OFI was extensibility: adapting to increased demands of fabric services from applications. This session explores the first major enhancements to the libfabric API in response to user demands and learnings.”
Watch the video: http://insidehpc.com/2017/04/video-advancing-open-fabrics-interfaces/
Learn more: http://hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the University of Houston CACDS HPC Workshop, Jeff Larkin from Nvidia presents: The Past, Present, and Future of OpenACC.
"OpenACC is an open specification for programming accelerators with compiler directives. It aims to provide a simple path for accelerating existing applications for a wide range of devices in a performance portable way. This talk with discuss the history and goals of OpenACC, how it is being used today, and what challenges it will address in the future."
Watch the video presentation: http://wp.me/p3RLHQ-dTm
Edge Computing: A Unified Infrastructure for all the Different PiecesCloudify Community
Edge Computing along with 5G promises to revolutionize customer experience with immersive applications that we can only imagine at this point. The edge will include PNFs, VNFs, and mobile-edge applications; requiring containers, virtual machines and bare-metal compute. But while edge computing promises numerous new revenue streams, managing and orchestrating these edge infrastructure environments is not going to be a seamless, instant process. In this webinar, experts in NFV orchestration discuss the concerns you must address in the transition to the edge, and show how you can use available open source tools to create a single management environment for PNFs, VNFs, and mobile-edge applications.
In this deck from the 2017 MVAPICH User Group, DK Panda from Ohio State University presents: Overview of the MVAPICH Project and Future Roadmap.
"This talk will provide an overview of the MVAPICH project (past, present and future). Future roadmap and features for upcoming releases of the MVAPICH2 software family (including MVAPICH2-X, MVAPICH2-GDR, MVAPICH2-Virt, MVAPICH2-EA and MVAPICH2-MIC) will be presented. Current status and future plans for OSU INAM, OEMT and OMB will also be presented."
Watch the video: https://www.youtube.com/watch?v=wF7t-oH7wi4
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Macromolecular crystallography is an experimental technique allowing to explore 3D atomic structure of proteins, used by academics for research in biology and by pharmaceutical companies in rational drug design. While up to now development of the technique was limited by scientific instruments performance, recently computing performance becomes a key limitation. In my presentation I will present a computing challenge to handle 18 GB/s data stream coming from the new X-ray detector. I will show PSI experiences in applying conventional hardware for the task and why this attempt failed. I will then present how IC 922 server with OpenCAPI enabled FPGA boards allowed to build a sustainable and scalable solution for high speed data acquisition. Finally, I will give a perspective, how the advancement in hardware development will enable better science by users of the Swiss Light Source.
This document summarizes a presentation given at the London Open Source Meetup for RISC-V on April 19, 2021. The presentation introduced the RISC-V Online Tutor, an online course for learning RISC-V fundamentals from digital logic to C programming. It provided an overview of the course structure and lessons, which take students through RISC-V assembly, processor design, and application development. It also demonstrated the online learning platform and its ability to interact with remote FPGA hardware during lessons. The goal is to invite community participation and collaboration to further develop the Online Tutor.
This document lists several SDN projects, controllers, simulators, and research areas. It describes Python and C++ based OpenFlow controllers like SNAC and HELIOS. It also lists simulators that support SDN including POX with GEPHI for SDN virtualization and OpenFlow VM simulation. Finally, it outlines significant research areas for SDN PhD projects such as software-defined wireless networking and software-defined mobile networks.
This document discusses how HPC infrastructure is being transformed with AI. It summarizes that cognitive systems use distributed deep learning across HPC clusters to speed up training times. It also outlines IBM's hardware portfolio expansion for AI training, inference, and storage capabilities. The document discusses software stacks for AI like Watson Machine Learning Community Edition that use containers and universal base images to simplify deployment.
Challenges and Opportunities for HPC Interconnects and MPIinside-BigData.com
In this video from the 2017 MVAPICH User Group, Ron Brightwell from Sandia presents: Challenges and Opportunities for HPC Interconnects and MPI.
"This talk will reflect on prior analysis of the challenges facing high-performance interconnect technologies intended to support extreme-scale scientific computing systems, how some of these challenges have been addressed, and what new challenges lay ahead. Many of these challenges can be attributed to the complexity created by hardware diversity, which has a direct impact on interconnect technology, but new challenges are also arising indirectly as reactions to other aspects of high-performance computing, such as alternative parallel programming models and more complex system usage models. We will describe some near-term research on proposed extensions to MPI to better support massive multithreading and implementation optimizations aimed at reducing the overhead of MPI tag matching. We will also describe a new portable programming model to offload simple packet processing functions to a network interface that is based on the current Portals data movement layer. We believe this capability will offer significant performance improvements to applications and services relevant to high-performance computing as well as data analytics."
Watch the video: https://wp.me/p3RLHQ-hhK
Learn more: http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This presentation covers various partners and collaborators who are currently working with OpenPOWER foundation ,Use cases of OpenPOWER systems in multiple Industries , OpenPOWER Workgroups and OpenCAPI features .
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: Options and Trade-offs" tutorial at the May 2018 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to rapidly evolve. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT. Inferencing engines are being fed via neural network file formats such as NNEF and ONNX. Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others, such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Trevett presents the current landscape of APIs, file formats and SDKs for inferencing and vision acceleration, explaining where each one fits in the development flow. Trevett also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
Shrilesh Kathe has experience developing ETL solutions using Apache Spark, Python, and Hive. He has worked as an offshore ETL developer for Nike implementing their WEST BI CWA system using Spark SQL, Python, and Hive. Some of his responsibilities included developing a generic ETL solution to capture web traffic data, writing PySpark applications to parse raw data and store it in partitioned tables, and facilitating XML parsing and logical functions in Python. He also has a Bachelor's degree in Electronics and Telecommunication and certifications in Apache Spark and Java.
The document discusses porting OpenJ9 JDK to RISC-V architecture. It involves preparing the software toolchain for cross-compilation to RISC-V, preparing hardware like the HiFive Unleashed development board, and developing OpenJ9 JDK through a mix of local and cross compilation. The status shows OpenJ9 JDK can execute in interpreter mode on the RISC-V emulator and HiFive board running Debian, with future work planned on JIT support, different GC strategies, and supporting other Java versions.
"Session ID: BUD17-503
Session Name: The HPE Machine and Gen-Z - BUD17-503
Speaker:
Grant Likely
Track:
★ Session Summary ★
With the exponential rise in quantity of data to manage, the modern data centre is increasingly limited by the capacity of individual machines. Since storage and compute demand more capacity than can be provided by a single machine, we distribute both over large clusters and use the network to transfer data between where it is stored and where it is processed. Moving all that data around uses deep storage stacks which incur a significant performance impact. If we could somehow flatten the storage stack and provide applications with direct access to data, then we could improve performance by orders of magnitude.
Hewlett Packard Enterprise recently demonstrated that we can do exactly with their research project, ""The Machine"". Instead of moving data around with a network, The Machine uses multi terabytes of persistent memory and a next generation fabric-attached memory interconnect to provide a single pool of storage which can be accessed by any processor in the cluster. It shows that we can provide applications with immediate load/store access to huge data sets in a model called Memory-Driven Computing.
Proof in hand, now it is time to bring Memory-Defined Computing to the data centre. Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct attached, switched or fabric topologies. HPE has joined the Gen-Z consortium and is using the knowledge gained with The Machine to help shape Gen-Z to set the stage for true Memory-Driven Computing. With putting memory at the centre, this enables us to overcome the limitations of today's computing systems and power innovations.
This session will cover two topics. It will start with a status update on The Machine and an overview of how it works. Then we'll shift into an introduction of Gen-Z, and how it can reshape the architecture of computing in the years to come.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-503/
Presentation:
Video: https://youtu.be/1BVtChDQVyQ
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
Keyword: HPE, Gen-Z
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/mathworks/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-venkataramani
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Avinash Nehemiah, Product Marketing Manager for Computer Vision, and Girish Venkataramani, Product Development Manager, both of MathWorks, presents the "Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded GPUs" tutorial at the May 2017 Embedded Vision Summit.
In this presentation, you'll learn how to adopt a MATLAB-centric workflow to design, verify and deploy your computer vision and deep learning applications onto embedded NVIDIA Tegra-based platforms including Jetson TK1/TX1 and DrivePX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease-of-use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB.
Next, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. The workflow affords on-board real-time prototyping and verification controlled through MATLAB. Examples of common computer vision algorithms and deep learning networks are used to describe this workflow, and their performance benchmarks are presented.
Codasip is a leading provider of RISC-V processor IP and design tools. They introduced the first licensable RISC-V processor in 2015. This presentation discusses Codasip's portfolio of application class RISC-V processors, including their single-core and multiprocessor options. It also describes Codasip Studio, their tool for customizing RISC-V processors, and the P extension for digital signal processing applications.
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
ML solutions in production start from data ingestion and extend upto the actual deployment step. We want this workflow to be scalable, portable and simple. Containers and kubernetes are great at the former two but not the latter if you aren't a devops practitioner. We'll explore how you can leverage the Kubeflow project to deploy best-of-breed open-source systems for ML to diverse infrastructures.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: An Industry Overview of Options and Trade-offs" tutorial at the May 2019 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to evolve rapidly. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT, being fed by neural network file formats such as NNEF and ONNX.
Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project? Trevett answers these and other questions in this presentation.
High Performance Computing (HPC)
The HPC SIG was officially launched at Linaro Connect Las Vegas in September 2016 to drive the adoption of ARM in HPC through the creation of a data center ecosystem. It is a collaborative project comprised of members and an advisory board. Current members include ARM, HiSilicon, Qualcomm, Fujitsu, Cavium, Red Hat and HPE. CERN and Riken are on the advisory board.
https://www.linaro.org/sig/hpc/
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
The document discusses the history and future of high performance computing (HPC). It outlines the key technologies and architectures that have enabled exponential increases in computational power over recent decades. These include vector processing, parallelization, GPUs, and interconnects like Infiniband. The document also examines emerging technologies like exascale computing and quantum computing that could further push the boundaries of HPC. Overall, the document argues that HPC will remain indispensable for scientific discovery and engineering innovation into the future.
In this deck from the University of Houston CACDS HPC Workshop, Jeff Larkin from Nvidia presents: The Past, Present, and Future of OpenACC.
"OpenACC is an open specification for programming accelerators with compiler directives. It aims to provide a simple path for accelerating existing applications for a wide range of devices in a performance portable way. This talk with discuss the history and goals of OpenACC, how it is being used today, and what challenges it will address in the future."
Watch the video presentation: http://wp.me/p3RLHQ-dTm
Edge Computing: A Unified Infrastructure for all the Different PiecesCloudify Community
Edge Computing along with 5G promises to revolutionize customer experience with immersive applications that we can only imagine at this point. The edge will include PNFs, VNFs, and mobile-edge applications; requiring containers, virtual machines and bare-metal compute. But while edge computing promises numerous new revenue streams, managing and orchestrating these edge infrastructure environments is not going to be a seamless, instant process. In this webinar, experts in NFV orchestration discuss the concerns you must address in the transition to the edge, and show how you can use available open source tools to create a single management environment for PNFs, VNFs, and mobile-edge applications.
In this deck from the 2017 MVAPICH User Group, DK Panda from Ohio State University presents: Overview of the MVAPICH Project and Future Roadmap.
"This talk will provide an overview of the MVAPICH project (past, present and future). Future roadmap and features for upcoming releases of the MVAPICH2 software family (including MVAPICH2-X, MVAPICH2-GDR, MVAPICH2-Virt, MVAPICH2-EA and MVAPICH2-MIC) will be presented. Current status and future plans for OSU INAM, OEMT and OMB will also be presented."
Watch the video: https://www.youtube.com/watch?v=wF7t-oH7wi4
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Macromolecular crystallography is an experimental technique allowing to explore 3D atomic structure of proteins, used by academics for research in biology and by pharmaceutical companies in rational drug design. While up to now development of the technique was limited by scientific instruments performance, recently computing performance becomes a key limitation. In my presentation I will present a computing challenge to handle 18 GB/s data stream coming from the new X-ray detector. I will show PSI experiences in applying conventional hardware for the task and why this attempt failed. I will then present how IC 922 server with OpenCAPI enabled FPGA boards allowed to build a sustainable and scalable solution for high speed data acquisition. Finally, I will give a perspective, how the advancement in hardware development will enable better science by users of the Swiss Light Source.
This document summarizes a presentation given at the London Open Source Meetup for RISC-V on April 19, 2021. The presentation introduced the RISC-V Online Tutor, an online course for learning RISC-V fundamentals from digital logic to C programming. It provided an overview of the course structure and lessons, which take students through RISC-V assembly, processor design, and application development. It also demonstrated the online learning platform and its ability to interact with remote FPGA hardware during lessons. The goal is to invite community participation and collaboration to further develop the Online Tutor.
This document lists several SDN projects, controllers, simulators, and research areas. It describes Python and C++ based OpenFlow controllers like SNAC and HELIOS. It also lists simulators that support SDN including POX with GEPHI for SDN virtualization and OpenFlow VM simulation. Finally, it outlines significant research areas for SDN PhD projects such as software-defined wireless networking and software-defined mobile networks.
This document discusses how HPC infrastructure is being transformed with AI. It summarizes that cognitive systems use distributed deep learning across HPC clusters to speed up training times. It also outlines IBM's hardware portfolio expansion for AI training, inference, and storage capabilities. The document discusses software stacks for AI like Watson Machine Learning Community Edition that use containers and universal base images to simplify deployment.
Challenges and Opportunities for HPC Interconnects and MPIinside-BigData.com
In this video from the 2017 MVAPICH User Group, Ron Brightwell from Sandia presents: Challenges and Opportunities for HPC Interconnects and MPI.
"This talk will reflect on prior analysis of the challenges facing high-performance interconnect technologies intended to support extreme-scale scientific computing systems, how some of these challenges have been addressed, and what new challenges lay ahead. Many of these challenges can be attributed to the complexity created by hardware diversity, which has a direct impact on interconnect technology, but new challenges are also arising indirectly as reactions to other aspects of high-performance computing, such as alternative parallel programming models and more complex system usage models. We will describe some near-term research on proposed extensions to MPI to better support massive multithreading and implementation optimizations aimed at reducing the overhead of MPI tag matching. We will also describe a new portable programming model to offload simple packet processing functions to a network interface that is based on the current Portals data movement layer. We believe this capability will offer significant performance improvements to applications and services relevant to high-performance computing as well as data analytics."
Watch the video: https://wp.me/p3RLHQ-hhK
Learn more: http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This presentation covers various partners and collaborators who are currently working with OpenPOWER foundation ,Use cases of OpenPOWER systems in multiple Industries , OpenPOWER Workgroups and OpenCAPI features .
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: Options and Trade-offs" tutorial at the May 2018 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to rapidly evolve. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT. Inferencing engines are being fed via neural network file formats such as NNEF and ONNX. Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others, such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project?
In this presentation, Trevett presents the current landscape of APIs, file formats and SDKs for inferencing and vision acceleration, explaining where each one fits in the development flow. Trevett also highlights where these APIs overlap and where they complement each other, and previews some of the latest developments in these APIs.
Shrilesh Kathe has experience developing ETL solutions using Apache Spark, Python, and Hive. He has worked as an offshore ETL developer for Nike implementing their WEST BI CWA system using Spark SQL, Python, and Hive. Some of his responsibilities included developing a generic ETL solution to capture web traffic data, writing PySpark applications to parse raw data and store it in partitioned tables, and facilitating XML parsing and logical functions in Python. He also has a Bachelor's degree in Electronics and Telecommunication and certifications in Apache Spark and Java.
The document discusses porting OpenJ9 JDK to RISC-V architecture. It involves preparing the software toolchain for cross-compilation to RISC-V, preparing hardware like the HiFive Unleashed development board, and developing OpenJ9 JDK through a mix of local and cross compilation. The status shows OpenJ9 JDK can execute in interpreter mode on the RISC-V emulator and HiFive board running Debian, with future work planned on JIT support, different GC strategies, and supporting other Java versions.
"Session ID: BUD17-503
Session Name: The HPE Machine and Gen-Z - BUD17-503
Speaker:
Grant Likely
Track:
★ Session Summary ★
With the exponential rise in quantity of data to manage, the modern data centre is increasingly limited by the capacity of individual machines. Since storage and compute demand more capacity than can be provided by a single machine, we distribute both over large clusters and use the network to transfer data between where it is stored and where it is processed. Moving all that data around uses deep storage stacks which incur a significant performance impact. If we could somehow flatten the storage stack and provide applications with direct access to data, then we could improve performance by orders of magnitude.
Hewlett Packard Enterprise recently demonstrated that we can do exactly with their research project, ""The Machine"". Instead of moving data around with a network, The Machine uses multi terabytes of persistent memory and a next generation fabric-attached memory interconnect to provide a single pool of storage which can be accessed by any processor in the cluster. It shows that we can provide applications with immediate load/store access to huge data sets in a model called Memory-Driven Computing.
Proof in hand, now it is time to bring Memory-Defined Computing to the data centre. Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct attached, switched or fabric topologies. HPE has joined the Gen-Z consortium and is using the knowledge gained with The Machine to help shape Gen-Z to set the stage for true Memory-Driven Computing. With putting memory at the centre, this enables us to overcome the limitations of today's computing systems and power innovations.
This session will cover two topics. It will start with a status update on The Machine and an overview of how it works. Then we'll shift into an introduction of Gen-Z, and how it can reshape the architecture of computing in the years to come.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-503/
Presentation:
Video: https://youtu.be/1BVtChDQVyQ
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
Keyword: HPE, Gen-Z
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/mathworks/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-venkataramani
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Avinash Nehemiah, Product Marketing Manager for Computer Vision, and Girish Venkataramani, Product Development Manager, both of MathWorks, presents the "Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded GPUs" tutorial at the May 2017 Embedded Vision Summit.
In this presentation, you'll learn how to adopt a MATLAB-centric workflow to design, verify and deploy your computer vision and deep learning applications onto embedded NVIDIA Tegra-based platforms including Jetson TK1/TX1 and DrivePX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease-of-use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB.
Next, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. The workflow affords on-board real-time prototyping and verification controlled through MATLAB. Examples of common computer vision algorithms and deep learning networks are used to describe this workflow, and their performance benchmarks are presented.
Codasip is a leading provider of RISC-V processor IP and design tools. They introduced the first licensable RISC-V processor in 2015. This presentation discusses Codasip's portfolio of application class RISC-V processors, including their single-core and multiprocessor options. It also describes Codasip Studio, their tool for customizing RISC-V processors, and the P extension for digital signal processing applications.
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
ML solutions in production start from data ingestion and extend upto the actual deployment step. We want this workflow to be scalable, portable and simple. Containers and kubernetes are great at the former two but not the latter if you aren't a devops practitioner. We'll explore how you can leverage the Kubeflow project to deploy best-of-breed open-source systems for ML to diverse infrastructures.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-trevett
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Neil Trevett, President of the Khronos Group and Vice President at NVIDIA, presents the "APIs for Accelerating Vision and Inferencing: An Industry Overview of Options and Trade-offs" tutorial at the May 2019 Embedded Vision Summit.
The landscape of SDKs, APIs and file formats for accelerating inferencing and vision applications continues to evolve rapidly. Low-level compute APIs, such as OpenCL, Vulkan and CUDA are being used to accelerate inferencing engines such as OpenVX, CoreML, NNAPI and TensorRT, being fed by neural network file formats such as NNEF and ONNX.
Some of these APIs, like OpenCV, are vision-specific, while others, like OpenCL, are general-purpose. Some engines, like CoreML and TensorRT, are supplier-specific, while others such as OpenVX, are open standards that any supplier can adopt. Which ones should you use for your project? Trevett answers these and other questions in this presentation.
High Performance Computing (HPC)
The HPC SIG was officially launched at Linaro Connect Las Vegas in September 2016 to drive the adoption of ARM in HPC through the creation of a data center ecosystem. It is a collaborative project comprised of members and an advisory board. Current members include ARM, HiSilicon, Qualcomm, Fujitsu, Cavium, Red Hat and HPE. CERN and Riken are on the advisory board.
https://www.linaro.org/sig/hpc/
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
The document discusses the history and future of high performance computing (HPC). It outlines the key technologies and architectures that have enabled exponential increases in computational power over recent decades. These include vector processing, parallelization, GPUs, and interconnects like Infiniband. The document also examines emerging technologies like exascale computing and quantum computing that could further push the boundaries of HPC. Overall, the document argues that HPC will remain indispensable for scientific discovery and engineering innovation into the future.
A Library for Emerging High-Performance Computing ClustersIntel® Software
This document discusses the challenges of developing communication libraries for exascale systems using hybrid MPI+X programming models. It describes how current MPI+PGAS approaches use separate runtimes, which can lead to issues like deadlock. The document advocates for a unified runtime that can support multiple programming models simultaneously to avoid such issues and enable better performance. It also outlines MVAPICH2's work on designs like multi-endpoint that integrate MPI and OpenMP to efficiently support emerging highly threaded systems.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...KRamasamy2
This document discusses parallel computer architecture and challenges. It covers topics such as resource allocation, data access, communication, synchronization, performance and scalability for parallel processing. It also discusses different levels of parallelism that can be exploited in programs as well as the need for and feasibility of parallel computing given technology and application demands.
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
The document discusses Microsoft's Cognitive Toolkit (CNTK), an open source deep learning toolkit developed by Microsoft. It provides the following key points:
1. CNTK uses computational graphs to represent machine learning models like DNNs, CNNs, RNNs in a flexible way.
2. It supports CPU and GPU training and works on Windows and Linux.
3. CNTK achieves state-of-the-art accuracy and is efficient, scaling to multi-GPU and multi-server settings.
SystemML is an Apache project that provides a declarative machine learning language for data scientists. It aims to simplify the development of custom machine learning algorithms and enable scalable execution on everything from single nodes to clusters. SystemML provides pre-implemented machine learning algorithms, APIs for various languages, and a cost-based optimizer to compile execution plans tailored to workload and hardware characteristics in order to maximize performance.
In this deck from the HPC Advisory Council Spain Conference, DK Panda from Ohio State University presents: Communication Frameworks for HPC and Big Data.
Watch the video presentation: http://insidehpc.com/2015/09/video-communication-frameworks-for-hpc-and-big-data/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn more: http://www.hpcadvisorycouncil.com/events/2015/spain-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
DK Panda from Ohio State University presented this deck at the 2017 HPC Advisory Council Stanford Conference.
"This talk will focus on challenges in designing runtime environments for exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPGPUs and Intel MIC), virtualization technologies (KVM, Docker, and Singularity), and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video: http://wp.me/p3RLHQ-glW
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
John Hennessy gave a talk outlining the end of Moore's law and faster general purpose computing, and opportunities for a new golden age. He discussed how three key changes - the end of Dennard scaling, slowing Moore's law transistor gains, and architectural limitations - have converged to end the steady performance increases of the past. This marks the end of an era of stunning microprocessor progress. Domain specific architectures and languages that better match applications to tailored hardware designs provide new opportunities for more efficient computing. Research into cheaper hardware development, new technologies, and the co-evolution of domains, languages and architectures could enable a new golden age.
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
In this video from the 2014 HPC Advisory Council Europe Conference, DK Panda from Ohio State University presents: MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans.
This talk will focus on latest developments and future plans for the MVAPICH2 and MVAPICH2-X projects. For the MVAPICH2 project, we will focus on scalable and highly-optimized designs for pt-to-pt communication (two-sided and one-sided MPI-3 RMA), collective communication (blocking and MPI-3 non-blocking), support for GPGPUs and Intel MIC, support for MPI-T interface and schemes for fault-tolerance/fault-resilience. For the MVAPICH2-X project, will focus on efficient support for hybrid MPI and PGAS (UPC and OpenSHMEM) programming model with unified runtime."
Watch the video presentation: http://wp.me/p3RLHQ-coF
Esteban Hernandez is a PhD candidate researching heterogeneous parallel programming for weather forecasting. He has 12 years of experience in software architecture, including Linux clusters, distributed file systems, and high performance computing (HPC). HPC involves using the most efficient algorithms on high-performance computers to solve demanding problems. It is used for applications like weather prediction, fluid dynamics simulations, protein folding, and bioinformatics. Performance is often measured in floating point operations per second. Parallel computing using techniques like OpenMP, MPI, and GPUs is key to HPC. HPC systems are used across industries for applications like supply chain optimization, seismic data processing, and drug development.
This document summarizes a presentation on HPC performance tools and the road to exascale computing. It discusses:
1) The challenges of exascale computing, including the need for unprecedented computing power for scientific simulations and big data, and issues around power consumption. New hardware architectures like multicore and manycore chips as well as new software paradigms will be required.
2) Performance tools that will be essential for exascale systems, including debugging tools, performance analysis tools, and correctness checking tools. Several European and U.S. initiatives are working to develop necessary performance tools.
3) Several European exascale initiatives including GENIUS, which aims to help exploit data from the Gaia
In this deck from the 2016 HPC Advisory Council Switzerland Conference, DK Panda from Ohio State University presents: High-Performance and Scalable Designs of Programming Models for Exascale Systems.
"This talk will focus on challenges in designing runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPUs and Intel MIC) and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video presentation: http://wp.me/p3RLHQ-f7c
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
1) The document discusses improving the efficiency of machine learning algorithms using the HPCC Systems platform through parallelization.
2) It describes the HPCC Systems architecture and its advantages for distributed machine learning.
3) A parallel DBSCAN algorithm is implemented on the HPCC platform which shows improved performance over the serial algorithm, with execution times decreasing as more nodes are used.
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
Google’s TensorFlow is one of the most popular deep learning (DL) frameworks. In distributed TensorFlow, gradient updates are a critical step governing the total model training time. These updates incur a massive volume of data transfer over the network.
In this talk, we first present a thorough analysis of the communication patterns in distributed TensorFlow. Then we propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMA-gRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance. Our design includes advanced features such as message pipelining, message coalescing, zero-copy transmission, etc. The performance evaluations show that our proposed design can significantly speed up gRPC throughput by up to 1.5x compared to the default gRPC design. By integrating our RDMA-gRPC with TensorFlow, we are able to achieve up to 35% performance improvement for TensorFlow training with CNN models.
Speakers
Dhabaleswar K (DK) Panda, Professor and University Distinguished Scholar, The Ohio State University
Xiaoyi Lu, Research Scientist, The Ohio State University
Real time machine learning proposers day v3mustafa sarac
This document discusses DARPA's Real Time Machine Learning (RTML) program. The objective is to develop hardware generators and compilers that can automatically create application-specific integrated circuits for machine learning from high-level code. This would allow no-human-in-the-loop creation of efficient neural network hardware. The program has two phases: phase 1 develops an ML hardware compiler, and phase 2 demonstrates RTML systems for applications like wireless communication and image processing. Key goals are high performance, low power consumption, and support for a variety of neural network architectures and machine learning techniques.
Similar to FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote) (20)
Haku, a toy functional language based on literary JapaneseWim Vanderbauwhede
Haku is a natural language functional programming language based on literary Japanese. This talk is discusses the motivation behind Haku and explains the language by example. You don't need to know Japanese or have read the Haku documentation.
https://codeberg.org/wimvanderbauwhede/haku
On the need for low-carbon and sustainable computing and the path towards zero-carbon computing.
See https://wimvanderbauwhede.github.io/articles/frugal-computing/ for the complete article with references.
* The problem:
The current emissions from computing are about 2% of the world total but are projected to rise steeply over the next two decades. By 2040 emissions from computing alone will be close to 80% of the emissions level acceptable to keep global warming below the safe limit of 1.5°C. This growth in computing emissions is unsustainable: it would make it virtually impossible to meet the emissions warming limit.
The emissions from production of computing devices far exceed the emissions from operating them, so even if devices are more energy efficient producing more of them will make the emissions problem worse. Therefore we must extend the useful life of our computing devices.
* The solution:
As a society we need to start treating computational resources as finite and precious, to be utilised only when necessary, and as effectively as possible. We need frugal computing: achieving the same results for less energy.
* The vision:
Imagine we can extend the useful life of our devices and even increase their capabilities without any increase in energy consumption.
Meanwhile, we will develop the technologies for the next generation of devices, designed for energy efficiency as well as long life.
Every subsequent cycle will last longer, until finally the world will have computing resources that last forever and hardly use any energy.
NOTE: there is a small mistake in the presentation, the safe limit for 2040 is 13 GtCO2e, not 23. This makes it even more important to embrace frugal computing.
As Slideshare does not allow re-uploads, please find the corrected slides at https://wimvanderbauwhede.github.io/presentation/Zero-Carbon-Computing.pdf
Many people working in academia find it difficult to achieve or maintain a good work-life balance. This talk goes into the reasons for this, the consequences of working too much, the benefits of having the right balance, and ways of achieving a better balance. The talk is very much based on my personal views and experiences, but I hope there is some interest in sharing these.
In this talk I introduce Perl 6 and some of its exciting new features, especially gradual typing, roles and some functional programming features like lazy lists.
This talk was given at the Scottish Programming Languages Seminar on 24th Feb 2016 at the School of Computing Science of Glasgow University.
Perl and Haskell: Can the Twain Ever Meet? (tl;dr: yes)Wim Vanderbauwhede
This talk is about two Perl modules (Call:Haskell and Functional::Types) I developed to call Haskell functions as transparently as possible.
In general, the only way to guarantee the correctness of the types of the function arguments in Haskell is to ensure they are well-typed in Perl. So I ended up writing a Haskell-inspired type system for Perl. In this talk I will first discuss the approach I took to call Haskell from Perl, and then the reasons why a type system is needed, and the actual type system I developed. The type system is based on "prototypes", functions that create type descriptors, and a small API of functions to create type constructors and manipulate the types. The system is type checked at run time and supports sum types, product types, function types and polymorphism. The approach is not Perl-specific and suitable for other dynamic languages.
https://github.com/wimvanderbauwhede
These are the slides of the talk I gave at the Dyla'14 workshop (http://conferences.inf.ed.ac.uk/pldi2014/). It's about monads for languages like Perl, Ruby and LiveScript.
The source code is available at
https://github.com/wimvanderbauwhede/Perl-Parser-Combinators
https://github.com/wimvanderbauwhede/parser-combinators-ls
Don't be put off by the word monad or the maths. This is basically a very practical way for doing tasks such as parsing.
1) The author discusses how the late Scottish writer Iain Banks envisioned in his novels that advanced computing and availability of information could lead to a utopian society, as depicted in Banks' fictional culture.
2) The author notes that there are still many unsolved problems in their area of computer science research related to exploiting parallelism and heterogeneous systems.
3) The author wants their research to help address real-world problems like climate change, aging populations, energy security, and issues with the internet, such as by helping weather scientists improve severe weather predictions or linking rainfall to flooding simulations.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
When it is all about ERP solutions, companies typically meet their needs with common ERP solutions like SAP, Oracle, and Microsoft Dynamics. These big players have demonstrated that ERP systems can be either simple or highly comprehensive. This remains true today, but there are new factors to consider, including a promising new contender in the market that’s Odoo. This blog compares Odoo ERP with traditional ERP systems and explains why many companies now see Odoo ERP as the best choice.
What are ERP Systems?
An ERP, or Enterprise Resource Planning, system provides your company with valuable information to help you make better decisions and boost your ROI. You should choose an ERP system based on your company’s specific needs. For instance, if you run a manufacturing or retail business, you will need an ERP system that efficiently manages inventory. A consulting firm, on the other hand, would benefit from an ERP system that enhances daily operations. Similarly, eCommerce stores would select an ERP system tailored to their needs.
Because different businesses have different requirements, ERP system functionalities can vary. Among the various ERP systems available, Odoo ERP is considered one of the best in the ERp market with more than 12 million global users today.
Odoo is an open-source ERP system initially designed for small to medium-sized businesses but now suitable for a wide range of companies. Odoo offers a scalable and configurable point-of-sale management solution and allows you to create customised modules for specific industries. Odoo is gaining more popularity because it is built in a way that allows easy customisation, has a user-friendly interface, and is affordable. Here, you will cover the main differences and get to know why Odoo is gaining attention despite the many other ERP systems available in the market.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
openEuler Case Study - The Journey to Supply Chain Security
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
1. FPGAs
as
Components
in
Heterogeneous
High-‐Performance
Compu8ng
Systems:
Raising
the
Abstrac8on
Level
Wim
Vanderbauwhede
School
of
Compu6ng
Science
University
of
Glasgow
3. 80
Years
ago:
The
Theory
Turing,
Alan
Mathison.
"On
computable
numbers,
with
an
applica6on
to
the
Entscheidungsproblem."
J.
of
Math
58,
no.
345-‐363
(1936):
5.
4. 1936:
Universal
machine
(Alan
Turing)
1936:
Lambda
calculus
(Alonzo
Church)
1936:
Stored-‐program
concept
(Konrad
Zuse)
1937:
Church-‐Turing
thesis
1945:
The
Von
Neumann
architecture
Church,
Alonzo.
"A
set
of
postulates
for
the
founda6on
of
logic."
Annals
of
mathema0cs
(1932):
346-‐366.
6. 1957:
Fortran,
John
Backus,
IBM
1958:
First
IC,
Jack
Kilby,
Texas
Instruments
1965:
Moore’s
law
1971:
First
microprocessor,
Texas
Instruments
1972:
C,
Dennis
Ritchie,
Bell
Labs
1977:
Fortran-‐77
1977:
von
Neumann
bocleneck,
John
Backus
8. 1984:
Verilog
1984:
First
reprogrammable
logic
device,
Altera
1985:
First
FPGA,Xilinx
1987:
VHDL
Standard
IEEE
1076-‐1987
1989:
Algotronix
CAL1024,
the
first
FPGA
to
offer
random
access
to
its
control
memory
9. 20
Years
ago:
High-‐level
Synthesis
Page,
Ian.
"Closing
the
gap
between
hardware
and
soiware:
hardware-‐soiware
cosynthesis
at
Oxford."
(1996):
2-‐2.
17. High-‐Level
Synthesis
•
For
many
years,
Verilog/VHDL
were
good
enough
•
Then
the
complexity
gap
created
the
need
for
HLS.
•
This
reflects
the
ra6onale
behind
VHDL:
"A
language
with
a
wide
range
of
descrip0ve
capability
that
was
independent
of
technology
or
design
methodology."
•
What
is
lacking
in
this
requirement
is
"capability
for
scalable
abstrac6on".
18. “C
to
Gates”
•
"C-‐to-‐Gates"
offered
that
higher
abstrac6on
level
•
But
it
was
in
a
way
a
return
to
the
days
before
standardised
VHDL/Verilog:
the
various
components
making
up
a
system
were
designed
and
verified
using
a
wide
range
of
different
and
incompa0ble
languages
and
tools.
19. The
Choice
of
C
•
C
was
designed
by
Ritchie
for
the
specific
purpose
of
wri6ng
the
UNIX
opera6ng
system.
•
i.e.
to
create
a
control
system
for
a
RAM-‐
based
single-‐threaded
system.
•
It
is
basically
a
syntac6c
layer
over
assembly
language.
•
Very
different
seman6cs
from
HDLs
•
But
it
became
the
lingua
franca
for
engineers,
and
hence
the
de-‐facto
language
for
HLS
tools.
20. Really
C?
•
None
of
them
was
ever
really
C
though:
•
"C
with
restric6ons
and
pragmas”
(e.g.
DIME-‐C)
•
“C
with
restric6ons
and
a
CSP
API”
(e.g.
Impulse-‐C)
•
"C-‐syntax
language
with
parallel
and
CSP
seman6cs”
(e.g.
Handel-‐C)
•
Typically,
no
recursion,
func6on
pointers
(no
stack)
and
dynamic
alloca6on
(no
OS)
21. The
Odd
Ones
Out
•
Mitrion-‐C
is
func6onal
dataflow
language,
very
different
from
C
in
abstrac6on
level
and
seman6cs;
•
Bluespec
too
was
a
radical
departure
from
"C-‐
to-‐Gates".
•
Both
were
inspired
by
func6onal
languages
like
Haskell.
24. GPUs,
Manycores
and
FPGAs
•
Accelerators
acached
to
host
systems
have
become
increasingly
popular
•
Mainly
GPUs,
•
But
increasingly
manycores
(MIC,
Tilera)
•
And
FPGAs
26. Heterogeneous
Programming
•
State
of
affairs
today:
•
Programmer
must
decide
what
to
offload
•
Write
host-‐accelerator
control
and
data
movement
code
using
dedicated
API
•
Write
accelerator
code
using
dedicated
language
•
Many
approaches
(CUDA,
OpenCL,
MaxJ,
C++
AMP)
27. Programming
Model
•
All
solu6ons
assume
data
parallelism:
•
Each
kernel
is
single-‐threaded,
works
on
a
por6on
of
the
data
•
Programmer
must
iden6fy
these
por6ons
and
the
amount
of
parallelism
•
So
not
ideal
for
FPGAs
•
Recent
OpenCL
specifica6ons
have
kernel
pipes
allowing
construc6on
of
pipelines
•
Also
support
for
unified
memory
space
28. Performance
Nearest
neighbour
Lava
MD
Document
Classifica6on
Kernel
Speed
5.345454545
4.317035156
1.306521951
Total
Speed
1.603603604
4.232313328
1.037712032
0
1
2
3
4
5
6
Speed-‐up
OpenCL-‐FPGA
Speed-‐up
vs
OpenCL-‐CPU
Segal,
Oren,
Nasibeh
Nasiri,
Mar6n
Margala,
and
Wim
Vanderbauwhede.
"High
level
programming
of
FPGAs
for
HPC
and
data
centric
applica6ons.”
Proc.
IEEE
HPEC
2014,
pp.
1-‐3.
29. Power
Savings
Nearest
neighbour
Lava
MD
Document
Classifica6on
Kernel
Power
5.241358852
4.232966577
1.281079155
Total
Power
1.572375533
4.149894595
1.017503956
0
1
2
3
4
5
6
Power
Saving
CPU/FPGA
Power
Consump8on
Ra8o
Segal,
Oren,
Nasibeh
Nasiri,
Mar6n
Margala,
and
Wim
Vanderbauwhede.
"High
level
programming
of
FPGAs
for
HPC
and
data
centric
applica6ons.”
Proc.
IEEE
HPEC
2014,
pp.
1-‐3.
31. Heterogeneous
HPC
Systems
•
Modern
HPC
cluster
node:
•
Mul6core/manycore
host
•
Accelerators:
GPGPU,
MIC
and
increasingly,
FPGAs
•
HPC
workloads
•
Very
complex
codebase
•
Legacy
code
32.
33. Example:
WRF
•
Weather
Research
and
Forecas6ng
Model
•
Fortran-‐90,
support
for
MPI
and
OpenMP
•
1,263,320
lines
of
code
•
So
about
ten
thousand
pages
of
code
lis6ngs
•
Parts
of
it
have
been
accelerated
manually
on
GPU
(a
few
thousands
of
lines)
•
Changing
the
code
for
a
GPU/FPGA
system
would
be
a
huge
task,
and
the
result
would
not
be
portable.
34. FPGAs
in
HPC
•
FPGAs
are
good
at
some
tasks,
e.g.:
•
Bit
level,
integer
and
string
opera6ons
•
Pipeline
parallelism
rather
than
data
parallelism
•
Superior
internal
memory
bandwidth
•
Streaming
dataflow
computa6ons
•
But
not
so
good
at
others
•
Double-‐precision
floa6ng
point
computa6ons
•
Random
memory
access
computa6ons
35. "On the Capability and Achievable
Performance of FPGAs for HPC
Applications"
Wim Vanderbauwhede
School of Computing Science, University of Glasgow, UK
hcp://www.slideshare.net/WimVanderbauwhede
38. One
Codebase,
Many
Components
•
For
complex
HPC
applica6ons,
FPGAs
will
never
be
op6mal
for
the
whole
codebase
•
But
neither
will
mul6cores
or
GPUs
•
So
we
need
to
be
able
to
split
the
codebase
automa6cally
over
the
different
components
in
the
heterogeneous
system.
•
Therefore,
we
need
to
raise
the
abstrac6on
level
beyond
“heterogeneous
programming”
and
“high-‐
level
synthesis”
40. •
Device-‐specific
high-‐level
abstrac6on
is
no
longer
good
enough
•
OpenCL
is
rela6vely
high-‐level
and
device-‐
independent,
but
it
is
s6ll
not
good
enough
•
High-‐level
synthesis
languages
and
heterogeneous
programming
frameworks
should
be
compila6on
targets!
•
Just
like
assembly
/IR
languages
and
HDLs
43. •
Star6ng
from
a
complete,
unop6mised
program
•
Compiler-‐based
program
transforma6ons
•
Correct-‐by-‐construc6on
•
Component-‐based,
hierarchical
cost
model
for
the
full
system
•
Op6miza6on
problem:
find
the
op6mal
program
variant
given
the
system
cost
model
44. A
Func6onal-‐Programming
Approach
•
For
the
par6cular
case
of
scien6fic
HPC
codes
•
Focus
on
array
computa6ons
•
Express
the
program
using
higher-‐order
func6ons
•
Type
Transforma6on
based
program
transforma6on
45. Func6onal
Programming
•
There
are
only
func6ons
•
Func6ons
can
operate
on
func6ons
•
Func6ons
can
return
func6ons
•
Syntac6c
sugar
over
the
λ-‐calculus
46. Types
in
Func6onal
Programming
•
Types
are
just
labels
to
help
us
reason
about
the
values
in
a
computa6on
•
More
general
than
types
in
e.g
C
•
For
our
purpose,
we
focus
on
types
of
func6ons
that
perform
array
opera6ons
•
Func6ons
are
values,
so
they
need
a
type
47. Examples
of
Types
-‐-‐
a
func6on
f
taking
a
vector
of
n
values
of
type
a
and
returing
a
vector
of
m
values
of
type
b
f : Vec a n -> Vec b m
-‐-‐
a
func6on
map
taking
a
func0on
from
a
to
b
and
a
vector
of
type
a,
and
returning
a
vector
of
type
b
map : (a -> b) -> Vec a n -> Vec b n
48. Type
Transforma6ons
•
Transform
the
type
of
a
func6on
into
another
type
•
The
func6on
transforma6on
can
be
derived
automa6cally
from
the
type
transforma6on
•
The
type
transforma6ons
are
provably
correct
•
Thus
the
transformed
program
is
correct
by
construc6on!
49. Array
Type
Transforma6ons
•
For
this
talk,
focus
on
•
Vector
(array)
types
•
FPGA
cost
model
•
Programs
must
be
composed
using
par6cular
higher-‐order
func6ons
(correctness
condi6ons)
•
Transforma6ons
essen6ally
reshape
the
arrays
50. Higher-‐order
Func6ons
•
map:
perform
a
computa6on
on
all
elements
of
an
array
independently,
e.g.
square
all
values.
•
can
be
done
sequen6ally,
in
parallel
or
using
a
pipeline
if
the
computa6on
is
pipelined
•
foldl:
reduce
an
array
to
a
value
using
an
accumulator,
e.g.
sum
all
values.
•
can
be
done
sequen6ally
or,
if
the
computa6on
is
associa6ve,
using
a
binary
tree
51. Example:
SOR
•
Successive
Over-‐Relaxa6on
(SOR)
kernel
from
a
Large-‐Eddy
simulator
(weather
simula6on)
in
Fortran:
52. Example:
SOR
using
map
•
Fortran
code
rewricen
as
a
map
of
a
func6on
over
a
1-‐D
vector
53. Example:
Type
Transforma6on
•
Transform
the
1-‐D
vector
into
a
2-‐D
vector
•
The
program
transforma6on
is
derived
55. Cost
Calcula6on
•
Uses
an
Intermediate
Representa6on
Language,
the
TyTra-‐IR
•
TyTra-‐IR
uses
LLVM
syntax
but
can
express
sequen6al,
parallel
and
pipeline
seman6cs
•
Thus
a
direct
mapping
to
the
higher-‐order
func6ons
•
But
the
cost
of
computa6ons
and
communica6on
can
be
computed
directly
from
the
TyTra-‐IR
program
•
No
need
for
synthesis
57. Cost
Space
and
Cost
Es6ma6on
Logic and
Memory
Resources
Communication
Bandwidth
(local memory, global
memory, host)
Performance
(Throughput)
The Resource-Wall
(computation-bound)
The Bandwidth-Wall
(communication-bound)
60. Full-‐Program
Transforma6on
•
Type
Transforma6ons
are
not
FPGA-‐specific
•
Compiler
can
create
variants
for
full
program
•
Then
separate
out
parts
of
the
program
based
on
minimal
cost
on
given
components
of
the
system
•
For
parallelisa6on
over
a
cluster,
use
Mul6-‐Party
Session
Types
to
transform
the
program
into
communica6ng
processes
61. Conclusions
•
FPGAs
have
reached
maturity
as
HPC
plauorms
•
High-‐Level
Synthesis
and
Heterogeneous
Programming
are
both
very
important
steps
forward,
and
performance
is
already
impressive
•
But
we
need
to
raise
the
abstrac6on
level
even
more
•
Full-‐system
compilers
for
heterogeneous
systems
•
FPGAs
are
merely
components
in
such
systems
•
Type
Transforma6ons
are
one
possible
way