The document describes the HercuLeS HLS environment, a new high-level synthesis tool marketed by Ajax Compilers. It provides the following key details:
- HercuLeS is an extensible, high-level synthesis environment that allows whole-program hardware compilation from C code through pluggable analyses and optimizations.
- It features an automatic flow from an algorithmic C description to customized digital hardware using an intermediate representation called N-Address Code and construction of control/data flow graphs.
- The generated VHDL code and self-checking testbenches aim to be human-readable, vendor- and technology-independent.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
This document discusses bringing algebraic semantics to Apache Mahout by developing Scala and Spark bindings. It begins by covering Apache Mahout's history and current problems. The future plans include narrowing the focus, abandoning MapReduce, using Spark, and developing a DSL for linear algebra. Requirements for an ideal machine learning environment are outlined. The Scala/Spark bindings address these by providing an R-like DSL, Scala language qualities, and automatic distribution/parallelism. Core features include linear algebra operations on various data types. An example demonstrates distributed linear regression on a cereals dataset. Under the covers, optimization translates logical plans to physical operators, like rewriting matrix multiplication using more efficient formulations.
This document discusses Spark, an open-source cluster computing framework. It begins with an introduction to distributed computing problems related to processing large datasets. It then provides an overview of Spark, including its core abstraction of resilient distributed datasets (RDDs) and how Spark builds on the MapReduce model. The rest of the document demonstrates Spark concepts like transformations and actions on RDDs and the use of key-value pairs. It also discusses SparkSQL and shows examples of finding the most retweeted tweet using core Spark and SparkSQL.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
This presentation will explore how Bloomberg uses Spark, with its formidable computational model for distributed, high-performance analytics, to take this process to the next level, and look into one of the innovative practices the team is currently developing to increase efficiency: the introduction of a logical signature for datasets.
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
This document discusses techniques for co-occurrence-based recommendations using Apache Mahout, Scala, and Spark. It describes how Mahout computes the co-occurrence matrix ATA using a row-outer product formulation that executes in a single pass over the row-partitioned matrix A. It also explains how the computation is optimized physically by using specialized operators like Transpose-Times-Self to avoid repartitioning the matrix. Finally, it provides examples of how the distributed computation of ATA is implemented across worker nodes.
This is an updated version of the opening talk I gave on February 4, 2013 at the Dagstuhl Seminar on Fault Prediction, Localization, and Repair (http://goo.gl/vc6Cl).
Abstract: Software debugging, which involves localizing, understanding, and removing the cause of a failure, is a notoriously difficult, extremely time consuming, and human-intensive activity. For this reason, researchers have invested a great deal of effort in developing automated techniques and tools for supporting various debugging tasks. Although potentially useful, most of these techniques have yet to fully demonstrate their practical effectiveness. Moreover, many current debugging approaches suffer from some common limitations and rely on several strong assumptions on both the characteristics of the code being debugged and how developers behave when debugging such code. This talk provides an overview of the state of the art in the broader area of software debugging, discuss strengths and weaknesses of the main existing debugging techniques, present a set of open challenges in this area, and sketch future research directions that may help address these challenges.
The second part of the talk discusses joint work with Chris Parnin presented at ISSTA 2011: C. Parnin and A. Orso, "Are Automated Debugging Techniques Actually Helping Programmers?", in Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2011), pp 199-209, http://doi.acm.org/10.1145/2001420.2001445
The document presents a general framework for slicing programs and models. It defines a slicing framework with syntax, semantics, projections, and slicing criteria. It applies the framework to program slicing examples and proposes adapting it for model slicing. The framework is then demonstrated on a class model case study, showing how slicing can reduce models for comprehension and debugging.
The document discusses program dependence graphs (PDGs) and system dependence graphs (SDGs). A PDG represents a single program as a graph of nodes for statements connected by edges showing data and control dependencies. An SDG extends a PDG to represent relationships between procedures/processes in a program by adding nodes for calls, arguments, and results and edges connecting them. SDGs allow modeling interdependencies across multiple processes in a program.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
This document discusses bringing algebraic semantics to Apache Mahout by developing Scala and Spark bindings. It begins by covering Apache Mahout's history and current problems. The future plans include narrowing the focus, abandoning MapReduce, using Spark, and developing a DSL for linear algebra. Requirements for an ideal machine learning environment are outlined. The Scala/Spark bindings address these by providing an R-like DSL, Scala language qualities, and automatic distribution/parallelism. Core features include linear algebra operations on various data types. An example demonstrates distributed linear regression on a cereals dataset. Under the covers, optimization translates logical plans to physical operators, like rewriting matrix multiplication using more efficient formulations.
This document discusses Spark, an open-source cluster computing framework. It begins with an introduction to distributed computing problems related to processing large datasets. It then provides an overview of Spark, including its core abstraction of resilient distributed datasets (RDDs) and how Spark builds on the MapReduce model. The rest of the document demonstrates Spark concepts like transformations and actions on RDDs and the use of key-value pairs. It also discusses SparkSQL and shows examples of finding the most retweeted tweet using core Spark and SparkSQL.
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
This presentation will explore how Bloomberg uses Spark, with its formidable computational model for distributed, high-performance analytics, to take this process to the next level, and look into one of the innovative practices the team is currently developing to increase efficiency: the introduction of a logical signature for datasets.
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
This document discusses techniques for co-occurrence-based recommendations using Apache Mahout, Scala, and Spark. It describes how Mahout computes the co-occurrence matrix ATA using a row-outer product formulation that executes in a single pass over the row-partitioned matrix A. It also explains how the computation is optimized physically by using specialized operators like Transpose-Times-Self to avoid repartitioning the matrix. Finally, it provides examples of how the distributed computation of ATA is implemented across worker nodes.
This is an updated version of the opening talk I gave on February 4, 2013 at the Dagstuhl Seminar on Fault Prediction, Localization, and Repair (http://goo.gl/vc6Cl).
Abstract: Software debugging, which involves localizing, understanding, and removing the cause of a failure, is a notoriously difficult, extremely time consuming, and human-intensive activity. For this reason, researchers have invested a great deal of effort in developing automated techniques and tools for supporting various debugging tasks. Although potentially useful, most of these techniques have yet to fully demonstrate their practical effectiveness. Moreover, many current debugging approaches suffer from some common limitations and rely on several strong assumptions on both the characteristics of the code being debugged and how developers behave when debugging such code. This talk provides an overview of the state of the art in the broader area of software debugging, discuss strengths and weaknesses of the main existing debugging techniques, present a set of open challenges in this area, and sketch future research directions that may help address these challenges.
The second part of the talk discusses joint work with Chris Parnin presented at ISSTA 2011: C. Parnin and A. Orso, "Are Automated Debugging Techniques Actually Helping Programmers?", in Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2011), pp 199-209, http://doi.acm.org/10.1145/2001420.2001445
The document presents a general framework for slicing programs and models. It defines a slicing framework with syntax, semantics, projections, and slicing criteria. It applies the framework to program slicing examples and proposes adapting it for model slicing. The framework is then demonstrated on a class model case study, showing how slicing can reduce models for comprehension and debugging.
The document discusses program dependence graphs (PDGs) and system dependence graphs (SDGs). A PDG represents a single program as a graph of nodes for statements connected by edges showing data and control dependencies. An SDG extends a PDG to represent relationships between procedures/processes in a program by adding nodes for calls, arguments, and results and edges connecting them. SDGs allow modeling interdependencies across multiple processes in a program.
Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016Andrew Richards
This presentation discusses the open standards and software tools needed to enable vision processing for autonomous vehicles. It focuses on the hardware and software platforms that can deliver results, the tools to build solutions for those platforms, and open standards that allow solutions to interoperate. The presentation advocates for using graph programming and C++-based approaches like SYCL, along with open standards like SPIR-V and HSA, to develop software today that can run on future platforms and achieve full autonomy.
A Future for R: Parallel and Distributed Processing in R for Everyoneinside-BigData.com
In this deck from the 2018 European R Users Meeting, Henrik Bengtsson from the University of California San Francisco presents: A Future for R: Parallel and Distributed Processing in R for Everyone.
In this video from the European R Users Meeting, Henrik Bengtsson from the University of California San Francisco presents: A Future for R: Parallel and Distributed Processing in R for Everyone.
"The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It's ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you'd rather not have locking up your interactive R session."
Watch the video: https://wp.me/p3RLHQ-jJ4
Learn more: https://blog.revolutionanalytics.com/2019/01/future-package.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Presentation of NvFX: an effect layer that allows encapsulation of GLSL and/or D3D shading language.
The basic concept follows the footprints of NVIDIA CgFX
https://github.com/tlorach/nvFX
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Spark Summit
This document discusses using Spark Streaming and GraphX to perform near-realtime analytics on large distributed systems. The authors present a model-driven approach to implement Pregel-style graph processing to handle heterogeneous graphs. They were able to achieve over 100,000 messages per second on a 4 node cluster by using sufficient batch sizes. Implementation challenges included scaling graph processing across nodes, dealing with graph heterogeneity, and hidden memory costs from intermediate RDDs. Lessons learned include the importance of partitioning, testing high availability, and addressing memory sinks.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Spark Training Institutes: kelly technologies is the best Spark class Room training institutes in Bangalore. Providing Spark training by real time faculty in Bangalore.
BugSense is a company that analyzes data from 400 million mobile devices using custom databases built with Erlang, C, and Lisp. They process over 120,000 writes per second using their distributed, concurrent database called LDB. LDB is an in-memory time-series stream processor that allows for complex event processing and distributed, modular architecture through Erlang nodes. They use Lisp for custom data processing and views due to its expressive power for parallel computing on large datasets.
The document introduces VHDL to engineers who will use it to describe circuits for implementation in programmable logic or ASICs. It aims to provide enough information for engineers to quickly get started using VHDL while avoiding prolonged discussions more relevant for simulation developers. The document suggests coding styles appropriate for a variety of synthesis and simulation tools.
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
This document provides an overview of Apache SystemML, an open source machine learning framework for scalable machine learning. It discusses how SystemML allows data scientists to implement machine learning algorithms using a declarative language, and how SystemML then compiles and optimizes the algorithms to run efficiently on everything from single nodes to large clusters. It also provides examples of DML code used in SystemML and how to invoke SystemML through its APIs or command line.
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
This deck will provide SystemML architecture, how to get documentation for usage, algorithms etc. It will explain usage of it through command line or through notebook.
This document compares hardware implementations of three financial algorithms (Black-Scholes, Black, and binomial) using VHDL and high-level synthesis (HLS). HLS implementations achieved similar performance to VHDL in terms of throughput and latency but required fewer logic resources and used floating point for higher accuracy. However, HLS design effort was over an order of magnitude higher than VHDL. Implementing the algorithms on FPGAs provided up to 350x speedup compared to software.
These are slides from the Dec 17 SF Bay Area Julia Users meeting [1]. Ehsan Totoni presented the ParallelAccelerator Julia package, a compiler that performs aggressive analysis and optimization on top of the Julia compiler. Ehsan is a Research Scientist at Intel Labs working on the High Performance Scripting project.
[1] http://www.meetup.com/Bay-Area-Julia-Users/events/226531171/
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
Data transformation has traditionally required expertise in specialized data platforms and typically been restricted to the domain of IT. A domain specific language (DSL) separates the user’s intent from a specific implementation, while maintaining expressivity. A user interface can be used to produce these expressions, in the form of suggestions, without requiring the user to manually write code. This higher level interaction, aided by transformation previews and suggestion ranking allows domain experts such as data scientists and business analysts to wrangle data while leveraging the optimal processing framework for the data at hand.
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL – NT2 – providing a Matlab -like syntax for parallel numerical computations inside a C++ library.
Main issues addressed here is how liimtaions of classical DSEL generation and multithreaded code generation can be overcome.
Don't Call It a Comeback: Attribute Grammars for Big Data VisualizationLeo Meyerovich
This talk overviews our web platform for big data visualization (Superconductor). It focuses on one of the core components: our domain specific language for writing custom layouts. We took a new look at the old idea of attribute grammars: I'll show live examples of writing attribute grammars, automatically finding parallelism in them, and then automatically compiling them down into efficient parallel JavaScript code that runs on GPUs.
See http://www.sc-lang.com for more. The video (which includes live demos) will be up in a couple of weeks. Thanks for Functional Monthly for hosting!
This document provides an outline for a lecture on software security. It introduces the lecturer, Roman Oliynykov, and covers various topics related to software vulnerabilities like buffer overflows, heap overflows, integer overflows, and format string vulnerabilities. It provides examples of vulnerable code and exploits, and recommendations for writing more secure code to avoid these vulnerabilities.
This document summarizes the key steps and optimizations required to develop an efficient FPGA accelerator for non-binary LDPC decoding using high-level synthesis. It describes how a naive implementation based solely on a software description achieves limited performance, and how careful tuning of the high-level application description through optimizations like directive-based transformations can achieve performance close to hand-optimized RTL. The document aims to guide readers on designing efficient HLS-based hardware accelerators by considering limitations of the target device and characteristics of the application.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Open Standards for ADAS: Andrew Richards, Codeplay, at AutoSens 2016Andrew Richards
This presentation discusses the open standards and software tools needed to enable vision processing for autonomous vehicles. It focuses on the hardware and software platforms that can deliver results, the tools to build solutions for those platforms, and open standards that allow solutions to interoperate. The presentation advocates for using graph programming and C++-based approaches like SYCL, along with open standards like SPIR-V and HSA, to develop software today that can run on future platforms and achieve full autonomy.
A Future for R: Parallel and Distributed Processing in R for Everyoneinside-BigData.com
In this deck from the 2018 European R Users Meeting, Henrik Bengtsson from the University of California San Francisco presents: A Future for R: Parallel and Distributed Processing in R for Everyone.
In this video from the European R Users Meeting, Henrik Bengtsson from the University of California San Francisco presents: A Future for R: Parallel and Distributed Processing in R for Everyone.
"The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It's ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you'd rather not have locking up your interactive R session."
Watch the video: https://wp.me/p3RLHQ-jJ4
Learn more: https://blog.revolutionanalytics.com/2019/01/future-package.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Presentation of NvFX: an effect layer that allows encapsulation of GLSL and/or D3D shading language.
The basic concept follows the footprints of NVIDIA CgFX
https://github.com/tlorach/nvFX
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Spark Summit
This document discusses using Spark Streaming and GraphX to perform near-realtime analytics on large distributed systems. The authors present a model-driven approach to implement Pregel-style graph processing to handle heterogeneous graphs. They were able to achieve over 100,000 messages per second on a 4 node cluster by using sufficient batch sizes. Implementation challenges included scaling graph processing across nodes, dealing with graph heterogeneity, and hidden memory costs from intermediate RDDs. Lessons learned include the importance of partitioning, testing high availability, and addressing memory sinks.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Spark Training Institutes: kelly technologies is the best Spark class Room training institutes in Bangalore. Providing Spark training by real time faculty in Bangalore.
BugSense is a company that analyzes data from 400 million mobile devices using custom databases built with Erlang, C, and Lisp. They process over 120,000 writes per second using their distributed, concurrent database called LDB. LDB is an in-memory time-series stream processor that allows for complex event processing and distributed, modular architecture through Erlang nodes. They use Lisp for custom data processing and views due to its expressive power for parallel computing on large datasets.
The document introduces VHDL to engineers who will use it to describe circuits for implementation in programmable logic or ASICs. It aims to provide enough information for engineers to quickly get started using VHDL while avoiding prolonged discussions more relevant for simulation developers. The document suggests coding styles appropriate for a variety of synthesis and simulation tools.
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
This document provides an overview of Apache SystemML, an open source machine learning framework for scalable machine learning. It discusses how SystemML allows data scientists to implement machine learning algorithms using a declarative language, and how SystemML then compiles and optimizes the algorithms to run efficiently on everything from single nodes to large clusters. It also provides examples of DML code used in SystemML and how to invoke SystemML through its APIs or command line.
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
This deck will provide SystemML architecture, how to get documentation for usage, algorithms etc. It will explain usage of it through command line or through notebook.
This document compares hardware implementations of three financial algorithms (Black-Scholes, Black, and binomial) using VHDL and high-level synthesis (HLS). HLS implementations achieved similar performance to VHDL in terms of throughput and latency but required fewer logic resources and used floating point for higher accuracy. However, HLS design effort was over an order of magnitude higher than VHDL. Implementing the algorithms on FPGAs provided up to 350x speedup compared to software.
These are slides from the Dec 17 SF Bay Area Julia Users meeting [1]. Ehsan Totoni presented the ParallelAccelerator Julia package, a compiler that performs aggressive analysis and optimization on top of the Julia compiler. Ehsan is a Research Scientist at Intel Labs working on the High Performance Scripting project.
[1] http://www.meetup.com/Bay-Area-Julia-Users/events/226531171/
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
Data transformation has traditionally required expertise in specialized data platforms and typically been restricted to the domain of IT. A domain specific language (DSL) separates the user’s intent from a specific implementation, while maintaining expressivity. A user interface can be used to produce these expressions, in the form of suggestions, without requiring the user to manually write code. This higher level interaction, aided by transformation previews and suggestion ranking allows domain experts such as data scientists and business analysts to wrangle data while leveraging the optimal processing framework for the data at hand.
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL – NT2 – providing a Matlab -like syntax for parallel numerical computations inside a C++ library.
Main issues addressed here is how liimtaions of classical DSEL generation and multithreaded code generation can be overcome.
Don't Call It a Comeback: Attribute Grammars for Big Data VisualizationLeo Meyerovich
This talk overviews our web platform for big data visualization (Superconductor). It focuses on one of the core components: our domain specific language for writing custom layouts. We took a new look at the old idea of attribute grammars: I'll show live examples of writing attribute grammars, automatically finding parallelism in them, and then automatically compiling them down into efficient parallel JavaScript code that runs on GPUs.
See http://www.sc-lang.com for more. The video (which includes live demos) will be up in a couple of weeks. Thanks for Functional Monthly for hosting!
This document provides an outline for a lecture on software security. It introduces the lecturer, Roman Oliynykov, and covers various topics related to software vulnerabilities like buffer overflows, heap overflows, integer overflows, and format string vulnerabilities. It provides examples of vulnerable code and exploits, and recommendations for writing more secure code to avoid these vulnerabilities.
This document summarizes the key steps and optimizations required to develop an efficient FPGA accelerator for non-binary LDPC decoding using high-level synthesis. It describes how a naive implementation based solely on a software description achieves limited performance, and how careful tuning of the high-level application description through optimizations like directive-based transformations can achieve performance close to hand-optimized RTL. The document aims to guide readers on designing efficient HLS-based hardware accelerators by considering limitations of the target device and characteristics of the application.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
2. The need for high-level synthesis (HLS)
Moore’s law anticipates an annual increase in chip
complexity by 58%
At the same time, human designer’s productivity increase is
limited to 21% per annum
This designer-productivity gap is a major problem in
achieving time-to-market of hardware products
Solution Adoption of a high-level design and synthesis methodology
imposing user entry from a raised level of abstraction
• Hide low-level, time-consuming, error-prone details
• Drastically reduce human effort
• End-to-end automation from concept to production
HLS An algorithmic description is automatically synthesized to a
customized digital embedded system
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
3. Current status of HLS tools
The HLS tree bares fruits of all sorts; academic and
commercial
• Source releases, binary releases, web tools, vendor-dependent
tools, ‘‘unreleases’’ (unavailable for testing)
Commercial (CatapultC, ImpulseC, Cadence C-to-Silicon,
Synopsys Synphony, Xilinx Vivado HLS)
• ASIC-oriented tools priced within the 6-digit range
($100,000+)
• Xilinx Vivado HLS (formerly AutoESL) for FPGA
prototyping is priced at $4,800
Tools with free source/binaries (ROCCC, GAUT, SPARK,
PandA, LegUp)
• Unsupported in the long term; most of them abandoned after
funding ends/Ph.Ds get completed
Web access tools (C-to-Verilog, TransC)
• Generating incomplete designs, no testbench, limited C
support
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
4. Omissions, limitations and inefficiencies of current HLS
tools
The devise and use of non-standard, idiosyncratic languages
• HercuLeS connects to external frontends via a simple
interface. Apart from test frontends, work is underway for
GCC/GIMPLE and clang/LLVM support
Insufficient, opaque representations, recording only partial
information
• Uses a universal typed-assembly language, called NAC, as an
intermediate representation
Maintenance difficulties; code and API bloat as longevity
threats
• Optimizations added as self-contained external modules
Mandating the use of code templates
• HercuLeS does not rely on code templates since it uses a
graph-based backend
Succumbing to vendor and technology dependence
• The generated HDL is human-readable and vendor- and
technology-independent
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
5. What’s in a name
HercuLeS: An extensible, high-level
synthesis (HLS) environment for
whole-program hardware compilation
with pluggable analyses and
optimizations
named after the homonymous constellation and not the mighty
but flawed demigod
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
6. The HercuLeS environment
HercuLeS is a new high-level synthesis tool marketed by
Ajax Compilers
Easy to use, extensible, high-level synthesis (HLS)
environment for whole-program hardware compilation
In development since 2009
HercuLeS targets both hardware and software
engineers/developers
• ASIC/SoC developers, FPGA-based/prototype/reference system
engineers
• Algorithm developers (custom HW algorithm
implementations)
• Application engineers (application acceleration)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
7. The HercuLeS flow
Optimized C code passed to GCC for GIMPLE generation
gimple2nac translates to N-Address Code (NAC) IR
HercuLeS nac2cdfg + cdfg2hdl
• nac2cdfg: SSA construction/CDFG extraction from NAC
• cdfg2hdl: automatic FSMD hardware and self-checking
testbench generation
Modular and extensible flow; support for the basic GMP
(multi-precision integer) API/DSL added in 24h (3 days)
• mpint.vhd: MP integer and SLV operators: 500 LOC, 12h
• gimple2nac extensions: 50 LOC, 4h
• HercuLeS additions: 150 LOC, 8h
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
8. Features
Automatic RTL VHDL code and testbench generation
Automatic user-defined IP integration
C subset frontend
HercuLeS GUI
Parallel operation scheduling with chaining optimizations
Arithmetic optimizations, register optimization, C source
code optimizer incl. array flattening optimizations
VHDL-2008 floating-point and fixed-point arithmetic support
GNU multi-precision integer extensions
C verification backend
GHDL/Modelsim support
HercuLeS GUI
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
9. How it works
The user supplies NAC for a translation unit and reference test data
nac2cdfg: translator from NAC to flat CDFGs; generates C backend files
and VHDL packages for compound data types
Each source procedure is represented by a CDFG
cdfg2hdl: maps CDFGs ( *.dot) to an extended FSMD MoC
All required scripts are automatically generated (GHDL/Modelsim
simulation, logic synthesis, backend C compilation)
Diagnostic simulation output; tracing of VHDL signals/C variables
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
10. NAC (N-Address Code)
NAC is a procedural intermediate language
• Extensible typed-assembly language similar in concept to
GCC’s GIMPLE and LLVM, but more generic and simple
• Arbitrary m-to-n mappings, virtual address space per array
• Statements: operations (atomic) and procedure calls
(non-atomic)
• Bit-accurate data types (integer, fixed-point,
single/double/custom floating-point arithmetic)
• Uses: RISC-like VM for static/dynamic analyses, CDFG
extraction, graph-based data flow analyses, input to HLS
kernels, software compilation
GCC GIMPLE: three-address code IR; actual semantics not
yet complete/stable; GCC code base huge and cluttered
LLVM IR: low-level IR, not directly usable as a machine
model; backward compatibility issues; provides access to a
modern optimization infrastructure
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
12. Example translation flow: 2D Euclidean distance
approximation (overview)
Approximating the euclidean distance of a point (x, y) from the origin by:
eda = MAX ((0.875 ∗ x + 0.5 ∗ y), x) where x = MAX (|a|, |b|) and
y = MIN (|a|, |b|)
√
Average error against (dist = a2 + b2 ) is 4.7% when compared to the
dist (rounded-down) and 3.85% to the dist (rounded-up) value
# define ABS(x)
((x) >0?(x):(-x))
# define MAX(x,y) ((x)>(y)?(x):(y))
# define MIN(x,y) ((x)<(y)?(x):(y))
int eda(int in1 , int in2) {
int t1 , t2 , t3 , t4 , t5;
int t6 , x, y;
t1 = ABS(in1 );
t2 = ABS(in2 );
x = MAX(t1 , t2);
y = MIN(t1 , t2);
t3 = x >> 3;
t4 = y >> 1;
t5 = x - t3;
t6 = t4 + t5;
return MAX(t6 , x);}
procedure eda (in s16 in1 ,
in s16 in2 , out u16 out1) {
localvar u16 x, y, t1 , t2 ,
t3 , t4 , t5 , t6 , t7;
S_1:
t1 <= abs in1;
t2 <= abs in2;
x <= max t1 , t2;
y <= min t1 , t2;
t3 <= shr x, 3;
t4 <= shr y, 1;
t5 <= sub x, t3;
t6 <= add t4 , t5;
t7 <= max t6 , x;
out1 <= mov t7;}
ANSI C
NAC IR
in1
The HercuLeS HLS environment
abs
t1 t1 t2 t2
3
max
3
min
x
y
shr
x
shr
t3
x
sub
t5
add
t6
max
t7
mov
out1
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
in2
abs
t4
1
1
13. Example translation flow: 2D Euclidean distance
approximation (VHDL code)
when S_ENTRY =>
ready <= ’1’;
if (start = ’1’) then
next_state <= S_001_001 ;
else
next_state <= S_ENTRY ;
end if;
when S_001_001 =>
if (in1 (15) = ’1’) then
t1_next <= slv(not( unsigned (in1 )) + "1");
else
t1_next <= in1;
end if;
if (in2 (15) = ’1’) then
t2_next <= slv(not( unsigned (in2 )) + "1");
else
t2_next <= in2;
end if;
next_state <= S_001_002 ;
when S_001_002 =>
if ( t1_reg > t2_reg ) then
x_next <= t1_reg ;
else
x_next <= t2_reg ;
end if;
if ( t1_reg < t2_reg ) then
y_next <= t1_reg ;
else
VHDL
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
y_next <= t2_reg ;
end if;
next_state <= S_001_003 ;
when S_001_003 =>
t3_next <= "000" & x_reg;
t4_next <= "0" & y_reg;
next_state <= S_001_004 ;
when S_001_004 =>
t5_next <= slv( unsigned (x_reg)
- unsigned ( t3_reg ));
next_state <= S_001_005 ;
when S_001_005 =>
t6_next <= slv( unsigned ( t4_reg )
+ unsigned ( t5_reg ));
next_state <= S_001_006 ;
when S_001_006 =>
if ( t6_reg > x_reg ) then
t7_next <= t6_reg ;
else
t7_next <= x_reg ;
end if;
next_state <= S_001_007 ;
when S_001_007 =>
out1_next <= t7_reg ;
next_state <= S_EXIT ;
when S_EXIT =>
done <= ’1’;
next_state <= S_ENTRY ;
VHDL (cont.)
The HercuLeS HLS environment
14. SSA (Single Static Assignment) form construction
Pseudo-statements called φ-functions join variable
definitions from different control-flow paths
Enforce a single definition site for each variable
False dependencies are naturally removed
Many analyses and optimizations are simplified
HercuLeS supports minimal SSA (in the number of φs) and
intrablock-only (pseudo) SSA
i = 123
j = i * j
i0
j0
i1
j1
PRINT(j)
t = j > 5
i2 = phi(i1, i4)
PRINT(j1)
t0 = j1 > 5
T
T
F
i = i + 1
T
=
=
= 123
= i1 * j0
F
i4 = i2 + 1
T
t1 = i4 <= 234
t = i <= 234
i$1
j$1
i =
j =
PRINT(j)
t0$1 = j > 5
T
F
i$1 = i + 1
i = i$1 + 1
T
t1$1 = i <= 234
F
F
i7 = phi(i4, i2)
F
Prior SSA
= 123
= i$1 * j
i$1
j$1
Minimal SSA
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
Pseudo SSA
The HercuLeS HLS environment
15. Introduction to the SSA (Static Single Assignment) form
SSA enforces a single def site for each variable
Pseudo-statements called φ-functions join variable
definitions from different control-flow paths
Many analyses and optimizations are simplified
HercuLeS supports minimal SSA (in the number of φs) and
pseudo-SSA algorithms
• Using a scan-based algorithm
φ-functions for each variable in each basic block (BB)
iterative removal of redundant φs
• Using pseudo-SSA form (kind of intrablock SSA)
No φ-insertion
Restore the unversioned variable names prior to the exit of
each BB
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
16. SSA construction example
i = 123
j = i * j
i0
j0
i1
j1
PRINT(j)
t = j > 5
i2 = phi(i1, i4)
PRINT(j1)
t0 = j1 > 5
T
T
F
i = i + 1
T
=
=
= 123
= i1 * j0
F
i4 = i2 + 1
T
t1 = i4 <= 234
t = i <= 234
i$1
j$1
i =
j =
PRINT(j)
t0$1 = j > 5
T
F
i$1 = i + 1
i = i$1 + 1
T
t1$1 = i <= 234
F
F
i7 = phi(i4, i2)
F
Prior SSA
= 123
= i$1 * j
i$1
j$1
Minimal SSA
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
Pseudo SSA
The HercuLeS HLS environment
18. Representing hardware as FSMDs
Port
clk
reset
start
din
dout
ready
I
I
O
O
valid
done
I/O interface
Dir.
I
I
O
O
Description
external clocking source
asynchronous (or synchronous) reset
enable computation
data inputs
data outputs
the block is ready to accept new input
a data output port is streamed out
end of computation for the block
FSMD (Finite-State Machine with Datapath) as a MoC is
universal, well-defined and suitable for either data- or
control-dominated applications
FSMDs FSMs with embedded datapath actions
HercuLeS supports extended FSMDs (hierarchical calls,
communication with on-chip memories, IP integration)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
19. Representing hardware as FSMDs (Finite-State Machine
with Datapaths)
FSMD as a MoC is universal, well-defined and suitable for
either data- or control-dominated applications
HercuLeS generates extended FSMD architectures
FSMDs are FSMs with embedded datapath actions within
the next state generation logic
HercuLeS FSMDs use fully-synchronous conventions and
register all their outputs
Array data ports are supported; multi-dimensional data ports
are feasible based on their equivalent single-dimensional
flattened array type definition
Use of din: in std_logic_vector(M*N-1 downto 0);
for M related ports of width N
A selection of the form din((i+1)*N-1 downto i*N) is
typical for a for-generate loop in order to synthesize
iterative structures
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
20. Basic FSMD I/O interface
clk
reset
start
din
dout
ready
valid
signal from external clocking source
asynchronous (or synchronous) reset
enable computation
data inputs
data outputs
the block is ready to accept new input
asserted when a certain data output port is streamed-out
from the block (generally it is a vector)
done end of computation for the block
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
21. Hierarchical calls between FSMDs
Supported to arbitrary depth
and complexity (apart from
recursion)
Example of a caller FSMD,
handing over computation to
callee superstate (a square root
computation)
Variable-related quantities are
represented by three signals:
*_next (value to-be-written),
*_reg (value currently read
from register), *_eval (callee
output)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
22. Communication with embedded memories
load Requires a wait-state register for devising a dual-cycle
substate (address + data cycles)
store Raises block RAM write. Stored data are made available in
the subsequent machine cycle
store
load
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
23. Communication with embedded memories
Only two memory communication primitives are needed in
NAC: load and store
load Requires a wait-state register to devise a dual-cycle substate
(address + data cycles)
store Raises BRAM write. Stored data are made available in the
subsequent machine cycle
when STATE_1 =>
mem_addr <= index;
wstate_next <= not ( wstate_reg );
if ( wstate_reg = ’1’) then
mysignal_next <= mem_dout ;
next_state <= STATE_2 ;
else
next_state <= STATE_1 ;
end if;
when STATE_2 =>
...
when STATE_1 =>
mem_we <= ’1’;
mem_addr <= index;
mem_din <= mysignal_reg ;
next_state <= STATE_2 ;
when STATE_2 =>
...
store example
load example
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
24. Operation chaining
Assign multiple data-dependent operations to a single
control step
Simple means for selective operation chaining involve
merging ASAP states to compound states
Intermediate registers are eliminated
Basic block partioning heuristic for critical path reduction
when S_1_3 =>
t3_next <= "000"& x_reg (15 downto 3);
t4_next <= "0"& y_reg (15 downto 1);
next_state <= S_1_4;
when S_1_4 =>
t5_next <= x_reg - t3_reg ;
next_state <= S_1_5;
when S_1_5 =>
t6_next <= t4_reg + t5_reg ;
next_state <= S_1_6;
...
when S_1_7 =>
out1_next <= t7_reg ;
next_state <= S_EXIT ;
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
when S_1_1 =>
...
t3_next <= "000"& x_next (15 downto 3);
t4_next <= "0"& y_next (15 downto 1);
t5_next <= x_next - t3_next ;
t6_next <= t4_next + t5_next ;
...
out1_next <= t7_next ;
next_state <= S_EXIT ;
The HercuLeS HLS environment
25. Automatic IP integration
IPs Third-party components used in hardware systems (e.g.
dividers, floating-point operators)
How to import and use your own IP
1
2
3
4
Implement IP and place in proper subdirectory
Add entry in text database
Replace operator uses by black-box function calls
HercuLeS creates a hierarchical FSMD with the requested
callee(s)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
26. The optimization space of HercuLeS
constant multiplication optimization
constant division optimization
optimization of polynomial expressions
low-level superoptimizations
Arithmetic
partial loop unrolling
full loop unrolling
loop coalescing
strip mining
Loop-oriented
source language
Restructuring
array flattening
statement vectorizer
N-Address Code
SSA construction
if-conversion
peephole transformations
basic block partitioning
black box function insertion
GNU MP extensions
level
Graphviz CDFGs
VHDL
sequential scheduling
ASAP scheduling
dead node elimination
hardware interpretation of SSA form
synchronous embedded memories (block RAMs)
operation chaining
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
27. Optimizations
Optimizers are implemented as external modules
ANSI/ISO C optimizations
• Basic loop optimizations: strip mining, loop coalescing,
partial and full loop unrolling
• Syntactical transformations among iteration schemes
• Arithmetic optimizations
Constant multiplication/division/modulo
Optimization of linear systems
Univariate polynomial evaluation: Horner scheme, Estrin
scheme
Multivariate polynomial evaluation: parallelized, optimized,
brute-force variants
NAC optimizations
• Single constant multiplication/division; peephole
optimizations; use of superoptimized operation sequences
Graphviz/CDFG optimizations, e.g. redundancy removal
VHDL optimizations, e.g. operation chaining
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
28. Optimizations at the C level
Source-level optimizer targets speed improvement via generic
code restructuring and loop-specific optimizations
Implemented as TXL transformations
Transformation
bump
extension
reduction
reversal
normalization
fusion
coalescing
unswitching
strip mining
Description
Alter loop boundaries by an offset
Extend loop boundaries
Reverse the effect of extension
Reverse iteration direction
Convert arbitrary to well-behaved loops
Merges bodies of successive loops
Nested loops into single loop
Move invariant control code
Single loop tiling
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
Params
offset, step
lo, hi
–
–
–
–
–
–
tilesize
The HercuLeS HLS environment
29. Optimizations at the NAC level
Examples include if-conversion and arithmetic optimizations
(constant multiplication and division)
Integer constant division: replace a variable divider by
custom circuit
Use multiplicative inverse followed by a number of
compensation steps
Requires 2-9 states (amortized maximum)
Can be further optimized by operation chaining and
replacing the 64-bit (long long) constant multiply
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
30. Example: Prime factorization (1/4)
A streaming-output implementation of prime factorization
void pfactor ( unsigned int x,
unsigned int *outp)
{
unsigned int i=2, n=x;
while (i <= n) {
while ((n % i) == 0) {
n = n / i;
*outp = i;
}
i = i + 1;
}
}
ANSI C
procedure pfactor (in u32 x, out u32 outp) {
localvar u32 D_1369 , i, n;
L0005 :
n <= mov x;
i <= ldc 2;
D_1366 <= jmpun ;
D_1365 :
D_1363 <= jmpun ;
D_1362 :
(n) <= divu(n, i);
outp <= mov i;
D_1363 <= jmpun ;
D_1363 :
( D_1369 ) <= modu(n, i);
D_1362 , D_1364 <= jmpeq D_1369 , 0;
D_1364 :
i <= add i, 1;
D_1366 <= jmpun ;
D_1366 :
D_1365 , D_1367 <= jmple i, n;
D_1367 :
nop;
}
NAC
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
31. Example: Prime factorization (2/4)
2
2
BB1
ldc(1)
i_1
U
1
mov(2)
BB2
T
1
i_2
add(14)
F
i_2
BB3
BBX
TU
U
F
jmpun(13)
i_2
F
U
0
BB4
i_2 i_6
i_2
mov(15)
i_2
i_2
mod(7)
mov(11)
BB5
0 t0_4 i_2
outp
jmpeq(9)
CFG
outp
T
div(10)
x
n_5 n_3
n_3
mov(12)
n_3
jmpun(4)
x
n_1
mov(8)
U
mov(3)
n_3 n_2
mov(5)
i_2
T
mov(0)
n_3
jmpun(16)
U
i_2
n_3
n_2
n_2
n_2
jmple(6)
F
nop(17)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
CDFG
32. Example: Prime factorization (3/4)
when S_002_001 =>
modu_10_start <= ’1’;
next_state <= S_004_001 ;
when S_003_001 =>
if ( divu_6_ready =’1’ and
divu_6_start =’0’) then
n_1_next <= n_1_eval ;
next_state <= S_003_002 ;
else
next_state <= S_003_001 ;
end if;
...
when S_004_001 =>
if ( modu_10_ready =’1’ and
modu_10_start =’0’) then
D_1369_1_next <= D_1369_1_eval ;
next_state <= S_004_002 ;
else
next_state <= S_004_001 ;
end if;
when S_004_002 =>
if ( D_1369_1_reg = CNST_0 ) then
divu_6_start <= ’1’;
next_state <= S_003_001 ;
else
next_state <= S_005_001 ;
end if;
...
divu_6 : entity WORK.divu(fsmd)
generic map (W => 32)
port map (
clk ,
reset ,
divu_6_start ,
n_reg ,
i_reg ,
n_1_eval ,
divu_6_done ,
divu_6_ready );
modu_10 : entity WORK.modu(fsmd)
generic map (W => 32)
port map (
clk ,
reset ,
modu_10_start ,
n_reg ,
i_reg ,
D_1369_1_eval ,
modu_10_done ,
modu_10_ready );
VHDL (cont.)
VHDL
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
33. Example: Prime factorization (4/4)
00000004
00000005
00000006
00000007
00000008
00000009
0000000 a
00000002
00000005
00000002
00000007
00000002
00000003
00000002
x =00000004
x =00000004
PFACTOR OK:
x =00000005
PFACTOR OK:
x =00000006
x =00000006
PFACTOR OK:
x =00000007
PFACTOR OK:
x =00000008
x =00000008
x =00000008
PFACTOR OK:
x =00000009
x =00000009
PFACTOR OK:
x =0000000 A
x =0000000 A
00000002
00000003
Input vector data
00000002 00000002
00000003
00000005
outp =00000002 outp_ref =00000002
outp =00000002 outp_ref =00000002
Number of cycles =212
outp =00000005 outp_ref =00000005
Number of cycles =265
outp =00000002 outp_ref =00000002
outp =00000003 outp_ref =00000003
Number of cycles =256
outp =00000007 outp_ref =00000007
Number of cycles =353
outp =00000002 outp_ref =00000002
outp =00000002 outp_ref =00000002
outp =00000002 outp_ref =00000002
Number of cycles =291
outp =00000003 outp_ref =00000003
outp =00000003 outp_ref =00000003
Number of cycles =256
outp =00000002 outp_ref =00000002
outp =00000005 outp_ref =00000005
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
Diagnostic output
The HercuLeS HLS environment
34. Example: Multi-function CORDIC
Universal multi-function CORDIC IP core automatically generated from
NAC
Supports all directions (ROTATION, VECTORING) and modes
(CIRCULAR, LINEAR, HYPERBOLIC); uses Q2.14 fixed-point arithmetic
I/O interface similar to Xilinx CORDIC ( xin, yin, zin; xout, yout,
zout; dir, mode)
√
√
Computes cos(xin ), sin(yin ), arctan(yin /xin ), yin /xin , w, 1/ w with
√
xin = w + 1/4, yin = w − 1/4 (in two stages: a. y = 1/w, b. z = y)
Monolithic design, low area, requires external scaling (e.g. for the square
root)
Self-checking testbench autogenerated
Scheduler: ASAP + chaining w/o BB partitioning
Lines-of-code: C
Design
cordic1cyc
cordic5cyc
29, NAC
56, Graphviz
Description
1-cycle/iteration; uses asynchronous read
LUT RAM
5-cycles/iteration; uses synchronous read
(Block) RAM
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
178, VHDL
436
Max. freq.
204.5
Area (LUTs)
741
271.5
571, 1 BRAM
The HercuLeS HLS environment
35. Excerpt from ANSI C implementation of multi-function
CORDIC
Hand-optimized by Nikolaos Kavvadias
void cordicopt (dir , mode , xin , yin , zin , *xout , *yout , *zout) {
...
x = xin; y = yin; z = zin;
offset = (( mode == HYPER) ? 0 : (( mode == LIN) ? 14 : 28));
kfinal = (( mode != HYPER) ? CNTAB : CNTAB +1);
for (k = 0; k < kfinal ; k++) {
d = (( dir == ROTN) ? ((z >=0) ? 0 : 1) : ((y <0) ? 0 : 1));
kk = (( mode != HYPER) ? k :
cordic_hyp_steps [k]);
xbyk = (x>>kk);
ybyk = (( mode == HYPER) ? -(y>>kk) : (( mode == LIN) ? 0 :
(y>>kk )));
tabval = cordic_tab [kk+ offset ];
x1 = x - ybyk; x2 = x + ybyk;
y1 = y + xbyk; y2 = y - xbyk;
z1 = z - tabval ; z2 = z + tabval ;
x = ((d == 0) ? x1 : x2);
y = ((d == 0) ? y1 : y2);
z = ((d == 0) ? z1 : z2 );}
*xout = x; *yout = y; *zout = z;}
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
36. Multi-function CORDIC VHDL (partial)
architecture fsmd of cordicopt is
type state_type is (S_ENTRY , S_EXIT , S_001 , S_002 , S_003 , S_004 );
signal current_state , next_state : state_type ;
-- Scalar , vector signals and constants
...
begin
process (*)
begin
...
case current_state is ...
when S_003 =>
t1_next <= cordic_hyp_steps ( to_integer ( unsigned (k_reg (3 downto 0))));
if ( lmode_reg /= CNST_2 (15 downto 0)) then
kk_next <= k_reg (15 downto 0);
else
kk_next <= t1_next (15 downto 0);
end if;
t2_next <= shrv4 (y_reg , kk_next , ’1’);
...
x1_next <= slv( signed (x_reg) - signed ( ybyk_next (15 downto 0)));
y1_next <= slv( signed (y_reg) + signed ( xbyk_next (15 downto 0)));
z1_next <= slv( signed (z_reg) - signed ( tabval_next (15 downto 0)));
...
end process ;
zout <= zout_reg ;
yout <= yout_reg ;
xout <= xout_reg ;
end fsmd;
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
37. Floating-point example: lerp – linear interpolation (1/2)
lerp(t, a, b) = a ∗ (1. − t) + b ∗ t = a + t(b − a) calculates a
number between two numbers a, b at a specific relative
increment 0.0 ≤ t ≤ 1.0
Used in computer graphics for drawing dotting lines and in
Perlin noise functions (for terrain generation)
x
# define LERP(t,a,b) (a+t*(b-a))
double lerp( double t,
double x, double y) {
double temp;
temp = LERP(t, x, y);
return (temp );
}
ANSI C
procedure lerp(in f1 .11.52 t,
in f1 .11.52 x,
in f1 .11.52 y,
out f1 .11.52 D_1365 ) {
localvar f1 .11.52 D_1363 ;
localvar f1 .11.52 D_1364 ;
localvar f1 .11.52 temp;
L0005:
D_1363 <= sub y,x;
D_1364 <= mul D_1363 ,t;
temp <= add D_1364 ,x;
D_1365 <= mov temp;
}
NAC IR
y
x
y
sub
x
D_1363 t
mul
D_1364
add
temp
mov
D_1365
D_1365
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
t
38. Floating-point example: lerp – linear interpolation (2/2)
entity lerp is
port (
clk : in std_logic ;
reset : in std_logic ;
start : in std_logic ;
t : in float64 ;
x : in float64 ;
y : in float64 ;
D_1365 : out float64 ;
done : out std_logic ;
ready : out std_logic
);
end lerp;
VHDL I/O interface
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
when S_001_001 =>
D_1363_next <= subtract (y, x);
next_state <= S_001_002 ;
when S_001_002 =>
D_1364_next <= multiply (D_1363_reg , t);
next_state <= S_001_003 ;
when S_001_003 =>
temp_next <= add(D_1364_reg , x);
next_state <= S_001_004 ;
when S_001_004 =>
D_1365_next <= temp_reg (11 downto -52);
next_state <= S_EXIT ;
VHDL FSMD excerpt
The HercuLeS HLS environment
39. Design space exploration configurations
Explore different choices both in frontend translation (e.g.
SSA construction) and hardware optimization
Numerous configurations are possible
The following configuration sets will be used
O1 sequential scheduling
O2 ASAP scheduling using SSA
O3 O2 with operation chaining (collapsing dependent operations
to a single state)
O4 O3 with pseudo-SSA construction
O5 O3 with preserving φ functions
O6 O3 with block RAM inference
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
40. Fibonacci series example: Introduction
Fibonacci series computation is
defined as
F(n) =
0
1
F(n − 1) + F(n − 2)
n=0
n=1
n>1
Three iterative variants
A Addition and subtraction in the
main loop
B Addition with one more
temporary
C Addition with an in-situ
register swap
uint32 fibo( uint32 x) {
uint32 f0=0, f1=1, k=2;
#ifdef B
uint32 f;
#endif
do {
k = k + 1;
#if A
f1 = f1 + f0;
f0 = f1 - f0;
#elif B
f = f1 + f0;
f0 = f1;
f1 = f;
#elif C
f0 = f1 + f0;
SWAP(f0 , f1 );
#endif
} while (k <= x);
#if A || B
return (f1);
#else
return (f0);
#endif
}
C code
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
41. Fibonacci series example: Machine cycles
Design
A0
A1
A2
A3-A5
Cycles
n
4n + 3
4n + 2
n+2
Design
B0
B1
B2
B3-B5
Cycles
n
5n + 2
4n + 2
n+2
Design
C0
C1
C2
C3-C5
Cycles
2n − 1
7n + 1
7n
2(n + 1)
Hand-optimized designs are A0, B0, C0
The benefit of ASAP is not significant due to the data
dependencies in the algorithm
Cycle reduction is achieved through operation chaining
HercuLeS can closely match the result of a human expert for
optimization schemes O3-O5
The slight differences in cycle performance are due to
specific design choices of the human expert
• initializing f 0, f 1 and k in the FSMD entry state
• passing the output data argument without use of an
intermediate register
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
42. Fibonacci series example: Execution time vs LUTs
Better results are placed near the bottom-left corner
Pareto-optimal designs by human expert: B0 and C0
Pareto-optimal designs by HercuLeS: B2, B3, B4
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
43. Fibonacci series example: Execution time vs Registers
Better results are placed near the bottom-left corner
Pareto-optimal designs by human expert: A0
Pareto-optimal designs by HercuLeS: A1 and B3
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
44. Benchmark results: Speed
Average computation time is reduced by 44.3% when comparing O1 to O3
This gain is limited to 37.3% for O6 due to block RAM timing
float2half and half2float achieve up to 4× execution time reduction
Maximum operating frequencies in the range of 119-450MHz
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
45. Benchmark results: Area
O1 generates (slower and) smaller hardware in terms of LUTs and registers
O3 introduces the highest LUT requirements; O2 the highest register demand
Registers are reduced by 17.5% among O2 and O6; LUTs by 14% among O3 and O6
Block RAM inference leads to significant LUT/register area reduction (smwat)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
47. Against competition
New frontends, analyses and optimizations are easy to add
Maps character I/O and malloc/free to efficient hardware
Uses open IRs and formats (GIMPLE, NAC, Graphviz)
Vendor and technology-independent HDL code generation
Preliminary results against Vivado HLS 2013.1
Benchmark
Array sum
Bit reversal
Edge detection∗
Fibonacci series
FIR filter
Greatest common
divisor
Cubic root approx.
Population count
Prime sieve∗
Sierpinski triangle
LUTs
102
67
246
138
102
210
239
45
525
88
Vivado HLS
Regs
Time (ns)
132
26.5
39
72.0
130
1636.3
131
60.2
52
833.4
98
35.2
207
65
595
163
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
260.6
19.4
6108.4
11326.5
LUTs
103
42
680
137
217
128
365
53
565
230
HercuLeS
Regs
Time (ns)
63
73.3
40
11.6
361
1606.4
197
102.7
140
2729.4
93
75.9
201
102
523
200
The HercuLeS HLS environment
400.5
26.1
3869.5
16224.9
48. The HercuLeS GUI
Bundled with HercuLeS v1.0.0 (2013a) released on June 30
Specify code generation, simulation and synthesis options
Includes embedded results viewer
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
49. Summary
HercuLeS is an extensible HLS environment for hardware
and software engineers
Straightforward to use from ANSI C, generic assembly
(NAC) or custom DSLs (Domain Specific Languages) to
producing compact VHDL designs with competitive QoR
Product information
• HercuLeS GUI for specifying code generation, simulation and
synthesis options
• Commercial distribution: http://www.ajaxcompilers.com
• Technical details: http://www.nkavvadias.com/hercules
• FREE, BASIC, and FULL software licensing schemes
(2013.a version)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
51. fibo.c: Iterative computation of the Fibonacci series
Setup the environment: source env-hercules-lin.sh
Inspect input ANSI C code: fibo function, main() used for
data vector generation
Compile C code; inspect fibo_test_data.txt
Use the run-fibo.sh script
After the simulation ends
End-upon-assertion
Inspect diagnostic output fibo_alg_test_results.txt
Inspect generated NAC and Graphviz CDFG
Visualize CFG and CDFG with gwenview
Inspect generated VHDL code: code size, naming
conventions, readability of code
• Inspect waveform data ( fibo_fsmd.ghw)
•
•
•
•
•
Open discussion (e.g. generation settings, more
optimizations, comparison to manual design, etc)
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
52. easter.c: Easter Sunday calculations
Easter Sunday calculations based on Gauss’ algorithm
No need to have any knowledge of the algorithm!
Compile C code; inspect easter_test_data.txt
Use the run-easter.sh script
After the simulation ends
•
•
•
•
•
Inspect generated NAC
Inspect generated VHDL code
Automatic IP integration for modulo
Automatic optimization of constant multiplications
Inspect results; did we get last year’s Easter date right?
Open discussion
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
53. fir.c: Finite Impulse Response filter
Small FIR example
Reads data from single-dimensional arrays
Compile C code; inspect fir_test_data.txt
Use the run-fir.sh script
After the simulation ends
• Inspect generated VHDL code: array access not to block
RAM
• Alter HercuLeS generation script ( run-fir.sh) to enable
block RAM inference ( -blockmem -read-first)
• Reiterate
Open discussion
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
54. FPGA demo 1: PLOTLINE IP
Demo IP that will be offered for free from Ajax Compilers
web store
Synthesizable RTL model of Bresenham’s line drawing
algorithm
Integer version of the algorithm using addition, subtraction,
bit shifting, comparison, absolute value and conditional
moves
Consists of a single loop that keeps track of the propagated
error over the x- and y-axis, which is then used for
calculating the subsequent pixel on the line
The loop terminates when the end point is reached
Invoke IMPACT and upload plotline_system.bit to the
Spartan-3AN Starter Kit board
Dimensions: 320x240, upscaled by x2 to VGA
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment
55. FPGA demo 2: Pixel generation function
Maps the masked value of f (x, y) to the corresponding pixel
without using any memory
Generates either grey (same function) or colored (different
function per chromatic component) images
f (x , y ) =
R ← x,
R ← x,
R ← x,
R ← x,
R ← x2 + y 2 ,
R ← x + y,
R ← x2 − y 2 ,
R ← max(x, y),
G ← y,
G ← x + y,
G ← x ∗ y,
G ← y,
G ← x2 + y 2 ,
G ← x − y,
G ← x2 − y 2 ,
G ← max(x, y),
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
B ← x ⊕ y,
B ← x + y,
B ← x ∗ y,
B ← 0,
B ← x2 + y 2 ,
B ← x ⊕ y,
B ← x2 − y 2 ,
B ← max(x, y),
The HercuLeS HLS environment
mode == 0
mode == 1
mode == 2
mode == 3
mode == 4
mode == 5
mode == 6
mode == 7
56. Thank you
for your interest
Nikolaos Kavvadias nkavvadias@ajaxcompilers.com
The HercuLeS HLS environment