This document describes the Taylor Decomposition System (TDS) data structures and optimization flow. TDS contains three main data structures - the Taylor Expansion Diagram (TED) which represents the algorithmic behavior, the Data Flow Graph (DFG) which visualizes the TED, and the Netlist (NTL) which interfaces with high-level synthesis tools. TDS takes in C/C++ designs through tools like GAUT, performs optimizations on the TED and DFG, and outputs optimized netlists to the synthesis tools for further processing and output of HDL code.
This document summarizes the statutory requirements and procedures for e-filing TDS returns in India. It outlines that the Finance Act of 2003 and 2004 made it mandatory for corporate and government deductors to file their TDS returns electronically. It then describes the forms and periodicity for filing different TDS returns, the process for preparing and validating e-TDS returns using the File Validation Utility, and the steps for final filing of e-TDS returns at TIN Facilitation Centers.
This document provides information about generating Form 24Q using the greytHR payroll software. It discusses what TDS is, who must file eTDS returns, and the related Forms 24Q and 27A. It then demonstrates how to generate Form 24Q within greytHR, including locking the payroll, adding challan details, allocating amounts to employees, generating the FVU file, and handling errors. It also provides tips for registering with NSDL for eTDS filing and doing revised returns.
The document discusses different types of information systems used in business applications. It describes operational systems like transaction processing systems, office automation systems, and process control systems. It also covers management information systems including decision support systems, information reporting systems, and executive information systems. Finally, it discusses functional applications of information systems in areas like procurement, human resources, marketing, and finance.
Final project comparison between various version of tallyDeep Srivastava
This document provides a summary of the new features introduced in different versions of Tally accounting software. It begins with a brief comparison of Tally versions 4.5, 5.4, 6.3, 7.2, 8.1 and 9. It then describes in more detail some of the major new features introduced in Tally 6.3 like Reorder Levels, Aged Stock Analysis, Scenario Management, Tally ODBC connectivity and e-capabilities. Finally, it mentions additional features added in subsequent Tally 6.3 releases.
Systemic approach towards enterprise functional decompositionDmitry Kudryavtsev
Kudryavtsev, D., & Grigoriev, L. (2011). Systemic approach towards enterprise functional decomposition. The proceedings of the Workshop “Convergence of Business Architecture, Business Process Architecture, Enterprise Architecture and Service Oriented Architecture” within the 13-th IEEE Conference on Commerce and Enterprise Computing (CEC), September 5-7, 2011, Luxemborg. P. 310-317.
The document summarizes key aspects of relational database design including:
1) Features of good relational design such as atomic domains and normal forms.
2) The use of functional dependencies to test relations, specify constraints, and imply new dependencies.
3) Boyce-Codd normal form which determines if a relation is properly normalized based on its functional dependencies.
The document discusses subsystems and types of management information systems (MIS). There are two approaches to defining subsystems: organisational function and activity. Organisational function subsystems support marketing, production, personnel, etc. Activity subsystems support transactions, operations, etc. Common MIS subsystems include transaction processing, production, marketing, and personnel. Types of systems include operations support systems, management support systems like management information systems and decision support systems, and executive information systems.
This document summarizes the statutory requirements and procedures for e-filing TDS returns in India. It outlines that the Finance Act of 2003 and 2004 made it mandatory for corporate and government deductors to file their TDS returns electronically. It then describes the forms and periodicity for filing different TDS returns, the process for preparing and validating e-TDS returns using the File Validation Utility, and the steps for final filing of e-TDS returns at TIN Facilitation Centers.
This document provides information about generating Form 24Q using the greytHR payroll software. It discusses what TDS is, who must file eTDS returns, and the related Forms 24Q and 27A. It then demonstrates how to generate Form 24Q within greytHR, including locking the payroll, adding challan details, allocating amounts to employees, generating the FVU file, and handling errors. It also provides tips for registering with NSDL for eTDS filing and doing revised returns.
The document discusses different types of information systems used in business applications. It describes operational systems like transaction processing systems, office automation systems, and process control systems. It also covers management information systems including decision support systems, information reporting systems, and executive information systems. Finally, it discusses functional applications of information systems in areas like procurement, human resources, marketing, and finance.
Final project comparison between various version of tallyDeep Srivastava
This document provides a summary of the new features introduced in different versions of Tally accounting software. It begins with a brief comparison of Tally versions 4.5, 5.4, 6.3, 7.2, 8.1 and 9. It then describes in more detail some of the major new features introduced in Tally 6.3 like Reorder Levels, Aged Stock Analysis, Scenario Management, Tally ODBC connectivity and e-capabilities. Finally, it mentions additional features added in subsequent Tally 6.3 releases.
Systemic approach towards enterprise functional decompositionDmitry Kudryavtsev
Kudryavtsev, D., & Grigoriev, L. (2011). Systemic approach towards enterprise functional decomposition. The proceedings of the Workshop “Convergence of Business Architecture, Business Process Architecture, Enterprise Architecture and Service Oriented Architecture” within the 13-th IEEE Conference on Commerce and Enterprise Computing (CEC), September 5-7, 2011, Luxemborg. P. 310-317.
The document summarizes key aspects of relational database design including:
1) Features of good relational design such as atomic domains and normal forms.
2) The use of functional dependencies to test relations, specify constraints, and imply new dependencies.
3) Boyce-Codd normal form which determines if a relation is properly normalized based on its functional dependencies.
The document discusses subsystems and types of management information systems (MIS). There are two approaches to defining subsystems: organisational function and activity. Organisational function subsystems support marketing, production, personnel, etc. Activity subsystems support transactions, operations, etc. Common MIS subsystems include transaction processing, production, marketing, and personnel. Types of systems include operations support systems, management support systems like management information systems and decision support systems, and executive information systems.
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
This document discusses various techniques for identifying performance limiters in GPU code using CUDA. It recommends timing different parts of code, profiling to collect metrics and events, prototyping kernel parts separately, and benchmarking hardware characteristics. It provides examples of measuring wall time and GPU time. It also lists common profiling events, metrics, and discusses a case study of profiling a matrix transpose. The document emphasizes that profiling helps verify assumptions and identify bottlenecks, but does not replace optimization work.
The document provides instructions for processing NMR diffusion data to obtain translational diffusion constants and hydrodynamic radii. It describes:
1) How to collect the data using pulse sequences to increment the gradient strength in a pseudo-2D experiment.
2) How to convert and process the data in nmrPipe, including extracting individual repeats.
3) How to analyze the processed data using nmrPipe's DOSY viewer or Brian Volkman's dsFit1D.tcl script to fit curves and calculate self-diffusion coefficients.
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
This document describes data flow diagrams and Jackson Structured Programming. It provides details on how to construct DFDs, including leveled DFDs for large systems. It explains how DFDs differ from flowcharts by focusing on data flow rather than control flow. The document also provides an example DFD for a payroll system. It then describes Jackson Structured Programming and how to develop the data structure diagram, program structure diagram, and list operations and conditions. An example JSP is provided for an accounting processing system.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. OW2
The aim of the Sat4j library is to solve Boolean satisfaction and optimization problems. Those problems have received considerable attention in the last two decades, mainly due to its use in hardware verification.Sat4j started as a research project to experiment ideas about while providing an efficient Boolean reasoning engine to the Java community.
Introductory Overview to Managing AWS with TerraformMichael Heyns
The document provides an overview of Terraform including:
- Terraform is an open source tool from HashiCorp that allows defining and provisioning infrastructure in a code-based declarative way across multiple cloud platforms and services.
- Key concepts include providers that define cloud resources, configuration files that declare the desired state, and a plan-apply workflow to provision and manage infrastructure resources.
- Common Terraform commands are explained like init, plan, apply, destroy, output and their usage.
Consistency, Availability, Partition: Make Your ChoiceAndrea Giuliano
Shared data systems try hardly to satisfy data consistency, system availability and tolerance to network partitions.
In a distributed system it is impossible to simultaneously provide all these guarantees at any given moment in time.
The purpose of the talk is to show the mechanism used by data storage systems such as Dynamo and BigTable in order to satisfy two guarantees at a time.
[Globant summer take over] Empowering Big Data with CassandraGlobant
Mar del Plata Summer Take Over Presentation 2016 - By Renato Carelli
DevOps + Infra @ Big Data
Hardening Enthusiast
Cloud evangelist
Bitcoin speculator
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios
Rob Seiwert's presentation on Graphing and Trend Prediction in Nagios.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Operating and Supporting Delta Lake in ProductionDatabricks
The document discusses strategies for optimizing and managing metadata in Delta Lake. It provides an overview of optimize, auto-optimize, and optimize write strategies and how to choose the appropriate strategy based on factors like workload, data size, and cluster resources. It also discusses Delta Lake transaction logs, configurations like log retention duration, and tips for working with Delta Lake metadata.
This slides describes the basic concepts of industrial-strength compiler design. This includes basic concept of static single-assignment form (SSA) and various optimizations such as dead code elimination, global value numbering, constant propagation, etc. This is intend for a 150 minutes undergraduate compiler class.
The document provides instructions for customizing and building a new Android device called Marakana Alpha. It describes generating custom platform signing keys, building the initial device code, and adding optional components like a custom kernel. Key steps include registering the new device in the build system, generating signing keys, compiling the code, and running it on an emulator or device. The goal is to remix Android by building a customized version with new features while maintaining compatibility.
GridSQL is an open source distributed database built on PostgreSQL that allows it to scale horizontally across multiple servers by partitioning and distributing data and queries. It provides significantly improved performance over a single PostgreSQL instance for large datasets and queries by parallelizing processing across nodes. However, it has some limitations compared to PostgreSQL such as lack of support for advanced SQL features, slower transactions, and need for downtime to add nodes.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
This document discusses different methods of parallelism for data warehousing including data parallelism, temporal parallelism (pipelining), and their advantages. Data parallelism involves executing a single query across partitions of data using multiple query servers coordinated by a query coordinator. Temporal parallelism breaks a task into independent subtasks that can execute concurrently in a pipeline. Pipelining aims to increase throughput rather than decrease subtask time. The document also covers partitioning strategies like round robin, hash, and range partitioning and their suitability for different types of queries.
We all have to deal with the scarcest resource in development teams: time. This often leads to compromises and shortcuts when writing code. The result is what is called technical debt: the difference between how the code should have been and how it is.
Strategies for managing technical debt are many but all require an investment of time and energy, proportional to the amount of debt to be repaid.
Is this really the only way?
Rector is a tool capable of automating a wide range of refactorings, allowing us to speed up the most tedious manual operations. In this talk we will see how, exploring
- rector's basic principles
- predefined rules, configurable rules, sets of rules
- how to write custom rules
The document provides an overview of FPGA routing, which is an important step in the CAD process that connects logic blocks placed on the FPGA. It discusses the routing resources in Xilinx FPGAs including connection boxes, switch boxes, and wire segments. It also describes the FPGA routing model commonly used in academia, which simplifies the island-style architecture of commercial FPGAs. Efficient routing aims to minimize wiring area and critical path lengths to improve circuit performance.
This document describes the design of a softcore microcontroller called UMASS-core that is implemented in an FPGA. The UMASS-core is based on the Microchip PIC16F84 8-bit microcontroller architecture but adds some additional features. It is written in VHDL and synthesized for a Spartan3 FPGA. The goal is to provide designers flexibility to customize the microcontroller configuration while maintaining compatibility with PIC16F84 assembly code.
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
This document discusses various techniques for identifying performance limiters in GPU code using CUDA. It recommends timing different parts of code, profiling to collect metrics and events, prototyping kernel parts separately, and benchmarking hardware characteristics. It provides examples of measuring wall time and GPU time. It also lists common profiling events, metrics, and discusses a case study of profiling a matrix transpose. The document emphasizes that profiling helps verify assumptions and identify bottlenecks, but does not replace optimization work.
The document provides instructions for processing NMR diffusion data to obtain translational diffusion constants and hydrodynamic radii. It describes:
1) How to collect the data using pulse sequences to increment the gradient strength in a pseudo-2D experiment.
2) How to convert and process the data in nmrPipe, including extracting individual repeats.
3) How to analyze the processed data using nmrPipe's DOSY viewer or Brian Volkman's dsFit1D.tcl script to fit curves and calculate self-diffusion coefficients.
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
This document describes data flow diagrams and Jackson Structured Programming. It provides details on how to construct DFDs, including leveled DFDs for large systems. It explains how DFDs differ from flowcharts by focusing on data flow rather than control flow. The document also provides an example DFD for a payroll system. It then describes Jackson Structured Programming and how to develop the data structure diagram, program structure diagram, and list operations and conditions. An example JSP is provided for an accounting processing system.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. OW2
The aim of the Sat4j library is to solve Boolean satisfaction and optimization problems. Those problems have received considerable attention in the last two decades, mainly due to its use in hardware verification.Sat4j started as a research project to experiment ideas about while providing an efficient Boolean reasoning engine to the Java community.
Introductory Overview to Managing AWS with TerraformMichael Heyns
The document provides an overview of Terraform including:
- Terraform is an open source tool from HashiCorp that allows defining and provisioning infrastructure in a code-based declarative way across multiple cloud platforms and services.
- Key concepts include providers that define cloud resources, configuration files that declare the desired state, and a plan-apply workflow to provision and manage infrastructure resources.
- Common Terraform commands are explained like init, plan, apply, destroy, output and their usage.
Consistency, Availability, Partition: Make Your ChoiceAndrea Giuliano
Shared data systems try hardly to satisfy data consistency, system availability and tolerance to network partitions.
In a distributed system it is impossible to simultaneously provide all these guarantees at any given moment in time.
The purpose of the talk is to show the mechanism used by data storage systems such as Dynamo and BigTable in order to satisfy two guarantees at a time.
[Globant summer take over] Empowering Big Data with CassandraGlobant
Mar del Plata Summer Take Over Presentation 2016 - By Renato Carelli
DevOps + Infra @ Big Data
Hardening Enthusiast
Cloud evangelist
Bitcoin speculator
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios
Rob Seiwert's presentation on Graphing and Trend Prediction in Nagios.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Operating and Supporting Delta Lake in ProductionDatabricks
The document discusses strategies for optimizing and managing metadata in Delta Lake. It provides an overview of optimize, auto-optimize, and optimize write strategies and how to choose the appropriate strategy based on factors like workload, data size, and cluster resources. It also discusses Delta Lake transaction logs, configurations like log retention duration, and tips for working with Delta Lake metadata.
This slides describes the basic concepts of industrial-strength compiler design. This includes basic concept of static single-assignment form (SSA) and various optimizations such as dead code elimination, global value numbering, constant propagation, etc. This is intend for a 150 minutes undergraduate compiler class.
The document provides instructions for customizing and building a new Android device called Marakana Alpha. It describes generating custom platform signing keys, building the initial device code, and adding optional components like a custom kernel. Key steps include registering the new device in the build system, generating signing keys, compiling the code, and running it on an emulator or device. The goal is to remix Android by building a customized version with new features while maintaining compatibility.
GridSQL is an open source distributed database built on PostgreSQL that allows it to scale horizontally across multiple servers by partitioning and distributing data and queries. It provides significantly improved performance over a single PostgreSQL instance for large datasets and queries by parallelizing processing across nodes. However, it has some limitations compared to PostgreSQL such as lack of support for advanced SQL features, slower transactions, and need for downtime to add nodes.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
This document discusses different methods of parallelism for data warehousing including data parallelism, temporal parallelism (pipelining), and their advantages. Data parallelism involves executing a single query across partitions of data using multiple query servers coordinated by a query coordinator. Temporal parallelism breaks a task into independent subtasks that can execute concurrently in a pipeline. Pipelining aims to increase throughput rather than decrease subtask time. The document also covers partitioning strategies like round robin, hash, and range partitioning and their suitability for different types of queries.
We all have to deal with the scarcest resource in development teams: time. This often leads to compromises and shortcuts when writing code. The result is what is called technical debt: the difference between how the code should have been and how it is.
Strategies for managing technical debt are many but all require an investment of time and energy, proportional to the amount of debt to be repaid.
Is this really the only way?
Rector is a tool capable of automating a wide range of refactorings, allowing us to speed up the most tedious manual operations. In this talk we will see how, exploring
- rector's basic principles
- predefined rules, configurable rules, sets of rules
- how to write custom rules
The document provides an overview of FPGA routing, which is an important step in the CAD process that connects logic blocks placed on the FPGA. It discusses the routing resources in Xilinx FPGAs including connection boxes, switch boxes, and wire segments. It also describes the FPGA routing model commonly used in academia, which simplifies the island-style architecture of commercial FPGAs. Efficient routing aims to minimize wiring area and critical path lengths to improve circuit performance.
This document describes the design of a softcore microcontroller called UMASS-core that is implemented in an FPGA. The UMASS-core is based on the Microchip PIC16F84 8-bit microcontroller architecture but adds some additional features. It is written in VHDL and synthesized for a Spartan3 FPGA. The goal is to provide designers flexibility to customize the microcontroller configuration while maintaining compatibility with PIC16F84 assembly code.
This document discusses the GAUT digital synthesis tool. It describes GAUT's main window which allows compiling C code to a control/data flow graph (CDFG) and then synthesizing the CDFG to VHDL. The CDFG to VHDL synthesis can generate registers and muxes in the VHDL output based on the specified pipeline cadence. Pipelining is not allowed if the new cadence is not a multiple of the clock cycle.
how to obtain animated slides in slideshare from power point without much work or hassle. The idea is simple, automatically break each animation into a separate slide using a free software and upload the new slides into slideshare.
The document outlines data structures and algorithms, including analysis of complexity, common data structures like arrays, stacks, queues, linked lists, and sorting algorithms like merge sort and quick sort. It provides an overview of these topics along with examples of analyzing time complexity using Big-O notation.
Stacks follow the LIFO (last in, first out) principle. They are commonly implemented using arrays, where elements are pushed and popped from one end of the array to enforce the LIFO behavior. This overrides the random access of regular arrays. Common stack operations include push to add an element, pop to remove the top element, peek to access the top element without removing it, and checks for empty or full stacks. Stacks have many applications like function calls, undo/redo operations, parsing expressions etc.
This debugging session summarizes an issue where the TDS tool was producing inconsistent behavior when run multiple times on the same input, sometimes outputting a latency of 10 and other times a latency of 9. The debugging found that the inconsistent behavior was caused by memory allocation occurring in different locations each time, resulting in TedNodes having different pointers and getting mapped to DfgNodes in a different order, changing the output. The solution taken was to modify the associative container to traverse nodes in a consistent list order rather than by key.
This debugging session addresses Bug 111 in TDS version 99. The bug occurred when extracting product terms from an expression during the decompose command. Specifically, when extracting the third product term (PT3), the sub-chain that was extracted as the first product term (PT1) was removed. The solution was to check if the top node of an extracted product term has parents still in the container, and if so, update the parents to point to the extracted variable rather than the removed nodes. This prevents extracted sub-chains from being deleted during later extractions.
This debugging session document summarizes fixing bug 119 in TDS software. The bug occurred when decomposing polynomials, where the top node of the removed chain was losing its backpointer information during the ted2dfg conversion. The solution was to copy the backpointer for the top node as well, matching the rest of the internal nodes. This ensured the full chain maintained connections when decomposed terms were removed.
This document describes debugging a reordering bug in the TDS software. The bug caused reordering to abort at 4% completion. By analyzing debug files, the author isolated the bug to an issue when nodes ai_12 and ai_4 were being swapped. The author then developed techniques like "print_cone" to reduce the test case size. Further analysis using visualization tools revealed the bug was caused by dangling node references during recursive reordering of parent nodes. The solution was to reorder nodes in a levelized manner to avoid reference issues.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
2. CHAPTER 1
TAYLOR DECOMPOSITION SYSTEM MANUAL
Taylor Decomposition System
Compiled a t : 15: 56: 30(GMT) , Sep 9 2012
Tds 01> help
− balance Balance the DFG or N e t l i s t to minimize l a t e n c y .
− bbldown Move down the given v a r i a b l e one p o s i t i o n .
− bblup Move up the given v a r i a b l e one p o s i t i o n .
− bottom Move the given v a r i a b l e j u s t above node ONE.
− bottomdcse [CSE Dynamic ] Move c a n d i d a t e s j u s t above node ONE.
− bottomscse [CSE S t a t i c ] Move c a n d i d a t e s j u s t above node ONE.
− candidate Show the c a n d i d a t e s e xpr e s s i on f or CSE .
− c l u s t e r Create p a r t i t i o n s from m u l t i p l e out put TEDs .
− c l u s t e r e xe Execute a command on a TED p a r i t i o n s .
− c l u s t e r i n f o P r i n t out i nf or m a t i on from the c l u s t e r s
− compute Annotate r e q u i r e d b i t w i d t h f or exact computation .
− cost P r i n t out the cost a s s o c i a t e d to t h i s TED.
− dcse [CSE Dynamic ] E x t r a c t a l l c a n d i d a t e s a v a i l a b l e .
− decompose Decompose the TED in i t s Normal F a c t or Form .
− dfactor Dynamic f a c t o r i z a t i o n .
− dfg2ntl Generate a N e t l i s t from the DFG.
− dfg2ted Generate a TED from the DFG.
− dfgarea Balance the DFG to minimize the area .
− dfgevalconst E va l ua t e s e x p l i c i t DFG c o n s t a n t s
− d f g f l a t t e n Smooths out a l l DFG out put s used as DFG i n p u t s
− dfgschedule Perform the s c he dul i ng of the DFG.
− dfgstrash Perform a s t r u c t u r a l hash of the DFG.
− erase Erase a primary out put from the TED.
− eval Evaluate a TED node .
− exchange Exchange the p o s i t i o n of two v a r i a b l e s .
− e xt r ac t E x t r a c t primary out put s from the TED or N e t l i s t .
− fixorder Fix the order broken by a retime o p e r a t i o n .
− f l i p F l i p the order of a l i n e a r i z e d v a r i a b l e .
− info P r i n t out TED i nf or m a t i on : s t a t i s t i c , e t c .
− jumpAbove Move a v a r i a b l e above a not he r one .
− jumpBelow Move a v a r i a b l e below a not he r one .
− l c s e CSE f or l i n e a r i z e d TED.
− l i n e a r i z e Transform a non l i n e a r TED i n t o a l i n e a r one .
− l i s t v a r s L i s t a l l v a r i a b l e s according to i t s or de r i ng .
1
3. − load Load the environment .
− n t l 2t e d E x t r a c t from a N e t l i s t a l l TEDs .
− optimize Minimize DFG b i t w i d t h s u b j e c t to an e r r o r bound .
− poly C ons t r uc t a TED from a polynomial e xpr e s s i on .
− print P r i n t out TED i nf or m a t i on : s t a t i s t i c , e t c .
− printenv P r i n t the environment v a r i a b l e s .
− p r i n t n t l P r i n t out s t a t i s t i c s of the N e t l i s t .
− purge Purge the TED, DFG and / or N e t l i s t .
− pushshifter Perform a s t r u c t u r a l hash of the DFG.
− quartus Generate , compile and r e p o r t Quartus p r o j e c t .
− read Read a s c r i p t , a CDFG, a TED or a DFG.
− r e c l u s t e r R e c l u s t e r s p a r i t i o n s according to an o b j e c t i v e .
− reloc Relocate the given v a r i a b l e to the d e s i r e d p o s i t i o n .
− remapshift Remap m u l t i p l i e r s and a d d i t i o n s by s h i f t e r s .
− reorder Reorder the v a r i a b l e s in the TED ( Pre−f i xe d cost )
− reorder∗ Reorder the v a r i a b l e s in the TED. ( User cost )
− retime Performs ( forward / backward ) re−timing in TED.
− save Save the environment .
− scse [CSE S t a t i c ] E x t r a c t candidate s , one a t a time .
− s e t Set the v a r i a b l e b i t w i d t h and ot he r opt i ons .
− setenv Set a environment v a r i a b l e .
− s h i f t e r Replace edge weight by c o n s t a n t nodes .
− show Show the TED, DFG or N e t l i s t graph .
− s i f t H e u r i s t i c a l l y optimize the l e v e l of the v a r i a b l e .
− sub S u b s t i t u t e an a r i t h m e t i c e xpr e s s i on by a v a r i a b l e
− ted2dfg Generate a DFG from the TED.
− top Move the given v a r i a b l e to the r oot .
− tr C ons t r uc t a TED from pr e de f i ne d DSP t r a ns f or m s .
− vars P r e s e t the order of v a r i a b l e s .
− ve r i f y V e r i f i e s t h a t two TED out put s are the same .
− write Write the e x i s t i n g NTL|DFG|TED i n t o a f i l e .
−−−−−−−− GENERAL
− ! [ bin ] System c a l l to execute bin .
− h [ elp ] P r i n t t h i s help .
− e [ x i t ] Exit the s h e l l .
− q [ u i t ] Quit the s h e l l .
− h i s t or y P r i n t a l l executed commands with e xe c ut i on times .
− time Show e l a ps e d time f or the l a s t command executed .
− man P r i n t the manual of the given command .
− Use <TAB> f or command completion . i . e . type fl <TAB> f or f l i p .
TDS system manual, by Daniel Gomez-Prado 2
4. 1.1 Taylor Decomposition System Data Structure
The Taylor Decomposition System (TDS) is composed of a shell integrating three data
structures as seen in figure 1.1, the Taylor Expansion Diagram (TED), the Data Flow Graph
(DFG) and Netlist (NTL).
1. TED captures the functionality of an algebraic data path and performs optimizations
on the behavioral level.
2. DFG provides a mechanism to visualize the data-path being implemented by TED.
3. NTL provides a mechanism to communicate with a high level synthesis tool (GAUT).
Manager ManagerManager
Shell +
History Command Facilities
Convert Retime Reorder CostDecompose
Quartus
tcl
VHDL
c/c++
Figure 1.1. Taylor Decomposition System’s internal data structures.
TDS system manual, by Daniel Gomez-Prado 3
5. The TED manager is the main data structure containing the canonical TED graph. Most
of the algorithms described related to TED are implemented within the TED manager by
visitor classes as shown in Figure 1.2.
Ted Manager
kids
kid
node
kid kid
node
kids
kid kid
node
root2root1
Primary Outputs
var1
var2
Ted Container
PrimaryInputs
Linearization
Retiming
Binary
Constant
Variable
Grouping
BFS iterator, DFS iterator
Parents
Functor:
collect Parents
Reorder
Cost
Strategy
Mux
Area
Nodes
Edges
delay
Annealing
Sift
Exhausive
Convert
Viewer
Figure 1.2. Taylor Expansion Diagram’s internal data strcuture.
Figure 1.3 provides an additional view of how the different data structures in TDS are
integrated. The NTL data structure is used to import and export Control Data Flow Graphs
(CDFG) generated by high level synthesis tool. The TDS system has been interfaced to
GAUT (which can be obtained for free at http://hls-labsticc.univ-ubs.fr/ ) as its primary high
level synthesis engine. Additionally, the TDS system also supports importing and exporting
control data flow graphs in eXtendded Markup Language (XML) format. The high level
synthesis tool parses C/C++ designs, compiles them, and traslate them into CDFG or XML
formats which can then used by TDS to perform different optimizations. In the TDS system
TDS system manual, by Daniel Gomez-Prado 4
6. optimizations are encoded into scripts which can be executed as file batch or command by
command through the shell console.
TED linearization
Variable ordering
TED factorization
& decomposition
Constant multiplication
& shifter generation
Common subexpression
elimination (CSE)
TED-based Transformations
Static timing analysis
Latency optimization
Resource constraints
DFG-based Transformations
Behavioral Transformations
C, Behavioral HDL
DFG extraction
Architectural synthesis
GAUT
RTL VHDL
Optimized
DFG
OriginalDFG
GAUT Flow
TDS network
TDS network
Design
objectives
Design
constraints
Structural
elements
Functional
TED
Structural
DFG
TDS Flow
Matrix transforms,
Polynomials
Figure 1.3. Taylor Decomposition System flow.
Although CDFGs can be read into TDS and plot into its NTL data structure, there are
structural elements in the NTL that cannot be transform into TED, these structural elements
force a single design netlist to be represented by a set of TEDs, in which the input of some
TEDs are the outputs of other TEDs. Figure 1.4(a) shows a C design in GAUT, and figure
1.4(b) shows the NTL data structure with structural elements imported into TDS.
The optimized CDFG provided by TDS can afterwards given back to the high level
synthesis tool to finish the synthesis process and generate a Register Transfer Level (RTL).
The final hardware cost after logic synthesis, map and routing can be obtained withi the
TDS shell by compiling the design into an Altera project and running the Altera Quartus
TDS system manual, by Daniel Gomez-Prado 5
7. tool (which can be downloaded for free at http://www.altera.com/products/software/sfw-
index.jsp).
Example behavioral design in C Initial TDS network
Functional
operations
Structural
element
Primary output
Primary inputs
(a) (b)
Figure 1.4. (a) GAUT system. (b) The control data flow graph imported into TDS.
1.1.1 TDS interface with high level synthesis tools
Besides importing and exporting CDFGs, the comamnds read and write allow to store
and load the TED and DFG internal data structures. These commands, as shown in listings
1.1 and 1.2, recognize the input and output format based in the filename extension.
TDS system manual, by Daniel Gomez-Prado 6
8. Listing 1.1. Import Facility
Tds 01> read −h
NAME
read − Read a s c r i p t , a CDFG, a TED or a DFG.
SYNOPSIS
read i n p u t f i l e . [ c | cpp | s c r | poly |mx| ted | cdfg | xml ]
OPTIONS
−h , −−help
P r i n t t h i s message .
i n p u t f i l e
The f i l e to read could be any of the f o l l o w i n g e x t e n s i o n s :
− c | cpp f o r c f i l e s
− poly | s c r f o r s c r i p t f i l e s
− mx f o r matrix f i l e s
− ted f o r ted data s t r u c t u r e f i l e s
− cdfg f o r f i l e s g e n e r a t e d from GAUT
− xml f o r f i l e s g e n e r a t e d from GECO
[−−n o d f f ]
d i s a b l e s r e g i s t e r ”DFF” d i s c o v e r i n g when reading the cdfg
SEE ALSO
write , show , purge
Listing 1.2. Export Facility
Tds 01> write −h
NAME
write − Write the e x i s t i n g NTL|DFG|TED i n t o a ∗.[ cdfg | dfg | ted ] f i l e .
SYNOPSIS
write [ cdfg o p t i o n s ] o u t p u t f i l e . [ cdfg | dfg | ted | s c r ]
OPTIONS
−h , −−help
P r i n t t h i s message .
[ cdfg o p t i o n s ]
−d , −−dfg
Uses the DFG data s t r u c t u r e as s t a r t i n g p o i n t
−t , −−ted
Uses the TED data s t r u c t u r e as s t a r t i n g p o i n t
−n , −−n t l
Uses the NTL data s t r u c t u r e as s t a r t i n g p o i n t [DEFAULT BEHAVIOR]
o u t p u t f i l e
The d e s i r e d output f i l e name . The e x t e n s i o n d e f i n e s the format :
− c c language .
− cdfg c u r r e n t GAUT format .
− dfg i n t e r n a l DFG data s t r u c t u r e .
− ted i n t e r n a l TED data s t r u c t u r e .
− gappa GAPPA s c r i p t f o r computing the accuracy of the DFG data s t r u c t u r e .
− xml XML format f o r the DFG data s t r u c t u r e .
− s c r | poly g e n e r a t e s a s c r i p t f i l e from the command history .
NOTE
By d e f a u l t the f i l e format determines the data s t r u c t u r e from which
the f i l e w i l l be writed : cdfg−>NTL, dfg−>DFG, ted−>TED
EXAMPLE
write poly2 . cdfg
. . . w r i t e s the NTL in a cdfg f i l e format
write −−ted poly1 . cdfg
. . . c o n v e r t s the TED i n t o NTL and then w r i t e s i t s i n t o a cdfg f i l e
write poly1 . dfg
. . . w r i t e s the DFG in f i l e poly1 . dfg
SEE ALSO
read , purge
TDS system manual, by Daniel Gomez-Prado 7
9. 1.1.2 Transforming Internal Data Structures
The transformations ntl2ted, dfg2ted, dfg2ntl are a one to one transformation. This is
not the case with the ted2dfg transform which might generate different DFG graphs from
the same TED, depending on how the TED is traversed. This is depicted on the ted2dfg
command shown in figure 1.3 by options –normal and –factor
Listing 1.3. Transforming TED into DFG
Tds 01> ted2dfg −h
NAME
ted2dfg − Generate a DFG from the TED .
SYNOPSIS
ted2dfg [ method ]
OPTIONS
−h , −−help
P r i n t t h i s message .
[ method ]
−l , −−l a t e n c y [−−cluster , −−clusterA , −−c l u s t e r D ] [−−l e v e l ]
Generate a DFG by b a l a n c i n g a l l o p e r a t i o n s on i t
Sub option : −−c l u s t e r A
Generate the DFG using only TEDs in c l u s t e r A .
Sub option : −−c l u s t e r D
Generate the DFG using only TEDs in c l u s t e r D .
Sub option : −−c l u s t e r
Generate the DFG using TEDs in both c l u s t e r A and c l u s t e r D .
Sub option : −−l e v e l
Maintain the delay l e v e l of e x t r a c t e d boxes in the N e t l i s t .
−n , −−normal [−−cluster , −−clusterA , −−c l u s t e r D ]
Generate a one to one t r a n s l a t i o n of the DFG through a TED NFF t r a v e r s a l [DEFAULT BEHAVIOR ] .
−f , −−f a c t o r , −−f l a t t e n [−−show ]
F l a t t e n the DFG by f a c t o r i z i n g common terms in the TED graph .
Sub option : −−show
Treat each f a c t o r found as a pseudo output in the DFG graph .
DETAILS
Most of the times t h i s c o n s t r u c t i o n i s made i m p l i c i t . For i n s t a n c e when
an o p e r a t i o n in a DFG i s r e q u e s t e d ( i . e . show −d ) and no DFG e x i s t yet
an i m p l i c i t conversion occurs . I f a DFG a l r e a d y e x i s t t h i s command w i l l
o v e r w r i t e i t .
EXAMPLE
poly X = a−b+c
poly Y = a+b−c
poly F = X+Y
dfg2ted −−normal
show −−dfg
echo produces a DFG with o u t p u t s X,Y and F
purge −−dfg
dfg2ted −−f a c t o r
show −−dfg
echo polynomials X and Y d i s a p p e a r in the DFG as e v a l u a t i o n of F ,
echo the r e s u l t i n g polynomial F = 2∗a , has no record of X or Y
SEE ALSO
t e d 2 n t l , ntl2ted , dfg2ted , dfg2ntl
1.2 TDS Environment
There is a set of environments in TDS that can be customized. The complete list of
environments is stored on a file named tds.env and shown in listing 1.4.
The current list of environment settings can be obtained in TDS with the command print
environment:
Tds 01> printenv
TDS system manual, by Daniel Gomez-Prado 8
10. Listing 1.4. TDS Environment
#######################################
# Environment f i l e g e n e r a t e d by TDS . #
# h t t p : / / i n c a s c o u t . ecs . umass . edu / main #
#######################################
# Environment v a r i a b l e | Value
#−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−
b i t w i d t h f i x e d p o i n t | 4 ,28
b i t w i d t h i n t e g e r | 32
c d f g b i n p a t h | / home / d a n i e l / Gaut / GAUT 2 4 2 / GautC / cdfgcompiler / bin /
c o n s t a s v a r s | f a l s e
c o n s t c d f g e v a l | f a l s e
c o n s t p r e f i x | c o n s t
d e f a u l t d e s i g n n a m e | t d s 2 n t l
delayADD | 1
delayLSH | 1
delayMPY | 2
delayREG | 1
delayRSH | 1
delaySUB | 1
d o t b i n | dot
f p g a d e v i c e | AUTO
f p g a f a m i l y | ” S t r a t i x I I ”
g a u t a l l o c a t e s t r a t e g y | −d i s t r i b u t e d t h l b
g a u t b i n p a t h | / home / d a n i e l / Gaut / GAUT 2 4 2 / GautC / bin /
gaut cadency | 200
g a u t c l o c k | 10
g a u t c o s t e x t e n s i o n | . gcost
g a u t g a n t t g e n e r a t i o n | f a l s e
g a u t l i b p a t h | / home / d a n i e l / Gaut / GAUT 2 4 2 / GautC / l i b /
gaut mem | 10
gaut mem generation | f a l s e
g a u t o p t i m i z e o p e r a t o r | t r u e
g a u t r e g i s t e r s t r a t e g y | 0
g a u t s c h e d u l e s t r a t e g y | f o r c e n o p i p e l i n e
g a u t s o c l i b g e n e r a t i o n | f a l s e
g a u t t e c h l i b | notech 16b . l i b
g a u t t e c h l i b v h d | n o t e c h l i b . vhd
g a u t t e c h v h d | notech . vhd
n e g a t i v e p r e f i x | moins
p s b i n | evince
q u a r t u s b i n p a t h |
rADD | 0
rMPY | 0
rSUB | 0
r e o r d e r t y p e | proper
show bigfont | f a l s e
s h o w d i r e c t o r y | . / d o t f i l e s
show level | t r u e
show verbose | f a l s e
#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
# Other p o s s i b l e values |
#−−−−−−−−−−−−−−−−−−−−−−−−
# g a u t s c h e d u l e s t r a t e g y {”” , ” f o r c e n o p i p e l i n e ” , ” f o r c e n o m o b i l i t y ” , ” no more stage ” , ””}
# g a u t a l l o c a t e s t r a t e g y {”−d i s t r i b u t e d t h l b ” , ”−d i s t r i b u t e d r e u s e t h ” , ”−d i s t r i b u t e d r e u s e p r u b ” ,
# ”−d i s t r i b u t e d r e u s e p r ” , ”−g l o b a l p r l b ” , ”−g l o b a l p r u b ”}
# g a u t r e g i s t e r s t r a t e g y {”0” , ”1” , ”2” , ”3”}
# 0 MWBM [ d e f a u l t ]
# 1 MLEA
# 2 Left edge
# 3 None
# p s b i n {” evince ” , ”gv ” , ” gsview32 . exe ”}
# r e o r d e r t y p e {” proper ” , ”swap ” , ” reloc ”}
TDS system manual, by Daniel Gomez-Prado 9
11. Similarly, a particular setting can be modified with the command set environment:
Tds 02> setenv r e o r d e r t y p e = proper
The current environment can be saved:
Tds 03> save [ o p t i o n a l filename argument , d e f a u l t i s t ds . env ]
or load and re-load:
Tds 04> load [ o p t i o n a l filename argument , d e f a u l t i s t ds . env ]
The environment variables have a direct impact on how different commands work, for
instance the environment variable const prefix determines the string expected by the CDFG
parser to identify a constant value, if this value is modified all constants read from the
CDFG file will fail to be identify and will be treated as variables.
1.3 ALIASES
Additional to the commands provided by TDS, one can use aliases to refer to a particu-
lar command or to a group of commands. The list of aliases should be stored in a file named
tds.aliases with the format shown in listing 1.5. This file is uploaded by TDS on startup,
so any modification on this file requires restarting the TDS system. It is worth noting that
this alias commands do not accept arguments, therefore no help can be invoked on these
alias-commands.
Listing 1.5. Alias Commands
###############################################################
# r e s e r v e d word a l i a s name semi−colon s e p a r a t e d commands#
###############################################################
a l i a s f l a t t e n ted2dfg −n ; d f g f l a tt e n ; dfg2ted
a l i a s s t a t s print −s
a l i a s s h i f t e r ∗ s h i f t e r −−ted ; remapshift
a l i a s gautn cost −n
a l i a s gautd cost −d
a l i a s g a u t t cost −t
TDS system manual, by Daniel Gomez-Prado 10
12. 1.4 Commands
1.4.1 Variable Ordering
Moving an individual variable in the TED data structure can be achieved by the fol-
lowing commands: bblup, bbldown, bottom, top, flip, reloc. While reordering the TED
to optimize a specific metric is achieved by the commands reorder and reorder*. All these
commands except by reorder* are subject to three reordering algorithms as shown in list-
ing 1.6.
Listing 1.6. Alias Commands
[ reorder a l g o r i t h m ]
−p , −−proper
The proper a l g o r i t h m to reorder the TED as d e s c r i b e d in the paper
h t t p : / / doi . i e e e c o m p u t e r s o c i e t y . org / 1 0 . 1 1 0 9 /HLDVT.2004.1431235 [DEFAULT BEHAVIOR ] .
−−reloc (TO BE DEPRECATED IN A FUTURE RELEASE)
R e c o n s t r u c t s the TED with the d e s i r e d order of v a r i a b l e s . This i s
slow , but s e r v e s as golden r e f e r e n c e .
−−swap (TO BE DEPRECATED IN A FUTURE RELEASE)
A Hybrid implementation , t h a t re−s t r u c t u r e s the TED with the d e s i r e d
orden , but i n t e r n a l l y uses r e c o n s t r u c t s and o p e r a t i o n s on the bottom
graph .
In this particular setting the default algorithm is the swap; but this setting can always
be changed on the environment settings of the TDS. The reason for these many implemen-
tations is best understood with the following recap:
Brief history: There have been 4 different implementations of the TED package.
1. The first one used internally the ccud package and was started on 2002 and never
finished.
2. The second one was a re-write of the package called TED in 2004 - 2005, and imple-
mented the construction of the TED and variable ordering.
3. The third version named TEDify was an optimized version of the second package
built from scratch to cope with memory problems and efficiency 2006 - 2007.
4. The forth version was also started in 2006 and was named TDS. The development
of the third and forth version overlapped in time, and because other people started
working on the forth version, the TEDify was abandon and its algorithms ported into
TDS system manual, by Daniel Gomez-Prado 11
13. TDS. Since 2008 substantial changes and improvements have been made to the TDS
package.
Coding the variable ordering algorithm is most likely one of the most troublesome parts to
write on the TED data structure. And although the proper algorithm built for TEDify has
been completely ported to TDS, new requirements on the data structure (retiming, bitwith,
error) require new modification to this algorithm. Therefore a quick and dirt implementa-
tion called swap has been left, this implementation disregards any information other than
the variable name in a TED. The proper implementation performs a bit faster than the swap
and produces the same results, but currently it is being modified to take into account the
register limitations imposed by retiming in TED.
1.4.2 Optimization
Ordering the TED to optimize a particular cost function is possible. The commands
reorder and reorder* permit to evaluate the variable ordering of a TED to a certain cost
function. For instance, each of the TEDs shown below correspond to the same TED but
with different orderings as to minimize the number of nodes, number of multipliers, latency,
etc. Searching for the best TED ordering for a particular cost function is in the worst case
exponential in the number of nodes (an exhaustive search is not recommended), therefore
one can specify other heuristics with the command reorder and reorder* as shown in
listing 1.7.
TDS system manual, by Daniel Gomez-Prado 12
14. Listing 1.7. Heuristic number of iterations
[ i t e r a t i o n s t r a t e g y ]
−a , −−a n n e a l i n g [−−n o s t r i d e ] [−−s t r i d e b a c k t r a c k ] [−−pJumpAbove ]
[−−pJumpBelow ] [−−beta ] [−−alpha ] [−−r a t i o ] [−−adjustment ]
I t minimizes the cost f u n c t i o n by using the a n n e a l i n g a l g o r i t h m
Sub option : −−n o s t r i d e
P r e v e n t s ( groups of a ) m u l t i p l e output TEDs to be t r e a t e d as i n d i v i d u a l TEDs
Sub option : −−s t r i d e b a c k t r a c k
Enables b a c k t r a c k i n g a f t e r a n n e a l i n g each s i n g l e s t r i d e
Sub option : −−probAnnealingJumpAbove,− paja double number between 0 and 1
Defines the p r o b a b i l i t y t h a t one v a r i a b l e w i l l jump above another one
Sub option : −−probAnnealingJumpBelow , −pajb double number between 0 and 1
Defines the p r o b a b i l i t y t h a t one v a r i a b l e w i l l jump below another one
Sub option : −−beta double number between 0 and 1
Defines a s c a l e f a c t o r f o r the i n i t i a l t e m p e r a t u r e of the a n n e a l i n g a l g o r i t h m
Sub option : −−alpha double number between 0 and 1
Defines a s c a l e f a c t o r f o r the new t e m p e r a t u r e of the a n n e a l i n g a l g o r i t h m
Sub option : −−r a t i o i n t e g e r
Defines a s c a l e f a c t o r f o r the number of l e v e l s
Sub option : −−adjustment i n t e g e r
Defines an adjustment to the i n i t i a l cost of the a n n e a l i n g a l g o r i t h m
−e , −−e x h a u s t i v e [−−end ] [−−n o s t r i d e ]
T r i e s a l l p o s s i b l e o r d e r s by doing permutation , i s O(N! )
Sub option : −−end
P r e v e n t s the a b o r t i o n of the permutation when ’ESC’ i s pressed
Sub option : −−n o s t r i d e
P r e v e n t s c l u s t e r i n g m u l t i p l e output TEDs
−s , −−s i f t [−g , −−group ]
Moves each v a r i a b l e a t a time through out the h e i g h t of the TED
graph t i l l i t s b e s t p o s i t i o n i s found , i s O(Nˆ 2 ) [DEFAULT BEHAVIOR ] .
Sub option : −g , −−group
This option only a f f e c t s the SIFT a l g o r i t h m . I f grouping i s s e l e c t e d ,
the s i f t i n g i s done p r e s e r v i n g the r e l a t i v e grouping of a l l v a r i a b l e s
t h a t were l i n e a r i z e d . I f not every s i n g l e v a r i a b l e i s moved r e g a r d l e s s
of i t s grouping .
The command reorder can optimize one of the following cost functions (only one cost
function at a time) as shown in listing 1.8.
Listing 1.8. Cost functions for command reorder
[ cost f u n c t i o n s ]
D e f i n i t i o n s :
t e d s u b e x p r c a n d i d a t e s = # of product terms in the TED with 2+ p a r e n t s connecting to ONE
ted nodes = # of nodes in the TED graph
tLatency = the l a t e n c y computed from the TED graph
nMUL = # of m u l t i p l i c a t i o n s in the DFG graph
nADD = # of a d d i t i o n s in the DFG graph
nSUB = # of s u b s t r a c t i o n s in the DFG graph
rMPY = # of m u l t i p l i e r s a f t e r s c h e d u l i n g the DFG
rADD = # of adders and s u b t r a c t o r s a f t e r s c h e d u l i n g the DFG
dLatency = the l a t e n c y of the DFG
gLatency = the l a t e n c y of the Gaut implementation
Environment v a r i a b l e s used to s e t the :
1) delay of the DFG o p e r a t o r s :
delayADD [ Default value = 1]
delaySUB [ Default value = 1]
delayMPY [ Default value = 2]
delayREG [ Default value = 1]
2) maximum number of r e s o u r c e s used by the DFG s c h e d u l e r :
rMPY [ Default value = 4294967295]
rADD [ Default value = 4294967295]
rSUB [ Default value = 4294967295]
−−node
Minimizes the f u n c t i o n ”10∗ ted nodes − t e d s u b e x p r c a n d i d a t e s ”
−t l , −−tLatency
Minimizes the TED latency , i t s c r i t i c a l path , s u b j e c t to the r e s o u r c e s s p e c i f i e d in the environment v a r i a b l e s
−nm, −−nMUL {legacy −m, −−mul}
Minimizes the f u n c t i o n ”10∗nMUL − t e d s u b e x p r c a n d i d a t e s ” [DEFAULT BEHAVIOR]
−−op
Minimizes the f u n c t i o n ”10∗(nMUL + nADD + nSUB)− t e d s u b e x p r c a n d i d a t e s ”
−−opscheduled
Minimizes nMUL, followed by (nADD+nSUB) , dfg latency , rMPY, rADD
−dl,−−dLatency {legacy −−l a t e n c y }
Minimizes the DFG l a t e n c y s u b j e c t to the r e s o u r c e s s p e c i f i e d in the environment v a r i a b l e s
−−b i t w i d t h
Minimizes the b i t w i d t h of the HW implementation s u b j e c t to u n l i m i t e d l a t e n c y | r e s o u r c e s
−gm, −−gMUX {legacy −−gmux}
Minimizes the Gaut mux count in the Gaut implementation ( each mux i s c o n s i d e r e d 2 to 1)
−gl , −−gLatency {legacy −−g l a t e n c y}
Minimizes the Gaut l a t e n c y in the Gaut implementation
−gr , −−gREG {legacy −−garch}
Minimizes the Gaut r e g i s t e r count in the Gaut implementation
−−gappa
Minimizes the upper t i g h t e r bound found trough Gappa
TDS system manual, by Daniel Gomez-Prado 13
15. 1.4.3 Gaut and Quartus
For example, the estimated latency and area of implementing polynomial F = fbh +
a + cb + gfb + edb without TED optimization in GAUT is 100ns and 91 units. In Altera
the frequency obtained is 167Mhz with 287 ALUs. See listing 1.9 for the commands used.
Listing 1.9. Design F = fbh + a + cb + gfb + edb without optimization
Tds 01> poly F= f∗b∗h+a+c∗b+g∗f∗b+e∗d∗b
Tds 02> cost
Cadency i s 200
OP delays : ADD=1 SUB=1 MPY=2
r e s o u r c e s : ADD=4294967295 MPY=4294967295
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| TED |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| node | edge0 | edgeN | f a c t o r | width |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| 9 | 4 | 4 | 0 | 0 |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| DFG | Schedule |
|−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−|
| nMUL | nADD | nSUB | Latency | rMPY | rADD |
| 4 | 4 | 0 | 6 | 2 | 1 |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| Gaut |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| Muxes | Latency | R e g i s t e r | Area |
| 96 | 100 | 6 | 91 |
design name : ted
Tds 03> setenv gaut cadency =100
Tds 04> cost
Cadency i s 100
OP delays : ADD=1 SUB=1 MPY=2
r e s o u r c e s : ADD=4294967295 MPY=4294967295
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| TED |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| node | edge0 | edgeN | f a c t o r | width |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| 9 | 4 | 4 | 0 | 0 |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| DFG | Schedule |
|−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−|
| nMUL | nADD | nSUB | Latency | rMPY | rADD |
| 4 | 4 | 0 | 6 | 2 | 1 |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| Gaut |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| Muxes | Latency | R e g i s t e r | Area |
| 96 | 100 | 6 | 91 |
design name : ted
Tds 05> ted2dfg
Tds 06> dfg2ntl
Tds 07> cost −n
Cadency i s 100
OP delays : ADD=1 SUB=1 MPY=2
r e s o u r c e s : ADD=4294967295 MPY=4294967295
| TED | DFG | Schedule | Gaut |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| nodes | f a c t o r s | b i t w i d t h | nMUL | nADD | nSUB | l a t e n c y | rMPY | rADD | Muxes | l a t e n c y | R e g i s t e r | Area |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 100 | 5 | 91 |
design name : d f g 2 n t l
Tds 08> quartus −p
Tds 09> quartus −c
Info : Report f i l e saved in ” d f g 2 n t l . quartus . r e p o r t ”
Info : Loading r e p o r t f i l e ” d f g 2 n t l . quartus . r e p o r t ”
clock name | t a r g e t f r e q | design f r e q
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
clk | 1000.0 MHz | 167.76 MHz
Resources | S y n t h e s i s | F i t t e r
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Combinational ALUTs | 263 | 287 / 12 ,480 ( 2 % )
−− 7 i n p u t f u n c t i o n s | 0 | 0
−− 6 i n p u t f u n c t i o n s | 70 | 71
−− 5 i n p u t f u n c t i o n s | 115 | 114
−− 4 i n p u t f u n c t i o n s | 26 | 26
−− <=3 i n p u t f u n c t i o n s | 52 | 76
Dedicated l o g i c r e g i s t e r s | 90 | 90 / 12 ,480 ( 1 % )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Assignment to | : PARTITION HIERARCHY = r o o t p a r t i t i o n
There are 510 ADDs in the design
There are 510 SUBs in the design
There are 0 s h i f t s in the design
TDS system manual, by Daniel Gomez-Prado 14
16. Optimizing the design using the command reorder as shown in listing 1.10 gives an es-
timated latency of 70ns with an area of 348 units in GAUT, and a frequency of 182.58Mhz
with 720 ALUs in Quartus.
Listing 1.10. Design F optimized with: reorder –annealing -gl
Tds 01> poly F= f∗b∗h+a+c∗b+g∗f∗b+e∗d∗b
Tds 02> setenv gaut cadency =100
Tds 03> reorder −−a n n e a l i n g −gl
s t r a d e =111111110
[========================================] 100% Cost =80.00
Tds 04> setenv gaut cadency =80
Tds 05> reorder −−a n n e a l i n g −gl
s t r a d e =111111110
[========================================] 100% Cost =70.00
Tds 06> setenv gaut cadency =70
Tds 07> reorder −−a n n e a l i n g −gl
s t r a d e =111111110
[ ] 0% T=39.90 Window=49 Cost =70.00 too s t r o n g c o n s t r a i n t s
[===== ] 14% T=33.91 Window=47 Cost =70.00 too s t r o n g c o n s t r a i n t s
[=========== ] 27% T=28.83 Window=45 Cost =70.00 too s t r o n g c o n s t r a i n t s
[=============== ] 38% T=24.50 Window=43 Cost =70.00 too s t r o n g c o n s t r a i n t s
[=================== ] 47% T=20.83 Window=41 Cost =70.00 too s t r o n g c o n s t r a i n t s
[================================= ] 83% T=6.68 Window=27 Cost =70.00 too s t r o n g c o n s t r a i n t s
[===================================== ] 94% T=2.14 Window=13 Cost =70.00 too s t r o n g c o n s t r a i n t s
[========================================] 100%
Tds 08> ted2dfg
Tds 09> balance −d
Tds 10> dfg2ntl
Tds 11> cost −n
Cadency i s 70
OP delays : ADD=1 SUB=1 MPY=2
r e s o u r c e s : ADD=4294967295 MPY=4294967295
| TED | DFG | Schedule | Gaut |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| nodes | f a c t o r s | b i t w i d t h | nMUL | nADD | nSUB | l a t e n c y | rMPY | rADD | Muxes | l a t e n c y | R e g i s t e r | Area |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 70 | 8 | 348 |
design name : d f g 2 n t l
Tds 12> quartus −p
Tds 13> quartus −c
Info : Report f i l e saved in ” d f g 2 n t l . quartus . r e p o r t ”
Info : Loading r e p o r t f i l e ” d f g 2 n t l . quartus . r e p o r t ”
clock name | t a r g e t f r e q | design f r e q
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
clk | 1000.0 MHz | 182.58 MHz
Resources | S y n t h e s i s | F i t t e r
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Combinational ALUTs | 696 | 720 / 12 ,480 ( 6 % )
−− 7 i n p u t f u n c t i o n s | 0 | 0
−− 6 i n p u t f u n c t i o n s | 67 | 67
−− 5 i n p u t f u n c t i o n s | 339 | 339
−− 4 i n p u t f u n c t i o n s | 100 | 102
−− <=3 i n p u t f u n c t i o n s | 190 | 212
Dedicated l o g i c r e g i s t e r s | 135 | 135 / 12 ,480 ( 1 % )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Assignment to | : PARTITION HIERARCHY = r o o t p a r t i t i o n
There are 510 ADDs in the design
There are 510 SUBs in the design
There are 0 s h i f t s in the design
The command reorder* differs from command reorder in that it accept a list of cost
functions to be optimized. The least of cost functions should be entered in such that the
least important cost function is given first, and the most important cost function is given
last. The cost functions available are shown in listing 1.11.
TDS system manual, by Daniel Gomez-Prado 15
17. Listing 1.11. Cost functions for command Reorder*
[ l i s t of cost f u n c t i o n s ] {LICF . . . MICF}
Where LICF and MICF s t a n d s f o r the Least / Most Important Cost Function to optimize
−−node
Minimizes the number of TED nodes
−t l , −−tLatency
Minimizes the c r i t i c a l path computed in the TED graph
−−edge
Minimizes the t o t a l number of TED edges
−−edge0
Minimizes the number of a d d i t i v e TED edges
−−edgeN
Minimizes the number of m u l t i p l i c a t i v e TED edges
−nm, −−nMUL
Minimizes the number of m u l t i p l i c a t i o n s in the DFG
−na , −−nADD
Minimizes the number of a d d i t i o n s ( and s u b s t r a c t i o n s ) in the DFG
−rm , −−rMPY
Minimizes the number of m u l t i p l i e r s erOfCandidates ” [DEFAULT BEHAVIOR]
−dl , −−dLatency
Minimizes the l a n t e c y in the DFG
−−b i t w i d t h
Minimizes the b i t w i d t h of the HW implementation s u b j e c t to u n l i m i t e d l a t e n c y | r e s o u r c e s
−−gappa
Minimizes the upper t i g h t e r bound found trough Gappa
−gm, −−gMUX
Minimizes the number of muxes in the GAUT implementation
−gl , −−gLatency
Minimizes the l a t e n c y in the GAUT implementation
−gr , −−gREG
Minimizes the number of r e g i s t e r s in the GAUT implementation
−ga , −−gArea
Minimizes the t o t a l area of the o p e r a t o r s in the GAUT implementation
Optimizing the design using the command reorder* as shown in listing 1.12 gives a
latency of 70ns in GAUT; and a frequency of 193.12Mhz with 411 ALUs in Quartus.
Listing 1.12. Design F optimized with: reorder –annealing -gr -gm -ga -gl
Tds 01> poly F= f∗b∗h+a+c∗b+g∗f∗b+e∗d∗b
Tds 02> setenv gaut cadency =100
Tds 03> reorder∗ −−a n n e a l i n g −gr −gm −ga −gl
s t r a d e =111111110
[========================================] 100% Cost 80.00
Info : b a c k t r a c k i n g p r e v i o u s r e s u l t with cost 80
Tds 04> setenv gaut cadency =80
Tds 05> reorder∗ −−a n n e a l i n g −gr −gm −ga −gl
s t r a d e =111111110
[========================================] 100% Cost 70.00
Tds 06> setenv gaut cadency =70
Tds 07> ted2dfg
Tds 08> balance −d
Tds 09> dfg2ntl
Tds 10> cost −n
Cadency i s 70
OP delays : ADD=1 SUB=1 MPY=2
r e s o u r c e s : ADD=4294967295 MPY=4294967295
| TED | DFG | Schedule | Gaut |
|−−−−−−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−|−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−|
| nodes | f a c t o r s | b i t w i d t h | nMUL | nADD | nSUB | l a t e n c y | rMPY | rADD | Muxes | l a t e n c y | R e g i s t e r | Area |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 70 | 7 | 174 |
design name : d f g 2 n t l
Tds 11> quartus −p
Tds 12> quartus −c
Info : Report f i l e saved in ” d f g 2 n t l . quartus . r e p o r t ”
Info : Loading r e p o r t f i l e ” d f g 2 n t l . quartus . r e p o r t ”
clock name | t a r g e t f r e q | design f r e q
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
clk | 1000.0 MHz | 193.12 MHz
Resources | S y n t h e s i s | F i t t e r
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Combinational ALUTs | 409 | 411 / 12 ,480 ( 3 % )
−− 7 i n p u t f u n c t i o n s | 0 | 0
−− 6 i n p u t f u n c t i o n s | 66 | 66
−− 5 i n p u t f u n c t i o n s | 226 | 226
−− 4 i n p u t f u n c t i o n s | 25 | 27
−− <=3 i n p u t f u n c t i o n s | 92 | 92
Dedicated l o g i c r e g i s t e r s | 119 | 119 / 12 ,480 ( 1 % )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Assignment to | : PARTITION HIERARCHY = r o o t p a r t i t i o n
There are 510 ADDs in the design
There are 510 SUBs in the design
There are 0 s h i f t s in the design
TDS system manual, by Daniel Gomez-Prado 16
18. 1.4.4 Registers
To annotate registers into the TED, the polynomial operations during construction have
been extended to deal with timing information. The operator use to denote time delay is
the at sign @.
Tds 01> vars P M N a
Tds 02> poly N@3∗[{ a ∗( a∗X)@1+a ˆ2∗( a∗Y)@2+a ˆ3∗( a∗Y)@4}@2+
a ˆ3∗{ a ∗( a∗X)@1+a ˆ2∗( a∗Y)@2+a ˆ3∗( a∗Y)@4}@1]+
M@4∗[{ a ∗( a∗X)@1+a ˆ2∗( a∗Y)@2+a ˆ3∗( a∗Y)@4}@1]+
P@2∗[{ a ∗( a∗X)@1+a ˆ2∗( a∗Y)@2+a ˆ3∗( a∗Y)@4}@3]
Tds 03> show unr e t i m e d t e d
F
P@2
M@4
2R
a@3
4R
N@3
a@1
3R
a
1R
a@2
a@3
^ 2
a@5
^ 3
^ 3
a@2
^ 2
3R
a@4
a@7
^ 3
2RX@2
ONE
3RX@3
4RX@4
3RY@3
4RY@4
5RY@5
6RY@6
7RY@7
2R
a@3
a@4
^ 2
a@6
^ 3
2R
3R
5R
3R
4R
6R
4R
7R
Figure 1.5. Unretimed TED.
TDS system manual, by Daniel Gomez-Prado 17
19. Tds 04> ted2dfg −n
Tds 05> show −d unretimed dfg . dot
F
+
N
1R
1R
1R
*
a
1R
*
*
1R
*
1R
*
*
*
1R
*
*
X
1R
1R
1R
1R
+
1R
1R
*
*
1R
*
1R
*
Y
1R
1R
1R
1R
1R
+
1R
*
1R
*
*
1R
1R
*
*
1R
1R
1R
+
+
+
*
*
+
M
1R
1R
1R
1R
P
1R
1R
*
+
+
*
*
Figure 1.6. DFG corresponding to the unretimed TED.
TDS system manual, by Daniel Gomez-Prado 18
20. Tds 06> retime −up
Suggestion to r e s t o r e or de r i ng v i s u a l i z a t i o n :
jumpAbove −p a a
Tds 07> show r e t i m e d t e d . dot
F
P@2
M@4
a
---3R---
2R
4R
N@3
---1R---
3R
a a
---1R---
a
^ 2
---2R---
^ 3
---4R---
---2R---
^ 3
---1R---
X
ONE
Y
Figure 1.7. Retimed TED.
Tds 08> ted2dfg −n
Tds 09> show −d r e t i m e d df g . dot
TDS system manual, by Daniel Gomez-Prado 19
22. Listing 1.13. Annotating bitwidth in TED
Tds 01> s e t −h
NAME
s e t − Set the v a r i a b l e b i t w i d t h and o t h e r o p t i o n s .
SYNOPSIS
s e t [ b i t w i d t h ] [ range ] [ maximal e r r o r ]
OPTIONS
−h , −−help
P r i n t t h i s message .
[ b i t w i d t h ]
−b , −−b i t w i d t h i n t e g e r | i n t | f i x e d p o i n t | fxp [ var1 : b i t w i d t h 1 var2 : b i t w i d t h 2 . . . ]
Set the i n i t i a l b i t w i d t h of the v a r i a b l e s in the TED .
All o t h e r v a r i a b l e s not s p e c i f i e d in the l i s t , take a
d e f a u l t b i t w i d t h depending on i t s type :
∗ i n t e g e r ( i n t)−> 32
∗ f i x e d p o i n t ( fxp)−> 4 ,28
[ range ]
−r , −−range var1 : i n t e r v a l 1 [ var2 : i n t e r v a l 2 . . . ]
Where i n t e r v a l has the syntax : [ minval , maxval ]
[ maximal e r r o r ]
−e , −−e r r o r po1 : maxerror1 [ po2 : maxerror2 . . . ]
Where maxerror1 i s the maximal e r r o r allowed a t the primary output po1
EXAMPLE
poly F1 = a−b+c+d
poly F2 = ( a+b )∗( c+d ) ˆ 2
s e t −b f i x e d p o i n t a :4 ,16 c :2 ,10
s e t −r d : [ 0 . 1 2 8 , 1.123432] b :[ −5.3223 , 321.32 e−3] c : [ 0 , 1]
s e t −e F1 : 1 . 3 2 4 F2 : 0 . 9 8 3
SEE ALSO
l i s t v a r s , compute
The command compute is used then to compute the bit-width information across the
TED data structure.
Listing 1.14. Compute bitwidth required for exact computation
Tds 01> compute −h
NAME
compute − Annotate the bit−widths r e q u i r e d f o r exact computation .
SYNOPSIS
compute [−t −b −−snr ] | [−d −b −g ]
OPTIONS
−h , −−help
P r i n t t h i s message .
−b , −−b i t w i d t h
Compute the b i t w i d t h a t each p o i n t in the graph [DEFAULT BEHAVIOR ] .
−g , −−gappa
Compute a bound on the maximal e r r o r .
−−snr
Compute the Signal to Noise Ratio of the a r c h i t e c t u r e .
−t , −−ted
In the TED graph [DEFAULT BEHAVIOR ] .
−d , −−dfg
In the DFG graph .
SEE ALSO
optimize
The above means that the bitwidth information can be computed on the TED and DFG
data structure, whereas the maximal error bound provided by gappa can only be computed
from the DFG graph.
Let’s look at the following synthetic example to see how bitwidth optimization works.
Tds 01> poly F1=a−b+c+d
Tds 02> poly F2=( a+b ) ∗( c+d ) ˆ 2
Tds 03> show p r e c i s i o n n o t a n n o t a t e d . dot
Tds 04> s e t −b f i x e d p o i n t a :4 ,16 c :2 ,10 b :4 ,12 d :4 ,12
Tds 05> show p r e c i s i o n t e d . dot
TDS system manual, by Daniel Gomez-Prado 21
23. F1
a
b
-1
ONE
F2
a
b
cc
-1
^ 2d
2
d
^ 2
4:a Q(4,16)
1:d Q(4,12)
2:c Q(2,10)
3:b Q(4,12)
F1
a
b
-1
ONE
F2
a
b
cc
-1
^ 2d
2
d
^ 2
Figure 1.9. (a) Initial TED. (b) TED with bitwidth annotation in nodes.
Tds 06> compute −t −b
Tds 07> show pr e c i s i on c om put e d t e d . dot
4:a Q(4,16)
1:d Q(4,12)
2:c Q(2,10)
3:b Q(4,12)
F1
a
Q(7,16)
b = 7 , 1 6
b
-1b=6,12
ONE
b = 4 , 1 6
F2
a
Q(15,40)
b=15,40
b
b=14,36
c
b=14,40
b = 4 , 1 2
b = 6 , 1 2
c
-1b=5,12 b=14,36
b=14,36
^ 2 b = 4 , 2 0d
2b=8,22
b=10,24
d
b = 8 , 2 4
b = 2 , 1 0
b = 5 , 1 2
b = 4 , 1 2
b = 4 , 1 2
b = 4 , 1 2
^ 2 b = 8 , 2 4
b = 8 , 2 4
Figure 1.10. TED with bitwidth annotation for exact computation.
TDS system manual, by Daniel Gomez-Prado 22
24. Tds 08> ted2dfg −f
Tds 09> compute −b −d
Tds 10> show −−i ds −d pr e c i s i on c om put e d df g . dot
F1
+[12] Q(7,16)
F2
*[22] Q(15,40)
Q(4,16)a [11]
+[21]
Q(2,10)c [4]
+[5]
*[16] *[15]
Q(4,12)d [3]
*[13]
*[18]
Q(5,12)
-[10]
Q(4,12)b [8]
Q(6,12)
Q(5,16)
Q(4,20)
+[17]
Q(8,24)
Q(9,24)
+[19]
Q(2)2[14]
Q(4,10)
Q(8,22)
Q(10,24)
Figure 1.11. DFG with bitwidth annotation for exact computation.
Tds 11> s e t −r d : [ 0. 128 , 1. 123432] b :[ −5.3223 ,321.32 e−3] c : [ 0 , 1 ]
Tds 12> s e t −e F1 : 1. 324 F2 : 0. 983
Tds 13> compute −d −g
Tds 14> optimize
[========================================] 100%
Done . Type ” info −d” f or more i nf or m a t i on
Tds 15> show −−i ds −d p r e c i s i o n o p t i m i z e d d f g . dot
TDS system manual, by Daniel Gomez-Prado 23
26. 1.4.6 Linearization
The TED data structure can have multiple edges as shown in the TED of Figure 1.13(a),
nonetheless all internal nodes can be forced to have only additive-edges and multiplicative-
edges through linearization as shown in Figure 1.13(b).
Tds 01> poly a ˆ2+ b+c +(3∗( b+c )∗ d+e )∗ a
Tds 021> show
Tds 03> l i n e a r i z e
Tds 04> bottom a
Tds 04> show
The node with variable a in Figure 1.13(a) has an edge connected to node ONE with
power ˆ2, which represents a term a2
. After linearizing the TED, it can be observed that the
edge has been replaced by two nodes a[1] and a[2] both of type a.
F0
a
b b
ONE
^ 2
c c
d
3
3
e
F0
b
c
d
a[1]
a[2]
ONE
a[1]
e 3
(a) (b)
Figure 1.13. (a) TED for function F0 = a2
+ b+ c + (3(b+ c)d + e)a. (b) linearized TED.
TDS system manual, by Daniel Gomez-Prado 25
27. 1.4.7 Decomposition
A TED can be further decomposed in terms of chain of adders or chain of multipliers
by using the command decompose.
Listing 1.15. Annotating bitwidth in TED
Tds 01> decompose −h
NAME
decompose − Decompose the TED in i t s Normal Factor Form .
SYNOPSIS
decompose [−a|−−pt|−−s t ] [−−f o r c e ]
OPTIONS
−h , −−help
P r i n t t h i s message .
−−f o r c e
perform a g g r e s s i v e e x t r a c t i o n by t r e a t i n g support as primary o u t p u t s
−−s t
Decompose a l l sum terms a v a i l a b l e in the c u r r e n t TED
−−pt
Decompose a l l product terms a v a i l a b l e in the c u r r e n t TED
−a , −−a l l
Decompose changing the given order i f n e c e s s a r y u n t i l
the e n t i r e TED i s reduced to a s i n g l e node
DETAILS
The TED must be l i n e a r i z e d f i r s t
SEE ALSO
show
Tds 01>
The command decompose –all applied to the TED shown in Figure 1.13(b) results in
the TED shown in Figure 1.14. The resulting TED contains pseudo outputs labeled PT and
ST corresponding to product terms and sum terms respectively.
F0
ST2
ONE
PT1
a[1]
a[2]
PT2
a[1]
d
3
PT3
e
a[1]
PT4
ST1
PT2
ST1
b
c
ST2
PT4
PT1
PT3
Figure 1.14. Decomposed TED.
TDS system manual, by Daniel Gomez-Prado 26
28. 1.4.8 TED to DFG Transformations
Although the TED data structure is canonical, that is given a fixed variable ordering
the representation of its data structure is unique, the DFG generated from the TED is not
unique.
Listing 1.16. Annotating bitwidth in TED
Tds 01> ted2dfg −h
NAME
ted2dfg − Generate a DFG from the TED .
SYNOPSIS
ted2dfg [ method ]
OPTIONS
−h , −−help
P r i n t t h i s message .
[ method ]
−l , −−l a t e n c y [−−cluster , −−clusterA , −−c l u s t e r D ] [−−l e v e l ]
Generate a DFG by b a l a n c i n g a l l o p e r a t i o n s on i t
Sub option : −−c l u s t e r A
Generate the DFG using only TEDs in c l u s t e r A .
Sub option : −−c l u s t e r D
Generate the DFG using only TEDs in c l u s t e r D .
Sub option : −−c l u s t e r
Generate the DFG using TEDs in both c l u s t e r A and c l u s t e r D .
Sub option : −−l e v e l
Maintain the delay l e v e l of e x t r a c t e d boxes in the N e t l i s t .
−n , −−normal [−−cluster , −−clusterA , −−c l u s t e r D ]
Generate a one to one t r a n s l a t i o n of the DFG through a TED NFF t r a v e r s a l [DEFAULT BEHAVIOR ] .
−f , −−f a c t o r , −−f l a t t e n [−−show ]
F l a t t e n the DFG by f a c t o r i z i n g common terms in the TED graph .
Sub option : −−show
Treat each f a c t o r found as a pseudo output in the DFG graph .
DETAILS
Most of the times t h i s c o n s t r u c t i o n i s made i m p l i c i t . For i n s t a n c e when
an o p e r a t i o n in a DFG i s r e q u e s t e d ( i . e . show −d ) and no DFG e x i s t yet
an i m p l i c i t conversion occurs . I f a DFG a l r e a d y e x i s t t h i s command w i l l
o v e r w r i t e i t .
EXAMPLE
poly X = a−b+c
poly Y = a+b−c
poly F = X+Y
dfg2ted −−normal
show −−dfg
echo produces a DFG with o u t p u t s X,Y and F
purge −−dfg
dfg2ted −−f a c t o r
show −−dfg
echo polynomials X and Y d i s a p p e a r in the DFG as e v a l u a t i o n of F ,
echo the r e s u l t i n g polynomial F = 2∗a , has no record of X or Y
SEE ALSO
t e d 2 n t l , ntl2ted , dfg2ted , dfg2ntl
Tds 01>
Continuing the example shown in Figure 1.13(a), the command ted2dfg can be used
to transform the TED data structure into a DFG data structure. Three different DFGs are
shown in Figure 1.15.
Tds 01> poly a ˆ2+ b+c +(3∗( b+c )∗ d+e )∗ a
Tds 02> l i n e a r i z e
Tds 03> bottom a
Tds 04> ted2dfg −−normal
Tds 05> purge −d
Tds 06> ted2dfg −−f a c t o r
Tds 07> purge −d
Tds 08> ted2dfg −−l e v e l
TDS system manual, by Daniel Gomez-Prado 27
29. F0
+
a
** *
+
e
+
c
*
1
+
d
*
3
*
b
F0
+
a
* *
*
e
+
b
+
c
*
1
+
d
*
3
(a) (b)
F0
+
a
+
*
*
e 1
+
d
*
3
*
b
+
c
(c)
Figure 1.15. DFG generated through: (a) a normal factor form transformation. (b) factor-
ization transformation. (c) levelized and balanced transformation.
TDS system manual, by Daniel Gomez-Prado 28
30. 1.4.9 Replacing constant multiplication with shifters
All constant multiplications, that is, multiplications represented by weight on edges
within the TED data structure can be replaced by a series of shift operations. The first step
to replace constant multiplications by shifters is to force the TED data structure to consider
all weights on edges as a factor of constant node 2.
Tds 01> poly F1=a ∗91+( b∗a)∗77−b∗7
TDS 02> show
Tds 03> s h i f t e r
Tds 04> show
F1
a
7
b
-1
b
ONE
1 3 1 1
F1
const_2
a
^ 3 - 1
a
^ 2 ^ 4 - 1 ^ 5 - 1 ^ 7
b b
ONE
-1
b
(a) (b)
Figure 1.16. (a) TED with implicit constant multiplication on edges. (b) TED with constant
multiplications explecitly represented by variable const 2
The DFG generated for the TED shown in Fgure 1.17(b) is shown in Figure ??(a).
Tds 05> show −d
TDS 06> remapshift
Tds 07> show −d
Tds 08> balance −d
Tds 08> show −d
TDS system manual, by Daniel Gomez-Prado 29
31. F1
+
b
-
+
+
a
**
1
+ *
const_2
*
*
*
*
**
* *
-
*
-
-
F1
+
b
-
+
+
a
* *
1
+< <
< < < < < < < <
2
-
3
-
4
-
5 7
(a) (b)
F1
+
a
**
1
+-
b
+ < < < < < < < <
7
-
< < +
3 2
-
4
+
5
(c)
Figure 1.17. (a) DFG corresponding to TED in Figure 1.17(b). (b) DFG with replaced
shifters. (c) Balanced DFG.
TDS system manual, by Daniel Gomez-Prado 30