Circuit Partitioning for VLSI Layout presented by Oveis Dehghantanhaoveis dehghantanha
Application of Evolutionary Algorithms
for Multi-objective Optimization in VLSI
and Embedded Systems Chapter 3 presented by Oveis Dehghantanha in University of Birjand
Circuit Partitioning for VLSI Layout presented by Oveis Dehghantanhaoveis dehghantanha
Application of Evolutionary Algorithms
for Multi-objective Optimization in VLSI
and Embedded Systems Chapter 3 presented by Oveis Dehghantanha in University of Birjand
Robust Fault Tolerance in Content Addressable Memory InterfaceIOSRJVSP
With the rapid improvement in data exchange, large memory devices have come out in recent past. The operational controlling for such large memory has became a tedious task due to faster, distributed nature of memory units. In the process of memory accessing it is observed that data written or fetched are often encounter with fault location and faulty data are written or fetched from the addressed locations. In real time applications, this error cannot be tolerated as it leads to variation in the operational condition dependent on the memory data. Hence, It is required to have an optimal controlling fault tolerance in content addressable memory. In this paper, we present an approach of fault tolerance approach by controlling the fault addressing overhead, by introducing a new addressing approach using redundant control modeling of fault address unit. The presented approach achieves the objective of fault controlling over multiple fault location in different dimensions with redundant coding.
Built-in Self Repair for SRAM Array using RedundancyIDES Editor
In this paper, a built-in self repair technique for
word-oriented two-port SRAM memories is presented. The
technique is implemented by additional hardware design
instead of traditional software diagnostic procedures and the
computation time is minimized. A built-in self-test (BIST) is
used to detect the faulty locations which are isolated
immediately after detection. Therefore, the redirection process
can be executed as soon as possible. Spare rows are used to
replace the faulty rows. The hardware overhead of the
automatic fault isolation design depends on size of memory
system. All the repairs using BISR circuit are done at power
on.
Robust Fault Tolerance in Content Addressable Memory InterfaceIOSRJVSP
With the rapid improvement in data exchange, large memory devices have come out in recent past. The operational controlling for such large memory has became a tedious task due to faster, distributed nature of memory units. In the process of memory accessing it is observed that data written or fetched are often encounter with fault location and faulty data are written or fetched from the addressed locations. In real time applications, this error cannot be tolerated as it leads to variation in the operational condition dependent on the memory data. Hence, It is required to have an optimal controlling fault tolerance in content addressable memory. In this paper, we present an approach of fault tolerance approach by controlling the fault addressing overhead, by introducing a new addressing approach using redundant control modeling of fault address unit. The presented approach achieves the objective of fault controlling over multiple fault location in different dimensions with redundant coding.
Built-in Self Repair for SRAM Array using RedundancyIDES Editor
In this paper, a built-in self repair technique for
word-oriented two-port SRAM memories is presented. The
technique is implemented by additional hardware design
instead of traditional software diagnostic procedures and the
computation time is minimized. A built-in self-test (BIST) is
used to detect the faulty locations which are isolated
immediately after detection. Therefore, the redirection process
can be executed as soon as possible. Spare rows are used to
replace the faulty rows. The hardware overhead of the
automatic fault isolation design depends on size of memory
system. All the repairs using BISR circuit are done at power
on.
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
The design of more complex systems becomes an increasingly difficult task because of different issues
related to latency, design reuse, throughput and cost that has to be considered while designing. In
Real-time applications there are different communication needs among the cores. When NoCs (Networks
on chip) are the means to interconnect the cores, use of some techniques to optimize the
communication are indispensable. From the performance point of view, large buffer sizes ensure
performance during different applications execution. But unfortunately, these same buffers are the main
responsible for the router total power dissipation. Another aspect is that by sizing buffers for the worst case
latency incurs in extra dissipation for the mean case, which is much more frequent. Reconfigurable router
architecture for NOC is designed for processing elements communicate over a second
communication level using direct-links between another node elements. Several possibilities to use the
router as additional resources to enhance complexity of modules are presented. The reconfigurable router
is evaluated in terms of area, speed and latencies. The proposed router was described in VHDL and used
the ModelSim tool to simulate the code. Analyses the average power consumption, area, and frequency
results to a standard cell library using the Design Compiler tool. With the reconfigurable router it was
possible to reduce the congestion in the network, while at the same time reducing power dissipation and
improving energy.
A Simplied Bit-Line Technique for Memory Optimizationijsrd.com
High fan-in and fan-out in read-write of memory requires more area, power, and causes large propagation delay .The number of transistor counts also increases due to large fan-in and fan-out. A simplified bit line technique for power optimization of memory proposed consumes steady power, requires less number of transistors and hence reduces the propagation delay for any fan-in and fan-out of read-write memory. Adopting simplified bit line technique, we implemented 32-word 16-bits/word, 32-word 16-bits/word and so on, 1-read, 1-write ported register files in a 1.2-V/2.5V. By using this technique 2n word x m-bits/words can be achieved with steady power consumption of 2.4mW for 1.2V/2.5V, this power consumption can be further reduced to half of present level by constraining the parameters such as temperature, speed, frequency of operation etc for processing technology.
TYPES OF PLACEMENT,GOOD PLACEMENT VS. BAD PLACEMENT ,ALGORITHMS, ASIC DESIGN FLOW DIAGRAM,DEFINITION OF PLACEMENT,TECHNIQUES USED FOR PLACEMENT,PLACEMENT TRENDS,SOLUTIONS
Blue Gene_SM
Introduction
The word "supercomputer" entered the mainstream lexicon in 1996 and 1997 when IBM's Deep Blue supercomputer challenged the world chess champion in two tournaments broadcast around the world.
Since then, IBM has been busy improving its supercomputer technology and tackling much deeper problems.
Their latest project, code named Blue Gene, is poised to shatter all records for computer and network performance.
What is a Super Computer
A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.
Today, supercomputers are typically one-of-a-kind custom designs produced by "traditional" companies such as Cray, IBM and Hewlett-Packard, who had purchased many of the 1980s companies to gain their experience.
Why we need Super Computers
Supercomputers are very useful in highly calculation-intensive tasks such as
Problems involving quantum physics,
Weather forecasting,
Climate research,
Molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals),
Physical simulations (such as simulation of airplanes in wind tunnels, simulation of the detonation of nuclear weapons, and research into nuclear fusion).
Why we need Super Computers
Also, they are useful for a particular class of problems, known as Grand Challenge problems, full solution for such problems require semi-infinite computing resources.
NASA™s Linux-based Super Computer
Why Supercomputers are Fast
Several elements of a supercomputer contribute to its high level of performance:
Numerous high-performance processors (CPUs) for parallel processing
Specially-designed high-speed internal networks
Specially-designed or tuned operating systems
What is Blue gene
Blue Gene is a computer architecture project designed to produce several supercomputers that are designed to reach operating speeds in the PFLOPS (petaFLOPS = 1015) range, and currently reaching sustained speeds of nearly 500 TFLOPS (teraFLOPS = 1012).
It is a cooperative project among IBM(particularly IBM Rochester and the Thomas J. Watson Research Center), the Lawrence Livermore National Laboratory, the United States Department of Energy (which is partially funding the project), and academia.
Why Blue Gene
Blue Gene is an IBM Research project dedicated to exploring the
frontiers in supercomputing:
in computer architecture,
in the software required to program and control massively parallel systems, and
in the use of computation to advance the understanding of important biological processes such as protein folding.
Learning more about biomolecular mechanisms is expected to give medical researchers better understanding of diseases, as well as potential cures.
Why the name Blue gene
Blue - The corporate color of IBM
Gene - The intended use of the Blue Gene clusters was for Computational biology.
Blue Gene Projects
Designing a Fault-Tolerant Channel Extension Network for Internal Recoveryicu812
Originally published in the July 2006 issue of Technical Support magazine, this article by Mike Smith provides both an overview and best practices to be considered when designing a data replication network to support mainframe and data center recoverability.
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
With the shrinking technology, reduced scale and power-hungry chip IO leads to System on Chip. The design of SOC using traditional standard bus scheme encounters with issues like non-uniform delay and routing problems. Crossbars could scale better when compared to buses but tend to become huge with increasing number of nodes. NOC has become the design paradigm for SOC design for its highly regularized interconnect structure, good scalability and linear design effort. The main components of an NoC topology are the network adapters, routing nodes, and network interconnect links. This paper mainly deals with the implementation of full custom SRAM based arrays over D FF based register arrays in the design of input module of routing node in 2D mesh NOC topology. The custom SRAM blocks replace DFF(D flip flop) memory implementations to optimize area and power of the input block. Full custom design of SRAMs has been carried out by MILKYWAY, while physical implementation of the input module with SRAMs has been carried out by IC Compiler of SYNOPSYS.The improved design occupies approximately 30% of the area of the original design. This is in conformity to the ratio of the area of an SRAM cell to the area of a D flip flop, which is approximately 6:28.The power consumption is almost halved to 1.5 mW. Maximum operating frequency is improved from 50 MHz to 200 MHz. It is intended to study and quantify the behavior of the single packet array design in relation to the multiple packet array design. Intuitively, a
common packet buffer would result in better utilization of available buffer space. This in turn would translate into lower delays in transmission. A MATLAB model is used to show quantitatively how performance is improved in a common packet array design.
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
With the shrinking technology, reduced scale and power-hungry chip IO leads to System on Chip. The design of SOC using traditional standard bus scheme encounters with issues like non-uniform delay and routing problems. Crossbars could scale better when compared to buses but tend to become huge with increasing number of nodes. NOC has become the design paradigm for SOC design for its highly regularized interconnect structure, good scalability and linear design effort. The main components of an NoC topology are the network adapters, routing nodes, and network interconnect links. This paper mainly deals with the implementation of full custom SRAM based arrays over D FF based register arrays in the design of input module of routing node in 2D mesh NOC topology. The custom SRAM blocks replace D FF(D flip flop) memory implementations to optimize area and power of the input block. Full custom design of SRAMs has been carried out by MILKYWAY, while physical implementation of the input module with SRAMs has been carried out by IC Compiler of SYNOPSYS.The improved design occupies approximately 30% of the area of the original design. This is in conformity to the ratio of the area of an SRAM cell to the area of a D flip flop, which is approximately 6:28.The power consumption is almost halved to 1.5 mW. Maximum operating frequency is improved from 50 MHz to 200 MHz. It is intended to study and quantify the behavior of the single packet array design in relation to the multiple packet array design. Intuitively, a
common packet buffer would result in better utilization of available buffer space. This in turn would translate into lower delays in transmission. A MATLAB model is used to show quantitatively how performance is improved in a common packet array design.
Cache coherency controller for MESI protocol based on FPGA IJECEIAES
In modern techniques of building processors, manufactures using more than one processor in the integrated circuit (chip) and each processor called a core. The new chips of processors called a multi-core processor. This new design makes the processors to work simultanously for more than one job or all the cores working in parallel for the same job. All cores are similar in their design, and each core has its own cache memory, while all cores shares the same main memory. So if one core requestes a block of data from main memory to its cache, there should be a protocol to declare the situation of this block in the main memory and other cores.This is called the cache coherency or cache consistency of multi-core. In this paper a special circuit is designed using very high speed integrated circuit hardware description language (VHDL) coding and implemented using ISE Xilinx software. The protocol used in this design is the modified, exclusive, shared and invalid (MESI) protocol. Test results were taken by using test bench, and showed all the states of the protocol are working correctly.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGAVLSICS Design
Adders form an almost obligatory component of every contemporary integrated circuit. The prerequisite of the adder is that it is primarily fast and secondarily efficient in terms of power consumption and chip area. Therefore, careful optimization of the adder is of the greatest importance. This optimization can be attained
in two levels; it can be circuit or logic optimization. In circuit optimization the size of transistors are manipulated, where as in logic optimization the Boolean equations are rearranged (or manipulated) to optimize speed, area and power consumption. This paper focuses the optimization of adder through technology independent mapping. The work presents 20 different logical construction of 1-bit adder cell in CMOS logic and its performance is analyzed in terms of transistor count, delay and power dissipation. These performance issues are analyzed through Tanner EDA with TSMC MOSIS 250nm technology. From this analysis the optimized equation is chosen to construct a full adder circuit in terms of multiplexer. This logic optimized multiplexer based adders are incorporated in selected existing adders like ripple carry
adder, carry look-ahead adder, carry skip adder, carry select adder, carry increment adder and carry save adder and its performance is analyzed in terms of area (slices used) and maximum combinational path delay as a function of size. The target FPGA device chosen for the implementation of these adders was Xilinx ISE 12.1 Spartan3E XC3S500-5FG320. Each adder type was implemented with bit sizes of: 8, 16, 32, 64 bits. This variety of sizes will provide with more insight about the performance of each adder in terms of area and delay as a function of size.
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
Video Compression is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices and for various multimedia applications. Video compression is primarily achieved by Motion Estimation (ME) process in any video encoder which contributes to significant compression gain.Sum of Absolute Difference (SAD) is used as distortion metric in ME process.In this paper, efficient Absolute Difference(AD)circuit is proposed which uses Brent Kung Adder(BKA) and a comparator based on modified 1’s complement principle and conditional sum adder scheme. Results shows that proposed architecture reduces delay by 15% and number of slice LUTs by 42% as compared to conventional architecture. Simulation and synthesis are done on Xilinx ISE 14.2 using Virtex 7 FPGA.
Design and Implementation of Submicron Level 10T Full Adder in ALU Using Cell...IJERA Editor
As technology scales into the nanometer regime leakage current, active power, delay and area are becoming important metric for the analysis and design of complex circuits. The main concern in mobile and battery based systems are leakage current and power dissipation. A transistor resizing approach for 10 transistor single bit full adder cells is used to determine optimal sleep transistor size which reduces power dissipation and leakage current. A submicron level 10-transistor single bit full adder cell is considered to achieve low leakage current, reduced power dissipation and high speed. In this paper initially 10T full adder cell is designed with submicron technique and later this is employed to design an ALU adder unit. The modified ALU is simulated and synthesized successfully on cadence 180nm technology.
Similar to Partition problem IN VLSI algorithem automation (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Partition problem IN VLSI algorithem automation
1. Algorithms for VLSI Physical
Design Automation (ME-III SEM)
prepared by :
ANIL KUMAR SAHU(Assistant Professor,
SSCET)
1
ANIL(AP-ETC),SSCET BHILAI
2. Partitioning
Efficient designing of any complex system
necessitates decomposition of the same into a set of
smaller subsystems. Subsequently,
each subsystem can be designed independently and
simultaneously to speed up the design process.
The process of decomposition is called partitioning.
Partitioning efficiency can be enhanced within three
broad parameters.
First of all, the system must be design time.
2
ANIL(AP-ETC),SSCET BHILAI
3. decomposed carefully so that the original functionality
of the system remains intact.
Secondly, an interface specification is generated
during the decomposition ,which is used to connect all
the subsystems.
The system decomposition should ensure
minimization of the interface interconnections
between any two subsystems.
Finally, the decomposition process should be simple
and efficient so that the time required for the
decomposition is a small fraction of the total
3
ANIL(AP-ETC),SSCET BHILAI
4. Further partitioning may be required in the events
4
where the size of a subsystem remains too large to be
designed efficiently.
Thus, partitioning can be used in a hierarchical
manner until each subsystem created has a
manageablesize.
Partitioning is a general technique and finds
application in diverse areas.
For example, in algorithm design, the divide and
conquer approach is routinely used to partition
complex problems into smaller and simpler problems.
The increasing popularity of the parallel computation
techniques brings in its fold promises in terms of
provision of innovative tools for solution of complex
problems,
ANIL(AP-ETC),SSCET BHILAI
by combining partitioning and parallel processing
techniques.
5. Partitioning plays a key role in the design of a
5
computer system in general,
and VLSI chips in particular.
A computer system is comprised of tens of millions of
transistors. It is partitioned into several smaller
modules/blocks for facilitation of the design process.
Each block has terminals located at the periphery that
are used to connect the blocks.
The connection is specified by a netlist, which is a
collection of nets. A net is a set of terminals which
have to be made electrically equivalent.
Figure 5.1 (a) shows a circuit, which has been
partitioned into three subcircuits.
Note that the number of interconnections between
any two partitions is four (as shown in Figure 5.1(b)).
ANIL(AP-ETC),SSCET BHILAI
6. A VLSI system is partitioned at several levels due
6
to its complexity.
The partitioning of a system into a group of PCBs
is called the system level
partitioning. The partitioning of a PCB into chips is
called the board level
partitioning while the partitioning of a chip into
smaller subcircuits is called the chip level
partitioning.
At each level, the constraints and objectives of
the partitioning process are different as discussed
ANIL(AP-ETC),SSCET BHILAI
below.
7. The circuit assigned to a PCB must satisfy certain
constraints. Each PCB usually has a fixed area, and a
fixed number of terminals to connect with other boards.
The number of terminals available in one board
(component) to connect to other boards (components) is
called the terminal count of the board (component).
For example, a typical board has dimensions 32 cm×15
cm and its terminal count is 64. Therefore, the subcircuit
allocated to a board must be manufacturable within the
dimensions of the board.
In addition, the number of nets used to connect this board
to the other boards must be within the terminal count of the
board.
System Level Partitioning
7
ANIL(AP-ETC),SSCET BHILAI
8. The reliability of the system is inversely proportional
8
to the number of boards in the system.
Hence, one of the objectives of partitioning is to
minimize the number of boards.
Another important objective is the optimization of the
system performance.
Partitioning must minimize any degradation of the
performance caused by the delay due to the
connections between components on different
boards.
The signal carried by a net that is cut by partitioning
at this level has to travel from one board to another
board through the system bus.
The system bus is very slow as the bus has to
adhere to some strict specifications so that a variety
ANIL(AP-ETC),SSCET BHILAI
9. of different boards can share the same bus.
The delay caused by signals traveling between PCBs
(off-board delay) plays a major role in determining the
system performance as this delay is much larger than
the on-board or the on-chip delay.
9
ANIL(AP-ETC),SSCET BHILAI
11. The board level partitioning faces a different set of
constraints and fulfills a different set of objectives as
opposed to system level partitioning.
Unlike boards, chips can have different sizes and can
accommodate different number of terminals.
Typically the dimensions of a chip range from 2 mm×2 mm
to 25 mm×25 mm.
The terminal count of a chip depends on the package of
the chip.
A Dual
In-line Package (DIP) allows only 64 pins while a Pin Grid
Array (PGA) package may allow as many as 300 pins.
Board Level Partitioning:
11
ANIL(AP-ETC),SSCET BHILAI
12. While system level partitioning is geared towards
12
satisfying the area and the terminal constraints of
each partition,
board level partitioning ventures to minimize the
area of each chip.
The shift of emphasis is attributable to the cost of
manufacturing a chip that is proportional to its
area.
In addition, it is expedient that the number of
chips used for each board be minimized for
enhanced board reliability.
ANIL(AP-ETC),SSCET BHILAI the
Minimization of
13. number of chips is another important determinant of
13
performance because the off-chip delay is much
larger than the on-chip delay.
This differential in delay arises because the distance
between two adjacent transistors on a chip is a few
while the distance between two adjacent chips is in
mm.
In addition to traversing a longer distance, the signal
has to travel between chips, and through the
connector.
The connector used to attach the chip to the board
typically has a high resistance and contributes
significantly to the signal delay.
Figure 5.3 shows the different kinds of delay in a
computer system. In Figure 5.3(b), the off-board delay
is compared with
ANIL(AP-ETC),SSCET BHILAI the on-board delay while the off-chip
delay is compared with the on-chip delay in Figure
16. The circuit assigned to a chip can be fabricated as
a single unit, therefore, partitioning at this level is
necessary.
A chip can accommodate as many as three million or
more transistors. The fundamental objective of chip
level partitioning is to facilitate efficient design of the
chip.
After partitioning, each sub circuit, which is also called
a block, can be designed independently using either
full custom or standard cell design style.
Since partitioning is not constrained by physical
dimensions, there is no area constraint for any
partition. However, the partitions may be restrained by
user specified area constraints for optimization of the
design Level
Chip process. Partitioning
16
ANIL(AP-ETC),SSCET BHILAI
17. The terminal count for a partition is given by the
17
ratio of the perimeter of the partition to the
terminal pitch.
The minimum spacing between two adjacent
terminals is called terminal pitch and is
determined by the design rules. The number of
nets which connect a partition to other
partitions cannot be greater than the terminal
count of the partition.
In addition, the number of nets cut by partitioning
should be minimized to simplify the routing task.
The minimization of the number of nets cut by
partitioning is one of the most important
ANIL(AP-ETC),SSCET BHILAI
objectives in partitioning.
18. The terminal count for a partition is given by the ratio of the
18
perimeter
of the partition to the terminal pitch. The minimum spacing
between
two adjacent terminals is called terminal pitch and is
determined by the
design rules. The number of nets which connect a partition
to other
partitions cannot be greater than the terminal count of the
partition. In
addition, the number of nets cut by partitioning should be
minimized to
simplify the routing task. The minimization of the number of
nets cut
ANIL(AP-ETC),SSCET BHILAI
by partitioning is one of the most important objectives in
partitioning.
19. A disadvantage of the partitioning process is that it may degrade the
performance of the final design. Figure 5.4(a) shows two components
A and B which are critical to the chip performance, and therefore, must
be placed close together.
However, due to partitioning, components A and B may be assigned to
different partitions and may appear in the final layout as shown in Figure
5.4(b).
it is easy to see that the connection between A and B is very long,
leading to a very large delay and degraded performance.
Thus, during partitioning, these critical components should be assigned
to the same partition. If such an assignment is not possible,
Then appropriate timing constraints must be generated to keep the two
critical components close together. Chip performance is determined by
several components forming a critical path. Assignment of these
components to different partitions extends the length of the critical path.
Thus, a major challenge for improvement of system performance is
19
ANIL(AP-ETC),SSCET BHILAI
minimization of the length
of critical path.
21. At any level of partitioning, the input to the partitioning algorithm is a
21
set of components and a netlist.
The output is a set of sub circuits which when connected, function as
the original circuit and terminals required for each sub circuit to
connect it to the other sub circuits.
in addition to maintaining the original functionality, partitioning
process optimizes certain parameters subject to certain constraints.
The constraints for the partitioning problem include area constraints
and terminal constraints. The objective functions for a partitioning
problem include the minimization of the number of nets that cross the
partition boundaries, and the minimization of the maximum number
of times a path crosses the partition boundaries.
The constraints and the objective functions used in the partitioning
problem vary depending upon the partitioning level and the design
style used. The actual objective function and constraints chosen for
the partitioning problem may also depend on the specific problem.
ANIL(AP-ETC),SSCET BHILAI