cd-2-Batch id-33

i
Design of RISC Processor (64-bit) IP Core
Project Report Submitted in Partial Fulfillment of the Requirements for the
Degree of
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
BY
ANUSHAVARSALA (09241A0463)
HEMA HARISHA PONNAM (09241A0474)
RADHIKA REDDY PEDDAMALLU (09241A0490)
YAMINI SINDHU BOTCHA (09241AO4C0)
Under the Esteemed Guidance of
Mr. Mannem kiran
Associate Professor
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
GOKARAJU RANGARAJU INSTITUTE OF
ENGINEERING AN D TECHNOLOGY
(Affiliated to Jawaharlal Nehru Technological University)
HYDERABAD 500 090
2013

ii
Department of Electronics and Communication Engineering
Gokaraju Rangaraju Institute of Engineering and Technology
(Affiliated to Jawaharlal Nehru Technological University)
Hyderabad 500 090
2013
Certificate
This is to certify that this project report entitled Design of RISC
Processor (64bit) IP Core by Anusha (09241A0463), Harisha (09241A0474),Radhika
Reddy(09241A0490) Yamini Sindhu Botcha (Roll no 09241A04C0) submitted in
partial fulfillment of the requirements for the degree of Bachelor of Technology in
Electronics and Communication Engineering of the Jawaharlal Nehru Technological
University. Hyderabad, during academic year 2009-2013, is a bonafide record of work
carried out under our guidance and supervision.
M.Kiran Dr.Ravi Billa
Associate Professor (Head of Department)
(Internal Guide) (External Examiner)

iii
Acknowledgement
It is my pleasure to express thanks to Mr.M.Kiran for the encouragement
and guidance throughout the course of this project.
I thank Mr. Manchalla Omkara Venkata Pavan Kumar, Associate
professor helping us for the successful completion of the project.
I thank Mr. Ravi Billa, HOD ECE department for helping us for the
completion of project.
V. Anusha __________________
P. Hema Harisha __________________
P. Radhika __________________
B. Yamini Sindhu __________________

iv
Abstract
The RISC (Reduced Instruction Set Computer) is a CPU design strategy using small
instruction set compared to CISC (Complex Instruction Set Computer) processor. It is
designed to achieve faster execution of instructions, within one clock cycle. The RISC'
processor is designed to incorporate basic instructions involving Arithmetic, Logical, Data
Transfer and Control Instructions. All instruction will have simple register addressing. An
important aspect instruction set is that it is easy to decode (Fixed length instruction
format).Thus the OpCode and Instruction Register fields can be accessed simultaneously. To
implement these instructions the design incorporates various design blocks like Control Unit
(CU), Arithmetic Logic Unit (ALU), and Accumulator (ACC). Program Counter (PC).
Instruction Register (IR), Memory, Clock generator, Register and additional glue logic. The
Instruction format contains first four MSB bits as OPCOPDE and remaining 28bits as
Address bus. It can address 256Gbytes of memory location and 64-bit Bi-directional Data
Bus.
Implementation details
HDL : Verilog
Design : 64-bit (IP core)
Simulator : Cadence Tools

v
Contents
List of figures vii
Abbreviations viii
Chapter1: INTRODUCTION 1
1.1 IP core 1
1.2 Aim of the Project 1
1.3 Methodology 2
Chapter 2: LITERATURE REVIEW 3
2.1 Introduction 4
2.2 Importance 9
2.3 Organization of the report 9
2.4 Application areas 10
2.5 Typical Design Flow 10
Chapter 3: IMPLEMENTATION 11
3.1 Description 11
3.2 Code 13
3.3 ALU (Arithmetic and Logic Unit) 13
3.3.1 Algorithm 14
3.3.2 Code 14
3.3.3 Code Explanation 14
3.3.4 Waveforms 16
3.3.5 Waveform Explanation 16
3.4 Memory 16
3.4.1 Algorithm 16
3.4.2 Code 16
3.4.4 Waveforms 16
3.5 Control and Decoder 17
3.5.1 Algorithm 17
3.5.2 Code 18
3.5.4 Waveforms 21
3.6 Program Counter 21
3.6.1 Algorithm 21
3.6.2 Code 22
3.6.4 Waveforms 22
3.7 Instruction Register 23
3.7.1 Algorithm 23
3.7.2 Code 23
3.7.4 Waveforms 24
3.8 Internal Register 25
3.8.1 Algorithm 25

vi
3.8.2 Code 25
3.8.4 Waveforms: 27
3.9 Tristate Buffer 27
3.9.1 Algorithm 27
3.9.2 Code 27
3.9.4 Waveform 28
4.0 64-bit 8:1 Multiplexers 29
4.0.1 Multiplexer A 29
4.0.2 Multiplexer B 31
4.1 6-bit 2:1 Multiplexer 33
4.1.1 Algorithm 33
4.1.2 Code 33
4.1.4 Waveforms 34
4.2 Clock Generator 34
4.2.1 Algorithms 34
4.2.2 Code 34
4.2.4 Waveforms 35
4.3 Top Module 36
4.3.1 Algorithm 36
4.3.2 Code 36
4.3.4 Waveforms 38
4.4 Operation 39
Chapter 4: SCHEMATIC RESULTS 41
4.1 ALU 41
4.2 Memory 42
4.3 Control and Decoder 44
4.4 Program Counter 44
4.5 Instruction Register 44
4.6 Internal Register 45
4.7 Tristate Buffer 45
4.8 Multiplexer A 45
4.9 Multiplexer B 46
4.10 Clock Generator 46
4.11TopModule 47
Conclusion 48
Future scope 48
References 48

vii
List of figures
Figure 1.1:multiply two numbers 7
Figure 2.1:VLSI Design Flow 10
Figure 4.1: RISC Architecture 11
Figure 5.1: ALU Block diagram 13
Figure 5.2: ALU Waveforms 15
Figure 5.3: Memory Block diagram 15
Figure 5.4: Memory Waveforms 16
Figure 5.5: Control & Decode Block diagram 17
Figure 5.6: Control & Decoder Waveforms 21
Figure 5.7: Program Counter Block diagram 21
Figure 5.8:Program Counter Waveforms 22
Figure 5.9:Instruction Register Block diagram 23
Figure 5.10:Instruction Register Waveforms 24
Figure 5.11:Internal Register Block diagram 25
Figure 5.12: Internal Register Waveforms 27
Figure 5.13: Tristate Buffer Block diagram 27
Figure 5.14: Tistate Buffer Waveforms 28
Figure 5.15: Multiplexer A Block diagram 29
Figure 5.16:Multiplexer A Waveforms 30
Figure 5.17: Multiplexer B Block diagram 31
Figure 5.18: Multiplexer B Waveforms 32
Figure 5.19: Multiplexer Block diagram 33
Figure 5.20: Multiplexer Waveforms 34
Figure 5.21:Clock generator Block diagram 34
Figure 5.22: Clock generator Waveforms 35
Figure 5.23: Top Module Block diagram 36
Figure 5.24: Top Module Waveform (1) 37
Figure 5.25: Top Module Waveform (2) 38
Figure 6.1:Alu Schematic Diagram (1) 41
Figure 6.6: Memory Schematic Diagram (1) 42
Figure 6.11: Control & Decoder Schematic Diagram 44
Figure 6.12: Program Counter Schematic Diagram 44
Figure 6.13: Instruction Register Schematic Diagram 44
Figure 6.14: Internal Register Schematic Diagram (1) 45
Figure 6.15: Internal Register Schematic Diagram (2) 45
Figure 6.16: Multiplexer A Schematic Diagram 45
Figure 6.17: Multiplexer B Schematic Diagram 46
Figure 6.18: Multiplexer Schematic Diagram 46
Figure 6.19: Clock generator Schematic Diagram 46
Figure 6.20: Top Module Schematic Diagram 47

viii
Abbreviations
RISC – Reduced Instruction Set Computer
CISC – Complex Instruction Set Computer
IP core – Intellectual Property
HDL – Hardware Description Language
CAD – Computer Aided Design

1
Chapter 1
INTRODUCTION
The acronym MSC (pronounced risk), for reduced instruction set computing
represents a CPU design strategy emphasizing the insight that simplified instruction s
that “do Less” may still provide for higher performance if this simplicity can be
utilized to make instructions execute very quickly Many proposals for a “precise"
definition have been attempted, and the term is being slowly replaced by the more
descriptive loath-store architecture .Well known RISC families include Alpha, ARC,
ARM, AVR, MIPS, PA-RISC. Power Architecture (including PowerPC), SuperH and
SPARC.
Being an old idea, sonic aspects attributed to the first RISC-labeled designs (mound
1975) include the observations that the memory restricted compilers of the time were
often unable to take advantage of features intended to facilitate coding, and that
complex addressing inherently takes many cycles to perform. It was argued that such
functions would better be performed by sequences of simpler instructions, if this
could yield implementations simple enough to cope with really high frequencies, and
small enough to leave room for many registers. Uniform, fixed length instructions
with arithmetic's restricted to registers to registers were chosen to ease instruction
pipelining in these simple designs, with special load store instructions accessing
memory.
1.1 IP Core:
Introduction: An IP (intellectual property) core is a block of logic or data that used in
making a field programmable gate array (FPGA) or application-specific integrated
circuit (ASIC II) for a product. As essential elements of design reuse, IP cores are part
of the growing electronic design automation (EDA) industry trend towards repeated
use of previously designed components. Ideally, an IP core should be entirely portable
- that is, able to easily be inserted into any vendor technology or design methodology.
Universal Asynchronous Receiver/transmitter (UARTs), Central processing units
(CPUs), Ethernet controllers and PCI interfaces are all examples of IP cores.
IP core fall into one of three categories: hard cores, firm cores or soft cores. Hard
cores are physical manifestations of the design. these are best for plug-and- play
applications, and are less portable and flexible than the other two types of Core, Like
the hard cores, firm (sometimes called semi-hard) cores also carry placement data but
am configurable to various applications. The most flexible of the three, soft cores
exist either as a net list (a list of the logic gates and associated interconnections
making up an integrated circuit or hardware description language) code. This IP Core
is of soft core type. A number of organizations, such as the Free IP Project and Open
Cores have formed to promote open sharing of IP cores.
1.2 Aim of the Project:
RISC processor (Reduced Instruction Set Computer), computer arithmetic logic unit
that uses a minimal instruction set emphasizing the instruction set, emphasizing the
instructions used most often and optimizing them for the fastest possible execution.
Software for RISC processors must handle more operations than traditional CISC
(Complex Instruction Set Computer) processors, but RISC processors have

2
advantages in applications that benefit from faster instruction execution, such as
engineering and graphics workstations and parallel-processing systems.
Objectives:
• The RISC processor is designed to incorporate 20 basic instructions involving
Arithmetic, Logical, Data Transfer and Control instructions.
• An important aspect of the instruction set is that it is easy to decode (Fixed length
instruction format). The striking feature of RISC is that, it executes each instruction
within one clock cycle. This is achieved carrying out most of the operation within
Processor and minimizing the use of frequent operations requiring slower peripherals.
• To implement these instructions the design incorporates various design blocks like
Control Logic Unit (CLU), Arithmetic logic Unit (ALU), Accumulator, Program
Counter (PC), Instruction Register (IR).
• The Instruction format contains first four MSB bits as OPCODE and remaining
28bits as Address bus.
1.3 Methodology:
This project is aimed at designing of a Reduced Instruction Set Computer (RISC)
processor using the Verilog Hardware Description Language (HDL).HDL allowed the
designers to model the concurrency of process found in hardware elements.
Basically the RISC processors are easy to learn because it has
very less but power full instruction sets. And also it has so many internal peripherals.
RISC processor the hardware designs become very compact and cost effective. The
designing steps of RISC processor listed below.
• The functioning of RISC processor has to be described in the Verilog HDL. That is
called design module.
• The test bench program has to be developed to test the design module. The test
bench gives the input to the design module & verifies the outputs. The test bench has
to be written in such way to check the design module in all possible conditions.
• Verilog simulator tool is used to verify the design functioning. (Simulation)
• ALU block of the design module shall be synthesized and the gate level net list shall
be generated. The use of Verilog HDL has many advantage compared to the
traditional schematic based design like
• Designs can be described at very abstract level using HDL... Designers can write
their design description without choosing any specific fabrication technology. If a
new technology emerges, designers do not need to redesign their circuit. They simply
input the design program to the logic synthesis tool and create a new gate level net list
using the new fabrication technology. The logic synthesis tool will optimize the
circuit in area and timing for the new technology and etc.

3
Chapter 2
LITERATURE REVIEW
2.1 Introduction:
Modern integrated circuits are actually three-dimensional. In the Cadence
system, several layers route lines diagonally while others run horizontally and
vertically. As in conventional chips, the multiple levels of wires are separated by
layers of insulating material and interconnected through holes referred to as vias.
Computer chips are among today‟s most complex machines. The complexity is
handled by software tools that allow chip engineers to use specialized programming
languages that directly instruct chip-making equipment.
The Cadence designers say they are confident that the benefits are there. “The
math is clear if you can go diagonally, the wires will be 30 percent shorter,” said Aki
Fujimura, a Cadence senior vice president, who helped develop the technology at
Simplex Solutions, which Cadence acquired in 2002.
Cadence‟s biggest challenge may ultimately be more cultural than technical, said G.
Daniel Hutcheson, president of VLSI Technology, a semiconductor market research
firm in Santa Clara, Calif.
Although the industry has a reputation for innovation, the ruthless pace of
chip-making advances, requiring new systems at 18-month intervals, makes engineers
leery of trying alternative approaches, he said.“They‟re like penguins with the ice
melting around them,” he said. “They keep doing the same thing.”
Commands:
The Commands that are used in cadence for the execution are
1. Initially we should invoke the server and a path should be routed to client.
2. Go to the C environment with the command “csh” //c shell.
3. The source file should be opened by the command “cshrc”.
4. The next command is to go to the directory of cadence_dgital_labs into another
directory of workarea
#cd cadence_digital_labs/workarea
cd- current directory
5. Creating a directory by using the command #mkdir.
6. Files are added to the directory that which we created.
7. Then executing the total file by the command
“irun filename.v -access +rwc –message –gui”.
Rwc –read write command
Gui- graphical unit interface
8. After running the program we get the simulation window.

4
9. After the simulation the waveforms are shown in the other window.
2.2 Importance:
The main difference between RISC and CISC, is that the instruction set of the first
(kind of processors was explicitly designed to allow the sustained execution of
instructions in one cycle as average. CISC processors (in mainframes) can also
approach this objective, but only at the expense of much more hardware logic capable
of reproducing what RISC processors achieve through a streamlined design. Some
RISC processors, like the SPARC, achieve a sustained speedup of 2.8 running real
applications. This means that the SPARC is a parallel engine capable of working on
about three instructions simultaneously. Other RISC processors offer similar
performance.
The “official" definition of RISC processors should thus processors with an
instruction set whose individual instructions can be executed in one clock cycle
exploiting pipelining. Pipelined supercomputers and large mainframes have used
pipelining intensively for years, but in a radically different way as RISC processors.

5
In IBM mainframes, for example, the instruction set was given by "tradition” and
pipelining was implemented in spite of an instruction set which was not designed for
it. Of course there are ways to accommodate pipelining, but at a much higher cost.
This is the reason why other pipelined mainframes. Like the CDC/6600, are seen as
the precursors of RISC machines rather than the IBM/360 behemoths. In summary:
taking pipelining as the starting point it is easy to deduct all other features of RISC
processors.
Non-RISC design philosophy:
In the early days of the computer industry programming was done in
assembly language or machine code, which encouraged powerful and easy to use
instructions. CPU designers therefore tried to make instructions that would do as
much work as possible. With the advent of higher level languages, computers
architects also started to create dedicated instructions to directly implement, center in
Central mechanisms of such languages. Another general goal was to provide every
possible addressing mode for every instruction, known as orthogonality, to compiler
implementation. Arithmetic operations could therefore often have results as well as
operands directly in memory (in addition to register or immediate).
CPUs also had relatively few registers, for several reasons:
• More registers also implies more time consuming saving and restoring contents on
the machine stack.
• A large number of registers requires a large number of instruction bits as register
specifies. Meaning less dense code (see below)
• CPU registers are more expensive than external memory locations; large register
sets were cumbersome with limited circuit boards or chip integration.
RISC design philosophy:
In the mid 1970s researchers at IBM (am/ similar projects elsewhere)
demonstrated that the majority of combination, of these orthogonal addressing modes
and instructions were not used by most programs generated by compilers available at
the time. It proved difficult in many cases to write a compiler with more than limited
ability to take advantage of the features provided by conventional CPUs.
It was also discovered that, on micro coded implementations of
architectures, complex operations tended to be slower than a sequence of simpler
operations doing the same thing. This was in part an effect of the fact that many
designs were rushed, with little time to optimize or tune every instruction. But only
those used most often, as mentioned elsewhere, core memory had long since been
slower than many CPU designs. The advent of semiconductor memory reduced this
difference, but it was still apparent that more registers (and later caches) would allow
higher CPU operating frequencies. Additional registers would require sizable chip or
board areas which, at the time (1975), could be made available if the complexity the
CPU logic was reduced.
The clock rate of a CPU is limited by the time it takes to execute the
slowest sub-operation of any instruction; decreasing that cycle-lime often accelerates
the execution of other instruction. The focus on "reduced instructions" led to the

6
resulting machine being called a 'reduced instruction set computer" (RISC). The goal
was to make instructions so simple that they could easily be pipelined. In order
achieve a single clock throughput at high frequencies.
Instruction set size and alternative terminology:
A common misunderstanding of the phrase "reduced instruction set
computer" is the mistaken idea that instructions are simply eliminated, resulting in a
smaller set of instructions. In fact, over the years, RISC instruction sets have grown in
size, and today many of them have a larger set of instructions than many CISC CPUs.
Some RISC processors such as the INMOS Transputer have instruction sets as large
as,say,the CISC IBM System/370;and conversely, the DEC PDP-8 clearly a CISC
Cpu because many of its instructions involve multiple memory accesses - has only It
basic instructions, plus a few extended instructions.
The term "reduced" in that phrase was intended to describe the fact
that the amount of work any single instruction accomplishes is reduced at most a
single data memory cycle - compared to the “complex instructions “of CISC CPUs
that may require dozens of data memory cycles in order to execute a single
instruction”.
Typical characteristic of RISC:
For any given level of general performance, a RISC chip will typically haw far
fewer transistors dedicated to the core logic which originally allowed designer to
increase the size of the register set and increase internal parallelism. Other features,
which are typically found in RISC architectures, are
• Uniform instruction format. Using a single word with the OPCODE in the same bit
positions in every instruction, demanding less decoding.
• Identical general purpose registers. Allowing any register to be used in any context,
simplifying compiler design (although normally there are separate floating point
registers).
• Simple addressing modes. Complex addressing performed via sequences of
arithmetic and/or load-store operations
• Few data types in hardware some CISCs have byte string instructions, or support
complex numbers, this is so far unlikely to be found on a RISC.
Exceptions abound, of course, within both CISC and RISC.
RISC designs are also more likely to feature a Harvard memory
model, where the instruction stream and the data stream are conceptually separated;
this means that modifying the memory where code is held might not have any effect
on the instructions executed by the processor (because the CPU tins a separate
instruction and data cache), at least until a special synchronization instruction is
issued. On the upside, this allows both caches to be accessed simultaneously, which
can often improve performance.

7
RISC and x86:
However, despite many successes, RISC has made less inroads into the
desktop PC and commodity server markets, where Intel's x86 platform remains the
dominant processor architecture (Intel is facing increased competition from AMD,but
even AMD's processor implement the x86 platform, or a 64-bit superset known as
(x86-64).There are three main reasons for this.
1. The very large base of proprietary PC applications are written for x86, whereas no
RISC platform has a similar installed base, and this means PC users were locked into
the x86.
2. Although RISC was indeed able to scale up in performance quite quickly and
cheaply, Intel took advantage of its large market by spending vast amounts of money
on processor development. Intel could spend many times as much as any RISC
manufacturer on improving low level design and manufacturing.
3. Later, more powerful processors such as Intel P6 and AMD K6 had similar RISC-
like units that executed a stream of micro-operations generated from decoding wages
that split most x86 instructions into several pieces. Today, these principles have been
further refined and are used by modern x86 processors such as Intel Core 2 and AMD
K8. The first available chip deploying such techniques was the Next Gen Nx586
released in 1994 (while the AMD K5 was severely delayed and released in 1995).
Examining:
The simplest way to examine the advantages and disadvantages of RISC architecture
is by contrasting it with its predecessor. Complex instruction set computer
architecture.
Multiplying Two Numbers in Memory:
On the right is a diagram representing storage scheme for a generic computer.
The main memory is divided into locations numbered from (row) 1 :( column) 1 to
(row) 6:(column) 4. The execution unit is responsible for carrying out all
computations. However, the execution unit can only operate on data that has been
loaded into one of the six Registers (A,B,C,D,E or F).Let's say we want to find the
product of two numbers - one stored in Location 2:3 and another stored in location 5:2
and then store the product back in the location 2:3.

8
The CISC Approach:
The primacy goal of CISC architecture is to complete a task in as few lines of
assembly as possible. This is achieved by building processor hardware that is capable
of understanding and executing a series of operations. For this particular task a CISC
processor would come prepared with a specific instruction (we'll call it "MULT")
When executed, this instruction loads the two values into separate registers, the
operands in the execution unit, and then stores the product in the appropriate register.
Thus, the entire task of multiplying two numbers can be completed with one
instruction:
MULT 2:3, 5:2
MULT is what is known as a 'complex instruction". It operates directly on the
computer's memory banks and does not require the programmer to explicitly call any
loading or storing functions. It closely resembles a command in a higher level
language. For instance, if we let 'a' represent the value of 2:3 and "b" represent the
value of 5:2, then this command is identical to the C statement "a = a*b".
The RISC Approach:
RISC processors only use simple instructions that can be executed within
one clock cycle. Thus, the "MULT" command described above could be divided into
their separate commands "LOAD," which moves data from the memory bank to a
register, "PROD" which finds the product of two operands located within the
registers, and "STORE" which moves data from a register to the memory banks. In
order to perform the exact series of steps described in the CISC approach, a
programmer would need to code four lines of assembly:
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
At first, this may seem like a much less efficient way of completing the
operation. Because there are more lines of code, more RAM is needed to store the
assembly level instruction. The compiler must perform more work to convert a high-
level language statement into code of this form.
CISC Emphasis on hardware whereas RISC emphasis on software. CISC
Includes multi-clock complex instructions and RISC includes single-clock, reduced
instruction only. Memory-Memory: "LOAD" and "STORE “are independent
instructions. Small code sizes, high cycles per second are in CISC and Low cycles per
second, large code sizes are there in the RISC. In CISC Transistors are used for
storing complex instructions and RISC spends more transistors on memory register.
Separating the "LOAD" and "STORE" instructions actually reduces the
amount of work that the computer must perform. After a CISC-style "MULT"
command is executed, the processor automatically erases the registers. If one of the
operands needs to be used for another computation, the processor must re-load the
data from the memory bank into a register. In RISC, the operand will remain in the
register until another value is loaded in its place.

9
The Performance Equation:
The following equation is commonly used for expressing a computer's performance
ability:
Time = time X cycles X instruction
Program cycle instruction program
The CISC approach attempts to minimize the number of
instructions per program, sacrificing the number of cycles per instruction. RISC does
the opposite, reducing the cycles per instruction at the cost of the number of
instructions per program.
RISC success stories:
RISC designs have led to a number of successful platforms and architectures, some of
larger ones being:
• ARM - The ARM architecture dominates the market for high performance, low
power, low cost embedded systems(typically 100-599MHz in 2008)ARM Ltd. which
licensed intellectual property rather than manufacturing chips, reported that 10 billion
licensed chips had been chipped as of early 2008. ARM is deployed in countless
mobile devices such as:
-Apple iPods (custom ARM7TDMI Soc)
-Apple iPhone (Saoune ARM1176JZFM
-Nintendo Game Boy Advance (ARM7TDMI)
-Nintendo DS (ARM7TDMI, ARM946E-S)
-Sony Network Walkman (Sony-in house ARM based chip)
-Some Nokia and Sony Ericsson mobile phones (often Symbian OS based devices)
• MIPS's MIPS line, found in most SGI computers and the Play station, play station
2, Nintendo 64(discontinued), play station Portable game consoles and residential
gateways like Linksys WRT54G series.
2.3 Organization of the report:
The report is organized in the following chapters:
 Chapter 2 gives a brief introduction to VLSI and design flow of the VLSI.
 Chapter 3 gives the brief introduction to Verilog HDL, key words, operations,
data types, modeling, loops and procedures used in the project code.
 Chapter 4 gives the architecture of the project and it also describes the
architecture.
 Chapter 5 gives the modules description, source code, code explanation,
waveforms and wave forms explanation of all individual‟s modules.
 Chapter 6 presents the results (RTL, schematic diagrams of all modules and
synthesis reports).
 Chapter 7 gives the advantage and disadvantages of the project.
 Chapter 8 summarizes the project with a conclusion and discusses the thoughts
on the project.

10
2.4 Application areas:
The TX9956CXBG is the first standard 64bit microprocessor to employ the High-
performance TX99/H4 CPU Core and industry-leading 90nm process technology. The
TX9956CXBG device is the first standard 64b1t microprocessor to employ the high-
performance TX99/H4 CPU core and industry-leading 90nm process technology.
With 533 to 666Hz maximum operating frequency, the new TX9956CXBG is
currently the highest-performance microprocessor in the TX System RISC general-
purpose product line. It is targeted at diverse applications, including multifunction
printers and high-end set-top boxes .With the introduction, of the TX9956CXBG.
Toshiba continues to grow its bus-compatible, general-purpose microprocessor
portfolio to provide scalability and higher performance to customers.
2.5 Typical IC Design Flow:
Fig.2.1VLSIDesignFlow

11
Chapter 3
IMPLEMENTATION
3.1 Description:
The Architecture mainly consists of following units:
1. ALU unit
2. Memory unit
3. Control and Decoder unit
4. Program Counter unit
5. Instruction Register unit
6. Internal Register unit
7. Tristate Buffer unit
8. 64-bit 8:1 Multiplexer A & B unit
9. 6-bit 2:1 Multiplexer unit
10. Clock generator unit
When the clock generator generates the three clock cycles and gives
that three clock cycles to control and decoder unit. Then that unit will set LdIr port to
1 which means that it is telling to the instruction register to load the data from the
memory or internal register. After loading the data that data is given to program
counter and control and decoder unit. In the project the data will be 64-bit data. In that
data first 6 LSB bits are for memory address, last four MSB bits are for OPCODE,
next 3 MSB bits are for operand source address and the next 3 MSB bits are for
operand destination address. According to that addresses the control and decoder unit
will load the operands from the memory or internal registers at the time of loading

12
operand from the memory or internal registers at the of loading operand from the
memory it will set the ports MemRd to 1 and MemWr to 0 which means that memory
read operation is going on. The data will be loaded into MuxA. If we want to do any
ALU operation we need two operands already one operand is loaded into the MuxA
the other operand will also be loaded into the MuxB in the same way. At the time
selecting the internal registers the control and decoder unit will tell which register we
have to select using the operand source address. Then the outputs (data) from two
Multiplexers are loaded into the ALU unit where the execution of operation will be
done. At that time the control and decode unit will tell which operation we want to do
using OpCode.
After doing Alu operation that output can be stored in either internal
registers or memory. If we want to store the data in the internal registers then the
control and decoder unit will select the destination register using the operand
destination address. And if we want to store the data in the memory then the control
and decoder unit will set the MemWr to 1 and MemRd to 0 so that we can write the
data into the memory. For selecting the address to where we to store the data in the
memory it will use the memory address in the OpCode and if we want to use the
different address location, this address location will be selected by using 2:1 mux with
a fetch select line under the control and decoder unit. While storing the output data
from the Alu unit into the internal registers the buffer will be in high impedance state
which means that it will not allow the data to flow into the memory unit. And while
storing the output data into the memory the buffer will be set to 1 so that the data will
be loaded into the memory. The above process is repeated if we want to do another
Alu operation.
In the architecture there 11 modules and 1 top module. The modules are
1. ALU
2. Memory
3. Control and Decoder
4. Program Counter
5. Instruction Register
6. Internal Register
7. Tristate Buffer
8. 64-bit 8:1 Multiplexers(2)
9. 6-bit 2:1 Multiplexer
10. Clock generator

13
3.2Code Implementation:
Verilog HDL is one of the two common Hardware Description Languages (HDL)
used by integrated circuit (IC) designers. The other one is VHDL HDL's allows the
design cycle in order to correct errors or experiment with different architectures.
Designers described in HDL are technology-independent, easy to design and debug,
and are usually more readable than schematics, particular for large circuits.
3.3 ALU (Arithmetic and Logic Unit):
Fig. 5.1 ALU Block diagram
We design ALU which carry arithmetic operations are ADD, SUB, MUL, INR, and
DCR. Logical operations are AND, OR, XOR, LS, RS, INV and INV of Data. We
designed ALU with five inputs and one output, one input is from output MUXA
which is called OutA,2nd
input from MUXB called OutB, 3rd
input SelC is from
control & decoding module, this port gives OpCode to ALU, 4th
input is InClk and
finally 5th
is Reset. Outputs, AluOut carry out final result. Control & decoder section
selects the data to OutA and OutB, at the same time it will provide the ALU OpCode.
So that ALU collects the data from the two inputs OutA, OutB and do the operations
as per OpCode received and put the result in AluOut.
3.3.1 Algorithm:
1. Start
2. Inputs from Mux A and Mux B, Reset, SelC, and InClk
3. Output is AluOut
4. If negative edge of InClk and Reset
5. Then if SelC is 0000 no operation is allocated
6. If SelC is 0001 add OutA and outB and give result to AluOut
7. If SelC is 0010 subtract OutA with outB and give result to AluOut
8. If SelC is 0011 multiply OutA with outB and give result to AluOut
9. If SelC is 0100 increment OutA by one and give result to AluOut
10. If SelC is 0101 decrement OutA by one and give result to AluOut
11. If SelC is 0110 do and operation between OutA and outB and gives the result
to AluOut
12. If SelC is 1000 do EX-OR operation between OutA and outB and give the
result to AluOut
13. If SelC is 1001 do left shift OutA by 1 and give result to AluOut
14. If SelC is 1010 do right shift OutA by 1 and give result to AluOut
15. If SelC is 1011 pass OutA to AluOut
16. If SelC is 1100 complement OutA and give result to AluOut
OutA
OutB AluOut
Rst
InClk
SelC

14
17. If SelC is 1101 it is allocated to skip operation
18. If SelC is 1110 it is allocated to jump operation
19. If SelC is 1111 it is allocated to halt operation
20. If SelC is default pass OutA to AluOut
21. Stop
3.3.2 Code:
„timescale 1ns/1ps
module alu_1 (OutA, OutB, Rst, SelC, InClk, AluOut);
input Rst, InClk;
input [3:0] SelC;
input [63:0] OutA, OutB;
output reg [63:0] AluOut;
always @ (negedge InClk)
Begin
If (Rst == 1'b1)
Case (SelC)
//4'b0000:AluOut=default;
4'b0001: AluOut=OutA+OutB;
4'b0010: AluOut=OutA-OutB;
4'b0011: AluOut=OutA*OutB;
4'b0100: AluOut=OutA+1'b1;
4'b0101: AluOut=OutA-1'b1;
4'b0110: AluOut=OutA&OutB;
4'b0111: AluOut=OutA| OutB;
4'b1000: AluOut=OutA^OutB;
4'b1001: AluOut=OutA<<1;
4'b1010: AluOut=OutA>>1;
4'b1011: AluOut=OutA;
4'b1100: AluOut=~OutA;
endcase
end
endmodule
3.3.3 Code explanation:
As mention above ALU module collects two input data, one is OpCode and
one InClk and a reset. When InClk go to negative edge and Rst indicate low; then a
case statement is written as SelC (OpCode) as selection. This choice is given to a
specific OpCode (operation needs to perform) using the two input data‟s.
We left four choices for DEFALT, JMP, SKIP and HALT operations. We
assigned 0000 OpCode for DEFALT, 1101 is for SKIP, 1110 for JMP and 1111 for
HALT.

15
3.3.4 Waveforms:
Fig.5.2 ALU Waveforms
3.3.5 Waveform Explanation:
When clock is given, if Rst=1 and SelC is some OpCode then according to
that OpCode Alu will do corresponding operations, here 0001 is for addition. In the
above waveform we can see that OutA and outB is added and the result is stored in
AluOut.
3.4 Memory:
Fig.5.3 Memory Block diagram
In this module we have three inputs and one inout port. One input is Addr (address) of
the memory where we want to store the data. The other input is MemRd (Memory
read), when this input is one the data from the memory can be read. And the last input
is MemWr (Memory write), when this input is one the data can be write into the
memory. The inout port is used to store and load to and from the memory.
3.4.1 Algorithm:
1. Start
2. Inputs memory write, memory read and Addr.
3. Inout Databus.
4. Allocate a 64-bit register.
5. Assign a text data to the 64-bit register.
6. If memory write is 1 and memory read is 0 then register with Addr is assigned
to DataBus and output is high impedance.
7. Else memory write is 0 and memory read is 1 then assign DataBus output with
Addr of register and input is high impedance.
8. Else then DataBus is high impedance.
9. Stop.
Addr
MemRd DataBus
MemWr

16
3.4.2 Code
module memory_1 (DataBus, MemWr, MemRd, Addr);
inout [63:0] DataBus;
input MemWr;
input MemRd;
input [5:0] Addr;
reg [63:0] datareg;
reg [63:0] Mem [0:63];
//initial $fread ("om.bin", Mem);
initial $readmemh ("om.txt", Mem);
always @ (MemWr or MemRd or Addr or datareg)
begin
if (MemWr==1'b1 && MemRd==1'b0)
begin
Mem [Addr] =DataBus;
datareg=64'hzzzzzzzzzzzzzzzz;
end
else if (MemWr==1'b0 && MemRd==1'b1)
datareg= Mem [Addr];
else
datareg=64'hzzzzzzzzzzzzzzzz;
end
assign DataBus = datareg;
endmodule
3.4.3 Code Explanation:
When the inputs MemWr==1’b1 and MemRd==1’b0 the data which is in the
DataBus is loaded into the memory the specified address and if the inputs MemWr==1’b0
and MemRd==1’b1 the data from the memory is loaded into the datareg(internal register)
after that it is loaded into the DataBus.
3.4.4 Waveforms:
Fig. 5.4 Memory waveforms

17
When MemWr=1, MemRd=0 and if Addr is given then the data in the
DataBus will be written in that memory location otherwise if MemWr=0, MemRd=1
and if Addr is given then the data in that Addr is given then that data in that address
location from the memory is given to DataBus. In the above waveform when
MemWr=0 and MemRd=1 then the data is given to DataBus as 1111111100000000
from memory.
3.5 Control and Decoder:
Fig. 5.5 Control & Decoder Block diagram
In this module we have 7 inputs and 9 outputs. They are OpCode, OpDesAddr,
OpSrcAddr, Clk1, Clk2, Fetch, Rst, SelA, SelB, SelC, SelD, IncPc, LdIr, LdPc,
MemRd, and MemWr. According to the inputs of Clk1, Clk2 and fetch the other ports
are given corresponding inputs to do the required operation.
3.5.1 Algorithm:
1. Start.
2. Inputs clk1, clk2, fetch, OpCode, OpSrcAddr and OpDesAddr.
3. Outputs LdPc, IncPc, LdIr, MemRd, MemWr, SelA, SelB, SelC and SelD.
4. Parameter has to be assigned.
5. When reset is 0 and OpCode is 1111 then SelA is 000, SelB is 000, SelC is
0000, SelD is 000, LdPc is 0, IncPc is 0, LdIr is 0, MemRd is 0 and MemWr is
0.
6. If clk1 is 0, clk2 is 1 and fetch is 1 then SelA is 000, SelB is 000, SelC is
0000, SelD is 000, LdPc is 0, LdIr is 0, MemRd is 0, MemWr is 0.
7. If clk1 is 1, clk2 is 1, fetch is 1 then SelA is 000, SelB is 000, SelC is 0000,
SelD is 000, LdPc is 0, IncPc is 0, LdIr is 0, MemRd is 1, MemWr is 0.
OpCode SelA
OpDesAddr SelB
OpSrcAddr SelC
Clk1 SelD
Clk2 IncPc
Fetch LdIr
Rst LdPc
MemRd
MemWr

18
12. If clk 1 is 0, clk2 is 0, fetch is 0 then SelA is 111, SelB is 000, SelC is 0000,
SelD is 000, LdPc is 0, IncPc is 0, LdIr is 0, MemRd is 1, and MemWr is 0.
SelD is 000, LdPc is 0, IncPc is 0, LdIr is 0, MemRd is 0, and MemWr is 0.
14. Default is SelA is 000, SelB is 000, SelC is 0000, SelD is 000, LdPc is o,
IncPc is 0, LdIr is 0, MemRd is 0, and MemWr is 0.
15. Stop.
3.5.2 Code:
module
controler1(Clk1,Clk2,Fetch,Rst,OpCode,OpSrcAddr,OpDesAddr,LdIr,Ldpc,Incpc,Me
mRd,MemWr, SelA, SelB b, SelC, SelD);
input Clk1, Clk2, Fetch, Rst; input [3:0] OpCode; input [2:0] OpSrcAddr,
OpDesAddr;
output reg LdIr, LdPc, IncPc, MemRd, MemWr;
output reg [2:0] SelA, SelB, SelD;
output reg [3:0] SelC;
parameter AddrSetUp1 =3'b011;
parameter InstrFetch =3'b111;
parameter InstrLoad =3'b001;
parameter Idle =3'b101;
parameter AddrSetUp2 =3'b010;
parameter OperandFetch =3'b110;
parameter AluOperation =3'b000;
parameter StoreResult =3'b100;
wire [2:0] Control;
assign Control={Clk1,Clk2,Fetch};
always @ (Control or Rst or OpCode or OpSrcAddr or OpDesAddr)
begin
if(Rst==1'b0 && OpCode==4'b1111)
begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc=1'b0;
LdIr =1'b0;
MemRd =1'b0;
MemWr =1'b0;
end
else
begin
case (Control)
AddrSetUp1: begin
SelA =3'b000;

19
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b0;
MemRd =1'b0;
MemWr =1'b0;
end
InstrFetch: begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
LdIr =1'b0;
IncPc =1'b0;
MemRd =1'b1;
MemWr =1'b0;
end
InstrLoad: begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b1;
MemRd =1'b1;
MemWr =1'b0;
end
Idle: begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b1;
MemRd =1'b0;
MemWr =1'b0;
end
AddrSetUp2: begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b1;
LdIr =1'b0;

20
MemRd =1'b0;
MemWr =1'b0;
end
OperandFetch: begin
SelA =3'b111;
SelB =OpSrcAddr;
SelC =OpCode;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b0;
MemRd =1'b1;
MemWr =1'b0;
end
AluOperation: begin
SelA =3'b111;
SelB =OpSrcAddr;
SelC =OpCode;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b0;
MemRd =1'b1;
MemWr =1'b0;
end
StoreResult: begin
SelA =3'b000;
SelB =3'b000;
SelC=4'b0000;
SelD =OpSrcAddr;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b1;
MemRd =1'b1;
MemWr =1'b0;
end
default: begin
SelA =3'b000;
SelB =3'b000;
SelC =4'b0000;
SelD =3'b000;
LdPc =1'b0;
IncPc =1'b0;
LdIr =1'b0;
MemRd =1'b0;
MemWr =1'b0;
end
endcase
end
end

21
endmodule
According to the inputs of control (Clk1, Clk2, Fetch) the corresponding
case is selected and the operation in that block is done. After 8 clocks cycle‟s one
ALU operation is done.
3.5.4 Waveforms:
Fig. 5.6 Control & Decoder Waveforms
According to Clk1, Cl2 and fetch the parameters will be given codes,
corresponding to that codes the LdPc, IncPc, LdIr, MemRd, MemWr, SelA, SelB,
SelC, SelD will change so that the given operation will be done.
3.6 Program Counter:
Fig. 5.7 Program Counter Block diagram
In this module we have 4 inputs and 1 output. The data where we want to store the
data is given to OpDesAddr and according to the inputs of IncPc, LdPc and Rst the
output is obtained.
3.6.1 Algorithm:
1. Start
2. Inputs Mem Addr, IncPc, Rst, LdPc.
3. Output PcOut.
MemAddr
IncPc Addr
LdIr
Rst

22
4. If positive edge of IncPc and negative edge of Rst.
5. Then if reset is 0 then PcOut is 0.
6. Else LdPc is 1 then PcOut is Mem Addr.
7. Else IncPc by one.
8. Stop.
3.6.2 Code:
module pro_count (MemAddr, IncPc, LdPc, Rst, Pcout);
input [5:0] MemAddr; input IncPc, LdPc, Rst;
output [5:0] Pcout; reg [5:0] sreg;
assign Pcout = sreg [5:0];
always @(posedge IncPc or negedge Rst)
begin
if (Rst == 1'b0)
sreg = 6'b000000;
else if (LdPc == 1'b1)
begin
sreg = MemAddr;
end
else
sreg = sreg + 1;
end endmodule
If Rst is 0 then sreg (internal register) is set to 0 and if Rst is 1 and if LdPc
(Load program counter) is 1 then memory address is loaded into the sreg and if LdPc
is 0 then sreg is incremented.
3.6.4 Waveforms:
Fig. 5.8 Program Counter Waveforms
When Rst is 1 is given (here is a Clk) if LdPc is 1 then the MemAddr is
given to Pcout else if LdPc is 0 then the MemAddr is incremented. In the above

23
waveform when LdPc is 1, 11 from MemAddr is loaded into PcOut else if it is
incremented to 12, etc.
3.7 Instruction Register:
Fig.5.9 Instruction Register Block diagram
In this module we have 4 inputs and 4 outputs. According to the inputs of LdIr
(instruction register) and Clk the data from the DataBus is loaded into dreg (internal
register) or incremented. And if Reset is 0 then the data in the dreg is 0. The required
number of bits is given to the corresponding output ports like OpCode, OpDesAddr,
OpSrcAddr and MemAddr.
3.7.1 Algorithm:
1. Start.
2. Inputs Clk, Rst, LdIr, DataBus.
3. Output OpCode, OpDesAddr, OpSrcAddr, MemAddr.
4. Parameters OpCode is DataBus (63-60), OpSrcAddr is DataBus (59-57),
OpDesAddr (56-54) and MemAddr is DataBus (5-0) has to be assigned.
5. If Clk is positive edge and Rst is negative edge then if Rst is 0 then Databus is
zero.
6. Else if LdIr is 1 then DataBus is DataBus and if OpCode is 1101 then
increment MemAddr by one.
7. Else DataBus is DataBus.
8. Stop.
3.7.2 Code:
timescale 1ns/1ps
module m1 (DataBus, Clk, LdIr, Rst, OpCode, OpSrcAddr, OpDesAddr, MemAddr);
input Clk;
input Rst;
input LdIr;
input [63:0] DataBus;
output [5:0] OpCode;
output [5:0] OpSrcAddr;;
output [5:0] OpDesAddr;
output [5:0] MemAddr;
reg [5:0] sreg;
assign OpCode=dreg [63:60];
assign OpSrcAddr=dreg [59:57];
assign OpDesAddr=dreg [56:54];
assign MemAddr=dreg [5:0];
always @ (posedge Clk or negedge Rst)
DataBus MemAddr
Clk OpCode
LdIr
OpSrcAddr
Rst
OpDesAddr

24
begin
if (Rst==1'b0)
dreg=64'h0000000000000000;
else if (LdIr==1'b1)
begin
dreg=DataBus;
if (dreg [63:60] =4'b1101)
dreg [5:0] =dreg [5:0] +1;
end
else
dreg=dreg;
end
endmodule
If Rst is 0 then dreg is 0. And if Rst is 1 and if LdIr is 1 then DataBus is
loaded into dreg or else dreg is unchanged. The last 4 bits (63:60) of DataBus is given
to OpCode, the 3 bits (59:57) is given to OpSrcAddr, the next 3 bits (56:54) are given
to OpDesAddr and the first 6 bits are given to (5:0) is given to MemAddr.
3.7.4 Waveforms:
Fig. 5.10 Instruction Register Waveforms
When Clk and Rst are given and if LdIr is 1 then the DataBus is given to
OpCode, OpSrcAddr, OpDesAddr and MemAddr to their bit length. In the above
waveform if LdIr is 1 then OpCode= B, OpSrcAddr=0, OpDesAddr=5 and
MemAddr=00.

25
3.8 Internal Registers:
Fig.5.11 Internal Register Block diagram
In this module we have 4 inputs and 6 outputs. Based on the input of SelD the output
of Alu (AluOut) is loaded into the dreg (internal register). And if reset is 0 then all the
registers are set to zero.
3.8.1 Algorithm:
1. Start
2. Input Clk, Rst, SelD, AluOut.
3. Output Reg1, Reg2, Reg3, Reg4, Reg5, Acc
4. Assign data registers to registers
5. If Clk is positive edge is negative edge then if reset is zero then Reg1=0, Reg2=0,
Reg3=0, Reg4=0, Reg5=0 and Acc=0
6. Else if Rst=1 and is SelD is 001then reg1=AluOut, if SelD is 010 then
reg2=AluOut, if SelD is 011then reg3=AluOut, if SelD is 100then reg4=AluOut, if
SelD is 101 then reg5=AluOut, if SelD is 110 then reg6=AluOut
7. Else default is reg1=reg1, reg2=reg2, reg3=reg3, reg4=reg4, reg5=reg5, reg6=reg6
8. Stop
3.8.2 Code:
„Timescale 1ns/1ps
module Program _Counter (Clk, Rst, SelD, AluOut, Reg1, Reg2, Reg3, Reg4, Reg5,
Acc);
input Clk;
input Rst;
input [2:0] SelD;
input [63:0] AluOut;
output [63:0] Reg1;
output [63:0] Reg2;
output [63:0] Reg3;
output [63:0] Reg4;
output [63:0] Reg5;
output [5:0] Acc;
reg [5:0] dreg1, dreg2, dreg3, dreg4, dreg5, dreg5;
AluOut Acc
SelD Reg1
Clk Reg2
Rst Reg3
Reg4
Reg5

26
assign Reg1=dreg1;
assign Reg2=dreg2;
assign Reg3=dreg3;
assign Reg4=dreg4;
assign Reg5=dreg5;
assign Acc=dreg6;
always @ (posedge Clk or negedge Rst or SelD)
begin
if (Rst==1'b0)
dreg1=64'h0000000000000000;
dreg2=64'h0000000000000000;
dreg3=64'h0000000000000000;
dreg4=64'h0000000000000000;
dreg5=64'h0000000000000000;
dreg6=64'h0000000000000000;
end
else
case (SelD)
3'b001:dreg1=AluOut;
default:
begin dreg1=dreg1;
dreg2=dreg2;
dreg3=dreg3;
dreg4=dreg4;
dreg5=dreg5;
dreg6=dreg6;
end
endcase
end
endmodule
else if (LdIr==1'b1)
begin
dreg=DataBus;
if (dreg [63:60] = 4'b1101)
dreg [5:0] = dreg [5:0] +1;
end
else
dreg=dreg;
end
endmodule
If Rst is 0 then all the registers are set to zero or else according to the
input of SelD the AluOut is loaded into the corresponding (based on SelD) register.

27
3.8.4 Waveforms:
Fig.5.12 Internal Register Waveforms
3.8.5 Waveforms Explanation:
When Clk and Rst are given and if SelD then the corresponding register is
selected and the data in the AluOut is given to that register .In the waveforms SelD is
2 which means that the data in AluOut (AAAAAAAAAAAAAAAA) is given to
register2.
3.9 Tristate Buffer:
Fig.5.13 Tristate Buffer
In this module we have 4 inputs and 1 output. Here we use nor for giving input. The
neither inputs to that nor gate are Fetch, Clk2 and MemRd and the output is enabling.
If this enable is to set to 1 then the output from the Databus is AluOut.
3.9.1 Algorithm:
1. Start
2. Input fetches, Clk2, MemRd and AluOut
3. Output DataBus
4. Wire ena
5. The nor operation between fetch, clk2 and MemRd is assigned to ena
6. If ena is 1 then AluOut is assigned to Databus
7. Else high impedance
8. Stop
AluOut
Clk2 DataBus
Fetch
MemRd

28
3.9.2 Code:
module buffer (fetch, clk2, MemRd, AluOut, Databus)
input fetch;
input clk2;
input MemRd;
wire ena;
input [63:0] AluOut;
output [63:0] Databus;
reg [63:0] Databus;
nor n1 (ena, fetch, clk2, MemRd);
always@ (AluOut or ena)
begin
if (ena==1'b1)
Databus=AluOut;
else
Databus=64'hzzzzzzzzzzzzzzzz;
end
endmodule
If the output from the nor gate (enable) is 1 then the data in the AluOut is
loaded into DataBus which means the output and if the enable is 0 then Databus
output is high independence state.
3.9.4 Waveforms:
Fig.5.14 Tristate buffer Waveforms
When Clk2, Fetch and MemRd are given then the ena will be either 1 or 0.
If ena is 1 then the data in the AluOut is given to the Databus else Databus is in high
impedance state. In the above example ena=1 then the data in AluOut
(101010101010110) is given to the Databus.

29
4.0 64-bit 8:1 Multiplexers:
4.0.1 Multiplexer A:
Fig.5.15 Multiplexer. A Block diagram
In this module we have 8 inputs and 1 output. The inputs are internal
registers and one select line. Based on the select input corresponding register is
selected and the data which is in loaded into the output port (OutA).
Algorithm:
1. Start
2. Input Databus, reg1, reg2, reg3, reg4, reg5, Acc and SelA.
3. Output OutA.
4. If SelA is 001 OutA is assigned to reg1.
5. Else if SelA is 010 OutA is assigned to reg2.
8. Else if SelA is 101 OutA is assigned to reg 5.
9. Else if SelA is 110 OutA is assigned to Acc.
10. Else if SelA is 111 OutA is assigned to Databus.
11. Stop
Code:
module mux_a (Databus, reg1, reg2, reg3, reg4, reg5, acc, SelA, OutA);
input [63:0] Databus;
input [63:0] reg1;
input [63:0] reg2;
input [63:0] reg3;
Acc
Databus
Reg1 OutA
Reg2
Reg3
Reg4
Reg5
Reg6
SelA

30
input [63:0] reg4;
input [63:0] reg5;
input [63:0] acc;
input [2:0] SelA;
output [63:0] OutA;
reg [63:0] OutA;
always @ (SelA or Databus or reg1 or reg2 or reg3 or reg4 or reg5 or acc)
begin
case (SelA)
3'b001: OutA=reg1;
3'b010: OutA=reg2;
3'b011: OutA=reg3;
3'b100: OutA=reg4;
3'b101: OutA=reg5;
3'b110: OutA=acc;
3'b111: OutA=Databus;
default: OutA=64'hzz;
endcase
end
endmodule
Code Explanation:
Based on the select line the corresponding register is selected and the data in
the register is loaded into the output port (out A).And if the selection is not suited to
any one of the case then the output is high impedance state.
Waveforms Explanation:
According to the selection of SelA the dreg in DatabuReg1, Reg2, Reg3,
Reg4, Reg5, Acc is given to OutA. In the above waveforms SelA=1 then the dreg1
(AAAAAAAAAAAAAAAAAA) is given to OutA.
Waveforms:
Fig.5.16 Multiplexer A Waveform
.

31
4.0.2 Multiplexer B:
Fig.5.17 Multiplexer B Block diagram
In this module we have 8 inputs and 1 output. The inputs are internal
registers and one select line. Based on the select input corresponding register is
selected and the dat which is in loaded into the output port (OutA).
Algorithm:
1. Start
2. Input Databus reg1, reg2, reg3, reg4, reg5, Acc and SelB.
3. Output outB.
4. If SelB is 001 outB is assigned to reg1.
5. Else if SelB is 010 outB is assigned to reg2.
8. Else if SelB is 101 outB is assigned to reg 5.
9. Else if SelB is 110 OutB is assigned to Acc.
10. Else if SelB is 111 OutB is assigned to Databus.
11. Stop
Code:
module muxb (databus, reg1, reg2, reg3, reg4, reg5, acc, SelA, OutB);
input [63:0] Databus;
input [63:0] reg1;
input [63:0] reg2;
input [63:0] reg3;
input [63:0] reg4;
input [63:0] reg5;
input [63:0] acc;
input [2:0] SelB;
Acc
Databus
Reg1 OutB
Reg2
Reg3
Reg4
Reg5
Reg6
SelB

32
output [63:0] OutB;
reg [63:0] OutB;
always@ (SelB or Databus or reg1 or reg2 or reg3 or reg4 or reg5 or acc)
begin
case (SelB)
3'b001: outB=reg1;
3'b010: outB=reg2;
3'b011: outB=reg3;
3'b100: outB=reg4;
3'b101: outB=reg5;
3'b110: outB=acc;
3'b111: outB=Databus;
default: outB=64'hzz;
endcase
end
endmodule
Code Explanation:
Based on the select line the corresponding register is selected and the data in
the register is loaded into the output port (out B).And if the selection is not suited to
any one of the case then the output is high impedance state.
Waveforms:
Fig.5.17 Multiplexer B Waveforms
Waveforms Explanation:
According to the selection of SelB the data in Databus Reg1, Reg2, Reg3,
Reg4, Reg5, Acc is given to OutB. In the above waveforms SelB=1 then the data in
reg1 (BBBBBBBBBBBBBBBBBB) is given to outB.

33
4.1 6-bit 2:1 Multiplexer:
Fig.5.19 Multiplexer Block diagram
In this module we have 3 inputs and 1 output. Based on the input of Fetch output is
depended. The two possible outputs from this module are either MemAddr or PcOut.
4.1.1 Algorithm:
1. Start
2. Inputs MemAddr, PcOut and Fetch
3. Output Addr
If Fetch is 0 then Addr is MemAddr
5. Else Addr is PcOut
6. Stop
4.1.2 Code:
'timescale 1ns/1ps
module mux (MemAddr, PcOut, fetch, Addr);
input [5:0] MemAddr;
input [5:0] Pcout;
input fetch;
output [5:0] Addr;
reg [5:0] Addr;
always@ (fetch or Pcout or MemAddr)
begin
if (fetch==1'b0)
Addr=MemAddr;
else
Addr=PcOut;
end
endmodule
If the input Fetch is set to 0 then the output from the port Addr is MemAddr
and if the input Fetch is set to 1 then the output from the port Addr is PcOut.
MemAddr
Addr
Fetch
PcOut

34
4.1.4 Waveforms:
Fig.5.20 Multiplexer Waveform
According to the selection of Fetch the data in MemAddr and PcOut is given
to Addr. In the above waveforms Fetch=1 then the data in PcOut (01) is given as
Addr.
4.2 Clock Generator:
Fig.5.21 Clock Generator Block Diagram
In this module we have 2 inputs and 3 outputs. Based on the rising edge of the Clk
and Rst the outputs clk1, clk2, fetch are generated.
4.2.1 Algorithm:
1. Start
2. Inputs Clk and Rst.
3. Output clk1, clk2 and fetch
4. If Rst=0 then clk1, clk2, fetch=0
5. Else Clk is assigned clk1
6. If Clk negative edge then clk2 is complement of its state.
7. If clk2 negative edge then Fetch is complement of its state.
8. Stop
4.2.2 Code:
'timescale 1ns/1ps
module clkgen (clk, Rst, clk1, clk2, fetch);
input Clk, Rst;
Clk Clk1
Clk2
Rst
Fetch

35
output clk1;
output clk2;
output fetch;
reg clk1, clk2;
always @ (Clk)
begin
if (Rst==1'b0)
Clk=1'b0;
else
clk1=Clk;
end
always @ (negedge clk1 or negedge Rst)
begin
if (Rst==1'b0)
clk2=1'b0;
else
clk2=~clk2;
end
always@ (posedge clk2 or negedge Rst)
begin
if (Rst==1'b0)
fetch=1'b0;
else
fetch=~fetch;
end
endmodule
Based on this rising edge of the Clk if Rst is 0 then clk1 is 0 or else if Rst
is 1 then the clk1 is same as Clk. Based on the negative edge of clk1 and Rst if Rst is
0 then clk2 is 0 and if Rst is clk2 is negation of clk2. And based on the positive edge
of clk2 and negative edge of Rst if Rst is 0 then fetch is 0 and if Rst is 1 output fetch
is negation of fetch.
4.2.4 Waveforms:
Fig.5.22 Clock Generator Waveforms

36
According to Clk and Rst then ck1, clk2 and fetch are generated. In the
above example when Clk and Rst are 1 then clk1=clk, clk2=1 and fetch is 0.
4.3 Top Module:
Fig. 5.23 Top Module Block diagram
In this module only two inputs will be there. The output is data which can be
stored in either memory or internal register. And if we want to take the data from the
memory or internal register data is used as input which means that the data (output) is
used as both input and output. So there is no particular for this module.
4.3.1 Algorithm:
1. Start
2. Inputs Clk, Rst
3. Declare all the inputs and outputs of all individual modules as wires.
4. Create objects for all modules.
5. Stop.
4.3.2 Code:
„Timescale 1ns/1ps
Module topmod_1 (Clk, Rst);
Input Clk, Rst;
Wire Clk1,Clk2,Fetch,Rst,LdIr,Ldpc,Incpc,MemRd,MemWr,InClk;
Wire [2:0] OpSrcAddr, OpDesAddr, SelA, SelB, SelD;
Wire [3:0] OpCode, SelC;
Wire [5:0] MemAddr, Pcout, Addr;
wire [63:0] DataBus,Acc,Reg1,Reg2,Reg3,Reg4,Reg5,AluOut,Data,Outa,Outb;
inst_reg om1 (DataBus, Clk, LdIr, Rst, MemAddr, OpCode, OpDesAddr,
OpSrcAddr);
pro_count om2 (MemAddr, IncPc, LdPc, Rst, Pcout);
mux_2x1 om3 (MemAddr, Pcout, Fetch, Addr);
mux_a om4 (DataBus, Reg1, Reg2, Reg3, Reg4, Reg5, Acc, SelA, OutA);
mux_b om5 (DataBus, Reg1, Reg2, Reg3, Reg4, Reg5, Acc, SelB, OutB);
alu_1 om6 (OutA, OutB, Rst, SelC, InClk, AluOut);
Clk om7 (Clk, Rst, Clk1, Clk2, Fetch);
tri12 om8 (Fetch, Clk2, MemRd, AluOut, DataBus);
memory_1 om9 (DataBus, MemWr, MemRd, Addr);
controller1
om1(Clk1,Clk2,Fetch,Rst,OpCode,OpSrcAddr,OpDesAddr,LdIr,Ldpc,Incpc,MemRd,
MemWr, SelA, SelB, SelC, SelD)
internal_reg_1 om12 (Clk, Rst, SelD, AluOut, Reg1, Reg2, Reg3, Reg4, Reg5, Acc);
Clk
Rst

37
or1 om13 (Clk1, Clk2, Fetch, InClk);
nor1 om14 (Fetch, Clk2, MemRd, ena); endmodule
In this module all the other modules objects are created. While compiling all
the modules is executed concurrently and in the last output waveforms are generated.
4.3.4 Waveforms:
Fig. 5.24 Top module waveform (1) 500ns
Fig. 5.24 Top module waveform (1) 500ns

38
Fig 5.25 Top module waveforms (2) 4000ns
Fig 5.25 Top module waveforms (2) 4000ns
When Clk and Rst are given all the modules are instantiated and the
corresponding module will be executed and finally the result will be given to either
internal registers or memory.

39
4.4 Operation:
If we want to do any operation first we have type the instruction commands
and inputs in a text file and that file has to be stored in the memory and that text file
path has to be given in the top module code. The top module will go that path and
takes the commands in that file. For example take the file which we have stored in the
memory.
B280000000000001 //insert DataBus to Reg2
2222222222222222
1111111111111111
14C0000000000005 //add DataBus & Reg2, save in Reg3
1111111111111111
2300000000000007 //Sub DataBus, Reg1 save in Reg4
5555555555555555
3540000000000009 //Multiple DataBus, Reg1 save in Reg5
6666666666666666
424000000000000b //inc DataBus save in Reg1
1111111111111111
528000000000000d //dec DataBus save in Reg2
2222222222222222
670000000000000f //and Reg3, DataBus save in Reg4
2222222222222222
7740000000000011 // or Reg3, DataBus save in Reg5
CCCCCCCCCCCCCCCC
8280000000000013 // xor Reg1, DataBus save in Reg2
5555555555555555
9040000000000015 //leftshift DataBus save on Reg1
1111111111111111
A080000000000017 //rightshift DataBus save on Reg2
1111111111111111
C0C0000000000019 //complement of DataBus save in Reg3
1111111100000000
D00000000000001B //skip
1111111111111111
B28000000000001D //insert DataBus to Reg2
2222222222222222
E000000000000001
0000000000000000
1111111111111111
1111111111110000
The first command is b280000000000001 (hexadecimal) the binary code for that is
101100101000000000000000000000000000000000000000000000000000001; this code is
used for storing the data (2222222222222222) in register2. Next command is
b240000000000003 (hexadecimal) the binary code for that are
101100100100000000000000000000000000000000000000000000000000011; this code is
used for storing the data (1111111111111111) in register 1. If we want to do Alu operation
for example take addition the command will be 14c000000000005 (hexadecimal) the binary
code for that is

40
000101100110000000000000000000000000000000000000000000000000101, in that first
four MSB bits (0001) are for OPCODE. Here we took 0001 for addition. Next 3 MSB bits (010)
is source address which means one of the operand is in that location, here it is register 2.
And next 3 MSB bits (001) is destination address to where the result has to be stored, here in
register 3 and we have to give another operand i.e., 1111111111111111 the other is which
in register 2. After executing the addition operator the ALU module will sent the result
(3333333333333333) to register 3 to store.
Now if we want to do another operation the above processes is repeated
of course the instruction command and input will change.

41
Chapter 4
SCHEMATIC RESULTS
4.1 ALU

44
4.3 Control and decoder
4.4 Program counter
4.5 Instruction Register:

45
4.6 Internal Register
4.7 Tristate Buffer
4.8 Multiplexer A:

46
4.9 Multiplexer B:
4.10 Multiplexer:
4.11 Clock Generator:

47
4.12 Top Module:
Advantages:
The processor development was launched with clear goals – to deliver
industry leading performance on an aggressive schedule, while reducing the total
system cost, power dissipation and system foot print of its predecessor. The processor
is targeted at both the technical and commercial markets; spanning the product space
from the uniprocessor workstation to greater than 32- way scalable shared memory
multiprocessors. With fewer than 100 systems interface signals to route and no off
chip cache wiring, the simplifies system interface translates directly to a lower cost
circuit board design for uniprocessor applications and grater processor packing
density for multiprocessors. CPU power dissipation is reduced by over fifty percent
while performance is doubled, providing a fourfold increase performance per watt.
This is accompanied by a seventy five percent reduction in CPU cost resulting from
the elimination of all off-chip high speed cache SRAM and the lower cost packaging.
Accuracy is more.
Disadvantages:
The only disadvantage with the project is, it is more costly.
Conclusion:
Various individual modules of the project have been designed, verified
functionally using Verilog HDL-simulator (Active HDL), synthesized by the Xilinx
(ISE) tool.
This design of the 64-bit RISC processor is capable of performing
arithmetic and logical operations with the help of ALU block. The control and
decoder unit controls all the modules.
The designed processor is also capable of performing control
instruction like JUMP, SKIP, and HALT.
The functional-stimulation has been successfully carried out with the
results matching with the expressed ones.

48
The design has been synthesized using FPGA technology from
Xilinx. This design has targeted the device family spartan3, device xc2s15. Package
cs144 and speed grade -6. This device belongs to the Virtex-E group of FPGA‟s from
Xilinx ,reducing or simplifying the instruction set was not the primary goal of RISC
architecture, it is pleasant side effect of techniques used to gain the highest
performance possible from the available technology.
Future scope:
We can extend this 64-bit RISC processor to 128-bit RISC processor by
changing the instruction format length and also by increasing more no. of registers we
can increase the memory of the processor. We can also include more(less compared to
CISC) no. of instruction so that we can do more no. of ALU operations.
References:
Moris Mano, Digital Design, PHI, 2007
Nicholas P.Carter, Schaum‟s Outline of Computer Architecture, 2002 p.96 ISBN
007136207X
J.Bhaskar, VHDL Primer.

cd-2-Batch id-33

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to cd-2-Batch id-33

Similar to cd-2-Batch id-33 (20)

cd-2-Batch id-33