Ibrahim Hazmi presents work on redesigning the CPU12 microcontroller using VHDL. The document outlines a methodology for partitioning the design into blocks and describes work done on the register block, ALU, control unit, and simulations. Future work includes finishing the control unit, connecting it to the data path, and designing the pre-fetch unit. The goal is to gain understanding of embedded system design and combine that with HDL skills.
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...VLSI SYSTEM Design
https://www.udemy.com/vlsi-academy
The very first step in chip design is floorplanning, in which the width and height of the chip, basically the area of the chip, is defined. A chip consists of two parts, 'core' and 'die'.
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...VLSI SYSTEM Design
https://www.udemy.com/vlsi-academy
The very first step in chip design is floorplanning, in which the width and height of the chip, basically the area of the chip, is defined. A chip consists of two parts, 'core' and 'die'.
SFO15-302: Energy Aware Scheduling: Progress UpdateLinaro
SFO15-302: Energy Aware Scheduling: Progress Update
Speakers: Amit Kucheria, Robin Randhawa, Ian Rickards
Date: September 23, 2015
★ Session Description ★
ARM and Linaro are jointly developing “Energy Aware Scheduling”, a technique that improves power management on Linux by making it more central and easier to tune. Since the original discussions started on the Linux Kernel Mailing List back in 2013, there has been significant progress during 2015 with the posting of a full-functionality implementation. This talk will cover details of the components and how to get involved.
★ Resources ★
Video: https://www.youtube.com/watch?v=i4GVrG_gkYg
Presentation: http://www.slideshare.net/linaroorg/sfo15302-energy-aware-scheduling-progress-update
Etherpad: pad.linaro.org/p/sfo15-302
Pathable: https://sfo15.pathable.com/meetings/302934
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
Visit https://www.vlsiuniverse.com/
https://www.vlsiuniverse.com/2020/05/complete-asic-design-flow.html
This is the standard VLSI design flow that every semiconductor company follows. The complete ASIC design flow is explained by considering each and every stage.
Renesas DevCon 2010: Starting a QT Application with Minimal Bootandrewmurraympc
This presentation was presented at the Renesas DevCon 2010. It shows how with the right approach you can reduce the boot time of a cold Linux boot. In this case - the MS7724 was used as a case study to achieve a one second boot time with a QT UI application.
SFO15-302: Energy Aware Scheduling: Progress UpdateLinaro
SFO15-302: Energy Aware Scheduling: Progress Update
Speakers: Amit Kucheria, Robin Randhawa, Ian Rickards
Date: September 23, 2015
★ Session Description ★
ARM and Linaro are jointly developing “Energy Aware Scheduling”, a technique that improves power management on Linux by making it more central and easier to tune. Since the original discussions started on the Linux Kernel Mailing List back in 2013, there has been significant progress during 2015 with the posting of a full-functionality implementation. This talk will cover details of the components and how to get involved.
★ Resources ★
Video: https://www.youtube.com/watch?v=i4GVrG_gkYg
Presentation: http://www.slideshare.net/linaroorg/sfo15302-energy-aware-scheduling-progress-update
Etherpad: pad.linaro.org/p/sfo15-302
Pathable: https://sfo15.pathable.com/meetings/302934
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
Visit https://www.vlsiuniverse.com/
https://www.vlsiuniverse.com/2020/05/complete-asic-design-flow.html
This is the standard VLSI design flow that every semiconductor company follows. The complete ASIC design flow is explained by considering each and every stage.
Renesas DevCon 2010: Starting a QT Application with Minimal Bootandrewmurraympc
This presentation was presented at the Renesas DevCon 2010. It shows how with the right approach you can reduce the boot time of a cold Linux boot. In this case - the MS7724 was used as a case study to achieve a one second boot time with a QT UI application.
I enjoyed presenting this 3-hour CMG-T session at CMG 2010 in Orlando this year. The presentation is derivative of the slide deck Adrian Cockcroft presented last year; augmented by several topics I believe to be crucial to starting off trying to manage performance and/or capacity on Unix and Linux systems.
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
Top 10 Supercomputers Report
What is Supercomputer?
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS
Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in various fields, including quantum mechanics, weather forecasting, climate research, oil and gas exploration, molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), and physical simulations (such as simulations of the early moments of the universe, airplane and spacecraft aerodynamics, the detonation of nuclear weapons, and nuclear fusion). They have been essential in the field of cryptanalysis.
1. The Fugaku Supercomputer
Introduction:
Fugaku is a petascale supercomputer (while only at petascale for mainstream benchmark), at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer, and started operating in 2021. Fugaku made its debut in 2020, and became the fastest supercomputer in the world in the June 2020 TOP500 list, as well as becoming the first ARM architecture-based computer to achieve this. In June 2020, it achieved 1.42 exaFLOPS (in HPL-AI benchmark making it the first ever supercomputer that achieved 1 exaFLOPS. As of November 2021, Fugaku is the fastest supercomputer in the world. It is named after an alternative name for Mount Fuji.
Block Diagram:
Functional Units:
Functional Units, Co-Design and System for the Supercomputer “Fugaku”
1. Performance estimation tool: This tool, taking Fujitsu FX100 (FX100 is the previous Fujitsu supercomputer) execution profile data as an input, enables the performance projection by a given set of architecture parameters. The performance projection is modeled according to the Fujitsu microarchitecture. This tool can also estimate the power consumption based on the architecture model.
2. Fujitsu in-house processor simulator: We used an extended FX100 SPARC instruction-set simulator and compiler, developed by Fujitsu, for preliminary studies in the initial phase, and an Armv8þSVE simulator and compiler afterward.
3. Gem5 simulator for the Post-K processor: The Post-K processor simulator3 based on an opensource system-level processor simulator, Gem5, was developed by RIKEN during the co-design for architecture verification and performance tuning. A fundamental problem is the scale of scientific applications that are expected to be run on Post-K. Even our target applications are thousands of lines of code and are written to use complex algorithms and data structures. Altho
This presentation, made for TI TechDays 2010, shows how with the right approach you can reduce the boot time of a cold Linux boot. In this case - the OMAP3530 EVM was used as a case study to achieve a less than one second boot time.
In this paper we describe paradigms for building and designing parallel computing machines. Firstly we
elaborate the uniqueness of MIMD model for the execution of diverse applications. Then we compare the
General Purpose Architecture of Parallel Computers with Special Purpose Architecture of Parallel
Computers in terms of cost, throughput and efficiency. Then we describe how Parallel Computer
Architecture employs parallelism and concurrency through pipelining. Since Pipelining improves the
performance of a machine by dividing an instruction into a number of stages, therefore we describe how
the performance of a vector processor is enhanced by employing multi pipelining among its processing
elements. Also we have elaborated the RISC architecture and Pipelining in RISC machines After comparing
RISC computers with CISC computers we observe that although the high speed of RISC computers is very
desirable but the significance of speed of a computer is dependent on implementation strategies. Only CPU
clock speed is not the only parameter to move the system software from CISC to RISC computers but the
other parameters should also be considered like instruction size or format, addressing modes, complexity of
instructions and machine cycles required by instructions. Considering all parameters will give performance
gain . We discuss Multiprocessor and Data Flow Machines in a concise manner. Then we discuss three
SIMD (Single Instruction stream Multiple Data stream) machines which are DEC/MasPar MP-1, Systolic
Processors and Wavefront array Processors. The DEC/MasPar MP-1 is a massively parallel SIMD array
processor. A wide variety of number representations and arithmetic systems for computers can be
implemented easily on the DEC/MasPar MP-1 system. The principal advantages of using such 64×64
SIMD array of 4-bit processors for the implementation of a computer arithmetic laboratory arise out of its
flexibility. After comparison of Systolic Processors with Wave front Processors we found that both of the
Systolic Processors and Wave front Processors are fast and implemented in VLSI. The major drawback of
Systolic Processors is the problem of availability of inputs when clock ticks because of propagation delays
in connection buses. The Wave front Processors combine the Systolic Processor architecture with Data
Flow machine architecture. Although the Wave front processors use asynchronous data flow computing
structure, the timing in the interconnection buses, at input and at output is not problematic..
1. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
Master’s Research Project 2009
1
CONTROL
UNIT
ADRESS
CAL-CIRCUIT
REG
BLOCK
ALU
Pre-Fetch UNIT Memory Access Unit
CPU12
Presented by: Ibrahim Hazmi
Supervised by: Dr. Paul Beckett
HC12 Micro-controller Design Using VHDL
2. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
THE BLACK BOXES
2
CONTROL
UNIT
ADRESS
CAL-CIRCUIT
REG
BLOCK
ALU
Pre-Fetch UNIT Memory Access Unit
9. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
Introduction : What I am doing
I am Re-designing or Re-Engineering an existing
microcontroller in order to gain the following:
1. Practical and Detailed Understanding of Embedded
System Design
2. Combining the Understanding of the Embedded System
with the skills in HDL Design
3. HC12 is a microcontroller which has everything inside,
so it is worth it to start with.
4. According to my knowledge, HC12 is not available to the
public as a VHDL code, so it is great if we could make it
available for academic and research purposes.
I am Re-Designing the CPU12, the Heart of the HC12
(Block Diagram of HC12, MCU12, CPU12 – Black)
For Me CPU12 was a black box let us see how I’ve
imagined that Black Box?
9
10. Wednesday 01/07/2009 10IBRAHIM HAZMI - SECE
• AT the Beginning, I was taking every
part of Micro as a separate part and
design it without any guidelines except
its functionality, which was bringing
difficulty to join all parts together at the
end. Also it was leading me to put
unnecessary circuits into it.
• In this project, I’ve learnt a good
methodology in designing a
microprocessor:
• I need to thank Two Ideas in this regard:
Methodology
Wednesday 01/07/2009
11. Wednesday 01/07/2009 11IBRAHIM HAZMI - SECE
• 1st:
“In designing a CPU, we must first define its instruction set,
and how the instructions are encoded and executed. We need
to answer questions such as how many instructions do we
want? What are the instructions? What operation code
(opcode) do we assign to each of the instructions? How many
bits do we use to encode an instruction? Once we have
decided on the instruction set, we can proceed to designing a
datapath that can execute all the instructions in the instruction
set. In this step we are creating a custom datapath, so we
need to answer questions such as what functional units do we
need? How many registers do we need? Do we use a single
register file or separate registers? How are the different units
connected together?”
Enoch O. Hwang, 2005, Digital Logic and Microprocessor Design With VHDL, La Sierra University, Riverside
This Quote was my leader to understand how to start when
designing a microprocessor. Dr Hwang From in USA
Methodology
12. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
2nd: Partitioning
I was thinking of designing everything at a time
and that waste most of my time. I was ending
usually with disappointing. Mr. Paul, my
supervisor, was behind the idea of partitioning in
which I started design CPU12 block by block
considering the relations between all blocks and
the target instruction set.
Methodology
12
What I have achieved:
Imagined a block Diagram of the whole Design and
partitioned it to 5 Main Blocks
Designed and coded Data Path. (all ALU REG Blocks)
Designed some parts of the Control Unit (CU).
Put the main parts of the Pre-Fetch Unit (PFU).
13. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
REGISTERs BLOCK
13
R_Block
CLK
Accin Accout
A(D_High)
D
B (D_Low)
XI
CC
16
RA
RB
RCC
RSP
RXI
RYI
RPC
8
8
RD
8
8
16
16 16
88
1616
YI
1616
SP
1616
PC
1616
CLR
CLR
Ld
Ld
Ld
Ld
Ld
Ld
En
A
L
U
B
A
L
U
A
ALU
BUS
CAL_CC
R
CU_CCR
R
EL
ALU
BUS-MUX
PC+1/+2
ADDMUX
ALU
CLRA
CLRB
LdA
LdB
LdX
LdY
LdSP
LdC
CLdP
C
MACC
MX
MY
MSP
MCC
MPC
MOUT1
MOUT2
00000000C
C
BUS
LALU
L
ALU
BUS-MUX
CTRLUNI
T
ADDM
UX
BUS-MUX
ALU
MI1
MI2
MO1
MO2
MAB
M0F
INC_1_
2
16. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
REGISTERs BLOCK
16
IXH IXL IYH IYL SPH SPL
Internal Data Bus
CCR
BMUX
A B
BA
ALU
All Arithmetic and Logic Operations + Pre/Post
Increment/decrement + CCR Check!
RMUX2RMUX1
BUS-MUX
PC
H
PC
L
PCMUX
+1/+2
MUXIX MUXSPMUXIY MUXAcc
AMUX
MUXCC
Pre/Post
Imm
Address C
C
C
U
18. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
REGISTERS AND ALU
18
ALU
A B
D
MAR MDR
PC
IR
1
6
1
6
ADDR15:0 DATA15:0
T
Program
Counter
Instruction
Register
Memory Data
Register
Memory Address
Register
Arithmetic Logic
Unit
Temporary
Register
35. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
Conclusion
• As a designer, the first thing I have defined
at the beginning of the design was
Partitioning
• Then started with the simplest in order to
improve the Design based on thinking,
reading and people additions.
• The Design was simply two main parts: Data
Path and Control Unit, including the address
calculation unit and memory interface.
35
CONTINUE
•Design PFU (Pre-Fetch Unit), Finish CU
(Control Unit) and connect them to Data Path.
Future Work
36. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
Conclusion - Detailes
Based on the work that has been done on the design of the CPU12:
The objective of the Project was to design the two units of the CPU12, (Data
Path and Control Unit), individually then combine them together to run a
simple sequence of instructions.
Additional challenge was to make the design as structural as it possible, and
that has been satisfied for the Data Path Unit whereas Control unit was
designed as a state machine which is totally in behavioural Level.
There were many trials to come out with final Data Path Unit circuits. For
instance, the Register block has been designed several times and in different
considerations.
The answer to the question: What is the first thing I should know in doing such
design was the key of the understanding and imagination of what id going on.
It leads, at then end, to draw a reasonable block diagram that meets the
requirements.
The Idea of Partitioning was behind the ability to design CPU12 block by block
considering the relations between all blocks and the targeted instruction set.
During my work in this design, I’ve got the Idea of starting with the simplest
then improve the Design based on thinking, reading and people additions.
I’ve gained a good understanding of how embedded systems work, how to
manage a microcontroller processes and how to connect its parts together in
order to run a simple code.
36
37. Wednesday 01/07/2009 37IBRAHIM HAZMI - SECE
“The "immediate" operands are the part of
instruction, so it's the loaded into instruction
queue by fetching unit. Other operands have to
be read during execution of instruction. These
operands cannot be loaded directly by fetching
unit because the operands may be changed
while the instruction is waiting in queue.”
POINT OF DISCUSSION
My question is about the fetch unit - including the
Instruction queue - is it the only way that CPU12
speaks to memory or the CPU12 accesses memory
directly when it gets the instruction's operands?
I'm bit confused!
38. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
Thing need to be checked
• Make sure that the operations on CCR affect New-CCR
correctly.
• Shift - Memory against Accumulators A and D.
• Could some operations be done inside the Accumulators?
(Inc/Dec/Sh)?
• Do All Shifts need to be done through ALU because there
are some shifts on Memory Data.
• Transfer? TFR abcdxys,abcdxys /Load and Store paths?
• Apply Max/Min function using Comparator signals?
• A Swap B is done through the Accumulators Circuit.
• What happen if a Register is incremented from FFFF to be
0000? Carry?
• Check the functionality of the Divider!
38
Done in the Final submission!
39. Wednesday 01/07/2009IBRAHIM HAZMI - SECE
References
1. Enoch O. Hwang, 2005, Digital Logic and Microprocessor Design With VHDL, La Sierra University, Riverside,
USA.
2. Altera Co, 2009, AN 311: Standard Cell ASIC to FPGA Design, Methodology and Guidelines
3. E. Ostúa, J. Juan Chico, J. Viejo, M. J. Bellido, D. Guerrero, A. Millán & P. Ruiz-de-Clavijo, AN SOC DESIGN
METHODOLOGY FOR LEON2 ON FPGA, Universidad de Sevilla.
4. Volnei A. Pedroni, 2004, Circuit Design with VHDL, MIT Press, Cambridge, Massachusetts, London, England
5. PONG P. CHU, 2006, RTL HARDWARE DESIGN USING VHDL, Cleveland State University, WlLEY
INTERSCIENCE.
6. Peter J. Ashenden, 1990, The VHDL Cookbook, Dept. Computer Science, University of Adelaide, SA, 1ST
Edition
7. M. Morris Mano, 1993, Computer System Architecture, 3rd Edition, Prentice Hall Int.
8. Digitaaltehnika erikursus, Digital Design Methodology
9. Freescale Semiconductor, 2006, CPU12 Reference Manual (CPU12RM); Motorola M68HC12 and HCS12
Microcontrollers, Rev. 4.0
10. HC11: http://online.sfsu.edu/~valverde/ENGR/ENGR478_s07/welcome.htm, Dr. Ricardo V., 2004, Lecture
Notes; ENGR 478,
11. LC3: http://www.et.byu.edu/groups/ece224web/lectures/LC3-2.pdf, 20/07/2009
12. http://www.eng.auburn.edu/~nelson/, 13/03/2009.
13. http://oucsace.cs.ohiou.edu/~avinashk/, 13/03/2009.
14. http://cs.lasierra.edu/~ehwang/mybook/toc.html, 13/03/2009.
15. http://lap2.epfl.ch/courses/archord1/, 13/03/2009.
16. http://www.ece.tamu.edu/~vinith/ecen248/index_files/lab_manual_spring09.pdf, 13/03/2009
17. http://www.eda-stds.org/rassp/, 13/03/2009.
18. http://vlsi.ee.hacettepe.edu.tr/links.html, 13/03/2009.
19. Reto Z., 1999, Lecture notes on Computer Arithmetic: Principles, Architectures, and VLSI Design, Integrated
Systems Laboratory, Swiss Federal Institute of Technology (ETH), Switzerland.
39