2008
TERM PROJECT

   Design of Multilevel Cache
     Memory using VHDL

Anish Goel
216-67-817
FALL-08
NJIT

Computer Syst...
Page |2


CONTENTS

  1.   Problem Statement……………………………………………………………………………………………. 4
  2.   Design Description and Block Dia...
Page |3



List of Figures

Figure 1: System block layout………………………………………………………………………………………………………… 5

Figure 2: L1 Cache Bl...
Page |4


1. Problem Statement:

To design a multilevel cache memory for a uni-processor system using VHDL.

Cache Memory ...
Page |5


2. Design Description:

The design consists of two levels of cache memory as Level 1 (L1) and Level 2 (L2) and a...
Page |6


L1 and L2 specifications:
Physical Address: 32-bit

L1 Cache:
    Refer to figure 2 for the internal architectur...
Page |7


L1 Cache Memory Architecture




                                    A0-A31                             32-bit A...
Page |8


L2 Cache Memory Architecture



                                    A0-A31                               32-bit ...
Page |9


Cache Controller


The following diagram depicts all the signals of the cache controller that are used to carry ...
P a g e | 10


3. Design Approach:

The project is designed using mixed style of modeling in VHDL. ModelSim SE PLUS 6.2c p...
P a g e | 11


4. Simulation Results:

The following figures depict the simulation results of the higher entities like the...
P a g e | 12


L2 Cache



                         Cache hit in Way 5 L2 cache (for same address)




Figure 6: Simulatio...
P a g e | 13


Cache Controller:


                                        1                                    2         ...
P a g e | 14


5. Observations:

  1. Level 1 and Level 2 cache memories give the correct results at the output signals ca...
P a g e | 15


Appendix A: VHDL Codes

The following are the VHDL codes for all the .vhd files in the project design.
File...
P a g e | 16


                     port(d: in std_logic;
                         q: out std_logic;
                     ...
P a g e | 17


      F : in std_logic);
end decoder4to16;

architecture Behavioral of decoder4to16 is

begin
          pro...
P a g e | 18


                   when"1101"=>
                   E <= (others =>'0');
                   E(13) <= '1';
  ...
P a g e | 19


use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity inout_buf is
          Port ( A : in...
P a g e | 20


                             set <= addr (9 downto 2);
                             tag <= addr (31 downto ...
P a g e | 21


Tag Register : 22 –bit

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.S...
P a g e | 22


                                clk: in std_logic);
end tag_mem;

architecture Behavioral of tag_mem is

  ...
P a g e | 23


Cache Tag Memory:

library ieee;
   use ieee.std_logic_1164.all;
   use IEEE.std_logic_arith.all;
   use ie...
P a g e | 24


Cache Tag Comparator:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.ST...
P a g e | 25


        then case EN is
        when "00" => Dout <= Din(127 downto 96);
        when "01" => Dout <= Din(9...
P a g e | 26


                     set: out std_logic_vector(7 downto 0);
                     tag: out std_logic_vector(...
P a g e | 27


Cache Controller VHDL code:

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use I...
P a g e | 28


             L2_miss <= '1';
           end if;
     end if;
end if;
end if;
end if;
  end process;
       ...
P a g e | 29


Appendix B: Simulation Results of Discrete Blocks

     Data Buffer L1 Cache
               32 Bit word Ou...
P a g e | 30


   Address Field Separator




   Memory Decoder
P a g e | 31


Appendix C: Synthesis Results
L1 Cache Memory
Signal Description
Add(31:0) 32 bit address from microprocess...
P a g e | 32


Cache Controller
The figure depicts all the control signals and buses of the cache controller of the system...
P a g e | 33


   A view of the Xilinx ISE 9.1i Synthesis Tool window




   A view of the ModelSim SE Plus 6.2c Simulat...
P a g e | 34


References


     Computer Architecture and Organization By: John P. Hayes. (Mc Graw Hill publication)
  ...
Upcoming SlideShare
Loading in...5
×

Project Report Multilevel Cache

9,382

Published on

This is a project implemented in VHDL. It is design of a multi-level cache memory for a uni-processor system. The document also includes some of the simulation and synthesis results.

Published in: Education, Technology, Business
2 Comments
24 Likes
Statistics
Notes
  • how to download this file, tks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • how to download this file.......... help me please....
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
9,382
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
2
Likes
24
Embeds 0
No embeds

No notes for slide

Project Report Multilevel Cache

  1. 1. 2008 TERM PROJECT Design of Multilevel Cache Memory using VHDL Anish Goel 216-67-817 FALL-08 NJIT Computer Systems Architecture Instructor: Prof. S.G. Ziavras
  2. 2. Page |2 CONTENTS 1. Problem Statement……………………………………………………………………………………………. 4 2. Design Description and Block Diagram……………………………………………………………….. 5 3. Design Approach……………………………………………………………………………………………….. 10 4. Results………………………………………………………………………………………………………………. 11 5. Observations……………………………………………………………………………………………………… 14 Appendix A: VHDL Code…………………………………………………………………………………………… 15 Appendix B: Simulation Results………………………………………………………………………………… 29 Appendix C: Synthesis Results………………………………………………………………………………….. 31 References
  3. 3. Page |3 List of Figures Figure 1: System block layout………………………………………………………………………………………………………… 5 Figure 2: L1 Cache Block Diagram………………………………………………………………………………………………….. 7 Figure 3: L2 Cache Block Diagram………………………………………………………………………………………………….. 8 Figure 4: Cache Controller Signals…………………………………………………………………………………………………. 9 Figure 5: Simulation Results L1 Cache……………………………………………………………………………………………. 11 Figure 6: Simulation Results L2 Cache……………………………………………………………………………………………. 12 Figure 7: Simulation Results Cache Controller………………………………………………………………………………. 13
  4. 4. Page |4 1. Problem Statement: To design a multilevel cache memory for a uni-processor system using VHDL. Cache Memory Specifications: CACHE SIZE MAPPING L1 Cache 16KB 4-way set associative L2 Cache 128KB 8-way set associative Features:  Unified I & D cache at both levels L1 and L2  Set associative mapping  Write through policy  Common cache controller for L1 and L2 The project aims at designing the above mentioned memory hierarchy of cache memories for uni-processor system and obtain the simulation results using the ModelSim platform. In addition, the Xilinx ISE platform depicts the synthesized system for the designed VHDL code.
  5. 5. Page |5 2. Design Description: The design consists of two levels of cache memory as Level 1 (L1) and Level 2 (L2) and a cache controller that communicates between microprocessor and cache memories to carry out all memory related operations. The size and specifications of the cache memories are stated in the problem specification above and the design approach is described in the next section. Figure 1 shows the block diagram of the designed system. Microprocessor Cache Controller System Busses Level 1 Cache Level 2 Memory Cache Memory Figure 1: System block layout The functionality of the design is explained below: 1. Cache controller receives the address that microprocessor wants to access. 2. Cache controller looks for the address in the L1 cache. 3. If the address is in L1 cache (cache hit occurs in L1), the data from the location is provided to the microprocessor via the data bus. 4. If the address is not found in L1 cache i.e. cache miss occurs. 5. Cache controller looks for the same address in the L2 cache. 6. If the address is found in L2 cache (cache hit occurs in L2), the data from the location is provided to the microprocessor and the same data is also replaced in the L1 cache. 7. If the address is not found in L2 cache i.e. cache miss occurs in L2. 8. The controller has to request the same address in the main memory. This functionality is not modeled in the project, here the cache controller gives a signal to the microprocessor that a cache miss has occurred in the L2 cache. The microprocessor should then take appropriate action.
  6. 6. Page |6 L1 and L2 specifications: Physical Address: 32-bit L1 Cache: Refer to figure 2 for the internal architecture of L1 cache. Address Format (fields) Word Size: 32-bit (4 bytes) Tag: 22-bit Set Address: 8-bit Word: 2-bit Physical Memory Address: 32-bit TAG: 22 bit SET: 8-bit Address WORD: 2-bit L1 Cache Memory: 16KB 4-way set associative unified instruction and data cache. Total number of sets: 256*4 = 1024 sets L2 Cache: Refer to figure 3 for the internal architecture of L2 cache. Address Format (fields) Word Size: 32-bit (4 bytes) Tag: 20-bit Set Address: 10-bit Word: 2-bit Physical Memory Address: 32-bit TAG: 20 bit SET: 10-bit Address WORD: 2-bit L2 Cache Memory: 128KB 8-way set associative unified instruction and data cache. Total number of sets: 1024x8 = 8192 sets
  7. 7. Page |7 L1 Cache Memory Architecture A0-A31 32-bit Address Bus W WA WA Y Word Address A0-A1 W AY 3 A Y 2 T Y 1 A Set Address A2-A9 0 G A2-A9 A2-A9 A C Set 0: T0-T21 D Set 0: D0-D127 D A A A D C Set 1: T0-T21 T Set 1: D0-D127 R H … A … E E S M S D E I M R. O Set 255: T0-T21 R Set 255: D0-D127 Y A10-A31 T0-T21 Enable Data (4 Words) A0-A1 Tag Address Comparator Data buffer Hit/Miss 32-bit Data Figure 2: L1 Cache Block Diagram
  8. 8. Page |8 L2 Cache Memory Architecture A0-A31 32-bit Address Bus WAY7 W Word Address A0-A1 W A A Y Y 1 T 0 A Set Address A2-A11 G A2-A11 A2-A11 Set 0: T0-T19 Set 0: D0-D127 A C D D A Set 1: T0-T19 A Set 1: D0-D127 A D C … T … R H A E E S M S D E I M R. Set 1023: T0-T19 O Set 1023: D0-D127 R Y A10-A31 T0-T19 Data (4 Words) A0-A1 Enable Tag Address Comparator Data buffer Hit/Miss 32-bit Data Figure 3: L2 Cache Block Diagram
  9. 9. Page |9 Cache Controller The following diagram depicts all the signals of the cache controller that are used to carry out all the memory related operations between microprocessor and L1 and L2 cache. Reset Controller Controller Busy DAV_L1 DAV_L2 Address Request From microprocessor Cache Address Bus A31 –AA0 To Main Memory Cache Hit/Miss (L1) From each Block Controller Data bus D0-D31 L1 Enable Read Write L2 Enable Cache Hit/Miss (L2) From each Bloc Address and Data Bus to L1 L2 Cache Figure 4: Cache Controller Signals DAV_L1/L2: Data valid from L1 or L2 cache memory on the system data bus when a cache hit occurs in the corresponding block.
  10. 10. P a g e | 10 3. Design Approach: The project is designed using mixed style of modeling in VHDL. ModelSim SE PLUS 6.2c platform from Mentor Graphics is used as the design platform and simulator. To achieve the synthesis of the design, Xilinx ISE 9.1i platform is used. The basic storage element in the memory is modeled using a D flip-flop. Each D flip-flop stores a single bit. Arrays of this storage element is constructed using structural style of modeling in VHDL to form registers (for example: 22 bit tag register) and these registers are again used to create the complete memory array. The memory consists of L1 cache that is arranged as follows: L1 Cache capacity details Cache data memory: Word size = 32 bits Line size = 128 bits (4 words) No. of lines = 256 per block Thus total capacity is = 256*4 = 1KWords (4KB) {Per way} Thus for 4 way set associative cache memory: Total capacity is 1KWord x 4 = 4KWord (16KB) Cache Tag memory: Tag size = 22 bits Cache Tag comparator: 22 bit comparator Input Output Buffer: 128 bits The L2 cache is also designed using the same concept except for the difference that the size of the L2 cache is much larger then L1 cache and also it is a 8-way set associative cache. L2 Cache capacity details Cache data memory: Word size = 32 bits Line size = 128 bits (4 words) No. of lines = 1024 per block Thus total capacity is = 1024*4 = 4KWords (16KB) {per way} Thus for 4 way set associative cache memory: Total capacity is 4KWord x 8 = 32KWord (128KB) Cache Tag memory: Tag size = 20 bits Cache Tag comparator: 20 bit comparator Input Output Buffer: 128 bits All the operations in the L1 and L2 cache are guided by a cache controller. Any address request from the microprocessor is first directed to the cache controller. The cache controller then looks for the address in the L1 cache, if a cache hit occurs in L1 the data from the requested location is transferred to the microprocessor. In case a cache miss occurs in L1, the cache controller looks for the same address in L2 cache and if a cache hit occurs in L2, the controllers transfers the same data to the microprocessor as well as the L1 cache.
  11. 11. P a g e | 11 4. Simulation Results: The following figures depict the simulation results of the higher entities like the L1, L2 cache and cache controller. The results of discrete blocks like memory decoder, cache tag comparator etc. is shown in appendix B. L1 Cache: Cache Miss in L1 Figure 5: Simulation Results L1 Cache
  12. 12. P a g e | 12 L2 Cache Cache hit in Way 5 L2 cache (for same address) Figure 6: Simulation Results L2 Cache
  13. 13. P a g e | 13 Cache Controller: 1 2 3 Figure 7: Simulation Results Cache Controller 1: Cache Hit in L1 cache for specified address. 2: Cache miss in L1 cache for different address then address in instance 1 3: Cache hit in L2 cache for same address as in address in instance 2 Important Note: The above simulation results are obtained with respect to specified locations to test the functionality of the memory hierarchy. The data was previously stored on these addresses. However the address request from the microprocessor depends in the program code. Also the microprocessor generates address continuously and randomly based on the nature of the program. Thus to test the performance of this cache a complete hardware is needed that will carry out the functionality of the microprocessor.
  14. 14. P a g e | 14 5. Observations: 1. Level 1 and Level 2 cache memories give the correct results at the output signals cache_hit and Cache_miss if a match occurs between the tag part of the address requested by the microprocessor and the corresponding entry in the cache directory. 2. The read/write pins do not have any signals (Logic levels) on them as it is to be specified by the microprocessor as to a read operation or a write operation is to occur. 3. The cache controller delivers the appropriate signals to the cache memories L1 and L2 to match the tag part of the address requested by the microprocessor and if a cache hit occurs, it indicates this to the microprocessor by means of DAV_L1 or DAV_L2 (Data Valid) that the data over the data bus is valid data requested by the microprocessor from the requested address. 4. A cache hit in L1 or L2 cache directly outputs the data from the requested address to the data bus. This is not indicated in the above simulation result as many of the signals are activated in the internal architecture and not visible in the higher level hierarchy. 5. To observe the results mention in the point 4 above, some of the blocks like tag comparator and output buffer needs to be simulated separately. Some of these results are indicated in Appendix B.
  15. 15. P a g e | 15 Appendix A: VHDL Codes The following are the VHDL codes for all the .vhd files in the project design. Files related to L1 and L2 cache memories. D Flip-Flop library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity dff is Port ( d : in std_logic; clk : in std_logic; q : out std_logic; en : in std_logic); end dff; architecture Behavioral of dff is begin process(clk) begin if en='1' then if clk'event and clk='1' then q<= d; end if; else q<= 'Z'; end if; end process; end Behavioral; Cache Data Line: 128 bits (4, 32-bit words) library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity reg_128_data is Port ( D : in std_logic_vector(127 downto 0); clk : in std_logic; Q : out std_logic_vector(127 downto 0); en : in std_logic); end reg_128_data; architecture Behavioral of reg_128_data is component dff
  16. 16. P a g e | 16 port(d: in std_logic; q: out std_logic; clk: in std_logic; en : in std_logic); end component; signal outbuf: std_logic_vector(127 downto 0); begin gen: for i in 0 to 127 generate mem: dff port map (d(i),q(i),clk,en); end generate; end Behavioral; Data Block L1 Cache (256 Cache Lines) library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; --This block generates 256x32 cache data memory entity data_mem is Port ( Din : in std_logic_vector(127 downto 0); Dout : out std_logic_vector(127 downto 0); EN : in std_logic_vector(255 downto 0); clk: in std_logic); end data_mem; architecture Behavioral of data_mem is component reg_128_data Port ( D : in std_logic_vector(127 downto 0); clk : in std_logic; Q : out std_logic_vector(127 downto 0); en : in std_logic); end component; begin GEN_array: for i in 0 to 255 generate REGS: reg_128_data port map (Din(127 downto 0),clk,Dout(127 downto 0),EN(i)); end generate; end Behavioral; 4:16 decoder library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity decoder4to16 is Port ( D : in std_logic_vector(3 downto 0); E : out std_logic_vector(15 downto 0);
  17. 17. P a g e | 17 F : in std_logic); end decoder4to16; architecture Behavioral of decoder4to16 is begin process(D,F) begin if F='1' then case D is when"0000"=> E <= (others =>'0'); E(0) <= '1'; when"0001"=> E <= (others =>'0'); E(1) <= '1'; when"0010"=> E <= (others =>'0'); E(2) <= '1'; when"0011"=> E <= (others =>'0'); E(3) <= '1'; when"0100"=> E <= (others =>'0'); E(4) <= '1'; when"0101"=> E <= (others =>'0'); E(5) <= '1'; when"0110"=> E <= (others =>'0'); E(6) <= '1'; when"0111"=> E <= (others =>'0'); E(7) <= '1'; when"1000"=> E <= (others =>'0'); E(8) <= '1'; when"1001"=> E <= (others =>'0'); E(9) <= '1'; when"1010"=> E <= (others =>'0'); E(10) <= '1'; when"1011"=> E <= (others =>'0'); E(11) <= '1'; when"1100"=> E <= (others =>'0'); E(12) <= '1';
  18. 18. P a g e | 18 when"1101"=> E <= (others =>'0'); E(13) <= '1'; when"1110"=> E <= (others =>'0'); E(14) <= '1'; when others => E <= (others =>'0'); E(15) <= '1'; end case; end if; end process; end Behavioral; Memory Decoder: 8:256 decoder library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; --This Block Generates a 8:2^8 decoder entity mem_decoder is port(S: in std_logic_vector(7 downto 0); EN: out std_logic_vector(255 downto 0); Mem_EN: in std_logic); end mem_decoder; architecture Behavioral of mem_decoder is component decoder4to16 is Port ( D : in std_logic_vector(3 downto 0); E : out std_logic_vector(15 downto 0); F : in std_logic); end component; signal C1: std_logic_vector(15 downto 0); begin stage1: decoder4to16 port map(S(7 downto 4),C1(15 downto 0),Mem_EN); struct: for i in 16 downto 1 generate stage2: decoder4to16 port map (S(3 downto 0),EN(((16*i)-1) downto ((16*i)-16)),C1(i-1)); end generate; end Behavioral; Input Output Buffer: library IEEE; use IEEE.STD_LOGIC_1164.ALL;
  19. 19. P a g e | 19 use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity inout_buf is Port ( A : inout std_logic_vector(127 downto 0); B : inout std_logic_vector(127 downto 0); WR : in std_logic; RD : in std_logic); end inout_buf; architecture Behavioral of inout_buf is begin process(WR,RD) begin if WR='1' then B<= A; else B<= (others => ‘Z’); if RD='1' then A<= B; else A<=(others => ‘Z’); end if; end if; end process; end Behavioral; Address Fields: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity address_field is port(addr: in std_logic_vector(31 downto 0); word: out std_logic_vector(1 downto 0); set: out std_logic_vector(7 downto 0); tag: out std_logic_vector(21 downto 0); sep: in std_logic); end address_field; architecture Behavioral of address_field is begin process(addr) begin if sep = '1' then word <= addr (1 downto 0);
  20. 20. P a g e | 20 set <= addr (9 downto 2); tag <= addr (31 downto 10); end if; end process; end Behavioral; CACHE L1: library ieee; use ieee.std_logic_1164.all; use IEEE.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity memory_L1 is port(Add: in std_logic_vector(7 downto 0); Data: inout std_logic_vector(127 downto 0); RD,WR,CLK,EN: in std_logic); end memory_L1; architecture struct of memory_L1 is component data_mem is Port ( Din : in std_logic_vector(127 downto 0); Dout : out std_logic_vector(127 downto 0); EN : in std_logic_vector(255 downto 0); clk: in std_logic); end component; component inout_buf is Port ( A : inout std_logic_vector(127 downto 0); B : inout std_logic_vector(127 downto 0); WR : in std_logic; RD : in std_logic); end component; component mem_decoder is port(S: in std_logic_vector(7 downto 0); EN: out std_logic_vector(255 downto 0); Mem_EN: in std_logic); end component; signal int: std_logic_vector(255 downto 0); signal dat: std_logic_vector(127 downto 0); begin decoder: mem_decoder port map (Add,int,EN); buff: inout_buf port map (Data,dat,WR,RD); mem: data_mem port map (dat,Data,int,CLK); end struct;
  21. 21. P a g e | 21 Tag Register : 22 –bit library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity tag_data_L1 is Port ( D : in std_logic_vector(21 downto 0); clk : in std_logic; Q : out std_logic_vector(21 downto 0); en : in std_logic); end tag_data_L1; architecture Behavioral of tag_data_L1 is component dff port(d: in std_logic; q: out std_logic; clk: in std_logic; en : in std_logic); end component; signal outbuf: std_logic_vector(21 downto 0); begin gen: for i in 0 to 21 generate mem: dff port map (d(i),q(i),clk,en); end generate; end Behavioral; Tag Memory : 256x22 bit library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; --This block generates 255x28 data cache tag memory entity tag_mem is Port ( Din : in std_logic_vector(21 downto 0); Dout : out std_logic_vector(21 downto 0); EN : in std_logic_vector(255 downto 0);
  22. 22. P a g e | 22 clk: in std_logic); end tag_mem; architecture Behavioral of tag_mem is component tag_data_L1 Port ( D : in std_logic_vector(21 downto 0); clk : in std_logic; Q : out std_logic_vector(21 downto 0); en : in std_logic); end component; begin GEN_array: for i in 0 to 255 generate REGS: tag_data_L1 port map (Din(21 downto 0),clk,Dout(21 downto 0),EN(i)); end generate; end Behavioral; Cache Tag Buffer: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity inout_buf_tag is Port ( A : inout std_logic_vector(21 downto 0); B : inout std_logic_vector(21 downto 0); WR : in std_logic; RD : in std_logic); end inout_buf_tag; architecture Behavioral of inout_buf_tag is begin process(WR,RD) begin if WR='1' then B<= A; else B<= "ZZZZZZZZZZZZZZZZZZZZZZZ"; if RD='1' then A<= B; else A<="ZZZZZZZZZZZZZZZZZZZZZZZ"; end if; end if; end process; end Behavioral;
  23. 23. P a g e | 23 Cache Tag Memory: library ieee; use ieee.std_logic_1164.all; use IEEE.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity cache_tag_data_L1 is port(Add: in std_logic_vector(7 downto 0); tag: inout std_logic_vector(21 downto 0); RD,WR,CLK,EN: in std_logic); end cache_tag_data_L1; architecture struct of cache_tag_data_L1 is component tag_mem is Port ( Din : in std_logic_vector(21 downto 0); Dout : out std_logic_vector(21 downto 0); EN : in std_logic_vector(255 downto 0); clk: in std_logic); end component; component inout_buf_tag is Port ( A : inout std_logic_vector(21 downto 0); B : inout std_logic_vector(21 downto 0); WR : in std_logic; RD : in std_logic); end component; component mem_decoder is port(S: in std_logic_vector(7 downto 0); EN: out std_logic_vector(255 downto 0); Mem_EN: in std_logic); end component; signal int: std_logic_vector(255 downto 0); signal dat: std_logic_vector(21 downto 0); begin decoder: mem_decoder port map (Add,int,EN); buff: inout_buf_tag port map (tag,dat,WR,RD); mem: tag_mem port map (dat,tag,int,CLK); end struct;
  24. 24. P a g e | 24 Cache Tag Comparator: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; --This block compares the two addresses and produces a cache hit/miss signal entity tag_comparator is port( Addr_req: in std_logic_vector (21 downto 0); Addr_tag: in std_logic_vector (21 downto 0); tag_hit: out std_logic; EN: in std_logic); end tag_comparator; architecture Behavioral of tag_comparator is begin process(Addr_req,Addr_tag,EN) begin tag_hit <= '0'; if EN = '1' then if Addr_req = Addr_tag then tag_hit <= '1'; end if; end if; end process; end Behavioral; L1 DATA BUFFER: library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; -- 128 bit data buffer for L1 data cache entity data_buff_L1 is Port ( Din : in std_logic_vector(127 downto 0); Dout : out std_logic_vector(31 downto 0); EN : in std_logic_vector(1 downto 0); clk: in std_logic); end data_buff_L1; architecture behaviour of data_buff_L1 is begin process(clk,EN) begin if clk'event and clk = '1'
  25. 25. P a g e | 25 then case EN is when "00" => Dout <= Din(127 downto 96); when "01" => Dout <= Din(95 downto 64); when "10" => Dout <= Din(63 downto 32); when others => Dout <= Din(31 downto 0); end case; end if; end process; end behaviour; L1 Data Cache: Way 0 library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity L1_data_way0 is port( Add: in std_logic_vector(31 downto 0); data_out : out std_logic_vector(31 downto 0); cache_hit: out std_logic; clk,EN,RD,WR: in std_logic); end L1_data_way0; architecture structure of L1_data_way0 is signal F1: std_logic_vector(1 downto 0); signal F2: std_logic_vector(7 downto 0); signal F3: std_logic_vector(21 downto 0); signal data: std_logic_vector(127 downto 0); signal tag: std_logic_vector(21 downto 0); signal select_add: std_logic_vector(255 downto 0); component memory_L1 is port(Add: in std_logic_vector(7 downto 0); Data: inout std_logic_vector(127 downto 0); RD,WR,CLK,EN: in std_logic); end component; component cache_tag_data_L1 is port(Add: in std_logic_vector(7 downto 0); tag: inout std_logic_vector(21 downto 0); RD,WR,CLK,EN: in std_logic); end component; component address_field is port(addr: in std_logic_vector(31 downto 0); word: out std_logic_vector(1 downto 0);
  26. 26. P a g e | 26 set: out std_logic_vector(7 downto 0); tag: out std_logic_vector(21 downto 0); sep: in std_logic); end component; L1 4-way set associative cache: Main hierarchy for L1 cache memory. library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity L1_cache is port( Add: in std_logic_vector(31 downto 0); data_out : inout std_logic_vector(31 downto 0); cache_hit: out std_logic_vector(3 downto 0); clk,RD,WR: in std_logic; EN: in std_logic_vector(3 downto 0)); end L1_cache; architecture structure of L1_cache is component L1_data_way0 is port( Add: in std_logic_vector(31 downto 0); data_out : inout std_logic_vector(31 downto 0); cache_hit: out std_logic; clk,EN,RD,WR: in std_logic); end component; begin sets: for i in 0 to 3 generate struct: L1_data_way0 port map (Add,data_out,cache_hit(i),clk,EN(i),RD,WR); end generate; end structure; L2 Cache memory uses all the above specified .vhd files. Changes are made accordingly to increase the register size and capacity.
  27. 27. P a g e | 27 Cache Controller VHDL code: library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity cache_controller is Port (add_req: in std_logic_vector(31 downto 0); C_busy: out std_logic; C_reset,clk,EN:in std_logic; L1_miss,L2_miss,RD: out std_logic; Add_L1: out std_logic_vector(31 downto 0); DAV_L1: out std_logic; cache_hit_l1: in std_logic_vector(3 downto 0); Add_L2: out std_logic_vector(31 downto 0); DAV_L2: out std_logic; cache_hit_l2: in std_logic_vector(7 downto 0)); end cache_controller; Architecture behaviour of cache_controller is signal C1,C2: std_logic_vector(3 downto 0); begin process(add_req,EN) begin if clk'event and clk='1' then if EN = '1' then Add_L1 <= add_req; if (cache_hit_l1(0)='1' or cache_hit_l1(1)='1' or cache_hit_l1(2)='1' or cache_hit_l1(3)='1') then L1_miss <= '0'; DAV_l1<='1'; else L1_miss <= '1'; Add_L2 <= add_req; if (cache_hit_l2(0)='1' or cache_hit_l2(1)='1' or cache_hit_l2(2)='1' or cache_hit_l2(3)='1' or cache_hit_l2(4)='1' or cache_hit_l2(5)='1' or cache_hit_l2(6)='1' or cache_hit_l2(7)='1') then L2_miss <= '0'; DAV_l2<='1'; else L2_miss <= '1'; if clk'event and clk='1' then c1<=c1+1; Add_L2<=add_req+1; RD <=clk; else
  28. 28. P a g e | 28 L2_miss <= '1'; end if; end if; end if; end if; end if; end process; end behaviour;
  29. 29. P a g e | 29 Appendix B: Simulation Results of Discrete Blocks  Data Buffer L1 Cache 32 Bit word Output selected using A0-A1 Address Lines  Cache Tag Comparator Cache Hit Cache Miss
  30. 30. P a g e | 30  Address Field Separator  Memory Decoder
  31. 31. P a g e | 31 Appendix C: Synthesis Results L1 Cache Memory Signal Description Add(31:0) 32 bit address from microprocessor Clk: Clock input EN: Memory enable/select signal RD,WR: Read, Write Signal Cache_hit: cache hit/miss signal Data_out(31:0): Bi-directional data bus L1 Cache Memory Block generated using Synthesis Tool Internal Architecture: Includes Blocks:  Address field Separator  Cache data memory  Cache Tag memory  Cache tag comparator  Input/output buffer. Internal Architecture of L1 Cache memory Block L2 cache memory is identical to the L1 cache memory with only difference in number of sets per blocks and total number of blocks.
  32. 32. P a g e | 32 Cache Controller The figure depicts all the control signals and buses of the cache controller of the system. Add_req: Address request from the microprocessor Cache_hit_l1(3:0): cache hit from L1 cache memory block Cache_hit_l2(7:0): cache hit from L2 cache memory block Add_L1(31:0): Address bus to L1 cache Add_L2(31:0): Address bus to L2 cache C_busy: Cache controller busy (status signal) Clk: Clock input DAV_L1/L2: Data valid on data bus from respective cache memory C_reset: Reset Cache controller L1/L2_miss: Cache miss from L1/L2 cache RD: Read Cache controller status The figure below shows the internal architecture of the cache controller synthesized using the Xilinx ISE 9.1i platform. Cache Controller internal Architecture
  33. 33. P a g e | 33  A view of the Xilinx ISE 9.1i Synthesis Tool window  A view of the ModelSim SE Plus 6.2c Simulation Tool window
  34. 34. P a g e | 34 References  Computer Architecture and Organization By: John P. Hayes. (Mc Graw Hill publication)  Fundamentals of Digital Logic with VHDL design By: Stephen Brown & Zvonko Vranesic (TATA Mc Graw Hill)  A Circuit Design of 32KByte Integrated Cache Memory. TOSHIBA Corporation, TOSHIBA Microcomputer Eng.Corp.  http://www.ece.cmu.edu/~ece741  http://en.kioskea.net/pc/memoire.php3  Computer Architecture - A Quantitative Approach, Fourth Edition by John L. Hennessy and David A. Patterson  Advanced Computer Architecture: Parallelism, Scalability, Programmability By Kai Hwang  http://web.njit.edu/~rlopes/cache-performance.pdf  Lecture notes on memory hierarchy design by Prof. S.G. Ziavras including http://web.njit.edu/~ziavras/ECE690-NEW/SYLLABUS-NOTES/CH-5-APP-C/AppC-ch-5-m1- Ziavras.pdf  http://cs.uccs.edu/~cs520/S99ch5.PDF  High performance memories. By : Betty Prince

×