SlideShare a Scribd company logo
1 of 14
Memory
Access
Scheduling
Lakshmi Yasaswi Kamireddy
Koudinya Nidumolu
Venkata Harsha Akkipedi
1 | P a g e
List of Tables Page Number
Table 1: Sequence of Issued requests 4
Table 2: Memory System Configuration 7
Table 3: Comparison of key metrics on baseline and implemented schedulers 8
List of Figures Page Number
Figure 1: Internal organization of modern DRAMs. 2
Figure 2: Total Execution Time-RF-Set1 9
Figure 3: Total Execution Time-RF-Set2 9
Figure 4: Total Execution Time-BF-Set1 9
Figure 5: Total Execution Time-BF-Set2 9
Figure 6: EDP-RF Set1 10
Figure 7: EDP-RF Set2 10
Figure 8: EDP-BF Set1 10
Figure 9: EDP-BF Set2 10
Figure 11: Max Slowdown-RF-Set2 11
Figure 10: Max Slowdown-RF-Set1 11
Figure 12: Max Slowdown-BF-Set1 11
Figure 13: Max Slowdown-BF-Set2 11
SLNO TITLE PAGE
NUMBER
1 Introduction 2
DRAM Configuration 2
Memory Access Scheduling 3
2 Implementation 4
Bank First Memory Scheduling 4
Scheduler Implementation 4
Algorithm 5
Row First Memory Scheduling 5
Scheduler Implementation 5
Algorithm 6
3 Simulator Description 7
DRAM Commands 7
4 Results 8
Execution time 9
Energy delay product 10
Maximum Slowdown 11
5 Conclusions 12
6 References 13
2 | P a g e
INTRODUCTION
DRAM Configuration:
DRAM (Dynamic Random-Access Memory), the most commonly used technology for building main
memory for modern computer system, has been a major performance bottleneck for decades. Throughout
many generations of DRAM, from DDR1 to DDR3, internal memory architecture and performance-
related characteristics of DRAM has experienced little change and most of modern DRAM systems make
use of dual in-line memory modules (DIMM). A basic DRAM consists of one or more DRAM channels,
each channel has one or more memory modules. A modern DDR3 channel typically can support 1-2
DIMMs; each DIMM is typically consists of 1-4 ranks; each rank can be partitioned into multiple (4-16)
banks. Each bank operates independent of the other banks and contains an array of memory cells that are
accessed as an entire row at a time. When a row of this memory array is accessed (row activation) the
entire row of the memory array is transferred into the bank's row buffer. The row buffer serves as a cache
to reduce the latency of subsequent accesses to that row. While a row is active in the row buffer, any
number of reads or writes (column accesses) may be performed, typically with a throughput of one per
cycle. After completing the available column accesses, the cached row must be written back to the memory
array by an explicit operation (bank pre-charge) which prepares the bank for a subsequent row activation.
Figure 1: Internal organization of modern DRAMs.
A bank cannot be accessed during the pre-charge activate latency, as a single cycle is required on the data
pins when switching between read and write column accesses, and a single set of address lines is shared
by all DRAM operations (bank pre-charge, row activation, and column access). The amount of bank
parallelism that is exploited and the number of column accesses that are made per row access dictate the
sustainable memory bandwidth out of such a DRAM.A memory access scheduler must generate a schedule
that conforms to the timing and resource constraints of these modern DRAMs. Each DRAM operation
makes different demands on the three DRAM resources: the internal banks, a single set of address lines,
and a single set of data lines. The scheduler must ensure that the required resources are available for each
DRAM operation it schedules.
Each DRAM bank has two stable states: IDLE and ACTIVE. In the IDLE state, the DRAM is pre-charged
and ready for a row access. It will remain in this state until a row activate operation is issued to the bank.
To issue a row activation, the address lines must be used to select the bank and the row being activated.
Row activation requires say 3 cycles, during which no other operations may be issued to that bank, during
3 | P a g e
that time, however, operations may be issued to other banks of the DRAM. Once the DRAM’s row
activation latency has passed, the bank enters the ACTIVE state, during which the contents of the selected
row are held in the bank’s row buffer. Any number of pipelined column accesses may be performed while
the bank is in the ACTIVE state. To issue either a read or write column access, the address lines are
required to indicate the bank and the column of the active row in that bank. A write column access requires
the data to be transferred to the DRAM at the time of issue, whereas a read column access returns the
requested data three cycles later.
The bank will remain in the ACTIVE state until a pre-charge operation is issued to return it to the IDLE
state. The pre-charge operation requires the use of the address lines to indicate the bank which is to be
pre-charged. Like row activation, the pre-charge operation utilizes the bank resource for 3 cycles, during
which no new operations may be issued to that bank. Again, operations may be issued to other banks
during this time. After the DRAM’s pre-charge latency, the bank is returned to the IDLE state and is ready
for a new row activation operation. DRAMs typically also support column accesses with automatic pre-
charge, which implicitly pre-charges the DRAM bank as soon as possible after the column access. The
shared address and data resources serialize access to the different DRAM banks. While the state machines
for the individual banks are independent, only a single bank can perform a transition requiring a particular
shared resource each cycle. For many DRAMs, the bank, row, and column addresses share a single set of
lines. Hence, the scheduler must arbitrate between pre-charge, row, and column operations that all need
to use this single resource.
Some DRAMs, provide separate row and column address lines (each with their own associated bank
address) so that column and row accesses can be initiated simultaneously. To approach the peak data rate
with serialized resources, there must be enough column accesses to each row to hide the pre-
charge/activate latencies of other banks. Whether or not this can be achieved is dependent on the data
reference patterns and the order in which the DRAM is accessed to satisfy those references. The need to
hide the pre-charge/activate latency of the banks in order to sustain high bandwidth cannot be eliminated
by any DRAM architecture without reducing the pre-charge/activate latency, which would likely come at
the cost of decreased bandwidth or capacity, both of which are undesirable. So several scheduling schemes
are proposed to achieve the reduction in latency.
Memory Access Scheduling:
In a memory controller, the execution of a memory access instruction must adhere to the rules and timing
constraints of the hardware to access data in a modern DRAM. As shown in Figure 1, modern DRAMs
are three-dimensional memory devices with dimensions of bank, row and column. Thus, a location in the
DRAM is identified by an address that consists of bank, row and column fields. The steps of accessing a
location include a pre-charge, a row access, and then a column access. Due to the DRAM structure and its
hardware implementation, sequential accesses to different rows within one bank have high latency,
whereas accesses to different banks or different words within a single row have low latency [9].Memory
access scheduling can effectively reduce the average memory access latency and improve memory
bandwidth utilization by reducing cross-row data access. For example, prioritizing memory requests to
the same bank and the same row can improve performance.
4 | P a g e
IMPLEMENTATION:
In this section we describe the implemented two memory scheduling algorithms that are mainly focused
to exploit the locality property of DRAM memory systems: Bank first Memory scheduling and Row first
memory scheduling
Bank First Memory Scheduling:
Bank-first policy [9] arranges all memory requests by banks, and schedules them in a round-robin manner
according to the bank identifier. This policy is beneficial because the requests to different banks can be
carried out simultaneously. For the request sequence shown in Table 1, the sequence of issued requests by
the bank-first policy will be A-C-D-F-E-B-G-H-J-I.
Table 1: Sequence of Issued requests
Scheduler Implementation:
1. Scheduler gets a random sequence of memory request from the memory controller
2. Memory scheduler checks both read queue and write queue (for draining conditions)
3. Initiate write drain if either the write queue occupancy has reached the HI_WM or if there are no
pending read requests
4. If not in write drain mode initiate read drain mode
5. In the write and read draining modes, scheduling of write and read queue is done based on the
bank first policy
6. In any draining mode instructions are executed in a round robin fashion based on the bank id
a. First, look through all the request ( already arranged in the order of arrival ) in the respected
queues based on the draining mode
b. If the next request is from the same bank do not schedule this instruction for execution
rather schedule an instruction that is from a different bank
c. Selection of the banks which needs to be scheduled next is done in a round robin fashion
5 | P a g e
Algorithm:
INPUT: Random sequence of memory access requests from m cores
OUTPUT: Scheduled sequence of memory requests to the memory controller
BEGIN:
B = 0;
If write drains == true
Foreach request in write_queue
If B==current bank and request == issuable
Issue request
B=B+1
Else precharge
If write drains != true
Foreach request in read_queue
If B==current bank and request == issuable
Issue request
B+1
Else precharge
END
Row First Memory Scheduling:
Row-first policy gives the highest priority to the access to the same row of the same bank [10]. The row-
first policy essentially enhances the bank-first policy by grouping the accesses to the same bank and same
row together. This optimization is beneficial in reducing row misses. For the request sequence shown in
Table 1, the sequence of issued requests by the row-first policy will be A-B-J-C-D-G-I-F-H-E.
Scheduler Implementation:
This algorithm aims at optimizing the row hits and thus increasing the overall hit rate on the whole for all
the instructions. As row buffer hits have much shorter latency and consumes less power than the row
buffer misses, the scheduler tries to exploit row buffer hits as much as possible.
1. Scheduler gets a random sequence of memory request from the memory controller
2. Memory scheduler checks both read queue and write queue (for draining conditions)
3. Initiate write drain if either the write queue occupancy has reached the HI_WM or if there are no
pending read requests
4. If not in write drain mode initiate read drain mode
5. In the write and read draining modes, scheduling of write and read queue is done based on the
bank first policy
6. In any draining mode instructions are executed in a round robin fashion based on the bank id
a. First, look through all the request ( already arranged in the order of arrival ) in the respected
queues based on the draining mode
b. If the next request is from the same bank and from the same row schedule this instruction
for execution
c. If the next request is not from same bank and same row set a flag bit and get the new bank
and row
6 | P a g e
Algorithm:
INPUT: Random sequence of memory access requests from m cores
OUTPUT: Scheduled sequence of memory requests to the memory controller
BEGIN:
B = 0;
R = 0;
If write drains == true
Foreach request in write_queue
If flag == true
B=current bank
R= current row
Flag = false
Issue request
If B==current bank and request == issuable
If R==current row
Issue request
Flag = false
If write drains != true
Foreach request in read_queue
If flag == true
B=current bank
R= current row
Flag = false
Issue request
If B==current bank and request == issuable
If R==current row
Issue request
Flag = false
END
7 | P a g e
SIMULATOR DESCRIPTION:
This project operated on USIMM simulation infrastructure, a trace based full system simulator to build
and simulate the memory scheduler. Table gives the system configurations used in our evaluation.
Table 2: Memory System Configuration
DRAM Commands
The memory commands can be partitioned into two groups, commands that advance the execution of a
pending memory request (read or write), or commands that manage general DRAM state.
Advancing the execution of a memory request include four commands which are:
 PRE: Precharge the bitline of a bank so a new row can be read out.
 ACT: Bring the contents of a bank’s DRAM row into the bank’s row buffer.
 CLO-RD: Bring a cache line from the row buffer to the processor.
 CLO-WR: Write a cache line from the processor to the row buffer.
DRAM state management commands include five memory commands, as follows:
 PWR-DN-FAST: Puts a rank into the low-power-mode with quick exit times.
 PWR-DN-SLOW: Puts a rank into the precharge-powerdown (slow) mode with longer time to
transition into the activate state.
 PWR-UP: Brings a rank out of low-power mode.
 Refresh: Forces a refresh to multiple rows in all banks in a rank.
 PRE-ALL: Forces a precharge to all banks in a rank.
When the memory system is not busy, PWD-DN-FASTand PWR-DN-SLOW commands can put memory
ranks into low power- mode to save power. PWR-UP command is needed to bring a rank out of low-
power-mode. This project uses workloads from USIMM simulator with 1 channel and four channel
configuration. The memory workloads with corresponding traces are divided in to two sets: Set 1 and Set
2, comparison of evaluation metrics with FCFS is done on Set 1 and detailed results are described in result
section of this report
8 | P a g e
RESULTS:
Using USIMM traces we have simulated Bank First (BF) and Row First (RF) memory scheduling
algorithms and compared with baseline FCFS algorithms. Three metrics taken in the evaluation are: the
sum of threads execution time, the threads’ max slowdown and the energy-delay-product (EDP). We used
several mixed workloads for the evaluations: 1, 2, 4 thread(s) workload in 1channel model and 1, 2, 4, 8,
16 thread(s) workload in 4channel model and all the results are tabulated in Table
Table 3: Comparison of key metrics on baseline and implemented schedulers
Workload Config Sum of Exec times
(10 M cycle)
Max Slow Down EDP (J’s)
FCFS Bank
First
Row
First
FCFS Bank
First
Row
First
FCFS Bank
First
Row
First
MT-canneal
MT-canneal
bl-bl-fr-fr
bl-bl-fr-fr
c1-c1
c1-c1
c1-c1-c2-c2
c1-c1-c2-c2
c2
c2
fa-fa-fe-fe
fa-fa-fe-fe
fl-fl-sw-sw-
c2-c2-fe-fe
fl-fl-sw-sw-
c2-c2-fe-fe-
bl-bl-fr-fr-c1-
c1-st-st
fl-sw-c2-c2
fl-sw-c2-c2
st-st-st-st
st-st-st-st
1 Ch
4 Ch
1 Ch
4 Ch
1 Ch
4 Ch
1 Ch
4 Ch
1 Ch
4 Ch
1 Ch
4 Ch
4 Ch
4 Ch
1 Ch
4 Ch
1 Ch
4 Ch
418
179
149
80
83
51
242
127
44
30
228
106
295
651
249
130
162
86
404
167
147
75.9
82.3
46.7
235
117
43
27
224
99.5
279
620
243
121
160
81.5
403
167
147
75.7
82.6
46.4
235
117
43.1
27
224
99.2
279
620
243
120
158
80
NA
NA
1.20
1.11
1.12
1.05
1.48
1.18
NA
NA
1.52
1.22
1.40
1.90
1.48
1.13
1.28
1.14
NA
NA
1.18
1.05
1.10
0.95
1.45
1.10
NA
NA
1.47
1.14
1.31
1.80
1.43
1.05
1.25
1.08
NA
NA
1.18
1.05
1.10
0.94
1.45
1.10
NA
NA
1.47
1.14
1.31
1.80
1.42
1.05
1.24
1.08
4.23
1.78
0.50
0.36
0.41
0.44
1.52
1.00
0.38
0.50
1.19
0.64
2.14
5.31
1.52
0.99
0.58
0.39
3.98
1.56
0.48
0.32
0.40
0.36
1.43
0.85
0.36
0.39
1.15
0.56
1.89
4.78
1.44
0.83
0.56
0.35
3.97
1.56
0.48
0.32
0.40
0.36
1.44
0.84
0.36
0.39
1.14
0.55
1.88
4.76
1.43
0.82
0.56
0.34
Overall PFP 3312 3173 3167 1.90 1.80 1.80 23.88 21.69 21.60
9 | P a g e
Execution time:
Figure 2 to Figure 5 shows the total execution time of the implemented BF and RF algorithms. BF and
RF algorithm out performs the FCFS from 0.843% to 8.431 % and 0.4812% to 9.019% respectively as
evident from the Table. The overall execution time is reduced by 4.14% for BF and 4.32% for RF
scheduler. For the multi core cases, total execution time is calculated by adding each execution time of all
cores.
Figure 2: Total Execution Time-RF-Set1 Figure 3: Total Execution Time-RF-Set2
Figure 4: Total Execution Time-BF-Set1 Figure 5: Total Execution Time-BF-Set2
0
1E+09
2E+09
3E+09
4E+09
5E+09
6E+09
7E+09
Total Execution Time-RF-Set1
0
1E+09
2E+09
3E+09
4E+09
5E+09
6E+09
MTf-1
MTf-4
c3-c3-c3-…
c4-c4-c5-…
c4-c4-c5-…
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-…
li-li-mu-…
li-li-mu-…
ti-ti-1
ti-ti-4
Total Execution Time-RF-
Set2
0
1E+09
2E+09
3E+09
4E+09
5E+09
6E+09
7E+09
Total Execution time-BF Set1
0
1E+09
2E+09
3E+09
4E+09
5E+09
6E+09
MTf-1
MTf-4
c3-c3-c3-c3-…
c4-c4-c5-c5-1
c4-c4-c5-c5-4
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-…
li-li-mu-mu-1
li-li-mu-mu-4
ti-ti-1
ti-ti-4
Total Execution time-BF Set2
10 | P a g e
Energy delay product:
Figure 6 to Figure 7 shows the energy delay product of BF and RF algorithms respectively. BF scheduler
improves the EDP by 3.613% to 18.1815 % compared to FCFS scheduler on the other hand RF scheduler
improves the EDP by 4.201% to 18.1812% compared to FCFS scheduler. The overall execution time is
reduced by 9.17% for BF and 9.54% for RF scheduler
Figure 6: EDP-RF Set1 Figure 7: EDP-RF Set2
Figure 8: EDP-BF Set1 Figure 9: EDP-BF Set2
0
1
2
3
4
5
6
EDP-RF Set1
0
1
2
3
4
5
6
7
8
MTf-1
MTf-4
c3-c3-c3-c3-…
c4-c4-c5-c5-1
c4-c4-c5-c5-4
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-mu-…
li-li-mu-mu-1
li-li-mu-mu-4
ti-ti-1
ti-ti-4
EDP-RF Set2
0
1
2
3
4
5
6
EDP-BF Set1
0
1
2
3
4
5
6
7
8
MTf-1
MTf-4
c3-c3-c3-c3-c3-…
c4-c4-c5-c5-1
c4-c4-c5-c5-4
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-mu-…
li-li-mu-mu-1
li-li-mu-mu-4
ti-ti-1
ti-ti-4
EDP-BF Set2
11 | P a g e
Maximum Slowdown:
Figure and Figure shows the maximum slowdown metric of BF and RF algorithms respectively. An
improvement of around 1.667% to 9.523% is seen BF algorithm compared to FCFS scheduler and an
improvement of around 1.667% to 10.47619% is seen RF algorithm compared to FCFS scheduler.
Figure 10: Max Slowdown-RF-Set1 Figure 11: Max Slowdown-RF-Set2
Figure 12: Max Slowdown-BF-Set1 Figure 13: Max Slowdown-BF-Set2
0
0.5
1
1.5
2
Max Slowdown-RF-Set1
0
0.5
1
1.5
2
2.5
MTf-1
MTf-4
c3-c3-c3-c3-…
c4-c4-c5-c5-1
c4-c4-c5-c5-4
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-…
li-li-mu-mu-1
li-li-mu-mu-4
ti-ti-1
ti-ti-4
Max Slowdown-RF-Set2
0
0.5
1
1.5
2
Max Slow down-BF Set1
0
0.5
1
1.5
2
2.5
MTf-1
MTf-4
c3-c3-c3-c3-…
c4-c4-c5-c5-1
c4-c4-c5-c5-4
le-le-le-le-1
le-le-le-le-4
li-li-1
li-li-4
li-li-li-mu-mu-…
li-li-mu-mu-1
li-li-mu-mu-4
ti-ti-1
ti-ti-4
Max Slow down-BF Set2
12 | P a g e
CONCLUSIONS:
We have performed comprehensive study to analyze existing scheduling policies and experimental
results confirmed that memory scheduling policies have great influence on memory waiting latency. We
have considered results from the 3rd JILP Workshop on Computer Architecture Competitions (JWAC-3)
to compare with the schemes we have implemented.
Our results proved better performance than FCFS and are on par with some of the schemes proposed in
the competition. The Total EDP is obtained as 49.4782 and 49.6698 for Row First and Bank First
schemes respectively which proved to be better than the Stride- and Global History-based DRAM Page
Management scheme. We have the execution time, PFP and max slowdown of both the schemes to be
on par with the Stride- and Global History-based DRAM Page Management scheme. A better
performance can be obtained to these implemented schemes by introducing a core aware scheme along
with these basic schemes implemented.
REFERENCES:
[1] Thread-Fair Memory Request Reordering, Kun Fang, Nick Iliev, Ehsan Noohi, Suyu Zhang, and
Zhichun Zhu (University of Illinois at Chicago)
[2] The Compact Memory Scheduling Maximizing Row Buffer Locality, Young-Suk Moon, Yongkee
Kwon, Hong-Sik Kim, Dong-gun Kim, Hyungdong Hayden Lee, and Kunwoo park (SK Hynix)
[3] High Performance Memory Access Scheduling using Compute-Phase Prediction and Write back-
Refresh Overlap , Yasuo Ishii (The University of Tokyo, NEC Corporation) and Kouhei Hosokawa,
Mary Inaba, and Kei Hiraki (The University of Tokyo)
[4] Pre-Read and Write-Leak Memory Scheduling Algorithm, Long Chen, Yanan Cao, Sarah Kabala,
and Parijat Shukla (Iowa State University)
[5] Request Density Aware Fair Memory Scheduling, Takakazu Ikeda (Tokyo Institute of Technology),
Shinya Takamaeda-Yamazaki (Tokyo Institute of Technology / JSPS Research Fellow), and Naoki
Fujieda, Shimpei Sato, and Kenji Kise (Tokyo Institute of Technology)
[6] Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor
Systems, Chongmin Li, Dongsheng Wang, Haixia Wang, and Yibo Xue (Department of Computer
Science & Technology, Tsinghua University)
[7] Service Value Aware Memory Scheduler by Estimating Request Weight and Using per-Thread
Traffic Lights, Keisuke Kuroyanagi (INRIA/IRISA, The University of Tokyo) and Andre Seznec
(INRIA/IRISA)
[8] Stride- and Global History-based DRAM Page Management , Mushfique Junayed Khurshid, Mohit
Chainani, Alekhya Perugupalli, and Rahul Srikumar (University of Wisconsin-Madison)
13 | P a g e
[9] Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens, “Memory
Access Scheduling”, Proceedings of the 27th International Symposium on Computer Architecture, 2000
[10] Jun Shao and Brian T. Davis, “A Burst Scheduling Access Reordering Mechanism”, Proceedings of
the 13th International Symposium on High-Performance Computer Architecture, 2007

More Related Content

Viewers also liked

SAFE-Q: Safeguarding Food and Environment in Qatar
SAFE-Q: Safeguarding Food and Environment in QatarSAFE-Q: Safeguarding Food and Environment in Qatar
SAFE-Q: Safeguarding Food and Environment in QatarEmel Aktas
 
Bill Rosenthal CV
Bill Rosenthal CVBill Rosenthal CV
Bill Rosenthal CVBill Rosenthal
 
Journal poupée turquie
Journal poupée turquieJournal poupée turquie
Journal poupée turquiemassillonprimaire
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Lakshmi Yasaswi Kamireddy
 
Trade Shows and Events Portfolio
Trade Shows and Events PortfolioTrade Shows and Events Portfolio
Trade Shows and Events PortfolioStephen Shiner
 
Herramientas WEB 2.0
Herramientas WEB 2.0Herramientas WEB 2.0
Herramientas WEB 2.0Jesus Lopez
 
Raj Kunwar Singh_PM_Testing
Raj Kunwar Singh_PM_TestingRaj Kunwar Singh_PM_Testing
Raj Kunwar Singh_PM_TestingRajKunwar Singh
 
Nós do Açai & Cia...
Nós do Açai & Cia...Nós do Açai & Cia...
Nós do Açai & Cia...Bethania França
 
Studio Vertex Co Profile sm
Studio Vertex Co Profile smStudio Vertex Co Profile sm
Studio Vertex Co Profile smPeter Kisilu
 
Direct Marketing | Maximize response rate, engagement and ROI.
Direct Marketing | Maximize response rate, engagement and ROI. Direct Marketing | Maximize response rate, engagement and ROI.
Direct Marketing | Maximize response rate, engagement and ROI. John Fischbeck
 
NIDITE-Acuerdos-Acta-29-[04-junio-2015]
NIDITE-Acuerdos-Acta-29-[04-junio-2015]NIDITE-Acuerdos-Acta-29-[04-junio-2015]
NIDITE-Acuerdos-Acta-29-[04-junio-2015]NIDITE UPEL-IPRGR
 
French mathematical activities from Poland
French mathematical activities from PolandFrench mathematical activities from Poland
French mathematical activities from Polandmassillonprimaire
 

Viewers also liked (13)

SAFE-Q: Safeguarding Food and Environment in Qatar
SAFE-Q: Safeguarding Food and Environment in QatarSAFE-Q: Safeguarding Food and Environment in Qatar
SAFE-Q: Safeguarding Food and Environment in Qatar
 
Bill Rosenthal CV
Bill Rosenthal CVBill Rosenthal CV
Bill Rosenthal CV
 
Journal poupée turquie
Journal poupée turquieJournal poupée turquie
Journal poupée turquie
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)
 
Trade Shows and Events Portfolio
Trade Shows and Events PortfolioTrade Shows and Events Portfolio
Trade Shows and Events Portfolio
 
Herramientas WEB 2.0
Herramientas WEB 2.0Herramientas WEB 2.0
Herramientas WEB 2.0
 
Raj Kunwar Singh_PM_Testing
Raj Kunwar Singh_PM_TestingRaj Kunwar Singh_PM_Testing
Raj Kunwar Singh_PM_Testing
 
Nucleusppt
NucleuspptNucleusppt
Nucleusppt
 
Nós do Açai & Cia...
Nós do Açai & Cia...Nós do Açai & Cia...
Nós do Açai & Cia...
 
Studio Vertex Co Profile sm
Studio Vertex Co Profile smStudio Vertex Co Profile sm
Studio Vertex Co Profile sm
 
Direct Marketing | Maximize response rate, engagement and ROI.
Direct Marketing | Maximize response rate, engagement and ROI. Direct Marketing | Maximize response rate, engagement and ROI.
Direct Marketing | Maximize response rate, engagement and ROI.
 
NIDITE-Acuerdos-Acta-29-[04-junio-2015]
NIDITE-Acuerdos-Acta-29-[04-junio-2015]NIDITE-Acuerdos-Acta-29-[04-junio-2015]
NIDITE-Acuerdos-Acta-29-[04-junio-2015]
 
French mathematical activities from Poland
French mathematical activities from PolandFrench mathematical activities from Poland
French mathematical activities from Poland
 

Similar to Memory Access Scheduling

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Improving DRAM performance
Improving DRAM performanceImproving DRAM performance
Improving DRAM performancePrithvi Kambhampati
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Ishan Thakkar
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDITtjunicornfx
 
Adaptive bank management[1]
Adaptive bank management[1]Adaptive bank management[1]
Adaptive bank management[1]Durga Prasada Rao M
 
unit4 and unit5.pptx
unit4 and unit5.pptxunit4 and unit5.pptx
unit4 and unit5.pptxbobbyk11
 
Parallelism aware batch scheduling
Parallelism aware batch schedulingParallelism aware batch scheduling
Parallelism aware batch schedulingaifayed
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memoryDeepak John
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2ekul
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2Kyle
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2Naruin
 
CA UNIT V..pptx
CA UNIT V..pptxCA UNIT V..pptx
CA UNIT V..pptxssuser9dbd7e
 
COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5Dr.MAYA NAYAK
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data RepresentationNaruin
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET Journal
 
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemModeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemIRJET Journal
 
Presentation2 (1).pp text book for students
Presentation2 (1).pp text book for studentsPresentation2 (1).pp text book for students
Presentation2 (1).pp text book for studentsaddokenneth58
 

Similar to Memory Access Scheduling (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Improving DRAM performance
Improving DRAM performanceImproving DRAM performance
Improving DRAM performance
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
 
Adaptive bank management[1]
Adaptive bank management[1]Adaptive bank management[1]
Adaptive bank management[1]
 
unit4 and unit5.pptx
unit4 and unit5.pptxunit4 and unit5.pptx
unit4 and unit5.pptx
 
Parallelism aware batch scheduling
Parallelism aware batch schedulingParallelism aware batch scheduling
Parallelism aware batch scheduling
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
Sram pdf
Sram pdfSram pdf
Sram pdf
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
COA (Unit_4.pptx)
COA (Unit_4.pptx)COA (Unit_4.pptx)
COA (Unit_4.pptx)
 
CA UNIT V..pptx
CA UNIT V..pptxCA UNIT V..pptx
CA UNIT V..pptx
 
COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data Representation
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
 
Cache memory
Cache memoryCache memory
Cache memory
 
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory SubsystemModeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
Modeling of DDR4 Memory and Advanced Verifications of DDR4 Memory Subsystem
 
Presentation2 (1).pp text book for students
Presentation2 (1).pp text book for studentsPresentation2 (1).pp text book for students
Presentation2 (1).pp text book for students
 

More from Lakshmi Yasaswi Kamireddy (11)

ECE 467 Final Project
ECE 467 Final Project ECE 467 Final Project
ECE 467 Final Project
 
ECE 368 Lab Project 1
ECE 368 Lab Project 1ECE 368 Lab Project 1
ECE 368 Lab Project 1
 
ECE 468 Lab Project 1
ECE 468 Lab Project 1ECE 468 Lab Project 1
ECE 468 Lab Project 1
 
ECE 468 Lab Project 2
ECE 468 Lab Project 2ECE 468 Lab Project 2
ECE 468 Lab Project 2
 
ECE 467 Mini project 2
ECE 467 Mini project 2ECE 467 Mini project 2
ECE 467 Mini project 2
 
ECE 467 Mini project 1
ECE 467 Mini project 1ECE 467 Mini project 1
ECE 467 Mini project 1
 
ECE 565 presentation
ECE 565 presentationECE 565 presentation
ECE 565 presentation
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 
Survey on Prefix adders
Survey on Prefix addersSurvey on Prefix adders
Survey on Prefix adders
 
ECE469 Project1
ECE469 Project1ECE469 Project1
ECE469 Project1
 
ECE469 proj2_Lakshmi Yasaswi Kamireddy
ECE469 proj2_Lakshmi Yasaswi KamireddyECE469 proj2_Lakshmi Yasaswi Kamireddy
ECE469 proj2_Lakshmi Yasaswi Kamireddy
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Memory Access Scheduling

  • 2. 1 | P a g e List of Tables Page Number Table 1: Sequence of Issued requests 4 Table 2: Memory System Configuration 7 Table 3: Comparison of key metrics on baseline and implemented schedulers 8 List of Figures Page Number Figure 1: Internal organization of modern DRAMs. 2 Figure 2: Total Execution Time-RF-Set1 9 Figure 3: Total Execution Time-RF-Set2 9 Figure 4: Total Execution Time-BF-Set1 9 Figure 5: Total Execution Time-BF-Set2 9 Figure 6: EDP-RF Set1 10 Figure 7: EDP-RF Set2 10 Figure 8: EDP-BF Set1 10 Figure 9: EDP-BF Set2 10 Figure 11: Max Slowdown-RF-Set2 11 Figure 10: Max Slowdown-RF-Set1 11 Figure 12: Max Slowdown-BF-Set1 11 Figure 13: Max Slowdown-BF-Set2 11 SLNO TITLE PAGE NUMBER 1 Introduction 2 DRAM Configuration 2 Memory Access Scheduling 3 2 Implementation 4 Bank First Memory Scheduling 4 Scheduler Implementation 4 Algorithm 5 Row First Memory Scheduling 5 Scheduler Implementation 5 Algorithm 6 3 Simulator Description 7 DRAM Commands 7 4 Results 8 Execution time 9 Energy delay product 10 Maximum Slowdown 11 5 Conclusions 12 6 References 13
  • 3. 2 | P a g e INTRODUCTION DRAM Configuration: DRAM (Dynamic Random-Access Memory), the most commonly used technology for building main memory for modern computer system, has been a major performance bottleneck for decades. Throughout many generations of DRAM, from DDR1 to DDR3, internal memory architecture and performance- related characteristics of DRAM has experienced little change and most of modern DRAM systems make use of dual in-line memory modules (DIMM). A basic DRAM consists of one or more DRAM channels, each channel has one or more memory modules. A modern DDR3 channel typically can support 1-2 DIMMs; each DIMM is typically consists of 1-4 ranks; each rank can be partitioned into multiple (4-16) banks. Each bank operates independent of the other banks and contains an array of memory cells that are accessed as an entire row at a time. When a row of this memory array is accessed (row activation) the entire row of the memory array is transferred into the bank's row buffer. The row buffer serves as a cache to reduce the latency of subsequent accesses to that row. While a row is active in the row buffer, any number of reads or writes (column accesses) may be performed, typically with a throughput of one per cycle. After completing the available column accesses, the cached row must be written back to the memory array by an explicit operation (bank pre-charge) which prepares the bank for a subsequent row activation. Figure 1: Internal organization of modern DRAMs. A bank cannot be accessed during the pre-charge activate latency, as a single cycle is required on the data pins when switching between read and write column accesses, and a single set of address lines is shared by all DRAM operations (bank pre-charge, row activation, and column access). The amount of bank parallelism that is exploited and the number of column accesses that are made per row access dictate the sustainable memory bandwidth out of such a DRAM.A memory access scheduler must generate a schedule that conforms to the timing and resource constraints of these modern DRAMs. Each DRAM operation makes different demands on the three DRAM resources: the internal banks, a single set of address lines, and a single set of data lines. The scheduler must ensure that the required resources are available for each DRAM operation it schedules. Each DRAM bank has two stable states: IDLE and ACTIVE. In the IDLE state, the DRAM is pre-charged and ready for a row access. It will remain in this state until a row activate operation is issued to the bank. To issue a row activation, the address lines must be used to select the bank and the row being activated. Row activation requires say 3 cycles, during which no other operations may be issued to that bank, during
  • 4. 3 | P a g e that time, however, operations may be issued to other banks of the DRAM. Once the DRAM’s row activation latency has passed, the bank enters the ACTIVE state, during which the contents of the selected row are held in the bank’s row buffer. Any number of pipelined column accesses may be performed while the bank is in the ACTIVE state. To issue either a read or write column access, the address lines are required to indicate the bank and the column of the active row in that bank. A write column access requires the data to be transferred to the DRAM at the time of issue, whereas a read column access returns the requested data three cycles later. The bank will remain in the ACTIVE state until a pre-charge operation is issued to return it to the IDLE state. The pre-charge operation requires the use of the address lines to indicate the bank which is to be pre-charged. Like row activation, the pre-charge operation utilizes the bank resource for 3 cycles, during which no new operations may be issued to that bank. Again, operations may be issued to other banks during this time. After the DRAM’s pre-charge latency, the bank is returned to the IDLE state and is ready for a new row activation operation. DRAMs typically also support column accesses with automatic pre- charge, which implicitly pre-charges the DRAM bank as soon as possible after the column access. The shared address and data resources serialize access to the different DRAM banks. While the state machines for the individual banks are independent, only a single bank can perform a transition requiring a particular shared resource each cycle. For many DRAMs, the bank, row, and column addresses share a single set of lines. Hence, the scheduler must arbitrate between pre-charge, row, and column operations that all need to use this single resource. Some DRAMs, provide separate row and column address lines (each with their own associated bank address) so that column and row accesses can be initiated simultaneously. To approach the peak data rate with serialized resources, there must be enough column accesses to each row to hide the pre- charge/activate latencies of other banks. Whether or not this can be achieved is dependent on the data reference patterns and the order in which the DRAM is accessed to satisfy those references. The need to hide the pre-charge/activate latency of the banks in order to sustain high bandwidth cannot be eliminated by any DRAM architecture without reducing the pre-charge/activate latency, which would likely come at the cost of decreased bandwidth or capacity, both of which are undesirable. So several scheduling schemes are proposed to achieve the reduction in latency. Memory Access Scheduling: In a memory controller, the execution of a memory access instruction must adhere to the rules and timing constraints of the hardware to access data in a modern DRAM. As shown in Figure 1, modern DRAMs are three-dimensional memory devices with dimensions of bank, row and column. Thus, a location in the DRAM is identified by an address that consists of bank, row and column fields. The steps of accessing a location include a pre-charge, a row access, and then a column access. Due to the DRAM structure and its hardware implementation, sequential accesses to different rows within one bank have high latency, whereas accesses to different banks or different words within a single row have low latency [9].Memory access scheduling can effectively reduce the average memory access latency and improve memory bandwidth utilization by reducing cross-row data access. For example, prioritizing memory requests to the same bank and the same row can improve performance.
  • 5. 4 | P a g e IMPLEMENTATION: In this section we describe the implemented two memory scheduling algorithms that are mainly focused to exploit the locality property of DRAM memory systems: Bank first Memory scheduling and Row first memory scheduling Bank First Memory Scheduling: Bank-first policy [9] arranges all memory requests by banks, and schedules them in a round-robin manner according to the bank identifier. This policy is beneficial because the requests to different banks can be carried out simultaneously. For the request sequence shown in Table 1, the sequence of issued requests by the bank-first policy will be A-C-D-F-E-B-G-H-J-I. Table 1: Sequence of Issued requests Scheduler Implementation: 1. Scheduler gets a random sequence of memory request from the memory controller 2. Memory scheduler checks both read queue and write queue (for draining conditions) 3. Initiate write drain if either the write queue occupancy has reached the HI_WM or if there are no pending read requests 4. If not in write drain mode initiate read drain mode 5. In the write and read draining modes, scheduling of write and read queue is done based on the bank first policy 6. In any draining mode instructions are executed in a round robin fashion based on the bank id a. First, look through all the request ( already arranged in the order of arrival ) in the respected queues based on the draining mode b. If the next request is from the same bank do not schedule this instruction for execution rather schedule an instruction that is from a different bank c. Selection of the banks which needs to be scheduled next is done in a round robin fashion
  • 6. 5 | P a g e Algorithm: INPUT: Random sequence of memory access requests from m cores OUTPUT: Scheduled sequence of memory requests to the memory controller BEGIN: B = 0; If write drains == true Foreach request in write_queue If B==current bank and request == issuable Issue request B=B+1 Else precharge If write drains != true Foreach request in read_queue If B==current bank and request == issuable Issue request B+1 Else precharge END Row First Memory Scheduling: Row-first policy gives the highest priority to the access to the same row of the same bank [10]. The row- first policy essentially enhances the bank-first policy by grouping the accesses to the same bank and same row together. This optimization is beneficial in reducing row misses. For the request sequence shown in Table 1, the sequence of issued requests by the row-first policy will be A-B-J-C-D-G-I-F-H-E. Scheduler Implementation: This algorithm aims at optimizing the row hits and thus increasing the overall hit rate on the whole for all the instructions. As row buffer hits have much shorter latency and consumes less power than the row buffer misses, the scheduler tries to exploit row buffer hits as much as possible. 1. Scheduler gets a random sequence of memory request from the memory controller 2. Memory scheduler checks both read queue and write queue (for draining conditions) 3. Initiate write drain if either the write queue occupancy has reached the HI_WM or if there are no pending read requests 4. If not in write drain mode initiate read drain mode 5. In the write and read draining modes, scheduling of write and read queue is done based on the bank first policy 6. In any draining mode instructions are executed in a round robin fashion based on the bank id a. First, look through all the request ( already arranged in the order of arrival ) in the respected queues based on the draining mode b. If the next request is from the same bank and from the same row schedule this instruction for execution c. If the next request is not from same bank and same row set a flag bit and get the new bank and row
  • 7. 6 | P a g e Algorithm: INPUT: Random sequence of memory access requests from m cores OUTPUT: Scheduled sequence of memory requests to the memory controller BEGIN: B = 0; R = 0; If write drains == true Foreach request in write_queue If flag == true B=current bank R= current row Flag = false Issue request If B==current bank and request == issuable If R==current row Issue request Flag = false If write drains != true Foreach request in read_queue If flag == true B=current bank R= current row Flag = false Issue request If B==current bank and request == issuable If R==current row Issue request Flag = false END
  • 8. 7 | P a g e SIMULATOR DESCRIPTION: This project operated on USIMM simulation infrastructure, a trace based full system simulator to build and simulate the memory scheduler. Table gives the system configurations used in our evaluation. Table 2: Memory System Configuration DRAM Commands The memory commands can be partitioned into two groups, commands that advance the execution of a pending memory request (read or write), or commands that manage general DRAM state. Advancing the execution of a memory request include four commands which are:  PRE: Precharge the bitline of a bank so a new row can be read out.  ACT: Bring the contents of a bank’s DRAM row into the bank’s row buffer.  CLO-RD: Bring a cache line from the row buffer to the processor.  CLO-WR: Write a cache line from the processor to the row buffer. DRAM state management commands include five memory commands, as follows:  PWR-DN-FAST: Puts a rank into the low-power-mode with quick exit times.  PWR-DN-SLOW: Puts a rank into the precharge-powerdown (slow) mode with longer time to transition into the activate state.  PWR-UP: Brings a rank out of low-power mode.  Refresh: Forces a refresh to multiple rows in all banks in a rank.  PRE-ALL: Forces a precharge to all banks in a rank. When the memory system is not busy, PWD-DN-FASTand PWR-DN-SLOW commands can put memory ranks into low power- mode to save power. PWR-UP command is needed to bring a rank out of low- power-mode. This project uses workloads from USIMM simulator with 1 channel and four channel configuration. The memory workloads with corresponding traces are divided in to two sets: Set 1 and Set 2, comparison of evaluation metrics with FCFS is done on Set 1 and detailed results are described in result section of this report
  • 9. 8 | P a g e RESULTS: Using USIMM traces we have simulated Bank First (BF) and Row First (RF) memory scheduling algorithms and compared with baseline FCFS algorithms. Three metrics taken in the evaluation are: the sum of threads execution time, the threads’ max slowdown and the energy-delay-product (EDP). We used several mixed workloads for the evaluations: 1, 2, 4 thread(s) workload in 1channel model and 1, 2, 4, 8, 16 thread(s) workload in 4channel model and all the results are tabulated in Table Table 3: Comparison of key metrics on baseline and implemented schedulers Workload Config Sum of Exec times (10 M cycle) Max Slow Down EDP (J’s) FCFS Bank First Row First FCFS Bank First Row First FCFS Bank First Row First MT-canneal MT-canneal bl-bl-fr-fr bl-bl-fr-fr c1-c1 c1-c1 c1-c1-c2-c2 c1-c1-c2-c2 c2 c2 fa-fa-fe-fe fa-fa-fe-fe fl-fl-sw-sw- c2-c2-fe-fe fl-fl-sw-sw- c2-c2-fe-fe- bl-bl-fr-fr-c1- c1-st-st fl-sw-c2-c2 fl-sw-c2-c2 st-st-st-st st-st-st-st 1 Ch 4 Ch 1 Ch 4 Ch 1 Ch 4 Ch 1 Ch 4 Ch 1 Ch 4 Ch 1 Ch 4 Ch 4 Ch 4 Ch 1 Ch 4 Ch 1 Ch 4 Ch 418 179 149 80 83 51 242 127 44 30 228 106 295 651 249 130 162 86 404 167 147 75.9 82.3 46.7 235 117 43 27 224 99.5 279 620 243 121 160 81.5 403 167 147 75.7 82.6 46.4 235 117 43.1 27 224 99.2 279 620 243 120 158 80 NA NA 1.20 1.11 1.12 1.05 1.48 1.18 NA NA 1.52 1.22 1.40 1.90 1.48 1.13 1.28 1.14 NA NA 1.18 1.05 1.10 0.95 1.45 1.10 NA NA 1.47 1.14 1.31 1.80 1.43 1.05 1.25 1.08 NA NA 1.18 1.05 1.10 0.94 1.45 1.10 NA NA 1.47 1.14 1.31 1.80 1.42 1.05 1.24 1.08 4.23 1.78 0.50 0.36 0.41 0.44 1.52 1.00 0.38 0.50 1.19 0.64 2.14 5.31 1.52 0.99 0.58 0.39 3.98 1.56 0.48 0.32 0.40 0.36 1.43 0.85 0.36 0.39 1.15 0.56 1.89 4.78 1.44 0.83 0.56 0.35 3.97 1.56 0.48 0.32 0.40 0.36 1.44 0.84 0.36 0.39 1.14 0.55 1.88 4.76 1.43 0.82 0.56 0.34 Overall PFP 3312 3173 3167 1.90 1.80 1.80 23.88 21.69 21.60
  • 10. 9 | P a g e Execution time: Figure 2 to Figure 5 shows the total execution time of the implemented BF and RF algorithms. BF and RF algorithm out performs the FCFS from 0.843% to 8.431 % and 0.4812% to 9.019% respectively as evident from the Table. The overall execution time is reduced by 4.14% for BF and 4.32% for RF scheduler. For the multi core cases, total execution time is calculated by adding each execution time of all cores. Figure 2: Total Execution Time-RF-Set1 Figure 3: Total Execution Time-RF-Set2 Figure 4: Total Execution Time-BF-Set1 Figure 5: Total Execution Time-BF-Set2 0 1E+09 2E+09 3E+09 4E+09 5E+09 6E+09 7E+09 Total Execution Time-RF-Set1 0 1E+09 2E+09 3E+09 4E+09 5E+09 6E+09 MTf-1 MTf-4 c3-c3-c3-… c4-c4-c5-… c4-c4-c5-… le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-… li-li-mu-… li-li-mu-… ti-ti-1 ti-ti-4 Total Execution Time-RF- Set2 0 1E+09 2E+09 3E+09 4E+09 5E+09 6E+09 7E+09 Total Execution time-BF Set1 0 1E+09 2E+09 3E+09 4E+09 5E+09 6E+09 MTf-1 MTf-4 c3-c3-c3-c3-… c4-c4-c5-c5-1 c4-c4-c5-c5-4 le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-… li-li-mu-mu-1 li-li-mu-mu-4 ti-ti-1 ti-ti-4 Total Execution time-BF Set2
  • 11. 10 | P a g e Energy delay product: Figure 6 to Figure 7 shows the energy delay product of BF and RF algorithms respectively. BF scheduler improves the EDP by 3.613% to 18.1815 % compared to FCFS scheduler on the other hand RF scheduler improves the EDP by 4.201% to 18.1812% compared to FCFS scheduler. The overall execution time is reduced by 9.17% for BF and 9.54% for RF scheduler Figure 6: EDP-RF Set1 Figure 7: EDP-RF Set2 Figure 8: EDP-BF Set1 Figure 9: EDP-BF Set2 0 1 2 3 4 5 6 EDP-RF Set1 0 1 2 3 4 5 6 7 8 MTf-1 MTf-4 c3-c3-c3-c3-… c4-c4-c5-c5-1 c4-c4-c5-c5-4 le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-mu-… li-li-mu-mu-1 li-li-mu-mu-4 ti-ti-1 ti-ti-4 EDP-RF Set2 0 1 2 3 4 5 6 EDP-BF Set1 0 1 2 3 4 5 6 7 8 MTf-1 MTf-4 c3-c3-c3-c3-c3-… c4-c4-c5-c5-1 c4-c4-c5-c5-4 le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-mu-… li-li-mu-mu-1 li-li-mu-mu-4 ti-ti-1 ti-ti-4 EDP-BF Set2
  • 12. 11 | P a g e Maximum Slowdown: Figure and Figure shows the maximum slowdown metric of BF and RF algorithms respectively. An improvement of around 1.667% to 9.523% is seen BF algorithm compared to FCFS scheduler and an improvement of around 1.667% to 10.47619% is seen RF algorithm compared to FCFS scheduler. Figure 10: Max Slowdown-RF-Set1 Figure 11: Max Slowdown-RF-Set2 Figure 12: Max Slowdown-BF-Set1 Figure 13: Max Slowdown-BF-Set2 0 0.5 1 1.5 2 Max Slowdown-RF-Set1 0 0.5 1 1.5 2 2.5 MTf-1 MTf-4 c3-c3-c3-c3-… c4-c4-c5-c5-1 c4-c4-c5-c5-4 le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-… li-li-mu-mu-1 li-li-mu-mu-4 ti-ti-1 ti-ti-4 Max Slowdown-RF-Set2 0 0.5 1 1.5 2 Max Slow down-BF Set1 0 0.5 1 1.5 2 2.5 MTf-1 MTf-4 c3-c3-c3-c3-… c4-c4-c5-c5-1 c4-c4-c5-c5-4 le-le-le-le-1 le-le-le-le-4 li-li-1 li-li-4 li-li-li-mu-mu-… li-li-mu-mu-1 li-li-mu-mu-4 ti-ti-1 ti-ti-4 Max Slow down-BF Set2
  • 13. 12 | P a g e CONCLUSIONS: We have performed comprehensive study to analyze existing scheduling policies and experimental results confirmed that memory scheduling policies have great influence on memory waiting latency. We have considered results from the 3rd JILP Workshop on Computer Architecture Competitions (JWAC-3) to compare with the schemes we have implemented. Our results proved better performance than FCFS and are on par with some of the schemes proposed in the competition. The Total EDP is obtained as 49.4782 and 49.6698 for Row First and Bank First schemes respectively which proved to be better than the Stride- and Global History-based DRAM Page Management scheme. We have the execution time, PFP and max slowdown of both the schemes to be on par with the Stride- and Global History-based DRAM Page Management scheme. A better performance can be obtained to these implemented schemes by introducing a core aware scheme along with these basic schemes implemented. REFERENCES: [1] Thread-Fair Memory Request Reordering, Kun Fang, Nick Iliev, Ehsan Noohi, Suyu Zhang, and Zhichun Zhu (University of Illinois at Chicago) [2] The Compact Memory Scheduling Maximizing Row Buffer Locality, Young-Suk Moon, Yongkee Kwon, Hong-Sik Kim, Dong-gun Kim, Hyungdong Hayden Lee, and Kunwoo park (SK Hynix) [3] High Performance Memory Access Scheduling using Compute-Phase Prediction and Write back- Refresh Overlap , Yasuo Ishii (The University of Tokyo, NEC Corporation) and Kouhei Hosokawa, Mary Inaba, and Kei Hiraki (The University of Tokyo) [4] Pre-Read and Write-Leak Memory Scheduling Algorithm, Long Chen, Yanan Cao, Sarah Kabala, and Parijat Shukla (Iowa State University) [5] Request Density Aware Fair Memory Scheduling, Takakazu Ikeda (Tokyo Institute of Technology), Shinya Takamaeda-Yamazaki (Tokyo Institute of Technology / JSPS Research Fellow), and Naoki Fujieda, Shimpei Sato, and Kenji Kise (Tokyo Institute of Technology) [6] Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems, Chongmin Li, Dongsheng Wang, Haixia Wang, and Yibo Xue (Department of Computer Science & Technology, Tsinghua University) [7] Service Value Aware Memory Scheduler by Estimating Request Weight and Using per-Thread Traffic Lights, Keisuke Kuroyanagi (INRIA/IRISA, The University of Tokyo) and Andre Seznec (INRIA/IRISA) [8] Stride- and Global History-based DRAM Page Management , Mushfique Junayed Khurshid, Mohit Chainani, Alekhya Perugupalli, and Rahul Srikumar (University of Wisconsin-Madison)
  • 14. 13 | P a g e [9] Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens, “Memory Access Scheduling”, Proceedings of the 27th International Symposium on Computer Architecture, 2000 [10] Jun Shao and Brian T. Davis, “A Burst Scheduling Access Reordering Mechanism”, Proceedings of the 13th International Symposium on High-Performance Computer Architecture, 2007