Improving DRAM performance

Improving DRAM performance
Prithvi Kambhampati
Master of Science, Electrical and Computer Engineering
Michigan Technological University
Houghton, Michigan
pkambham@mtu.edu
Abstract—In order to reduce the growing gap
between the clock speed of the processors and that
of memory, more research is being done to
improve the performance of memory than ever.
Dynamic Random Access Memory (DRAM) is
being used in the cache to make the memory
accesses faster by reducing miss rate and latency.
This makes the DRAM performance improvement
an important aspect in today’s computation.
DRAM cells are refreshed at the rank-level,
periodically, in order to keep the data loss to a
minimum, prevent a complete rank from accepting
memory requests. This is one of the major
challenges the DRAM technology is facing. The
improvement to the DRAM can be made at four
different levels, namely, chip level, bank level,
subarray level, and row level. One of the methods
to do so is by reorganizing the structure of the
banks and the row buffer to improve the hit rates
of DRAM. Another method is to use light to
transmit data between the processor and the
memory system to reduce power consumption and
increase bandwidth. We also look into different set
mapping policies with which data is accessed from
the DRAM rows and discuss about the best
solution to improve the hit rate and reduce latency.
This paper shows that the methods implemented to
improve the performance of DRAM are
significantly affective. In addition, we also discuss
about the errors that occur in DRAMs and
describe the error-resilient schemes such as single
subarray memory systems with chipkills that can
overcome bit failures.
Index terms—Dynamic Random access memory,
chip level, bank level, subarray level, row level.
I. INTRODUCTION
In the past, the clock rates of microprocessors have
increased exponentially due to process improvements,
longer pipelines, and circuit design techniques. But
the main memory speed did not grow as fast as the
processors. Along with this, the number of cores on a
single chip has been increasing and is expected to
further increase in the future, and this increases the
aggregate demand for off-chip memory which makes
it worse to access the main memory. To address this
problem, we need to design a memory system that is
fast, big, and cheap. Static Random Access Memory
(SRAM) is being used in cache for its speed but is not
used in a large scale due its cost and low capacity.
Whereas, DRAM is being used in the main memory
for its large capacity and low cost. Therefore,
improving the efficiency of DRAM has become a
priority in the recent years. Many methods have been
proposed to reduce the loss of data and improve the
throughput and power efficiency. One solution is to
have a DRAM memory in the memory hierarchy. In
the recent past, DRAM has been employed in the
memory hierarchy as it increases the capacity of cache
memory via its higher density compared to the SRAM
cells. DRAM also has a higher bandwidth and lower
latency compared to the off-chip memory. DRAM
memory seems like a good solution to bring down the
memory wall (gap between the processor speed and
memory speed). The increased implementation of
DRAM memory has led to more and more research by
both industry and the academic institutions. Their
main aim is to improve the performance of DRAM
memory in today’s computation. For this purpose,
there have been many methods proposed for a given
limited off-chip memory bandwidth. Like many
things, a DRAM chip also has a structure (discussed
below), and can be subdivided into many parts. This
means that there is a possibility to improve the
characteristics of each and every of these parts.
A DRAM chip is made of capacitor based cells that
represent the data in the form of electric charge. To
store data in a cell, charge is injected, whereas to
retrieve data, the charge is extracted [2]. As shown in
figure 1, a typical DRAM chip has a hierarchy which
consists of multiple banks, a shared internal bus for
reading/writing data, and a chip I/O through which
memory is transferred between DRAM chip and other
memory units. Each bank is sub-divided into
subarrays and a bank I/O [10]. Furthermore the
subarrays are arranged into 2D arrays of DRAM cells

along with a common row buffer that consists of
SRAM cells and buffers one row of the DRAM bank.
Data can only be accessed after it is fetched to the row
buffer. Any attempt to read the data from the same
row will result in directly reading from the row buffer.
Accessing data (in the form of a cache line) from a
subarray involves multiple steps. First, the data can be
read only through a row buffer. This means that the
row must first be activated so that the data from the
rows of the DRAM cells can be transferred to the row
buffer. Secondly, after activating the row, the cache
line has to be read from/written to. This allows the
data to be transferred from/to the corresponding cells
through the internal bus that is present in the DRAM
chip. Finally, the row buffer has to be cleared for the
subsequent instructions.
Figure 1. Organization of a DRAM chip [10]
(Taken without permission)
In this paper, we are going to discuss the various
levels at which the DRAM performance can be
improved and the methods to do so. We observe four
different levels at which the modifications can be
done, with each level having multiple proposals to do
so. The first level is the chip level. At this level, there
is a memory channel with a memory controller, which
manages the set of DRAM banks present on the chip.
The memory channel has a three bus system which
includes a command bus, a read bus, and a write bus.
Each of these buses have I/O pins as well. These buses
and I/O access points can be partially/completely
replaced by the Photonically Interconnected DRAM
(PIDRAM) [4] technology, which provides energy
efficient communication. The photonic technology
uses light to transmit data between the processor and
the memory. To transmit data/commands, external
light (typically from a laser) is passed through
resonators which give that light a unique wavelength.
This modulated light is received by a photodetector
and is converted to electricity and the data/commands
are transferred. The advantage with this technology is
that multiple wavelengths can be transmitted at once,
allowing us to transmit more data that usual at low
power usage. The second level is the bank level. At
this level PIDRAM technology can be used to
reorganizing the banks [4] to save energy. Another
method to improve the performance of DRAM at this
level is by processing DRAM requests in batches of
requests [9]. The third level is the subarray level. One
idea is to have a hierarchical multi-bank DRAM [3] in
which the subarrays are converted to semi-
independent sub-banks, to take an advantage of the
fact that most of the DRAM accesses occur locally
within the subarrays. This allows the subarrays to act
independently for such accesses and makes the
process faster. The last level that can be modified is
the row level. In DRAM cache, to access memory
easily, memory blocks in the banks are mapped to a
particular set of a particular row of a particular bank.
These set-mapping policies [1] either concentrate on
improving the hit rate or decrease the latency. Another
change that can be made to this level is dividing the
row buffer into multiple smaller row buffers [7].
Figure 2. DRAM Memory System – Each inset shows detail for a different level of current electrical DRAM
memory systems. [4]

II. CHIP LEVEL
A DRAM chip consists of a shared internal bus,
multiple banks, a chip I/O and a memory channel
controlled by a memory controller. This section
describes different ways in which we can modify the
above parts of the chip to improve performance. One
such way is to use light to transmit data among the
parts of the DRAM chip. The following introduction
to the silicon photonic technology, which can replace
the conventional electrical circuit partially.
PHOTONICALLY INTERCONNECTED DRAM
The off-chip memory bandwidths are not likely to
match up to the performance of the processor. This
has been reducing the maximum achievable system
performance since 2008. The number of pins on the
board is limited by the area and power over heads of
high speed transceivers and package interconnect.
The number of packets transferred per pin can be
increased but only at the expense of using up more
energy. As described in the introduction, a DRAM
memory channel uses a memory controller to manage
a set of DRAM banks that are distributed across one
or more DRAM chips. We can overcome these
challenges by redesigning the DRAM memory using
Photonically Interconnected DRAM (PIDRAM) [4],
which uses a monolithically integrated silicon-
photonic technology. This technology uses light to
transfer data instead of electrical circuits. Firstly, the
light which is in the form of LASER is passed through
a series of resonators. These resonators modulate the
wavelength of the light which is transmitted from the
processor to the PIDRAM chip. At the PIDRAM chip,
this light is received and demodulated using filters and
is converted to electrical signal using a photo detector.
The advantages of this technology are: very less
power is required to transmit data, larger off-chip
bandwidths are supported at a minimum power
consumption, and transmission of the data at multiple
wavelengths at once, allowing multiple data packets
to be transferred at once. This is called as dense
wavelength division multiplexing (DWDM) [4] and
allows multiple links (wavelengths) to share the same
media (fibre or waveguide). The electrical I/O in
DRAM chips can be replaced by these energy
efficient photonic links. By redesigning DRAM banks
to provide greater bandwidth from an individual array
core, we can supply the bandwidth demands. This also
reduces the energy required to activate the banks. We
should keep in mind that all the electrical circuits
cannot be replaced by this technology as it needs more
area than a simple electrical circuit.
A. PIDRAM memory channel organization
A memory controller manages a set of DRAM
banks that are distributed across many DRAM chips.
This memory system has 3 logical buses: a command
bus, a write data bus, and a read data bus. We can
implement these buses using the photonic components
in 3 ways:
 Shared Photonic Bus:
All the three logical buses can be implemented
using a shared photonic bus, which works like a
standard electrical bus. In this implementation, the
memory controller first issues a command to all the
banks, and these banks determine if they are the target
bank. Once the target bank knows that it is the target,
for a write command, it will tune-in its photonic
receiver on the write-data bus. The memory controller
places the data on that bus, and the target bank
receives the data and performs the write operation,
and for a read command, the target bank will perform
its read operation and sends the data through the read
data bus.
Figure 3. Shared Photonic Buses [4]
 Split Photonic Bus:
In this implementation, the long shared bus is
divided into multiple branches. The laser power is
sent to all the receivers of the command and write bus,
and the modulators of the read bus. However, the total
laser power is roughly a linear function of the number
of banks. This reduces the effective bandwidth density
of the photonic device and also the optical laser power
compared to the shared photonic bus.

Figure 4. Monolithically integrated silicon-photonic technology - Two DWDM links in opposite directions
between a memory controller in a processor chip and a bank in a PIDRAM chip. λ1 is used for the request and λ2 is
used for the response in the opposite direction on the same waveguides and fibre. [4]
Figure 5.a. Split photonic buses [4]
 Guided Photonic Bus:
The optical power can be further reduced by this
implementation. Guided photonic bus uses optical
power guiding in the form of demultiplexers to
actively direct power to just the target bank. This
allows the total power to be constant throughout, and
also independent of the number of banks.
Figure 5.b. Guided photonic buses [4]
B. PIDRAM Chip Organization
We have discussed above different ways in which
the buses can be implemented photonically. The
trade-off with this is that only a portion of the buses
can be implemented photonically and the rest,
electrically. The design choice is made on the trade-
offs in power and area. The photonics can be
gradually extended deeper into the PIDRAM chip.
Figure 6. PIDRAM chip floorplan [4]
The vertical electrical data bus can be partitioned
into ‘n’ partitions and all the photonic circuits should
be replicated at each data access point for each bus
partition. Partitioning the data bus allows the DRAM
chip to use an energy efficient photonic interconnect.
This increases the fixed link power and higher optical
losses.
III. BANK LEVEL
Each bank consists of multiple subarrays and a
bank I/O. data is accessed in the form of cache lines
from each subarray. This requires activation of the
row containing the cache line, reading/writing the
cache line, and precharging the subarray to prepare for
subsequent requests. This section deals with novel

way to organize the banks a request scheduling
algorithm which help in increasing the number of
instructions executed.
A. PIDRAM Bank Organization
Most of the energy consumed in a DRAM chip is
by the banks themselves. Every array block in a bank
access activates an array core, which activates an
entire array core row. From this array core row, only
a few bits of data is used. Most of the bank energy is
used to wake up these unnecessary bits. This wastage
of energy can be reduced by either decreasing the
array core row size which reduces the number of the
unnecessary bits that are being activated, or increasing
the number of I/O per array core, and using fewer
array cores in parallel. Decreasing the array core row
size leads to a greater area penalty. Therefore the
access efficiency has to be improved by increasing the
number of I/Os per array core. The motivation to
make this change is not much because the energy
consumption by the bank is less compared to the
electrical inter-chip and intra-chip interconnect. Also
the number of pins we can have on a chip is limited.
The increased bandwidth allows more banks per
chip. The high bandwidth also allows energy savings
and does not affect the area of PIDRAM significantly.
This makes sure that the photonic technology will
play an important role in the future multiprocessor
performance. The upcoming PIDRAMs should not
only concentrate on high performance, low cost, and
energy efficiency at the chip level, but also support a
large range of multi-chip configurations with different
capacities and bandwidths.
B. Parallelism-aware batch scheduling
In a chip multiprocessor (CMP) system, the
DRAM is a frequently used resource. Inter-thread
interference can destroy the bank-level access
parallelism of individual threads. Bank level
parallelism [8] [9] is a method in which the requests
made by the threads are serviced in parallel in
different banks. Parallelism-aware batch scheduling is
the advanced method of bank level parallelism which
takes these requests and divides them into batches and
attends to those batches of requests. This method can
be divided into two steps:
i. Request Batching
A number of DRAM requests are grouped into a
batch. These batches of requests are completed one
after the other and this step ensures that all the
requests in one batch are completed before the arrival
of the next batch. The batch of requests serviced is
then removed from the memory request buffer and
only then the new batch is formed. When forming a
new batch, the batching component decides how
many requests issued by a thread for a certain bank
can be a part of a batch. Batching not only ensures that
all the requests are taken care of, but also provides a
uniform granularity due to which the performance
improves.
A fixed number of DRAM requests are grouped
into a batch. This is done based on the arrival time of
these requests. Even though there is interference from
other threads, the bank level access parallelism of
each thread is preserved. This guarantees the oldest
batch to be served first by prioritizing the oldest
requests and also prevents any thread from being
starved in the DRAM system due to interference from
other, potentially aggressive threads. Batching
reduces the serialization of thread requests by
executing them in parallel rather than allowing them
to run alone in the memory system.
ii. Parallelism-Aware Within-Batch Scheduling
In this step, the requests of each and every thread
in a batch is computed in parallel in the DRAM cells.
This hides the latency inside the batch and also
increases the throughput of the processor as many
requests are serviced in parallel. The Parallelism-
Aware Within-Batch Scheduling tries to maximize
the:
 Row-buffer locality:
Bank accesses will have lower latencies if a high
row-hit rate is present within a batch.
 Intra-thread bank parallelism:
Scheduling multiple requests from a thread to
various banks in parallel reduces the thread’s stall
time.
This scheduling uses a thread prioritization to
make use of both row-buffer locality and bank
parallelism. Thread ranking is done by maximum rule,
where the scheduler finds the maximum number of
marked requests and tie-breaker total rule, in which
the scheduler keeps track of the total number of
marked requests, called total-load, and assigns the
higher rank to the lower total load [9].
IV. SUB ARRAY LEVEL
Each subarray consists of a 2 dimensional array of
DRAM cells. The data stored in these cells are
accessed in terms of rows. The request is sent to a row
buffer which is common for all the rows of DRAM
cells. The data then is sent to the row buffer and the
data is accessed from the row buffer. The following
two sub-sections explain how the accesses can be
done faster by modifying the subarray.

A. Hierarchical multi-bank DRAM
Embedded DRAM or eDRAM is a dynamic
random-access memory integrated on the same die or
multi-chip module of an ASIC or microprocessor.
eDRAM allows for larger buses and higher operation
speeds, due to higher density of DRAM. eDRAM
cannot handle the number of memory accesses
generated by the high performance processor, which
creates a bottle neck. Successive accesses that need
the same bank must queue up and serialize. One
solution is having a parallelism aware batch
scheduling (discussed in III.B.). Another solution to
this problem is to simply increase the number of
independent DRAM banks in order to lower the
probability of a conflict. But increasing the number of
independent banks leads to a requirement of a larger
area. The number of independent banks can be
increased without effecting the area much by allowing
the subarrays to act as banks whenever they the
DRAM chip receives a request to that particular
subarray. This allows the subarrays to act as semi-
independent banks [3].
After dividing the DRAM banks into subarrays,
for the subarrays to act as semi-independent sub-
banks, some additions and modifications have to be
made to each subarray. The banks in the DRAM chip
use registers and controls to allow the accessing of
data. This means that a few pipeline registers and
controls, a set-reset flip-flop to hold the subarray
output from the subarray, and buffers to hold the
addresses for the access should be added to the
subarrays. The access queues of the DRAM should be
modified to detect accesses which do not cause
conflicts and then to start the accesses that do not
cause conflicts, in parallel.
Figure 7.a. Modifications made to each
subarray [9]
Figure 7.b. Modifications made to the access
queues [9]
This is a useful approach, since a large part of
each DRAM access actually occurs only locally
within individual DRAM subarrays. Individual
subarrays within independent banks are controlled as
semi-independent subbanks that share the main
bank’s I/O circuitry and decoders. The sub banks can
perform way better with just creating a small area
penalty
B. Fault tolerance in DRAMs
Errors are often occurred in DRAMs, which leads
to a significant downtime in datacentres. An
architecture of the DRAM has to be developed in
order to provide high standard of reliability. Error-
resilient schemes, called as chipkills [5] can be built
for such bit failures. Isolating an entire cache line to a
single small subarray on a single DRAM chip will
allow us to read an entire cache line out of a single
DRAM array, so the potential for correlated errors is
increased. In order to provide chipkill-level reliability
in concert with single small sub array, checksums [5]
stored for each cache line in the DRAM are
introduced, similar to that provided in hard drives.
Using the checksum we can provide robust error
detection capabilities, and provide chipkill-level
reliability through Redundant Array of Inexpensive
DRAMs [6]. In the Redundant Array of Inexpensive
DRAMs, a single disk serves as a parity check for
more than one other disks. On a disk access, only one
disk is read per ‘n’ number of disks. The check sum
related to the read block lets the Redundant Array of
Inexpensive DRAM controller know if the read is
correct or not. This approach is more effective in
terms of area and energy than prior chipkill
approaches, and only incurs a performance penalty
compared to a single sub array memory system
without chipkill.

Figure 8. Chipkill support in Single Sub
Array memory system (64KB) [5]
V. ROW LEVEL
As the number of cores in a processor is increasing,
the demand for an off-chip memory is increasing. This
exacerbates the main memory access bottle neck.
Many solutions have been proposed for this problem.
One of them is to have an on-chip DRAM as the last
level of cache to improve the performance for a given
off-chip memory bandwidth. This is called as the on-
chip DRAM cache, which increases the cache
capacity through high capacity. It also improves the
on-chip communication through high bandwidth and
low latency interconnect.
In a cache, the storage is mapped to the memory
addresses it serves. There are different ways this
mapping can be done. The choice of mapping is very
critical to the design that the cache is often named
after the mapping. This is done so in N-way set
associatively. The same goes with the DRAM. Each
row in a subarray of a bank of a DRAM chip has a
series of DRAM cells. All the rows in a subarray of a
bank have a common row buffer which is used to
access data. Implementing the DRAM in the cache
requires the DRAM to have mapping between the
rows and the main memory system. The following
two methods explain the ways to do so. The first
method explains the way set mapping works and later
in the section we see how the row buffer can be
modified to make the accessing of data faster.
A. Set mapping policy
As explained in the introduction, the DRAM cache
is a multi-bank system, with each bank having a
number of rows. The DRAM cache uses set mapping
policy [1], in which memory blocks are mapped to a
particular set of a particular row of a particular bank.
The set mapping policy directly affects the throughput
of the system by effecting the DRAM cache miss rate
and DRAM cache hit latency which makes it an
important aspect in the cache.
Figure 9. DRAM cache hierarchy (Intel)
(Taken from the website)

New DRAM set mapping policies are proposed
regularly to reduce the DRAM cache miss rate.
Through higher associativity we can achieve reduced
DRAM latency via improved row buffer hit rate. A
typical DRAM cache organization has multiple banks,
each with subarrays and each subarray containing an
array of rows and columns of DRAM cells. Each
DRAM provides with a row buffer, which buffers one
row in a DRAM bank. Data in a DRAM bank is
accessed after it is fetched through the row buffer.
Figure 10. 29 way associativity for 4KB size row [1]
(taken without permission)
Associativity deals with the hit ratio and search
speed. There is a trade-off between these two factors.
A direct mapped cache has a good hit ratio but a better
search speed. For a fully associative cache, the search
rate is better than the hit ratio. This implies that as the
associativity increases, the hit ratio improves and the
search speed decreases. Thus, we need to come up
with a reasonable associativity. As said before, a
higher associativity decreases the cache miss rate
significantly. The DRAM cache row is divided into
tag blocks and cache lines. Each bank of the DRAM
cache is associated with a row buffer that holds the
last accessed row of that bank. If the associativity of
DRAM row organization is increased, a cache access
first hits the tag block instead of the whole cache
block and reduces access latency. Having a higher
associativity cache might slightly increase the tag
latency compared to lower associativity. But this
device benefits from higher associativity that reduces
conflict misses. It also provides a higher row buffer
hit rate compared to a simple cache, because there are
more number of consecutive memory blocks mapped
to the same set.
B. Modifying the row buffer
The present DRAM cache banks have a single row
buffer. Having multiple smaller row buffers instead of
the existing single large row buffers helps improve the
row hit rates and also reduce the energy required for
row activation [7]. As explained earlier, the data has
to come to the row buffer after read request, and the
row data has to be read from the buffer. The width of
each row buffer is the width of the entire row and
holds a few KB of data. The precharge writes back the
row buffer to the appropriate row after a column
read/write of the selected words from/to the row
buffer. The precharge operation involves charging
and discharging of a large number of capacitors.
In a multi-core processor memory, the memory
addresses are spread evenly across memory banks to
compensate for the relatively slow speed of DRAM.
This decreases the row buffer hit rate. We can
improve this reduced row buffer hit rate by dividing
the row buffer into multiple smaller row buffers. This
new organization will now require sub-row activation
in addition to the row-buffer selection and row-buffer
selection. This requires for the controller to add more
address bits to the DRAM cache to activate the sub-
row buffers and bring into a row buffer. The memory
controller allocates and manages the row buffers,
providing the DRAM logic with additional flexibility,
to implement many other buffer allocation policies.
Figure 11. Reorganized DRAM bank structure to support sub-rows and buffer selection [7]

VI. CONCLUSION
From the problems we discussed above, it is clear
that improving the memory system is the top priority
to achieve greater speeds. DRAM plays an important
role in the memory system and therefore, more
techniques should be used to improve the DRAM.
DRAM is a hierarchical system and has four levels.
The components in each of these levels can be
modified by replacing or by modifying/reorganizing
them. All of the performance techniques discussed
above improve the DRAM efficiency significantly
and they do so in different ways. Some techniques
lead to decrease in power, some increase the
throughput, and some hide the latency. The final
taxonomy we obtained by analysing various
techniques to improve DRAM performance is shown
in the figure 12. We can also conclude that the
photonic technology will play a very crucial role in
the future of processors and memory systems.
ACKNOWLEDGEMENT
I thank Dr. Soner Onder, for his valuable and
comments on the earlier drafts and for being patient
throughout the process.
Figure 12. Resulting taxonomy of our analysis.
REFERENCES
[1] Hameed, F., Bauer, L., Henkel, J., "Architecting
On-Chip DRAM Cache for Simultaneous Miss Rate
and Latency Reduction," in Computer-Aided Design
of Integrated Circuits and Systems, IEEE
Transactions on , vol.PP, no.99, pp.1-1, Oct. 2015
[2] Donghyuk Lee, Yoongu Kim, Pekhimenko, G.,
Khan, S., Seshadri, V., Chang, K., Mutlu, O.,
"Adaptive-latency DRAM: Optimizing DRAM
timing for the common-case," in High Performance
Computer Architecture (HPCA), 2015 IEEE 21st
International Symposium on, pp.489-501, 7-11 Feb.
2015
[3] T. Yamauchi, L. Hammond and K. Olukotun, "The
hierarchical multi-bank DRAM: a high-performance
architecture for memory integrated with
processors," Advanced Research in VLSI, 1997.
Proceedings, Seventeenth Conference on, Ann Arbor,
MI, 1997, pp. 303-319.
DRAM performance
improvement
Chip level
PIDRAM Memory
channel organization
PIDRAM Chip
organization
Bank level
PIDRAM Bank
organization
Parallelism-aware
batch scheduling
Subarray level
Hierarchical multi-
bank DRAM
Fault tolerence
Row level
Set mapping policy
Row buffer
modification

[4] Scott Beamer, Chen Sun, Yong-Jin Kwon, Ajay
Joshi, Christopher Batten, Vladimir Stojanović, and
Krste Asanović, “Re-architecting DRAM memory
systems with monolithically integrated silicon
photonics,” in Proceedings of the 37th annual
international symposium on Computer
architecture (ISCA '10). ACM, New York, NY, USA,
pp. 129-140, 2010
[5] Aniruddha N. Udipi, Naveen Muralimanohar,
Niladrish Chatterjee, Rajeev Balasubramonian, Al
Davis, and Norman P. Jouppi, “Rethinking DRAM
design and organization for energy-constrained multi-
cores,” in Proceedings of the 37th annual
international symposium on Computer
architecture (ISCA '10). ACM, New York, NY, USA,
pp. 175-186, 2010
[6] J. L. Hennessy and D. A. Patterson. Computer
Architecture: A Quantitative Approach. Elsevier, 4th
edition, 2007.
[7] Gulur N., Manikantan R., Govindarajan R.,
Mehendale M., "Row-Buffer Reorganization:
Simultaneously Improving Performance and
Reducing Energy in DRAMs," in Parallel
Architectures and Compilation Techniques (PACT),
2011 International Conference on, pp.189-190, 10-14
Oct. 2011
[8] Chang K.K.-W., Donghyuk Lee, Chishti Z.,
Alameldeen A.R., Wilkerson C., Yoongu Kim, Mutlu
O., "Improving DRAM performance by parallelizing
refreshes with accesses," in High Performance
Computer Architecture (HPCA), 2014 IEEE 20th
International Symposium on , pp. 356-367, 15-19 Feb.
2014
[9] Mutlu, O., Moscibroda, T., "Parallelism-Aware
Batch Scheduling: Enhancing both Performance and
Fairness of Shared DRAM Systems," in Computer
Architecture, 2008. ISCA '08. 35th International
Symposium on, pp. 63-74, 21-25 June 2008
[10] Vivek Seshadri, Yoongu Kim, Chris Fallin,
Donghyuk Lee, Rachata Ausavarungnirun, Gennady
Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B.
Gibbons, Michael A. Kozuch, and Todd C. Mowry,
“RowClone: fast and energy-efficient in-DRAM bulk
data copy and initialization,” in Proceedings of the
46th Annual IEEE/ACM International Symposium on
Microarchitecture (MICRO-46). ACM, New York,
NY, USA, pp. 185-197, 2013

Improving DRAM performance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Improving DRAM performance

Similar to Improving DRAM performance (20)

Improving DRAM performance