SlideShare a Scribd company logo
1 of 20
Download to read offline
RAMinate:	
  Hypervisor-­‐based	
  Virtualization
for	
  Hybrid	
  Main	
  Memory	
  Systems
Takahiro	
  Hirofuchi	
  and	
  Ryousei	
  Takano
National	
  Institute	
  of	
  Advanced	
  Industrial	
  Science	
  and	
  Technology	
  (AIST)
ACM	
  Symposium	
  on	
  Cloud	
  Computing	
  2016,	
  Oct.	
  2016
Emerging	
  Non-­‐volatile	
  Memory	
  (NVM)
A	
  DRAM	
  cell	
  =~	
  a	
  capacitor
Pinned	
  layer
Free	
  layer
Low
resistance
High
resistance
A	
  MRAM	
  cell	
  =~	
  ferromagnetism
Energy	
  consuming
• Need	
  refresh	
  energy
• More	
  than	
  30%	
  of	
  energy	
  
consumption	
  of	
  a	
  data	
  center
Not	
  scalable	
  anymore
• Due	
  to	
  serious	
  energy	
  dissipation
Energy	
  efficient
• No	
  refresh	
  energy
Scalable	
  in	
  theory
• Higher	
  density	
  will	
  be	
  possible
2
Technology	
  Roadmap	
  on	
  STT-­‐MRAM
Spin	
  Transfer	
  Torque	
  Magnetoresistive	
  RAM	
  (STT-­‐MRAM,	
  in	
  short,	
  we	
  say	
  MRAM	
  here)	
  
will	
  achieve	
  the	
  same	
  level	
  of	
  read/write	
  latency	
  as	
  DRAM	
  around	
  2016.	
  
MRAM	
  can	
  rewrite	
  memory	
  cells	
  without	
  any	
  practical	
  degradation.	
  
However,	
  the	
  write	
  energy	
  will	
  be	
  10^2	
  times	
  larger	
  than	
  that	
  of	
  DRAM.	
  
Table	
  1.	
  Technology	
  roadmap	
  on	
  STT-­‐MRAM	
  and	
  DRAM,	
  according	
  to	
  
International	
  Technology	
  Roadmap	
  for	
  Semiconductor	
  2013
2013 2026
Read	
  Time	
  (ns)
DRAM <10 <10
STT-­‐MRAM 35 <10
Write/Erasure Time	
  (ns)
DRAM <10 <10
STT-­‐MRAM 35 <1
Write	
  Energy	
  (J/bit)
DRAM 4E-­‐15 2E-­‐15
STT-­‐MRAM 2.5E-­‐12 1.5E-­‐13
3
Hybrid	
  Main	
  Memory
• Achieve	
  large	
  main	
  memory	
  capacity	
  with	
  small	
  energy	
  consumption
• To	
  avoid	
  write	
  energy	
  problems	
  of	
  MRAM,	
  both	
  DRAM	
  and	
  MRAM	
  
must	
  be	
  combined	
  for	
  the	
  main	
  memory	
  of	
  a	
  computer.
• CPU	
  must	
  send	
  most	
  write	
  requests	
  to	
  DRAM.
DRAM
(Small)
MRAM
(Large)
CPU
Most	
  write	
  requests
4
Requirements	
  for	
  Hybrid	
  Memory	
  Systems	
  in	
  IaaS	
  Datacenters
• Transparent	
  to	
  guest	
  operating	
  systems
• A	
  guest	
  OS	
  is	
  under	
  customer’s	
  administrative	
  domain,
not	
  service	
  provider’s	
  administrative	
  domain.
• Easy	
  to	
  apply	
  the	
  current	
  server	
  systems
• Extending	
  a	
  memory	
  controller	
  is	
  costly,	
  and	
  should	
  be	
  avoided.
5
Assumption
Both	
  DRAM	
  and	
  STT-­‐MRAM	
  are	
  byte-­‐addressable.	
  
Both	
  are	
  mapped	
  to	
  different	
  physical	
  address	
  ranges.
MRAM
(Large)
DRAM
(Small)
Physical	
  Address	
  Mapping
Offset	
  0 Offset	
  M Offset	
  N
• Future	
  NVM	
  technologies	
  will	
  be	
  
also	
  attached	
  to	
  a	
  DIMM	
  interface
• Note:	
  Currently,	
  DIMM-­‐based	
  Flash	
  
modules	
  are	
  already	
  on	
  the	
  market.
6
2.	
  Determine	
  necessary	
  
page	
  swap	
  operations
1. Detect	
  written	
  pages
3.	
  Update	
  page	
  mapping
MRAM	
  PoolDRAM	
  Pool
1GB 3GB
VM	
  (DRAM	
  4GB)
RAMinate	
  (Qemu/KVM)
RAMinate:	
  Hybrid	
  Memory	
  Support	
  at	
  Hypervisor
Provide	
  transparency	
  to	
  guest	
  operating	
  systems	
  without	
  any	
  modification	
  to	
  
existing	
  memory	
  controllers.
7
Component	
  1.	
  RAMinate	
  periodically	
  scans	
  Extended	
  Page	
  Table	
  
entries	
  of	
  a	
  VM	
  and	
  finds	
  dirty	
  guest	
  pages.
9
EPT  maintains  mapping  between  guest  frame  numbers  and  physical  
frame  numbers.  When  a  guest  page  is  written,  CPU  sets  the  dirty  bit.
9
Component	
  2.	
  The	
  optimization	
  algorithm,	
  Corked	
  Multi	
  Queue,	
  
determines	
  which	
  guest	
  pages	
  must	
  be	
  migrated.
The	
  queue	
  level	
  relates	
  to	
  the	
  number	
  of	
  past	
  page	
  updates.	
  MRAM	
  
pages	
  in	
  higher	
  levels	
  of	
  queues	
  are	
  candidates	
  for	
  page	
  migration.
Q0
Q1
Q2
Q3
Q0
Q1
Q2
Q3
Promote a	
  page	
  when	
  the	
  number	
  of	
  
past	
  detected	
  updates	
  of	
  a	
  page	
  
exceeds	
  a	
  threshold	
  value.
Demote a	
  page	
  when	
  the	
  elapsed	
  time	
  
from	
  its	
  last	
  update	
  exceeds	
  a	
  threshold	
  
value.
Simulation
• Compare  CMQ,  LRU  and  nothing
• Use  trace  data  during  SPEC  CPU  2006
10
DRAM  hit  ratio The  number  of  swapped  pages
CMQ  drastically  reduces  the  
number  of  necessary  page  swaps.
11
Component	
  3.	
  RAMinate	
  dynamically	
  updates	
  page	
  mapping	
  
between	
  guest	
  and	
  physical	
  pages	
  without	
  disrupting	
  the	
  guest	
  
operating	
  system.
Page	
  Swap	
  in	
  Detail
• Temporary  stop  the  VM  during  optimization
1. Stop  the  VM
2. Repeat  page  swap  operations
1. Swap  the  PFNs  of  2  EPT entries
2. Swap  the  data  of  the  memory  pages
3. Restart  the  VM
12
Page	
  Swap	
  Overhead
• Downtime  is  approximately  30ms  for  1000  
swap  operations.
• Short  enough,  acceptable
– Page  mapping  optimization  is  performed  every  5  
second  in  the  default  setting.
13
Experiments
14
CH3
RAMinate
CH2CH1
Host  OS
CH0
• Use  one  memory  channel  for  MRAM
• Disable  memory  channel  interleaving
• Measure  read/write  traffic  per  memory  
channel  by  PMU
– CAS_̲COUNT.RD  Counter
– CAS_̲COUNT.WR  Counter
VM
PMUPMUPMUPMU
DRAM
Pool
MRAM
Pool
Kernel	
  Compile	
  (DRAM	
  40MB,	
  MRAM	
  3960MB)
15
First,  most  write  traffic  went  to  MRAM.  After  optimized,  only  50%.
The	
  Total	
  Amount	
  of	
  Written	
  Data	
  to	
  DRAM	
  and	
  MRAM	
  in	
  GB
16
The  DRAM  size  is  10%  of  VM  memory  (400MB).
A  smaller  interval  of  optimization  reduced  the  ratio  of  MRAM  traffic,  but  increased  
the  total  of  written  data  because  of  memory  copy  operations  for  page  swaps.  
(Also,  there  is  more  performance  overhead  in  a  smaller  interval.  See  the  paper.)
Energy	
  Estimation
• Explore	
  how	
  much	
  energy	
  can	
  be	
  saved	
  (or	
  wasted)
if	
  RAMinate	
  is	
  applied	
  to	
  future	
  STT-­‐MRAM	
  devices
• First,	
  develop	
  energy	
  models	
  of	
  DRAM	
  and	
  STT-­‐MRAM
• Next,	
  estimate	
  energy	
  consumption	
  by	
  applying
obtained	
  the	
  data	
  through	
  the	
  experiments	
  to	
  the	
  models
17
Energy	
  Models
• DRAM
– X:  Read  data  size  (MB)
– Y:  Written  data  size  (MB)
– P:  Idle  Power  (W)  per  VM
– T:  Duration  of  an  experiment  (s)
– Energy  (J)  =  a*X  +  a*Y  +  P*T
• STT-‐‑‒MRAM  with  100x  write  energy
– X:  Read  data  size  (MB)
– Y:  Written  data  size  (MB)
– Energy  (J)  =  a*X  +  (100*a)*Y
18
• Assume	
  a	
  simple	
  linear	
  correlation
between	
  read/write	
  throughput	
  and	
  energy	
  consumption
• Sufficient	
  for	
  discussing	
  orders	
  of	
  magnitudes	
  larger	
  write	
  energy
Estimated	
  Energy	
  Consumption
19
• Assume  MRAM  write  energy  x10^2  than  that  of  DRAM.
Other  values  are  discussed  in  the  paper.
• Hybrid  memory  with  10%  DRAM  outperformed  DRAM-‐‑‒only  memory  in  most  cases.
Conclusion
• RAMinate:  Hypervisor-‐‑‒based  hybrid  memory  mechanism
– Transparent  to  existing  applications
– No  modification  to  existing  memory  controllers
– AFAIK,  the  first  study  at  the  hypervisor  layer
• Reduce  energy  consumption  of  a  server  with  DRAM  and  
future  STT-‐‑‒MRAM
– For  tested  workloads,
• 70%  reduction  in  write  traffic  to  STT-‐‑‒MRAM
• 50%  reduction  in  total  energy  consumption
• Poster  Today!
20

More Related Content

What's hot

improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesRaghavendra Prabhu
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platformsSyed Zaid Irshad
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimizationK Gowsic Gowsic
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 
Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)Mamesh
 
Term paper
Term paperTerm paper
Term paperBilal201
 

What's hot (20)

improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and Opportunities
 
TPU paper slide
TPU paper slideTPU paper slide
TPU paper slide
 
48a tuning
48a tuning48a tuning
48a tuning
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
20090720 smith
20090720 smith20090720 smith
20090720 smith
 
SRAM DRAM
SRAM DRAMSRAM DRAM
SRAM DRAM
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
NUMA
NUMANUMA
NUMA
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 
Shadow paging
Shadow pagingShadow paging
Shadow paging
 
SRAM
SRAMSRAM
SRAM
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)
 
Term paper
Term paperTerm paper
Term paper
 

Viewers also liked

Case Study: Building a Culture of Analytics in HR at Micron
Case Study: Building a Culture of Analytics in HR at MicronCase Study: Building a Culture of Analytics in HR at Micron
Case Study: Building a Culture of Analytics in HR at MicronHuman Capital Media
 
2012.04.20 3기전석대 sk하이닉스 반도체기술동향
2012.04.20 3기전석대 sk하이닉스 반도체기술동향2012.04.20 3기전석대 sk하이닉스 반도체기술동향
2012.04.20 3기전석대 sk하이닉스 반도체기술동향Lucas Ju
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano
 
Presentation STT-RAM Survey
Presentation STT-RAM SurveyPresentation STT-RAM Survey
Presentation STT-RAM SurveySwapnil Bhosale
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何かRyousei Takano
 

Viewers also liked (6)

Case Study: Building a Culture of Analytics in HR at Micron
Case Study: Building a Culture of Analytics in HR at MicronCase Study: Building a Culture of Analytics in HR at Micron
Case Study: Building a Culture of Analytics in HR at Micron
 
2012.04.20 3기전석대 sk하이닉스 반도체기술동향
2012.04.20 3기전석대 sk하이닉스 반도체기술동향2012.04.20 3기전석대 sk하이닉스 반도체기술동향
2012.04.20 3기전석대 sk하이닉스 반도체기술동향
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価
 
Presentation STT-RAM Survey
Presentation STT-RAM SurveyPresentation STT-RAM Survey
Presentation STT-RAM Survey
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か
 

Similar to RAMinate ACM SoCC 2016 Talk

Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentationKarthik Iyr
 
A survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesA survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesKarthik Iyr
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKofficeaiotfab
 
Nvram applications in the architectural revolutions of main memory implementa...
Nvram applications in the architectural revolutions of main memory implementa...Nvram applications in the architectural revolutions of main memory implementa...
Nvram applications in the architectural revolutions of main memory implementa...IAEME Publication
 
CA assignment group.pptx
CA assignment group.pptxCA assignment group.pptx
CA assignment group.pptxHAIDERALICH3
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET Journal
 
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
RAMCloud: Scalable  Datacenter Storage  Entirely in DRAMRAMCloud: Scalable  Datacenter Storage  Entirely in DRAM
RAMCloud: Scalable Datacenter Storage Entirely in DRAMGeorge Ang
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldPraveen Kumar
 
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionTime and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionIJMTST Journal
 
RRAM Status and Opportunities
RRAM Status and Opportunities RRAM Status and Opportunities
RRAM Status and Opportunities Deepak Sekar
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Mram (magneticRAM)
Mram (magneticRAM)Mram (magneticRAM)
Mram (magneticRAM)Mohit Patel
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptPavikaSharma3
 

Similar to RAMinate ACM SoCC 2016 Talk (20)

Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentation
 
A survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesA survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphones
 
Emt
EmtEmt
Emt
 
Adaptive bank management[1]
Adaptive bank management[1]Adaptive bank management[1]
Adaptive bank management[1]
 
SRAM
SRAMSRAM
SRAM
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
Nvram applications in the architectural revolutions of main memory implementa...
Nvram applications in the architectural revolutions of main memory implementa...Nvram applications in the architectural revolutions of main memory implementa...
Nvram applications in the architectural revolutions of main memory implementa...
 
CA assignment group.pptx
CA assignment group.pptxCA assignment group.pptx
CA assignment group.pptx
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
 
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
RAMCloud: Scalable  Datacenter Storage  Entirely in DRAMRAMCloud: Scalable  Datacenter Storage  Entirely in DRAM
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworld
 
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data RetentionTime and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
Time and Low Power Operation Using Embedded Dram to Gain Cell Data Retention
 
RRAM Status and Opportunities
RRAM Status and Opportunities RRAM Status and Opportunities
RRAM Status and Opportunities
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Mram (magneticRAM)
Mram (magneticRAM)Mram (magneticRAM)
Mram (magneticRAM)
 
Processing-in-Memory
Processing-in-MemoryProcessing-in-Memory
Processing-in-Memory
 
Memory.pptx
Memory.pptxMemory.pptx
Memory.pptx
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.ppt
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

RAMinate ACM SoCC 2016 Talk

  • 1. RAMinate:  Hypervisor-­‐based  Virtualization for  Hybrid  Main  Memory  Systems Takahiro  Hirofuchi  and  Ryousei  Takano National  Institute  of  Advanced  Industrial  Science  and  Technology  (AIST) ACM  Symposium  on  Cloud  Computing  2016,  Oct.  2016
  • 2. Emerging  Non-­‐volatile  Memory  (NVM) A  DRAM  cell  =~  a  capacitor Pinned  layer Free  layer Low resistance High resistance A  MRAM  cell  =~  ferromagnetism Energy  consuming • Need  refresh  energy • More  than  30%  of  energy   consumption  of  a  data  center Not  scalable  anymore • Due  to  serious  energy  dissipation Energy  efficient • No  refresh  energy Scalable  in  theory • Higher  density  will  be  possible 2
  • 3. Technology  Roadmap  on  STT-­‐MRAM Spin  Transfer  Torque  Magnetoresistive  RAM  (STT-­‐MRAM,  in  short,  we  say  MRAM  here)   will  achieve  the  same  level  of  read/write  latency  as  DRAM  around  2016.   MRAM  can  rewrite  memory  cells  without  any  practical  degradation.   However,  the  write  energy  will  be  10^2  times  larger  than  that  of  DRAM.   Table  1.  Technology  roadmap  on  STT-­‐MRAM  and  DRAM,  according  to   International  Technology  Roadmap  for  Semiconductor  2013 2013 2026 Read  Time  (ns) DRAM <10 <10 STT-­‐MRAM 35 <10 Write/Erasure Time  (ns) DRAM <10 <10 STT-­‐MRAM 35 <1 Write  Energy  (J/bit) DRAM 4E-­‐15 2E-­‐15 STT-­‐MRAM 2.5E-­‐12 1.5E-­‐13 3
  • 4. Hybrid  Main  Memory • Achieve  large  main  memory  capacity  with  small  energy  consumption • To  avoid  write  energy  problems  of  MRAM,  both  DRAM  and  MRAM   must  be  combined  for  the  main  memory  of  a  computer. • CPU  must  send  most  write  requests  to  DRAM. DRAM (Small) MRAM (Large) CPU Most  write  requests 4
  • 5. Requirements  for  Hybrid  Memory  Systems  in  IaaS  Datacenters • Transparent  to  guest  operating  systems • A  guest  OS  is  under  customer’s  administrative  domain, not  service  provider’s  administrative  domain. • Easy  to  apply  the  current  server  systems • Extending  a  memory  controller  is  costly,  and  should  be  avoided. 5
  • 6. Assumption Both  DRAM  and  STT-­‐MRAM  are  byte-­‐addressable.   Both  are  mapped  to  different  physical  address  ranges. MRAM (Large) DRAM (Small) Physical  Address  Mapping Offset  0 Offset  M Offset  N • Future  NVM  technologies  will  be   also  attached  to  a  DIMM  interface • Note:  Currently,  DIMM-­‐based  Flash   modules  are  already  on  the  market. 6
  • 7. 2.  Determine  necessary   page  swap  operations 1. Detect  written  pages 3.  Update  page  mapping MRAM  PoolDRAM  Pool 1GB 3GB VM  (DRAM  4GB) RAMinate  (Qemu/KVM) RAMinate:  Hybrid  Memory  Support  at  Hypervisor Provide  transparency  to  guest  operating  systems  without  any  modification  to   existing  memory  controllers. 7
  • 8. Component  1.  RAMinate  periodically  scans  Extended  Page  Table   entries  of  a  VM  and  finds  dirty  guest  pages. 9 EPT  maintains  mapping  between  guest  frame  numbers  and  physical   frame  numbers.  When  a  guest  page  is  written,  CPU  sets  the  dirty  bit.
  • 9. 9 Component  2.  The  optimization  algorithm,  Corked  Multi  Queue,   determines  which  guest  pages  must  be  migrated. The  queue  level  relates  to  the  number  of  past  page  updates.  MRAM   pages  in  higher  levels  of  queues  are  candidates  for  page  migration. Q0 Q1 Q2 Q3 Q0 Q1 Q2 Q3 Promote a  page  when  the  number  of   past  detected  updates  of  a  page   exceeds  a  threshold  value. Demote a  page  when  the  elapsed  time   from  its  last  update  exceeds  a  threshold   value.
  • 10. Simulation • Compare  CMQ,  LRU  and  nothing • Use  trace  data  during  SPEC  CPU  2006 10 DRAM  hit  ratio The  number  of  swapped  pages CMQ  drastically  reduces  the   number  of  necessary  page  swaps.
  • 11. 11 Component  3.  RAMinate  dynamically  updates  page  mapping   between  guest  and  physical  pages  without  disrupting  the  guest   operating  system.
  • 12. Page  Swap  in  Detail • Temporary  stop  the  VM  during  optimization 1. Stop  the  VM 2. Repeat  page  swap  operations 1. Swap  the  PFNs  of  2  EPT entries 2. Swap  the  data  of  the  memory  pages 3. Restart  the  VM 12
  • 13. Page  Swap  Overhead • Downtime  is  approximately  30ms  for  1000   swap  operations. • Short  enough,  acceptable – Page  mapping  optimization  is  performed  every  5   second  in  the  default  setting. 13
  • 14. Experiments 14 CH3 RAMinate CH2CH1 Host  OS CH0 • Use  one  memory  channel  for  MRAM • Disable  memory  channel  interleaving • Measure  read/write  traffic  per  memory   channel  by  PMU – CAS_̲COUNT.RD  Counter – CAS_̲COUNT.WR  Counter VM PMUPMUPMUPMU DRAM Pool MRAM Pool
  • 15. Kernel  Compile  (DRAM  40MB,  MRAM  3960MB) 15 First,  most  write  traffic  went  to  MRAM.  After  optimized,  only  50%.
  • 16. The  Total  Amount  of  Written  Data  to  DRAM  and  MRAM  in  GB 16 The  DRAM  size  is  10%  of  VM  memory  (400MB). A  smaller  interval  of  optimization  reduced  the  ratio  of  MRAM  traffic,  but  increased   the  total  of  written  data  because  of  memory  copy  operations  for  page  swaps.   (Also,  there  is  more  performance  overhead  in  a  smaller  interval.  See  the  paper.)
  • 17. Energy  Estimation • Explore  how  much  energy  can  be  saved  (or  wasted) if  RAMinate  is  applied  to  future  STT-­‐MRAM  devices • First,  develop  energy  models  of  DRAM  and  STT-­‐MRAM • Next,  estimate  energy  consumption  by  applying obtained  the  data  through  the  experiments  to  the  models 17
  • 18. Energy  Models • DRAM – X:  Read  data  size  (MB) – Y:  Written  data  size  (MB) – P:  Idle  Power  (W)  per  VM – T:  Duration  of  an  experiment  (s) – Energy  (J)  =  a*X  +  a*Y  +  P*T • STT-‐‑‒MRAM  with  100x  write  energy – X:  Read  data  size  (MB) – Y:  Written  data  size  (MB) – Energy  (J)  =  a*X  +  (100*a)*Y 18 • Assume  a  simple  linear  correlation between  read/write  throughput  and  energy  consumption • Sufficient  for  discussing  orders  of  magnitudes  larger  write  energy
  • 19. Estimated  Energy  Consumption 19 • Assume  MRAM  write  energy  x10^2  than  that  of  DRAM. Other  values  are  discussed  in  the  paper. • Hybrid  memory  with  10%  DRAM  outperformed  DRAM-‐‑‒only  memory  in  most  cases.
  • 20. Conclusion • RAMinate:  Hypervisor-‐‑‒based  hybrid  memory  mechanism – Transparent  to  existing  applications – No  modification  to  existing  memory  controllers – AFAIK,  the  first  study  at  the  hypervisor  layer • Reduce  energy  consumption  of  a  server  with  DRAM  and   future  STT-‐‑‒MRAM – For  tested  workloads, • 70%  reduction  in  write  traffic  to  STT-‐‑‒MRAM • 50%  reduction  in  total  energy  consumption • Poster  Today! 20