SlideShare a Scribd company logo
1 of 60
1   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transcendent Memory
Avi Miller
Principal Program Manager




 ORACLE
PRODUCT
  LOGO
Further reading:
https://lwn.net/Articles/454795/




3   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Objectives


 Utilise RAM more effectively
         – Lower capital costs
         – Lower power utilisation
         – Less I/O


 Better performance on many workloads
         – Negligible loss on others




4   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Motivation: Memory-inefficient workloads




5   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
More motivation: memory capacity wall

               1000
                                                        # Core
                                                        GB DRAM
                  100




                     10




                         1
                                     2003

                                                 2004

                                                              2005

                                                                           2006

                                                                                  2007

                                                                                         2008

                                                                                                2009

                                                                                                       2010



                                                                                                                     2012

                                                                                                                            2013

                                                                                                                                   2014

                                                                                                                                          2015

                                                                                                                                                 2016

                                                                                                                                                        2017
                                      Memory capacity per core drops ~30% every 2 years                       2011

                                            Source: Disaggregated Memory for Expansion and Sharing in Blade Server
                                                           http://isca09.cs.columbia.edu/pres/24.pptx



6   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
7   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Slide from: Linux kernel support to exploit phase change memory, Linux Symposium 2010, Youngwoo Park, EE KAIST



8   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Disaggregated memory


                          DIMM                                                                                          DIMM
                          DIMM                                                                                          DIMM
                          DIMM CPUs                                                                       CPUs          DIMM
                          DIMM                                                                                          DIMM




                                                                                        Exofabric
                          DIMM                                                                                          DIMM
                          DIMM                                                                                          DIMM
                          DIMM CPUs                                                                       CPUs          DIMM
                          DIMM                                                                                          DIMM


            Leverage fast, shared                                                                              Memory
            communication fabrics                                                                              blade
                                       Source: Disaggregated Memory for Expansion and Sharing in Blade Server
                                                                           http://isca09.cs.columbia.edu/pres/24.pptx

9   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
OS memory “demand”




                                                                                         OS
Operating
systems are
memory hogs!


                                                                             Memory constraint




 10   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
OS Physical Memory Management




                                                                                      OS


If you give an
operating
system more
memory…


                                                                             New larger memory
                                                                             constraint

 11   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
OS Physical Memory Management




                                                                                          My name is
                                                                                          Linux and I
                                                                                          am a
… it uses up                                                                              memory hog
any memory
you give it!



                                                                             Memory constraint


 12   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
OS Memory “Asceticism”


  ASSUME
          – We should use as little RAM as possible


  SUPPOSE
          – Mechanism to allow the OS to surrender RAM
          – Mechanism to allow the OS to obtain more RAM


  THEN
          – How does an OS decide how much RAM it actually needs?




 as-cet-i-cism, n. 1. extreme self-denial and austerity; rigorous self-discipline and
 active restraint; renunciation of material comforts so as to achieve a higher state


13   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Impact on Linux Memory Subsystem




14   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
15   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
16   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
17   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
CAPACITY KNOWN
 Can read or write to
 any byte.

18   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
CAPACITY UNKOWN
 CAPACITY KNOWN                                                             and may change
 Can read or write to
                                                                            dynamically!
 any byte.

19   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• CAPACITY: known
     • USES:
        • kernel memory
        • user memory
        • DMA
     • ADDRESSABILITY:
        • Read/write any byte


20   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• CAPACITY: known
     • USES:                                                                • CAPACITY
                                                                                 -“unknowable”
        • kernel memory                                                          - dynamic
                                                                            SO…
        • user memory                                                            kernel/CPU can’t
        • DMA                                                               SO…
                                                                                 address directly!

     • ADDRESSABILITY:                                                           Need “permission”
                                                                                 to access and need
        • Read/write any byte                                                    to “follow rules”
                                                                                 (even the kernel!)


21   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• CAPACITY: known
     • USES:                                                                • THE RULES
        • kernel memory                                                     1. “page”-at-a-time
        • user memory                                                       2. to put data here,
        • DMA                                                                   kernel MUST use a
     • ADDRESSABILITY:                                                          “put page call”
        • Read/write any byte                                               3. (more rules later)

22   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
23   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:




     And the kernel wants to
     “preserve” Tux in Type B
     memory.

24   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:

                                                                            may say NO
                                                                            to kernel!



     And the kernel wants to “preserve”
     Tux into Type B memory… but…

     Kernel MUST ask permission
     and may get told NO!

25   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:

                                                                            may say NO
                                                                            to kernel!



And the kernel wants to “preserve”
Tux into Type B memory.                                                     may commit to
Two choices…                                                                keeping the
1.DEFINITELY want Tux back                                                  page around…
(e.g. “dirty” page)


26   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:

                                                                            may say NO
                                                                            to kernel!


     And the kernel wants to “preserve”
     Tux into Type B memory.
     Two choices…                                                           may commit
     1.DEFINITELY want Tux back                                             to keeping the
     2.PROBABLY want Tux back                                               page around…
     (but OK if disappears, e.g. “clean” pages)                             or may not!
27   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:




  Two choices…
  1.DEFINITELY want Tux back
  2.PROBABLY want Tux back

                      tran-scend-ent, adj., … beyond the range of normal perception


28   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
We have a page that contains:




Two choices…
1.DEFINITELY want Tux back
“PERSISTENT PUT”
2.PROBABLY want Tux back
“EPHEMERAL PUT”
 eph-em-er-al, adj., … transitory, existing only briefly, short-            tran-scend-ent, adj., … beyond the
 lived (i.e. NOT persistent)                                                range of normal perception



29   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
“PUT”

                                                                            “GET”

                                                                    “FLUSH”
     Core Transcendent Memory Operations


30   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
“Normal” RAM
 addressing
 • byte-addressable
 • virtual address:
 @fffff8000102458
 0



31   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
“Normal” RAM
                                                                            Transcendent
addressing
                                                                            Memory
• byte-addressable                                                          • object-oriented addressing
• virtual address:                                                             • object is a page
                                                                            • handle addresses a page
@fffff80001024580                                                           • kernel can (mostly) choose
                                                                              handle when a page is put
                                                                               • uses same handle to get
                                                                               • must ensure handle is
                                                                                 and remains unique



32   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Why bother?




33   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Once we’re behind the
curtain, we can do
interesting things…




34   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #1




                                        virtual machines (aka “guests”)
                                         hypervisor (aka “host”)


        hypervisor                                                          Tmem support:       Tmem supported in
        RAM                                                                 • multiple guests   Xen since 4.0 (2009)

                                                                            • compression
                                                                            • deduplication                   future?


35   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #2



                                                                            compress
                                                                            on put

                                                                             decompress
                                                                             on get


                                                                               Zcache
                                                                               (2.6.39 staging driver)




36   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #3




     Transparently move pre-
     compressed pages cross a
     high-speed coherent
     interconnect

37   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #3




     RAMster
     Peer-to-peer transcendent
     memory


38   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #4




SSmem: Transcendent Memory as a
“safe” access layer for SSD or NVRAM
e.g. as a “RAM extension” not I/O device
 39   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Interesting thing #3




     …maybe only one large
     memory server shared
     by many machines?


40   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Cleancache
Merged in Linux 3.0


  A third-level victim cache for otherwise reclaimed clean page cache
     pages
          – Optionally load-balanced across multiple clients


  Cleancache patchset:
          – VFS hooks to put clean page cache pages, get them back, maintain
                 coherency
          – Per filesystem opt-in hooks
          – Shim to zcache in 2.6.39
          – Shim to Xen tmem in 3.0




41   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Frontswap
Merged in Linux 3.5


  Temporary emergency FAST swap page store
          – Optionally load-balanced across multiple clients


  Frontswap patchset:
          – Swap subsystem hooks to put and get swap cache pages
          – Maintain coherency
          – Manages tracking data structures (1 bit/page)
          – Partial swapoff
          – Shim to zcache in 2.6.39
          – Shim to Xen tmem merged in 3.1




42   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Kernel changes


  Frontends require core kernel changes
          – Cleancache
          – Frontswap


  Backends do NOT require core kernel chances
          – Zcache, RAMster, Xen tmem all implemented as drivers




43   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transcendent Memory in Linux
Multi-year merge effort


Xen                  non-                     name of patchset                   Linux
                     Xen                                                         version

N                    Y                        zcache/zcache2                     2.6.39/3.7   staging driver

Y                    Y                        cleancache                         3.0          Linus decided!


Y                    N                        Xen-tmem, selfballooning           3.1

Y                    ?                        frontswap-selfshrinking            3.1

Y                    Y                        Frontswap                          3.5          Linus decided!


?                    Y                        RAMster (merged w/zcache2)         3.4/3.7      staging driver


Y                    Y                        module support, frontswap unuse,   3.8?         under development
                                              frontswap admission improvements



44   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Transcendent Memory
Oracle Product Plans


  Transcendent Memory now in upstream Linux kernel
          – cleancache, frontswap
          – guest kernel support (aka Xen tmem)
          – zcache
          – RAMster
  Transcendent Memory support has been in the Xen hypervisor for
     over 2 years.
          – Available in Oracle VM 2.2 and 3.x
  Transcendent Memory in UEK2 for a year
          – cleancache, frontswap
          – guest kernel support (aka Xen tmem)
          – zcache2 coming soon



45   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Pretty Graphs! Facts! Figures!




46   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
frontswap patchset diffstat


Documentation/vm/frontswap.txt | 210 +++++++++++++++++++
include/linux/frontswap.h      | 126 ++++++++++++
include/linux/swap.h           |    4
include/linux/swapfile.h       |   13 +
mm/Kconfig                     |   17 ++
mm/Makefile                    |    1
mm/frontswap.c                 | 273 +++++++++++++++++++
mm/page_io.c                   |   12 +
mm/swapfile.c                  |   64 +++++--
9 files changed, 707 insertions(+), 13 deletions(-)

  Low core maintenance impact
          – ~100 lines
  No impact if CONFIG_FRONTSWAP=n
  Negligible impact if CONFIG_FRONTSWAP=y and no backend
  How much benefit per backend?

47   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
A benchmark


  Workload:
          – make --jN on linux-3.1 source (after make clean)
          – Fresh reboot before each run
          – All tests run as root in multi-user mode
  Software:
          – Linux 3.2
  Hardware:
          – Dell Optiplex 790 (~$500)
          – Intel Core i5-2400 @ 3.1Ghz (Quad Core/Hyperthreaded – 6M cache)
          – 1GB DDR3 RAM @ 1333Mhz (limited by memmap)
          – One 7200rpm SATA 6Gpbs drive with 8MB cache
          – 10GB swap partition
          – 1Gb ethernet

48   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Workload objective
Changing N varies memory pressure


  Small N (4-12)
          – No memory pressure
                        Page cache never fills to exceed RAM, no swapping
  Medium N (16-24)
          – Moderate memory pressure
                        Page cache fills so lots of reclaiming, but little to no swapping
  Large N (28-36)
          – High memory pressure
                        Much page cache churn, lots of swapping
  Largest N (40)
          – Extreme memory pressure
                        Little space for page cache churn, swap storm occurs



49   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Native/Baseline (no zcache registered)
                                                                                                                  did not
                                                                            Kernel compile “make –jN”            complete
                                                                                     (smaller is be er)          (18000+)




                                    8000


                                    4000
                     (elapsed me)
           seconds




                                    2000


                                    1000


                                     500
                                   4                                    8       12      16   20   24   28   32   36   40
                        no zcache 879                                  858     858     1009 1316 2164 3293 4286 6516 DNC


50   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Review: what is zcache?
Captures and compresses evicted clean page cache pages


  when clean pages are reclaimed (cleancache “put”)
          – zcache compresses/stores contents of evicted pages in RAM
          – zcache has “shrinker hook” for if kernel runs low
  when filesystem reads file pages (cleancache “get”)
          – zcache checks if it has a copy, if so decompresses/returns
          – else reads from filesystem/disk as normal
  One disk access saved for every successful “get”




51   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Review: what is zcache?
Captures and compresses swap pages (in RAM)


  when a page needs to be swapped out (frontswap “put”)
          – zcache compresses/stores contents of swap page in RAM
          – zcache enforces policies, may reject some (or all) pages
          – frontswap maintains a bit map for saved/rejected swap pages
  when a page needs to be swapped in (frontswap “get”)
          – if frontswap bit is set, zcache decompresses/returns
          – else read from swap disk as normal
  One disk write+read saved for every successful “get”




52   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Zcache vs native/baseline
                                                                            Kernel compile “make –jN”
                                                                                     (smaller is be er)



                              8000
               (elapsed me)




                              4000
     seconds




                              2000


                              1000
                                                                                                          Up to 26-31% faster
                               500
                             4                                       8          12      16   20   24   28   32   36    40
                  no zcache 879                                     858        858     1009 1316 2164 3293 4286 6516
                  zcache    877                                     856        856      922 1154 1714 2500 4282 6602 13755


53   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Benchmark analysis - zcache


  small N (4-12):
          –      no memory pressure
                    zcache has no effect, but apparently no measurable cost either
  medium N (16-20):
          –      moderate memory pressure
                    zcache increases total pages cached due to compression
                    performance improves 9%-14%
  large N (24-28)
          –      high memory pressure
                     zcache increases total pages cached due to compression
                     AND zcache uses RAM for compressed swap to avoid swap-to-disk
                     performance improves 26%-31%
  large N (32-36)
          –      very high memory pressure
                        compressed page cache gets reclaimed before use, no advantage
                        compressed in-RAM swap counteracted by smaller kernel page cache?
                        performance improves /loses 0%-(1%)
  largest N (40):
          –      extreme memory pressure
          –      in-RAM swap compression reduces worst case swapstorm




54   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Review: what is RAMster?
Locally compresses swap and clean page cache pages, but stores in
remote RAM

  Leverages zcache, adds cluster code using kernel sockets
  same as zcache but also “remotifies” compressed swap pages to
     another system’s RAM
          – One disk write+read saved for every successful swap “get” (at cost
                 of some network traffic)
          – One disk access saved for every successful page cache “get” (at
                 cost of some network traffic)
  Peer-to-peer or client-server (currently up to 8 nodes)
  RAM management is entirely dynamic




55   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Zcache and RAMster
                                                                            Kernel compile “make –jN”
                                                                                     (smaller is be er)



                              8000
               (elapsed me)




                              4000
     seconds




                              2000

                              1000

                               500
                             4                                       8          12      16   20   24   28   32   36    40
                  no zcache 879                                     858        858     1009 1316 2164 3293 4286 6516
                  zcache    877                                     856        856      922 1154 1714 2500 4282 6602 13755
                  ramster   887                                     866        875      949 1162 1788 2177 3599 5394 8172


56   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Workload Analysis - RAMster


  small N (4-12):
          –      no memory pressure
                    RAMster has no effect, but small cost
  medium N (16-20):
          –      moderate memory pressure
                    RAMster increases total pages cached due to compression
                    performance improves 6%-13%
                           somewhat slower than zcache
  large N (24-28)
          –      high memory pressure
                     RAMster increases total pages cached (local) due to compression
                     and RAMster uses remote RAM for to avoid swap-to-disk
                     performance improves 21%-51%
  large N (32-36)
          –      very high memory pressure
                    compressed page cache gets reclaimed before use, no advantage
                    but RAMster still uses remote (compressed) RAM to avoid swap-to-disk
                    performance improves 19%-22% (vs zcache and native)
  largest N (40):
          –      extreme memory pressure
                           use of remote RAM significantly reduces worst case swapstorm



57   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Questions?




58   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
59   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
60   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

More Related Content

Similar to Transcendent Memory: Not Just for Virtualization Anymore

Partner facing vspex deck[1]
Partner facing vspex deck[1]Partner facing vspex deck[1]
Partner facing vspex deck[1]Arrow ECS UK
 
Flash performance tuning (EN)
Flash performance tuning (EN)Flash performance tuning (EN)
Flash performance tuning (EN)Andy Hall
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
Managing IP Subsystems at the System Level
Managing IP Subsystems at the System LevelManaging IP Subsystems at the System Level
Managing IP Subsystems at the System LevelChipStart LLC
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
DB2 10 Universal Table Space - 2012-03-18 - no template
DB2 10 Universal Table Space - 2012-03-18 - no templateDB2 10 Universal Table Space - 2012-03-18 - no template
DB2 10 Universal Table Space - 2012-03-18 - no templateWillie Favero
 
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012Marc Dutoo
 
20121108 vmug london event nimble sorage for vdi
20121108 vmug london event nimble sorage for vdi20121108 vmug london event nimble sorage for vdi
20121108 vmug london event nimble sorage for vdisubtitle
 
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...John Head
 
NLJUG: Content Management, Standards, Opensource & JCP
NLJUG: Content Management, Standards, Opensource & JCPNLJUG: Content Management, Standards, Opensource & JCP
NLJUG: Content Management, Standards, Opensource & JCPDavid Nuescheler
 
Multi-physics with MotionSolve
Multi-physics with MotionSolveMulti-physics with MotionSolve
Multi-physics with MotionSolveAltair
 
Ria2010 workshop dev mobile
Ria2010 workshop dev mobileRia2010 workshop dev mobile
Ria2010 workshop dev mobileMichael Chaize
 
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...MongoDB
 
Journey to end user computing dallas vmug may 2013
Journey to end user computing   dallas vmug may 2013Journey to end user computing   dallas vmug may 2013
Journey to end user computing dallas vmug may 2013Tommy Trogden
 
Transform Your SAP Landscape Using EMC Technologies
Transform Your SAP Landscape Using EMC TechnologiesTransform Your SAP Landscape Using EMC Technologies
Transform Your SAP Landscape Using EMC TechnologiesCenk Ersoy
 
IP Expo 2012 Storage Lab Presentation - Nimble Storage
IP Expo 2012 Storage Lab Presentation - Nimble StorageIP Expo 2012 Storage Lab Presentation - Nimble Storage
IP Expo 2012 Storage Lab Presentation - Nimble Storageresponsedatacomms
 
Limewood Event - EMC
Limewood Event - EMC Limewood Event - EMC
Limewood Event - EMC BlueChipICT
 
S3 l6 db2 - memory model
S3 l6   db2 - memory modelS3 l6   db2 - memory model
S3 l6 db2 - memory modelMohammad Khan
 
Flash Storage Technology 101
Flash Storage Technology 101Flash Storage Technology 101
Flash Storage Technology 101Unitiv
 

Similar to Transcendent Memory: Not Just for Virtualization Anymore (20)

Partner facing vspex deck[1]
Partner facing vspex deck[1]Partner facing vspex deck[1]
Partner facing vspex deck[1]
 
Flash performance tuning (EN)
Flash performance tuning (EN)Flash performance tuning (EN)
Flash performance tuning (EN)
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
Managing IP Subsystems at the System Level
Managing IP Subsystems at the System LevelManaging IP Subsystems at the System Level
Managing IP Subsystems at the System Level
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
DB2 10 Universal Table Space - 2012-03-18 - no template
DB2 10 Universal Table Space - 2012-03-18 - no templateDB2 10 Universal Table Space - 2012-03-18 - no template
DB2 10 Universal Table Space - 2012-03-18 - no template
 
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012
From Eclipse to Document Management - Eclipse DemoCamp Grenoble 2012
 
20121108 vmug london event nimble sorage for vdi
20121108 vmug london event nimble sorage for vdi20121108 vmug london event nimble sorage for vdi
20121108 vmug london event nimble sorage for vdi
 
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...
Master Class: Integration in the world of Social Business (Lotusphere2012 JMP...
 
NLJUG: Content Management, Standards, Opensource & JCP
NLJUG: Content Management, Standards, Opensource & JCPNLJUG: Content Management, Standards, Opensource & JCP
NLJUG: Content Management, Standards, Opensource & JCP
 
Multi-physics with MotionSolve
Multi-physics with MotionSolveMulti-physics with MotionSolve
Multi-physics with MotionSolve
 
Ria2010 workshop dev mobile
Ria2010 workshop dev mobileRia2010 workshop dev mobile
Ria2010 workshop dev mobile
 
RSG: ECM market
RSG: ECM marketRSG: ECM market
RSG: ECM market
 
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...
Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...
 
Journey to end user computing dallas vmug may 2013
Journey to end user computing   dallas vmug may 2013Journey to end user computing   dallas vmug may 2013
Journey to end user computing dallas vmug may 2013
 
Transform Your SAP Landscape Using EMC Technologies
Transform Your SAP Landscape Using EMC TechnologiesTransform Your SAP Landscape Using EMC Technologies
Transform Your SAP Landscape Using EMC Technologies
 
IP Expo 2012 Storage Lab Presentation - Nimble Storage
IP Expo 2012 Storage Lab Presentation - Nimble StorageIP Expo 2012 Storage Lab Presentation - Nimble Storage
IP Expo 2012 Storage Lab Presentation - Nimble Storage
 
Limewood Event - EMC
Limewood Event - EMC Limewood Event - EMC
Limewood Event - EMC
 
S3 l6 db2 - memory model
S3 l6   db2 - memory modelS3 l6   db2 - memory model
S3 l6 db2 - memory model
 
Flash Storage Technology 101
Flash Storage Technology 101Flash Storage Technology 101
Flash Storage Technology 101
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Transcendent Memory: Not Just for Virtualization Anymore

  • 1. 1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 2. Transcendent Memory Avi Miller Principal Program Manager ORACLE PRODUCT LOGO
  • 3. Further reading: https://lwn.net/Articles/454795/ 3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 4. Objectives  Utilise RAM more effectively – Lower capital costs – Lower power utilisation – Less I/O  Better performance on many workloads – Negligible loss on others 4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 5. Motivation: Memory-inefficient workloads 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 6. More motivation: memory capacity wall 1000 # Core GB DRAM 100 10 1 2003 2004 2005 2006 2007 2008 2009 2010 2012 2013 2014 2015 2016 2017 Memory capacity per core drops ~30% every 2 years 2011 Source: Disaggregated Memory for Expansion and Sharing in Blade Server http://isca09.cs.columbia.edu/pres/24.pptx 6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 7. 7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 8. Slide from: Linux kernel support to exploit phase change memory, Linux Symposium 2010, Youngwoo Park, EE KAIST 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 9. Disaggregated memory DIMM DIMM DIMM DIMM DIMM CPUs CPUs DIMM DIMM DIMM Exofabric DIMM DIMM DIMM DIMM DIMM CPUs CPUs DIMM DIMM DIMM Leverage fast, shared Memory communication fabrics blade Source: Disaggregated Memory for Expansion and Sharing in Blade Server http://isca09.cs.columbia.edu/pres/24.pptx 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 10. OS memory “demand” OS Operating systems are memory hogs! Memory constraint 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 11. OS Physical Memory Management OS If you give an operating system more memory… New larger memory constraint 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 12. OS Physical Memory Management My name is Linux and I am a … it uses up memory hog any memory you give it! Memory constraint 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 13. OS Memory “Asceticism”  ASSUME – We should use as little RAM as possible  SUPPOSE – Mechanism to allow the OS to surrender RAM – Mechanism to allow the OS to obtain more RAM  THEN – How does an OS decide how much RAM it actually needs? as-cet-i-cism, n. 1. extreme self-denial and austerity; rigorous self-discipline and active restraint; renunciation of material comforts so as to achieve a higher state 13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 14. Impact on Linux Memory Subsystem 14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 15. 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 16. 16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 17. 17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 18. CAPACITY KNOWN Can read or write to any byte. 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 19. CAPACITY UNKOWN CAPACITY KNOWN and may change Can read or write to dynamically! any byte. 19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 20. • CAPACITY: known • USES: • kernel memory • user memory • DMA • ADDRESSABILITY: • Read/write any byte 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 21. • CAPACITY: known • USES: • CAPACITY -“unknowable” • kernel memory - dynamic SO… • user memory kernel/CPU can’t • DMA SO… address directly! • ADDRESSABILITY: Need “permission” to access and need • Read/write any byte to “follow rules” (even the kernel!) 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 22. • CAPACITY: known • USES: • THE RULES • kernel memory 1. “page”-at-a-time • user memory 2. to put data here, • DMA kernel MUST use a • ADDRESSABILITY: “put page call” • Read/write any byte 3. (more rules later) 22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 23. 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 24. We have a page that contains: And the kernel wants to “preserve” Tux in Type B memory. 24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 25. We have a page that contains: may say NO to kernel! And the kernel wants to “preserve” Tux into Type B memory… but… Kernel MUST ask permission and may get told NO! 25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 26. We have a page that contains: may say NO to kernel! And the kernel wants to “preserve” Tux into Type B memory. may commit to Two choices… keeping the 1.DEFINITELY want Tux back page around… (e.g. “dirty” page) 26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 27. We have a page that contains: may say NO to kernel! And the kernel wants to “preserve” Tux into Type B memory. Two choices… may commit 1.DEFINITELY want Tux back to keeping the 2.PROBABLY want Tux back page around… (but OK if disappears, e.g. “clean” pages) or may not! 27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 28. We have a page that contains: Two choices… 1.DEFINITELY want Tux back 2.PROBABLY want Tux back tran-scend-ent, adj., … beyond the range of normal perception 28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 29. We have a page that contains: Two choices… 1.DEFINITELY want Tux back “PERSISTENT PUT” 2.PROBABLY want Tux back “EPHEMERAL PUT” eph-em-er-al, adj., … transitory, existing only briefly, short- tran-scend-ent, adj., … beyond the lived (i.e. NOT persistent) range of normal perception 29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 30. “PUT” “GET” “FLUSH” Core Transcendent Memory Operations 30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 31. “Normal” RAM addressing • byte-addressable • virtual address: @fffff8000102458 0 31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 32. “Normal” RAM Transcendent addressing Memory • byte-addressable • object-oriented addressing • virtual address: • object is a page • handle addresses a page @fffff80001024580 • kernel can (mostly) choose handle when a page is put • uses same handle to get • must ensure handle is and remains unique 32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 33. Why bother? 33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 34. Once we’re behind the curtain, we can do interesting things… 34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 35. Interesting thing #1 virtual machines (aka “guests”) hypervisor (aka “host”) hypervisor Tmem support: Tmem supported in RAM • multiple guests Xen since 4.0 (2009) • compression • deduplication future? 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 36. Interesting thing #2 compress on put decompress on get Zcache (2.6.39 staging driver) 36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 37. Interesting thing #3 Transparently move pre- compressed pages cross a high-speed coherent interconnect 37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 38. Interesting thing #3 RAMster Peer-to-peer transcendent memory 38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 39. Interesting thing #4 SSmem: Transcendent Memory as a “safe” access layer for SSD or NVRAM e.g. as a “RAM extension” not I/O device 39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 40. Interesting thing #3 …maybe only one large memory server shared by many machines? 40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 41. Cleancache Merged in Linux 3.0  A third-level victim cache for otherwise reclaimed clean page cache pages – Optionally load-balanced across multiple clients  Cleancache patchset: – VFS hooks to put clean page cache pages, get them back, maintain coherency – Per filesystem opt-in hooks – Shim to zcache in 2.6.39 – Shim to Xen tmem in 3.0 41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 42. Frontswap Merged in Linux 3.5  Temporary emergency FAST swap page store – Optionally load-balanced across multiple clients  Frontswap patchset: – Swap subsystem hooks to put and get swap cache pages – Maintain coherency – Manages tracking data structures (1 bit/page) – Partial swapoff – Shim to zcache in 2.6.39 – Shim to Xen tmem merged in 3.1 42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 43. Kernel changes  Frontends require core kernel changes – Cleancache – Frontswap  Backends do NOT require core kernel chances – Zcache, RAMster, Xen tmem all implemented as drivers 43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 44. Transcendent Memory in Linux Multi-year merge effort Xen non- name of patchset Linux Xen version N Y zcache/zcache2 2.6.39/3.7 staging driver Y Y cleancache 3.0 Linus decided! Y N Xen-tmem, selfballooning 3.1 Y ? frontswap-selfshrinking 3.1 Y Y Frontswap 3.5 Linus decided! ? Y RAMster (merged w/zcache2) 3.4/3.7 staging driver Y Y module support, frontswap unuse, 3.8? under development frontswap admission improvements 44 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 45. Transcendent Memory Oracle Product Plans  Transcendent Memory now in upstream Linux kernel – cleancache, frontswap – guest kernel support (aka Xen tmem) – zcache – RAMster  Transcendent Memory support has been in the Xen hypervisor for over 2 years. – Available in Oracle VM 2.2 and 3.x  Transcendent Memory in UEK2 for a year – cleancache, frontswap – guest kernel support (aka Xen tmem) – zcache2 coming soon 45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 46. Pretty Graphs! Facts! Figures! 46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 47. frontswap patchset diffstat Documentation/vm/frontswap.txt | 210 +++++++++++++++++++ include/linux/frontswap.h | 126 ++++++++++++ include/linux/swap.h | 4 include/linux/swapfile.h | 13 + mm/Kconfig | 17 ++ mm/Makefile | 1 mm/frontswap.c | 273 +++++++++++++++++++ mm/page_io.c | 12 + mm/swapfile.c | 64 +++++-- 9 files changed, 707 insertions(+), 13 deletions(-)  Low core maintenance impact – ~100 lines  No impact if CONFIG_FRONTSWAP=n  Negligible impact if CONFIG_FRONTSWAP=y and no backend  How much benefit per backend? 47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 48. A benchmark  Workload: – make --jN on linux-3.1 source (after make clean) – Fresh reboot before each run – All tests run as root in multi-user mode  Software: – Linux 3.2  Hardware: – Dell Optiplex 790 (~$500) – Intel Core i5-2400 @ 3.1Ghz (Quad Core/Hyperthreaded – 6M cache) – 1GB DDR3 RAM @ 1333Mhz (limited by memmap) – One 7200rpm SATA 6Gpbs drive with 8MB cache – 10GB swap partition – 1Gb ethernet 48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 49. Workload objective Changing N varies memory pressure  Small N (4-12) – No memory pressure  Page cache never fills to exceed RAM, no swapping  Medium N (16-24) – Moderate memory pressure  Page cache fills so lots of reclaiming, but little to no swapping  Large N (28-36) – High memory pressure  Much page cache churn, lots of swapping  Largest N (40) – Extreme memory pressure  Little space for page cache churn, swap storm occurs 49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 50. Native/Baseline (no zcache registered) did not Kernel compile “make –jN” complete (smaller is be er) (18000+) 8000 4000 (elapsed me) seconds 2000 1000 500 4 8 12 16 20 24 28 32 36 40 no zcache 879 858 858 1009 1316 2164 3293 4286 6516 DNC 50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 51. Review: what is zcache? Captures and compresses evicted clean page cache pages  when clean pages are reclaimed (cleancache “put”) – zcache compresses/stores contents of evicted pages in RAM – zcache has “shrinker hook” for if kernel runs low  when filesystem reads file pages (cleancache “get”) – zcache checks if it has a copy, if so decompresses/returns – else reads from filesystem/disk as normal  One disk access saved for every successful “get” 51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 52. Review: what is zcache? Captures and compresses swap pages (in RAM)  when a page needs to be swapped out (frontswap “put”) – zcache compresses/stores contents of swap page in RAM – zcache enforces policies, may reject some (or all) pages – frontswap maintains a bit map for saved/rejected swap pages  when a page needs to be swapped in (frontswap “get”) – if frontswap bit is set, zcache decompresses/returns – else read from swap disk as normal  One disk write+read saved for every successful “get” 52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 53. Zcache vs native/baseline Kernel compile “make –jN” (smaller is be er) 8000 (elapsed me) 4000 seconds 2000 1000 Up to 26-31% faster 500 4 8 12 16 20 24 28 32 36 40 no zcache 879 858 858 1009 1316 2164 3293 4286 6516 zcache 877 856 856 922 1154 1714 2500 4282 6602 13755 53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 54. Benchmark analysis - zcache  small N (4-12): – no memory pressure  zcache has no effect, but apparently no measurable cost either  medium N (16-20): – moderate memory pressure  zcache increases total pages cached due to compression  performance improves 9%-14%  large N (24-28) – high memory pressure  zcache increases total pages cached due to compression  AND zcache uses RAM for compressed swap to avoid swap-to-disk  performance improves 26%-31%  large N (32-36) – very high memory pressure  compressed page cache gets reclaimed before use, no advantage  compressed in-RAM swap counteracted by smaller kernel page cache?  performance improves /loses 0%-(1%)  largest N (40): – extreme memory pressure – in-RAM swap compression reduces worst case swapstorm 54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 55. Review: what is RAMster? Locally compresses swap and clean page cache pages, but stores in remote RAM  Leverages zcache, adds cluster code using kernel sockets  same as zcache but also “remotifies” compressed swap pages to another system’s RAM – One disk write+read saved for every successful swap “get” (at cost of some network traffic) – One disk access saved for every successful page cache “get” (at cost of some network traffic)  Peer-to-peer or client-server (currently up to 8 nodes)  RAM management is entirely dynamic 55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 56. Zcache and RAMster Kernel compile “make –jN” (smaller is be er) 8000 (elapsed me) 4000 seconds 2000 1000 500 4 8 12 16 20 24 28 32 36 40 no zcache 879 858 858 1009 1316 2164 3293 4286 6516 zcache 877 856 856 922 1154 1714 2500 4282 6602 13755 ramster 887 866 875 949 1162 1788 2177 3599 5394 8172 56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 57. Workload Analysis - RAMster  small N (4-12): – no memory pressure  RAMster has no effect, but small cost  medium N (16-20): – moderate memory pressure  RAMster increases total pages cached due to compression  performance improves 6%-13%  somewhat slower than zcache  large N (24-28) – high memory pressure  RAMster increases total pages cached (local) due to compression  and RAMster uses remote RAM for to avoid swap-to-disk  performance improves 21%-51%  large N (32-36) – very high memory pressure  compressed page cache gets reclaimed before use, no advantage  but RAMster still uses remote (compressed) RAM to avoid swap-to-disk  performance improves 19%-22% (vs zcache and native)  largest N (40): – extreme memory pressure  use of remote RAM significantly reduces worst case swapstorm 57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 58. Questions? 58 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 59. 59 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 60. 60 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Editor's Notes

  1. Our agenda for today, I'm going to quickly review the motivation, the key problem, and identify the challenge of optimizing memory utilization in both a virtualized and non-virtualized environment. Transcendent memory (or you may hear me call it “tee-mem”) has a good number of different parts and jargon. If I bring up something that you didn’t hear me explain, or if I miss something that you’d like to hear about, feel free to speak up.
  2. If after the presentation, you’d like to hear more, I’d encourage you to read this article... just google for “Transcendent Memory in a Nutshell”.
  3. The overall objective of tmem is to utilize RAM more efficiently. There’s a number of possible benefits from that and we’ll talk about these a bit more.
  4. Many virtualization users have consolidated their data centers, but find that their CPUs are still spending a lot of time idle. Sometimes this is because the real bottleneck is that there isn’t enough RAM in their systems. One solution is to add more RAM to all of their systems but that can be very expensive and we’d like to first ensure that the memory we do have is being efficiently utilized, not wasted. But it's often not easy to recognize the symptoms of inefficiently utilized memory in a bare metal OS, and it's even harder in a virtualized system. Some of you may call this “memory overcommit” and transcendent memory is one way Oracle products may support “memory overcommit”.
  5. If that problem weren't challenging enough, we are starting to see the ratio of memory per core go down over time. This graph shows that, over time, we can expect that every two years, the average core which will be increasing in throughput will have 30% less memory attached to it.
  6. and with power consumption becoming more relevant to all of us, we see the percentage of energy in the data center that's used only for powering memory becoming larger.
  7. and we are starting to see new kinds of memory, kinda like RAM, but with some idiosyncrasies.
  8. and we are also starting to see new architectures with memory fitting in to a system differently than it has in the past. But in the context of this rapidly changing future memory environment, we carry forward with us a very old problem. (ALERT: PIG COMING!)
  9. and that is that OS’s are memory hogs. Why? Most OS’s were written many years ago when memory was a scarce and expensive resource and every bit of memory had to be put to what the OS thinks is a good use. So as a result,
  10. if you give an OS more memory
  11. it’s going to essentially grow fat and use up whatever memory you give it. So it's not very easy to tell if an OS needs more memory or not and similarly it’s not very easy to tell whether it's using the memory it does have efficiently or not. And in a virtualized environment, this creates a real challenge. So, as a first step, it sounds like we need to put those guest OS's on a diet. Which is something I call:
  12. memory asceticism. We assume that we'd like an OS not to use up every bit of memory available, but only what it needs. To do that, we need some kind of mechanism for an OS to donate memory to a bigger cause, and a way for an OS to get back some memory when it needs it. But how much memory does an OS "need"? We'll get back to that question in a few minutes, but first let's cover a little more background on one way this can be done.
  13. Assume you have a normal computer system with a certain amount of RAM.
  14. We're going to take that RAM and split it into two parts.
  15. And, for now, we're going to call the two parts Type A memory and Type B memory.
  16. To visually represent Type B memory we are going to place a curtain in front of it. This curtain can slide back and forth, meaning the amount of Type A memory -- memory not behind the curtain – may change when the curtain moves.
  17. Now you can see -- and measure -- how much Type A memory there is... you know its capacity, and you know how to enumerate the addresses so you know how to read and write to any byte in Type A memory.
  18. BUT although you knew how much total memory was in the system, and you know how much Type A memory there is, and although you surely know how to do a simple subtraction, I'd like you to NOT assume you know how much Type B memory there is.Assume the amount of Type B memory is completely unknowable. It might be zero, or it might be a gazillion bytes. You just don't know. And even if you could know how much Type B memory there is right this moment, it might change in the next moment. It's all very dynamic.
  19. Since you do know how much Type A memory there is, let's just call that normal memory, or RAM. The OS kernel can decide how to make use of it just like normal. Some of it is used for the kernel itself, some of it to run applications, some for device DMA, etc etc. And the OS kernel decides what every byte is used for, can access any byte directly, and it has complete control over that memory, meaning it can change its mind about how any byte of memory is used whenever it wants. So this is just normal RAM for a normal OS kernel, right?
  20. What about this Type B memory? Since you don't know how much there is, obviously you can NOT directly read and write to it using normal processor instructions. For example, if you want to write to byte number one-billion, how do you even know if there is a billion bytes?Instead, we are going to have an interface between the kernel and Type B memory where the kernel needs to ask "permission" and follow certain rules to read and write to Type B memory. Even the all-knowing, all-powerful kernel has to follow these rules. So what are those rules?
  21. First, Type B memory can only be read or written a page at a time. A page is usually 4K bytes, but we can be flexible and decide on another page size as long as we are consistently using the same page size. Next, when the kernel wants to write to Type B memory, the kernel must use a special interface that we will call a "put page" call.
  22. OK, so we have a page full of data in RAM and the kernel wants to see if it can "put" that page to Type B memory, behind the curtain. Let's call the data in that page ”Tux".
  23. OK, so we have a page full of data in RAM and the kernel wants to see if it can "put" that page to Type B memory, behind the curtain. Let's call the data in that page ”Tux".
  24. If the kernel wants to "put" a page full of data to Type B memory, it’s important to note that the kernel can be told NO. Kernels have big egos and don’t like it when they are told no, so we have to train them to be more well-mannered and gracious by using the defined “put page” call. Anyway, the kernel has two options.
  25. First option is pretty normal: The kernel says "Here's a page of data called Tux... Mr Type B memory, can you take Tux? BUT if you say yes, I KNOW I'm going to need to get Larry back later, so you'd better keep him around. You can do whatever you want with him, BUT if I ask for him back, you'd damn well better give him back to me. BUT, one exception, if I reboot, you can throw him away. So, can you take him?... and a reminder, that the kernel is asking permission and Type B memory may say no.
  26. Or the kernel can say: "Here's a page of data called Tux... Mr Type B memory, can you take Tux for me and squirrel him away someplace? I may ask for him back later, or I may not. And if you have room for him now, and then you need to throw him away later, that's fine too." So for this kind of "put", the kernel has to accept that there is some probability that it might get the page of data back if it asks for it, and some probability that the data might completely disappear... So in the first of the two choices, the probability that the kernel might get the data back is 100%, and in the second case, the probability is less than 100%. It may be a lot less than 100%, we just don't know, because it's all very dynamic.
  27. OK, although Larry can be very entertaining, let's take a step back for a moment and give these ideas some names. First, instead of "Type B memory", we are going to use the term: "Transcendent Memory", or "tmem" for short. The word “transcendent” means "beyond the senses" and, by definition, Type B is beyond the sensors of the kernel because, well,the kernel can't enumerate it and can't address it like it addresses normal memory, instead only a page of data at a time. And the kernel has to overcome its ego and ask for permission.
  28. The two types of "puts", we are going to call "persistent" and "ephemeral". The kind of "put" where we know we can definitely get Larry back, 100% of the time, we are going to call a "persistent put".And the kind of "put" where we don't care if we get Larry back, where the probability is less than 100%, we are going to call an "ephemeral put".
  29. And when we ask Transcendent Memory for that page of data back, we are going to call that operation a "get." And if the kernel knows it isn’t going to need that page of data anymore and wants to tell tmem to throw it away, we will call that a flush. How does the kernel identify the page of data that it wants to put, get, or flush?
  30. Well for normal RAM there is thing called a "physical address", using which you can access any byte of RAM. And the processor has a large fancy virtual address space that it can use. Can't do that with Transcendent Memory.
  31. For transcendent memory, for puts and gets, we need to provide a new kind of addressing, that we call a "handle", which is kind of an object-oriented name for a page. Within certain constraints, the OS kernel gets to decide what "handle" to use when "put"ing the page of data and then uses that same handle when it wants to "get" it.
  32. One example: Maybe the kernel is running as a guest, a virtual machine, and that "different place" is special memory owned and managed by the hypervisor? This is actually where the concept of Transcendent Memory began over four years ago and the host-side has been implemented in Xen for three years and the guest-side works today in Oracles’ Unbreakable Enterprise Kernel.
  33. So virtualization is one example. Another example, we could compress Tux. Since most data compresses by about a factor of two, that could potentially save a lot of RAM. This functionality is fully working today, is called "zcache" and has been merged in the upstream Linux kernel tree for about a year and a half. With zcache, Type A memory is normal addressable kernel memory and Type B memory consists entirely of compressed pages.
  34. Or... we could send Tux to a completely different place as long as we can get him back if and when we need to. Maybe that place is some underutilized RAM on a completely different machine.
  35. That's a feature called RAMster, which went into Linux earlier this year.Rather than a guest and a hypervisor, we view multiple physical machines in a cluster as peers and allow them to work together to dynamically load balance their memory demand. In this cluster, if one machine is overloaded and another is basically idle, the idle machine’s RAM can be used to store pages of data for the overloaded machine. Kind of a poor man’s virtualization. This actually works pretty well over any protocol that supports kernel sockets, even a 100Mbit Ethernet connection.
  36. Or maybe it's some solid state device that we are using not as an I/O device but as a RAM extension.As you may know, solid state devices, or SSD's, are getting very fast, almost as fast as RAM, but they have a number of idiosyncrasies that make it difficult for them to be used instead of RAM. It turns out that the rules of Transcendent Memory might be a good way to work around those idiosyncrasies. Or maybe can combine the last two ideas...
  37. maybe that "different place" is some solid state device on another machine, that serves as a shared RAM extension for any and all of a set of blades in a cabinet, depending on what blade at any given time is short on memory. These other ideas are in the early stages of exploration.
  38. So this may seem like a lot of cool stuff, but doesn’t it require massive changes to the kernel? The answer, fortunately, is NOThe concept of an ephemeral put is a really good match for something the kernel does all the time, namely to “evict” clean page cache pages. A fairly simple non-invasive patch to Linux called the cleancachepatchset allows the kernel to use transcendent memory for these types of pages.
  39. Similarly, something called “anonymous” pages represent the important data of running applications, and when the kernel is running short on memory, it starts swapping these anonymous pages. And swap pages happen to be a really good match for transcendent memory’s “persistent” pages. We called this the frontswappatchset and that too was very clean and non- invasive.
  40. Since cleancache and frontswap feed pages to transcendent memory, we call them “frontends”. It’s no coincidence that page cache pages and anonymous pages constitute the vast majority of pages the kernel manages in a running system so the frontends can pass a lot of pages to tmem.It’s also no coincidence that cleancache and frontswap interface cleanly to any of zcache, ramster, Xen, or future transcendent memory implementations, which we call tmem “backends”.