Teradata memory management - A balancing act

  • 2,281 views
Uploaded on

Teradata Memory Management -- A balancing Act

Teradata Memory Management -- A balancing Act

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,281
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • 1 November 2012 Copyright © Teradata Corporation
  • 1 November 2012 Copyright © Teradata Corporation
  • 1 November 2012 Copyright © Teradata Corporation The remaining memory is given back to UNIX to manage tasks and internal buffers for redistribution, duplication, hash join, etc.
  • 1 November 2012 Copyright © Teradata Corporation Memory managed by the O.S. is referred to as “free memory” (UNIX MP-RAS). Teradata vprocs AMP worker tasks Parsing Engine tasks including Dictionary cache and Request-to-Steps cache Messages - communication between vprocs Administrative and/or user programs such as: kernel resources and administrative program text and data Redistribution Buffers Aggregation Buffers Hash Join Join Buffers other applications (ex., FastLoad) message buffers (ex., TCP/IP)
  • 1 November 2012 Copyright © Teradata Corporation The amount of memory allocated to FSGcache is for file system-related operations, and does not include any other Teradata operations that are using memory outside of FSGcache. Memory managed by PDE is called FSG cache. FSG cache is primarily used by the AMPs to access memory resident database segments. When Teradata needs to read a database block, it checks FSG Cache first. FSG Cache is used by: AMPs Backup Activity FSG Cache is used for: Permanent Data blocks Full Cylinder read slots Cylinder Indices (CI) for Permanent Data blocks Spool Data blocks and CIs for spool Transient Journal data (TJs) Permanent Journal data (PJs) Sync Scan data blocks Buddy Backup data blocks Hash Join Spool Buffers
  • 1 November 2012 Copyright © Teradata Corporation Memory depletion and the negative performance impacts that accompany this condition are a common cause of performance degradation. In performing both performance trouble shooting as well as in the process of capacity planning, look at free memory availability, paging/swapping and memory allocation failures. examine resusage data regularly for memory availability, paging/swapping and memory allocation failures (MP-RAS) with the same level of scrutiny as CPU and I/O utilization.
  • 1 November 2012 Copyright © Teradata Corporation The low dips show instances when memory was low or completely depleted. The depletions happened during different hours of the day (not just daytime or nighttime), and all days of the week (not just weekdays or weekends) with no particular pattern. Given the frequency of this depletion, we could then look at correcting it by adjusting FSGcache to a lower setting, since all nodes are already at the maximum of 4GB. However, we first check to see if hash join is enabled in dbscontrol: any concurrent steps eligible for hash join can potentially use a high amount of OS memory. In this case, we would first turn off hash join (a dynamic change, does not require a restart) and monitor memory to see how much memory can be recovered.
  • 1 November 2012 Copyright © Teradata Corporation Measuring Memory Availability Besides the amount of free memory available, it is also important to know: How many and how frequently paging and swapping I/Os occur Paging and swapping are necessary when minumim memory available drops below 40 MB, to avoid panicking or hanging the node. How many and how frequently memory allocations failures occur High memory allocation failures can occur even when memory is fully depleted in the case of low/no BNS pages in the page pool, or high numbers of concurrent requests for row redistribution buffers Which nodes or group of nodes are showing this pattern If a single node, what’s different about the configuration (e.g., more Parsing Engines configured on this node, non-Teradata processes running on it, more AMPs on these nodes, less physical memory on these nodes, etc., etc. If there are a high number (many thousands or more) of memory alloc failures but MIN FREE MEM has not reached 0 during the logging period, then the memory allocation failures are coming from some subset of memory being used in the OS or in the kernel itself
  • 1 November 2012 Copyright © Teradata Corporation However, the most common cause of sudden OS memory depletion can be seen on large systems during row redistribution. The amount of OS memory needed for row redistribution is 32K * the number of nodes per AMP. The pattern of utilization associated with row redistribution can be seen as Point to Points and duplications can be seen as Broadcasts in resusage
  • 1 November 2012 Copyright © Teradata Corporation As tuning memory allocation generally involves adjusting the FSGcache percent, all memory used inside FSGcache governed by a thresh hold (e.g., datablock caching) will also be affected. This is why it is important to measure cache effectiveness rates before and after making a change. Before tuning memory, it is important to see trends over time. Using the Higa macro ResPMA to take a 30-day sample, the following Free Mem chart shows the amount of memory that was available between node groups.
  • 1 November 2012 Copyright © Teradata Corporation When enabled, the DBSCacheThr parameter only affects sequential processing such as full-file scans or creations of spool or sort work tables.
  • 1 November 2012 Copyright © Teradata Corporation When the DBS Cache Threshold value is high, larger tables become candidates for caching. Larger tables may have the same probability of being accessed but, because we only cache individual data blocks the likelihood of caching a large table in it’s entirety is low. As a consequence the probability of a cache hit is reduced as the size of the table eligible for caching increases. The larger table data blocks will be cached in memory and eventually aged out, often with a very low probability of being accessed again while in memory. As a consequence, we would want to set the DBS Cache Threshold to support a file size equal to the smallest, most frequently used tables. The default size of 10% is of the FSGcache memory, which is also a percentage of total memory taken after the system boots up.
  • 1 November 2012 Copyright © Teradata Corporation Hash Join, is an alternative join scheme that performs better than some cases of merge join and product join. Hash Join builds an in-memory Hash Table using the smaller of the two join relations. Hash Join instead of a Merge Join Saves the sorting of the left and right tables into row hash order. Merge Joins require both left and right tables to be sorted Hash Join instead of an Equality Product Join Saves comparing every row in the right table with every row in the left table. Right table rows are compared to a single hash code in the indexed array. The memory used when hash join is enabled is from the available OS memory. Do not turn this feature on if: The ResUsage data for the target system shows a trend of low memory conditions already present. Available free memory appears to be adequate but page swaps are frequent.
  • 1 November 2012 Copyright © Teradata Corporation HTMemAllocBase = 10 MB 10 MB of memory comes from UNIX Free Memory. Internal Tunable parameter. Generally not recommended for change except for very large memory systems. HTMemAlloc = 2% (Default) 2% applied to HTMemAllocBase. Result when applied yields a Hash Buffer size of 200 KB. The default of 2% is chosen based on having 50 concurrent hash joins taking up a maximum 100% of the HTMemAllocBase size. SkewAllowance = 75% (Default) 75% is applied to the Hash Buffer size as an allowance for skewing of the left spool table used for the Hash Join. Remaining size after applying the SkewAllowance (25% of Hash Buffer size) is the Hash Partition size (25% of 200 KB = 50 KB).

Transcript

  • 1. Memory Management – A Balancing ActShaheryar IqbalBest viewed in Microsoft PowerPoint
  • 2. AgendaNode Memory Between OS and FSG Cache Memory Partitioning FSG Cache Percent Free Memory OS Managed Memory FSG Cache. PDE Managed Memory Free memory availability Monitoring memory Memory allocation failures Paging/Swapping. Adjusting FSG Cache Cache Effectiveness Rate DBS Cache Threshold Hash Join - HtMemAlloc Redistribution Buffer 2 > 1 November 2012
  • 3. Node Memory 4GB maximum is a limitationMemory on hand of the current 32 bit OS. 64 bit Windows and LINUX systems are shipping with 6GB (54XX) or 8GB (5500H/C). # rallsh –sv memsize –k How much mem on each Node Node Memory is partitioned into •OS Managed Memory and •FSG Cache. 3 > 1 November 2012
  • 4. Memory Partitioning between OS and FSG CacheFree Memory O.S. Overhead 100 MB FSG (File Segment Cache) 13 Vprocs @ 40 MB each Remaining space 3476 MB available for FSG 1 PDE 2 PE 10 AMPs 13 Vprocs @ 40 MB = 520 MB `Total Free Memory 40 MB 40 MB 40 MB 40 MB 40 MB 40 MB 40 MB = 100 + 520 = 620 MB PE-1 PE-2 AMP-1 AMP-2 AMP-3 AMP-4 ….. AMP-10 Vproc Vproc Vproc Vproc Vproc Vproc VprocFSG CacheTotal Memory 4096 MB – 100 MB PDE (BaseVproc) – 40 MB – 520 MBFSG Cache 3476 MB Operating System – 100 MB Ex. 4096 MB – 4 GB Memory 4 > 1 November 2012
  • 5. What’s FSG CACHE PERCENTFSG Cache An xctl DBS Control Performance field Percent Reduces the amount of memory to be used for FSG Cache. The remaining memory is given back to UNIX 5 > 1 November 2012
  • 6. FSG Cache Percent (Contd)Free Memory• O.S. 100 MB FSG Cache FSG (File Segment Cache)• 13 Vprocs @ 40 MB = 4096 – ( (10+2+1) * 40MB = 520 MB) 80% of remaining space – 2780 MB available for FSG• 20% of–remaining space 100 MB = 3746 MB returned = 696 MB example of Let’s take an Adjusted FSG Size = 80 FSG Cache PercentTotal Free i.e. the 3476 MB Memory = 80% x 20 % space will = 100 + 520 + 696 back to OS. = be returned 2780 MB = 1316 MB 20% of remaining space – 696 MB returned back to O.S/Free Mem ` Added to UNIX = 20% x 3746 MBFSG Cache = 696 MB 40 MB 40 MB 40 MB 40 MB 40 MB 40 MB 40 MBTotal Memory 4096 MB – 696 MB PE-1 PE-2 AMP-1 AMP-2 AMP-3 AMP-4 ….. AMP-10 – 100 MB Vproc Vproc Vproc Vproc Vproc Vproc Vproc – 520 MBFSG Cache = 2780 MB PDE (BaseVproc) – 40 MB Operating System – 100 MB Ex. 4096 MB – 4 GB Memory 6 > 1 November 2012
  • 7. What’s in Free Memory Memory managed by the O.S.Free Memory TERADATA VPROCS ADMINISTRATIVE PROGRAMS: AMP worker tasks, PE tasks – Dictionary cache Request-to-Steps cache, Vprocs Communication kernel resources, program text and data, Redistribution Buffers, Aggregation Buffers, Hash Join Join Buffers, other applications. 7 > 1 November 2012
  • 8. What’s in FSGcache FSG Cache Memory managed by the PDE. FSG Cache is used by: FSG Cache is used for: •AMPs •Backup Activity Used to access database segments. Permanent Data blocks Full Cylinder read slots (CI) for Permanent Data blocks Spool CIs & Data blocks Transient Journal data (TJs) Permanent Journal data (PJs) Sync Scan data blocks Buddy Backup data blocks Hash Join Spool Buffers 8 > 1 November 2012
  • 9. Physical memory on a Teradata node Memory managed by Memory managed by O.S. PDE 9 > 1 November 2012
  • 10. Memory Monitoring Memory WHY Monitoring WHAT HOW Memory Depletions: Common cause of Performance degradation MONITOR: Free memory availability, Examine Resusage Data: Memory allocation failures, With the same level of scrutiny as CPU and I/O utilization. Paging/Swapping. Cache Effectiveness Rate 10 > 1 November 2012
  • 11. Memory Availability The low happened dips show Check ifDepletionswhen memory hash join is frequency of Given the enabled; instances during differentahours of this depletion, it could can potentially usecompletely was low or high amount of day,memory. of the the OS all days First be corrected by depleted. week with noFSGcache adjusting particular turn off hash join, see how to pattern.value lower much memory can be recovered. 11 > 1 November 2012
  • 12. Memory Allocations FailuresTwo cases when MemAlloc Failures occurs. How many and how Case1: Free Mem memory frequently not available. failures allocations occur Which nodes or Case2: group of nodes are Free Mem Available. showing this pattern 12 > 1 November 2012
  • 13. Memory Allocations Failures (Case1) Mem Alloc Fails occurs when FREE MEM reaches to zero. FREE MEM zero at 14:10 Memory Allocation Failures at 14:10 13 > 1 November 2012
  • 14. Memory Allocations Failures (Case 2) Check peaks in redistributed blocks and correlate to Memory allocation failures timeframes when large occurring but MIN FREE MEM Common cause of numbers of mem alloc failures MEM has not reached Concurrent row ALLOC fails is at 0 occurred. during redistributions. log interval. FREE MEM never Mem Alloc Fails reaches at zero. at Redistribution Peak 22:10 at 23:30 Mem Alloc Fails at 23:30 Then memory allocation failures are coming from some subset of memory being used in the OS. Redistribution Peak at 22:10 14 > 1 November 2012
  • 15. Cache Effectiveness RateCache Effectiveness Cache utilization is a Rate temporal cache hit rate A 70% (time-based) is Check Cache Hit ratio Logical Reads – Physical Reads 70% process. same not the of data, running level of Logical Reads with the same all day scrutiny as CPU and long. It is also spatial. Cache Hit Ratio less than I/O utilization. 40% for a particular workload 15 > 1 November 2012
  • 16. Cache FSG Percent Tuning Tuning memory generally All memory used inside involves adjusting the FSGcache governed by a FSG cache percent. thresh hold (e.g., datablock caching) will also be affected Adjusting Cache FSG Percent This is why it is important Before tuning memory, it is to measure cache important to see trends over effectiveness rates before time e.g. take 30 days and after making a sample change. 16 > 1 November 2012
  • 17. Summary of Steps in Cache FSG Percent tuning Look for Low mem1: Free Memory conditions, total depletion High number of I/Os, 2: Occurrence of high memory allocation paging/swap I/Os failure 3: Check for use If OS memory is low/depleted of memory intensive often, turn off hash join first features as it does not require a Target effectiveness restart. rates should be above 4: Check cache hits 50% (higher the better) 5: Check current Default is 80% FSGcache setting OnIfcoexistent withrate went effectiveness frequent aftermem down low reducing 6: Lower FSGcache conditions,it maycache size fsgcache, the be in 5-10% increments necessaryincreased byto can be to go down 7: Recheck dbs 65% at 4GB. tuning DBSCACHETHR cache hits upward in dbscontrol. 8: Repeat stepsSudden onset of 5-7 as neededmemorydepletion. 17 > 1 November 2012
  • 18. DBSCacheThr - Understanding DBS Cache FSG cache is the portion DBS Cache of main memory assumed available for table data. DBSCacheThr Case 1: SubTable/Table fits entirely of the defines a percentage DBScontrol tunable FSG cache and helps control parameter. when data blocks are Only permanent and cached. Case 2: SubTable/Table DBs can be spool cannot fit affected by this parameter. When a subtable fits The table is not eligible entirely in this for synchronized full-file percentage of memory, scan (sync scan). its data blocks are cached and aged out of fit When the table cannot memory normally. in this percentage of If involved in full-file memory, its data blocks scan, the table is also are discarded from eligible for sync scan. memory as soon as possible. 18 > 1 November 2012
  • 19. DBSCacheThr - Understanding DBS Cache DBS Cache Data in cache memory is aged based on residency and usage. DBSCacheThr purpose We want to hold all frequently accessed reference tables in memory. Range and Default Values DBSCacheThr caching technique can prevent a large sequentially read or encourages smaller (e.g. written table from pushing Since the large table ’ s data reference) tables to stay in other data out of the blocks probably won ’ t be memory longer. cache accessed again until they have •Default value is 10 percent. aged out of memory, •caching Range of values is 0 - 100 percent. them does little good and may if the size of FSGcache cause other, more heavily- changes, the size of DBS accessed blocks to age out cache will change with it. prematurely. 19 > 1 November 2012
  • 20. HTMemAlloc -Understanding Hash Join What is it Hash Join Hash Join vs Merge Join Hash Join vs Product Join Alternative join scheme Memory Useage that performs better than some cases of merge Do not turn this feature on if Hash join eliminates the and product join. sorting, and possible join HJ builds an in-memory redistribution or copying, Hash Table using the of the larger table.The memory used when MJ require both left and smaller of the two join hash join is enabled is right tables to be sorted relations. from the available OS Saves comparing every memory row in the right table with every row in the left table. Right table rows are The ResUsage data for Available free memory compared to a single hash the target system shows appears to be adequate code in the indexed array. a trend of low memory but page swaps are conditions already frequent present 20 > 1 November 2012
  • 21. Hash Join Parameters HTMemAllocBase = 10 MB HTMemAlloc = 2% (Default) The default of 2% is chosen based on having 50 concurrent hash joins taking up a maximum 100% of the HTMemAllocBase Take an example; a system 10 MB of memory comes size. can have at max 25 concurrent from UNIX FreekB * 50 = 10 MB 200 Memory users (e.g Throttling Applied) 2% applied to Set HTMemAlloc = 4% HTMemAllocBase Result when applied yields 4% of 10MB = 400 KB a Hash Buffer size of 200 400Kb * 25 Users = 10 MB KB. 2% of 10MB = 200KBPartition size New Hash = 400KB (old 200KB) More opportunity for queries of having HJ instead of MJ or PJ. Result :Faster executions 21 > 1 November 2012
  • 22. Redistribution Buffer – Memory Requirement Memory Requirement For Node Level Row Redistribution Each AMP uses separate redistribution buffer for each node in the system Default Redistribution buffer size = 32KB per target Node Total Memory for 1 Sending AMP = 32KB * number of Nodes in system 22 > 1 November 2012
  • 23. Redistribution Buffer - Examples Redistribution Configuration - 8 Nodes, 8 AMPs per node Buffer Example 1 Single Node requirement - single user = 32KB * 8 = 256 KB / Redistribution Multi-user - 20 concurrent users = 20 * 256 KB = 5 MB (not a special problem) Redistribution Configuration - 200 Nodes, 8 AMPs per node Buffer Example 2 Single Node requirement - single user = 32KB * 200 = 6400 KB (6 MB) Multi-user - 20 users = 20 * 6400 KB = 128 MB (far exceeding 80-100 MB per AMP) 23 > 1 November 2012
  • 24. Redistribution Buffer - Recommendations Symptoms: Excessive Paging/Swapping RECOMMENDATIONS For Large Configurations: Set Node Level Redistribution buffer size smaller , e.g. 16 KB (Default is 32 KB) Set FSG Cache Percent to less than 80% 24 > 1 November 2012
  • 25. Feed backShaheryar-This is an excellent presentation. 25 > 1 November 2012
  • 26. QuestionsThe only bad question is the question never asked 26 > 1 November 2012