Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

2,054 views

Published on

Regularly Presented at the IQSS Workshop Series, Harvard University

Published in:
Technology

License: CC Attribution-ShareAlike License

No Downloads

Total views

2,054

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

30

Comments

0

Likes

2

No embeds

No notes for slide

- 1. High Performance StatisticalComputing with Applications in theSocial Sciences Micah Altman Senior Research Scientist “introduction to the RCE” by, Earl Robert Kinney Manager, Research Computing Environment Institute for Quantitative Social Science Harvard University
- 2. Goals for todayAnalysis Describe performance goals Identify resource use patterns Identify resource bottlenecks Identify performance hot-spots Select problem decompositionApplication Connect to RCE Use the RCE to analyze larger [Source: Wikimedia Commons] data sets Use the RCE to run interactive analyses more quickly Use the RCE to run large numbers of analyses independentlyM. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 25
- 3. Organization of this Workshop Motivation Principles Introduction to RCE Measuring Resource Use Scaling Up Tuning Up Scaling Out (Parallelization) Additional ResourcesM. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 3
- 4. Nine Steps to Faster Results 1. Predict your resource needs through benchmarks, models, algorithmic analysis 2. Select alternate algorithms when resource needs grow very rapidly with problem size 3. Identify resource bottlenecks using systems performance analysis tools 4. Address bottlenecks by increasing resources and/or changing program resource management 5. Discover hot-spots in programs using profiling tools 6. Adapt hot-spots to system architecture 7. Decompose the problem into independent subproblems 8. Distribute subproblems across pools of resources 9. Repeat analysis after making any changesM. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 4
- 5. FREE! With every first class! Coffee! Chocolate!! Consulting!!! Time off for good behavior !!!!M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 5 of 85
- 6. IQSS (and affiliates) offer you support across all stages of your quantitative research: Research design, including: design of surveys, selection of statistical methods. Primary and secondary data collection, including: the collection of geospatial and survey data. Data management, including: storage, cataloging, permanent archiving, and distribution. Data analysis, including : survey consulting, statistical software training, GIS consulting, high performance research computing. http://iq.harvard.edu/M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 6
- 7. The IQSS grants administration team helps with every aspect of the grant process. Contact us when you are planning your proposal. Assisting in identifying research funding opportunities Consulting on writing proposals Assisting IQSS affiliates with: preparation, review and submission of all grant applications (“pre-award support”) management of their sponsored research portfolio (“post-award support”) Interpret sponsor policies Coordinate with FAS Research Administration and the Central Office for Sponsored Programs… And, of course, support seminars like this!M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 7
- 8. “One‟s Reach should exceed One‟s Grasp” Leading edge statistical methods (such asHigh Performance Statistical Computing: Principles MCMC) can require lots of computing power Ensuring robust results can multiply (and re- multiply) the number of analyses done: Sensitivity analysis Parameterization studies Alternative models, Bayesian model averaging Performance benchmarks provides information for budgeting computing $$$ M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 8
- 9. “I Want it Now!”High Performance Statistical Computing: Principles Deadlines abound: conferences, trials, publication dates New observations, variables, corrections, or model specifications may necessitate speedy reanalysis M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 9
- 10. "My strength is as the strength of ten because my heart is pure."High Performance Statistical Computing: Principles Selection of algorithms can change the nature of the computational resource usage Tuning for a particular system can increase performance approximately ten-fold In some circumstances work can be split across thousands of systems. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 10
- 11. PrinciplesHigh Performance Statistical Computing: Principles Goals matter Problems matter Algorithms matter Answers matter Architecture matters M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 11
- 12. Types of Performance Goals Task completion time – wait time to finishHigh Performance Statistical Computing: Principles Efficiency – resource use for task Throughput –work done by system overall Latency – delay before response Responsiveness – perception of response Reliability – probability task/system will fail during time interval “If you don‟t know where you‟re going, any road will take you.” – Proverb “If you come to a fork in the road, take it.” – Yogi Berra M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 12
- 13. Performance Goals – Rules of Thumb Rules of thumb Completion time: work (i)/resource(i)High Performance Statistical Computing: Principles Throughput: Users of interactive maximize software want (work/resource) responsiveness for all jobs Latency: time elapsed before first response to input Users of batch jobs want Real-time: complete task small completion times within fixed interval “Responsiveness” Systems administrators Perceived latency want maximum Task completion time throughput, reliability Task progress indicators M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 13
- 14. Size of Factors Affecting Performance Run Time For Large InstanceHigh Performance Statistical Computing: Principles (n=1000) If runtime for solve small NP-Hard (worst case) 10^292 years instance of a problem Very Inefficient 1.6 years (n=10), running on single Algorithm O(N^3) system is one minute, how Inefficient Algorithm 16 hours long will it take to solve O(N^2) larger instance of n=1000? Very Poor Memory 11 hours Access Patterns Un-optimized Code 67 minutes Optimized code 7 minutes Local Multiprocessing 2 minutes Fully Parallel/Full 4 seconds Cluster M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 14
- 15. Problem Complexity ClassesHigh Performance Statistical Computing: Principles Decision Problem Problem complexity class: set of problems that can be Decidable Undecidable solved in O(f(n)) for some f More general than EXPSPACE algorithmic complexity – EXPTIME encompasses all possible algorithms to solve the PSPACE given problem CO-NP NP Polynomial time algorithm BQP necessary for large problem NP-complete instances P = BPP(?) M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 15
- 16. Some Problems Are HARD Traveling Salesperson Problem (weighted Hamiltonian cycle): Plot a route through NHigh Performance Statistical Computing: Principles locations, visiting each once, that minimizes cost. NP-Hard: worst-case instances require exponential time for optimal, certain, solution NP-Complete: Equivalent to a large class of hard problems Source: Applegate, Bixby, Chvátal, and Cook (1998) M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 16
- 17. How to “Solve” the Unsolvable Think small: Use only a small number of cities. Aggregate to regions and treat as quasi-cities.High Performance Statistical Computing: Principles Restrict Problem – Euclidean distances are easier than travel cost. Solve a different problem: minimum spanning tree Approximate solution: for Euclidean distances, there is an algorithm based on minimum spanning tree that is at most 50% longer Randomize: can a randomized algorithm find solution with probability p? (No one knows…, probably not) Be Lucky: maybe “average” problem isn‟t that hard? Heuristics: Apply Simulated Annealing (etc.), cross fingers M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 17
- 18. How to recognize hard problems… Is the problem routinely solved by existingHigh Performance Statistical Computing: Principles systems? Are efficient algorithms known? Does it appear in lists of hard problems? Is the problem universal? (Any computing problem, sufficiently generalized is hard [Papadimitriou 1994]) Is run time growing exponentially in practice? M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 18
- 19. Algorithmic Complexity Measures the complexity of a particularHigh Performance Statistical Computing: Principles solution to a problem Resource complexity: a measure of the resources used to solve a problem, as a function of input size Common resource measures: Time, usually represented as number of operations executed Space, usually represented as number of discrete scalar values stored M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 19
- 20. Algorithmic Complexity: Search bubbleSort(list) Number of Operations while (not finished)High Performance Statistical Computing: Principles { finished <- true for i in (1 to length(list)-1) 2 { if (list[i]>list[i+1]) On { swap(list[i],list[i+1]) finished<- false } } } quicksort(list) if (length(list)=1) return select from (list) for x in (list) { if x=pivot, add x to pivotList if x>pivot, add x to greaterList if x>pivot, add x to lessList O n nlog } return(quicksort( lessList + pivotlist + greaterlist )) *illustrations courtesy of wikipedia M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 20
- 21. Search Complexity Continued Tally sort:High Performance Statistical Computing: Principles Items in a fixed range, no duplicates Inlist = logical(length=max-min) For (i = 1:length(items)) {inlist[items[i]]=TRUE)} For (I in mix:max) if (inlist[i]) dowork(i) How fast is this? Algorithm Recurse_sort(array L, i = 0, j = length(L)-1) if L[j] < L[i] then L[i] ↔ L[j] if j - i > 1 then t = (j - i + 1)/3 Recurse_sort (L, i , j-t) Recurse_sort (L, i+t, j ) Recurse_sort (L, i , j-t) return L M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 21
- 22. Answers MatterHigh Performance Statistical Computing: Principles Before optimization, verify the answer Right can mean “right enough” if well-defined Correct code may have different performance characteristics than incorrect code Returning wrong answer can always be done quickly M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 22
- 23. Simple VN ArchitectureHigh Performance Statistical Computing: Principles Processor Memory Input/Out put M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 23
- 24. More Modern Disk DiskHigh Performance Statistical Computing: Principles RAID Disk Processor Controller Processor Processor Core Core Memory FPU FPU L2 L1 L1 Network Card GPU M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 24
- 25. Inside the CoreHigh Performance Statistical Computing: Principles Disk Disk RAID Disk Processor Contro Processor ller Processor C or e C or e F Memo L2 F P U P U ry L L Networ 1 1 k Card GPU © Intel M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 25
- 26. Deep Inside the CoreHigh Performance Statistical Computing: Principles M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 26
- 27. Resource Hierarchy: Big, Fast, Cheap*High Performance Statistical Computing: Principles Registers (<1KB) • Big, Fast, Cheap – Pick 2 Cache (1 MB) • Latency increases with each step down • Storage increases Ram (10 Gigabytes) • Throughput decreases (except, with some offline storage) Local Storage (10‟s Terabytes) ONLINE Storage (100‟s Petabytes) OFFLINE STORAGE ( 10‟s Exabytes ) M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 27
- 28. Reading One Byte: x-= m[1,3]High Performance Statistical Computing: Principles CPU: 8 bytes: Load Register Cache: 256 Bytes <- Cache Line RAM: 4K <- Page Disk <- 8K from NFS Networked File System M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 28
- 29. General Performance Implications of ArchitectureHigh Performance Statistical Computing: Principles Talking to external devices can cause waits… (latency) Information transmitted to CPU is limited by bus (throughput) In practice, expect 80% of theoretical data-path bandwidth at best Some optimizations are highly specific to architectural details Hidden parallelism at low levels Information travels in chunks (at least bus size) Complexity makes theoretical performance analysis difficult – use benchmarks M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 29
- 30. From Principles to PracticeHigh Performance Statistical Computing: Principles Practice = Principles * Optimization Goals * Problem Type * Computing Environment Optimization Goals Throughput Latency Reliability Scaling up Scaling out Problem Decomposition Independent data Independent calculations Coupled calculations M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 30
- 31. Principled Preparation ChecklistHigh Performance Statistical Computing: Principles Verify that your problem is tractable: Substitute an easier problem Restrict or limit the problem Be lucky or clever Establish performance goals Identify possible algorithms What is their resource complexity? Are better algorithms known? Identify potential system characteristics Communications costs Systems resources M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 31
- 32. Lab 0: Problem definition Define your computingHigh Performance Statistical Computing: Principles problem as formally as you can? What algorithms are you used to solve the problem? What are your performance goals? [Source: http://andreymath.wikidot.com/ . Creative Commons Sharealike Licensnce] M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 32
- 33. An Introduction to the IQSS RCEHigh Performance Statistical Computing : What is it? Introduction to RCE Why use it? How does it work? How do we use it? M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 33
- 34. What is the RCE?High Performance Statistical Computing : Introduction to RCE •Full virtual desktop •For large interactive jobs. •Run hundreds of jobs at Virtual Desktop Interactive Nodes Batch Processing environment – connect •Large amounts of memory once. anywhere available on demand •Optimized for non- •Many research software •Stata, Matlab, Mathmatica interactive, independent packages available work •Persistent session – •Easy to run from your connect anytime virtual desktop M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 34
- 35. Why use the RCE? For ResearchHigh Performance Statistical Computing : An environment customized for quantitative social science research A wide variety of research software packages are available Fore Convenience Introduction to RCE The RCE enables you to access a research desktop from almost any computer Sessions are persistent -- disconnect from your office, reconnect from home File storage is central. Never worry about which computer has your files For Resources Large analysis jobs are offloaded to high-powered Large resource pools : 800 processors , 3.3 TB of memory , 40 TB of disk storage Regularly updated software For collaboration Offers an ideal environment for collaborative research projects Share project files, desktops, software For reliability System performance and availability is constantly monitored Research files are regularly backed up and stored securely IQSS has full time staff dedicated to the support the RCE M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 35
- 36. RCE ArchitectureHigh Performance Statistical Computing : Virtual Batch Nodes Desktop Client Sessions Introduction to RCE Login Nodes Interactive Nodes Disk Disk Disk Disk M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 36
- 37. RCE Architecture Rules of Thumb Connect to interactive poolHigh Performance Statistical Computing : Small problems – run directly Introduction to RCE (on an interactive node) Large-memory problems use interactive nodes Interactive problems use interactive nodes Large-compute jobs use Batch submit -- but problem must be decomposed M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 37
- 38. RCE Powered Apps – How it Works User clicks on application User receives notice, offeredHigh Performance Statistical Computing : from menu batch node to run their job. No node is available User hits “yes” Introduction to RCE RCE checks for availability of RCE submits special Interactive nodes condor job to batch master node. A node is available ~120 s RCE submits special condor Window appears on RCE job to interactive ~30 desktop and application master node. s runs on node. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 38
- 39. RCE DesktopHigh Performance Statistical Computing : Application Menu – Application launching Introduction to RCE Quick Launch – Quick Access to E-mail, Web, and Office applications. File Browser – Graphical view of your home directory and files. HMDC Outage Notifier – Updates to reflect status of environment. Desktop Shortcuts – Contains Shortcuts to home directory and trash Status Bar – Shows open applications. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 39
- 40. Login NodesHigh Performance Statistical Computing : Number of servers: 8 Introduction to RCE Number of processors: 32 RAM per session: ~6 GB M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 40
- 41. Apps On Interactive NodesHigh Performance Statistical Computing : Features Introduction to RCE Easiest way to launch applications Limitations Smaller amounts of RAM Competition for resources with interactive processes. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 41
- 42. Interactive NodesHigh Performance Statistical Computing : Number of servers: 13 Introduction to RCE Number of processors: 84 RAM per job: 1-64GB M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 42
- 43. Apps on Interactive NodesHigh Performance Statistical Computing : Features More memory available for Introduction to RCE application Dedicated processor reduces competition for resources Multiple cores available (e.g. for Stata-MP) Limitations Interactive nodes are limited in number Time limit on applications (currently 72 hours) Time can be extended by request M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 43
- 44. Batch NodesHigh Performance Statistical Computing : Number of servers: 61 Introduction to RCE Number of processors: 258 RAM per job: 2-4GB M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 44
- 45. Running Statistical Apps On Batch NodesHigh Performance Statistical Computing : Features Introduction to RCE Nearly 400 nodes can run at the same time Well suited for loosely- coupled parallel problems Limitations Memory is more limited Application must be designed to harness the power of all node No failover to other pools M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 45
- 46. Memory LimitationsHigh Performance Statistical Computing : Login Nodes Each user on the machine Introduction to RCE is allowed to use a portion of available memory No enforcement of login limits (can be oversubscribed) Interactive/Batch Nodes Each node has share of memory based on request Physical hardware will only run number of jobs equal to processor cores (not oversubscribed) M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 46
- 47. Get started with the RCE: ChecklistHigh Performance Statistical Computing : Apply for an RCE account: support@help.hmdc.harvard.edu Introduction to RCE Install the free NX software Connect to rce.hmdc.harvard.edu Run interactive programs with menus Run large interactive jobs with “RCE Powered” menu Run large batch jobs using a simple launcher script M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 47
- 48. Lab 1: Connecting to the RCEHigh Performance Statistical Computing : In this lab, we will Analyzing Resource Use login to the RCE and launch stata on a Interactive node [Source: http://andreymath.wikidot.com/ . Creative Commons Sharealike Licensnce] M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 48
- 49. Systems Resource UseHigh Performance Statistical Computing : Benchmarks Timing Analyzing Resource Use System resource monitoring System resource limits M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 49
- 50. Benchmarks What patterns of usage are likely to occur?High Performance Statistical Computing : What are the 80% cases? Analyzing Resource Use Are there 10% cases that have unusual patterns of data access, or unusual input? Can you construct a plausible worst-case? Parameterize benchmarks Parameterize problem size Vary order-of-magnitude Create benchmarks based on real cases Use real problems for full benchmarking Miniaturize real problems for quick tests M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 50
- 51. Common BenchmarksHigh Performance Statistical Computing : Artificial benchmarks Analyzing Resource Use Simple “unit” benchmarks Real application + random data Real application + real data Real application + worst case data Mix of applications M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 51
- 52. Timing Why measure timingsHigh Performance Statistical Computing : Direct or indirect measure of performance Establish baseline for changes Analyzing Resource Use Empirical measure of scaling Limitations Timers are often imprecise for brief events Other activity on the system “noise” Many tools aggregate all phases of execution Many tools aggregate all areas of resource use CPU timings may exclude system resource use Must use condor_submit to run these on non- interactive nodes Heisenbugs M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 52
- 53. Alternative: Queuing Models Formalists alternative to benchmarks Can be useful for capacity planning Model services as network of queues Different classes of “customers” Resources with different delay characteristics Transition probabilities Distribution of “service events” Poisson events discrete, independent, no memory Number of events are Poisson distributed interarrival time exponentially distributed Little‟s law: Length of queue = arrival rate * time in queue Source: Takefusa, et al. 1999 Limitations Heroic assumptions are often required State-space explosion Only simplest models solvable closed-formM. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 53
- 54. Wall-Clock Time Measure completion timeHigh Performance Statistical Computing : Show phases of execution by inserting calls Analyzing Resource Use Linux: date OS X: date Windows: DATE R: Sys.time() Stata: display "$S_TIME $S_DATE” Matlab: clock; tic C: time(), getitimer() > print(Sys.time()) [1] "2010-04-28 10:21:45 EDT" > res <- optim(sq, distance, genseq, method="SANN", + control = list(maxit=30000, temp=2000)) > print(Sys.time()) [1] "2010-04-28 10:21:55 EDT" M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 54
- 55. CPU Time Measure CPU time used by programHigh Performance Statistical Computing : Show “system”-state and “user”-state time Analyzing Resource Use Some tools show other resources Linux: /usr/bin/time –v OS X: /usr/bin/time –l Windows: timeit.exe* R: system.time() Stata: timer Matlab: cputime C: getrusage() $ /usr/bin/time -v /usr/local/stata11/stata -b mycommand.do User time (seconds): 0.00 System time (seconds): 0.01 Percent of CPU this job got: 64% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03 ... *Optional tool, may require installation on your system M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 55
- 56. Interpreting CPU Time User time (seconds): 0.00High Performance Statistical Computing : System time (seconds): 0.01 Percent of CPU this job got: 64% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03 Analyzing Resource Use If (system)/(system + user) > .1 Possibly inefficient use of system calls, I/O If elapsed time >> (system+user) Possible resource bottleneck Possible sleep If CPU Percent low possible CPU contention M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 56
- 57. Monitoring Running Processes Show list of processes runningHigh Performance Statistical Computing : See current and accumulated CPU usage Analyzing Resource Use See CPU utilization Linux: top; gnome-system-monitor OS X: top; Utilities -> “Activity Monitor”; atMonitor (3rd party, highly recommended) Windows: taskmrg.exe; top.exe * $ gnome-system-monitor & *Optional tool, may require installation on your system M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 57
- 58. Interpreting Process Monitor Results $ gnome-system-monitor & Show processesHigh Performance Statistical Computing : Sort # of processes Analyzing Resource Use waiting to use CPU Sort processes By CPU use Show list of processes running See current and accumulated CPU & memory usage See CPU utilization M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 58
- 59. Sample Performance Curves • Best case: linear in size of problem • Nonlinearities could mean… • inefficient algorithm (case 2) • hard problem (case 3) • poor data access patterns (case 4)M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 59
- 60. System Resource Monitoring Why monitor system resources?High Performance Statistical Computing : Identify bottlenecks Identify processes using resources – may affect overall Analyzing Resource Use throughput and capacity Identify processes actively using resources – may affect performance Limitations Tools are often imprecise for brief events Other activity on the system “noise” Many tools aggregate all phases of execution Many tools aggregate all system use Many tools aggregate sub-resource use Must use condor_submit to run these on non-interactive nodes Heisenbugs M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 60
- 61. Monitoring System Resources See system aggregated use and activity for memory,High Performance Statistical Computing : disk, network See memory use by process Analyzing Resource Use See resource use by process (varies by platform) Linux: gnome-system-monitor ; /usr/bin/time –v sar ; iostat ; vmstat OS X: Utilities -> “Activity Monitor” /usr/bin/time –v; sar ; iostat Windows: perfmon.exe; taskmrg.exe $ gnome-system-monitor & $ sar –A 1 10 $ /usr/bin/time –v stata –b somefile.do M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 61
- 62. Detailed System Resource Tracing See system use/calls for process as it runsHigh Performance Statistical Computing : Analyzing Resource Use Linux: strace; systap (add-on) OS X: dtrace Windows: procmon.exe (add-on) $ strace –o strace.log myProgram $ sudo dtrace -n syscall:::entry { @[execname] = count() } -c ls M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 62
- 63. Interpreting Process Memory Use Use monitor->preferences toHigh Performance Statistical Computing : $ gnome-system-monitor & Add “Resident Memory” Memory in residence Analyzing Resource Use Requested memory Memory – amount of virtual memory requested Resident Memory – amount of memory currently in RAM for process M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 63
- 64. Interpreting System ActivityHigh Performance Statistical Computing : $ sar -bB 1 10 System memory activity Analyzing Resource Use 01:37:57 PM pgpgin/s pgpgout/s fault/s majflt/s 01:37:58 PM 0.00 0.00 14.71 0.00 01:37:57 PM tps rtps wtps bread/s bwrtn/s 01:37:58 PM 0.00 0.00 0.00 0.00 0.00 System disk activity Page faults – indicate memory activity or resource contention File i/o – indicates file activity M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 64
- 65. Interpreting System ActivityHigh Performance Statistical Computing : $ perfmon Analyzing Resource Use Page faults – indicate memory activity or resource contention File i/o – indicates file activity M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 65
- 66. Interpreting Process Resource UseHigh Performance Statistical Computing : $ /usr/bin/time –v stata –b command Often memory related Analyzing Resource Use Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 149 Voluntary context switches: 1280 Involuntary context switches: 460 Swaps: 0 File system inputs: 0 File system outputs: 0 Process disk I/O Page faults – indicate memory activity or resource contention Voluntary context switches – indicates waiting on I/O or memory Swaps – indicates a severe system memory shortage File i/o – indicates file activity If the numbers is always 0 – it‟s a lie M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 66
- 67. Symptoms of CPU Bound System/ProblemHigh Performance Statistical Computing : Analyzing Resource Use CPU User+Sys activity near 100% while there are active processes (if # of procs > # of cpus) Performance curve for your problem is continuous This is usually good CPU is most expensive resource You can trust code profiling reports More likely to have gains from parallelization However, if CPU %sys is high suspect inefficient use of system calls, or borderline I.o or memory bottlenecks M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 67
- 68. Symptoms of Resource Bottleneck Memory Bottlenecks: Severe:High Performance Statistical Computing : processes in swap queue (or wait on swap) lots of space in use (see swap –m), swapping activity, free memory low Analyzing Resource Use Moderate: high context switches + high page (validity) faults + active processes with memory >> resident memory I/O Bottlenecks: Moderate: high % sys activity in CPU, high # of system calls, # interrupts Severe I/O rate high Context switches, wait on I/O, or processes sleeping on I/O Physical disk activity high Performance curve Discontinuous regions of accelerated performance decline M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 68
- 69. Tune against bottlenecks Typically, a single resource will beHigh Performance Computing: Scaling Up the bottleneck point: CPU Memory I/O: Graphics, Network, Disk If you don‟t address the bottleneck, optimizations elsewhere won‟t matter Bottlenecks may depend on usage scenario and phase of operation Fixing one bottleneck may reveal others Don’t expect speedup of the entire program to be proportional to the code you just tuned! Programs interact, try to profile on a quiet system first M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 69
- 70. Resource Analysis: ChecklistHigh Performance Statistical Computing : Identify benchmarks Analyzing Resource Use Small instances of your problem Can vary size Target an isolated system Minimize other activity Time benchmarks at various sizes Monitor systems resources Look for non-linearities in performance curve Look for bottlenecks M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 70
- 71. Lab: Analyzing Resource Use In this lab, we willHigh Performance Statistical Computing : login to the RCE and Analyzing Resource Use run a simple set of benchmarks Use timing tools and performance analysis Identify bottlenecks [Source: http://andreymath.wikidot.com/ . Creative Commons Sharealike Licensnce] and performance curves M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 71
- 72. Scaling Up Addressing resource bottlenecksHigh Performance Statistical Computing : System and application limits Storing/accessing large datasets Scaling Up Visualizing large datasets M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 72
- 73. When to Scale Up If resource analysis identifies a memoryHigh Performance Statistical Computing : bottleneck If resource analysis identifies an I/O bottleneck (maybe …) If problem size prevents program from starting Scaling Up If program crashes or hangs in the middle of solving large problems (maybe…) If planning ahead for significant usage changes: - size of problem data > ~1/2 available physical memory (RAM) - change of algorithm - change of data structure M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 73
- 74. Addressing Memory Bottlenecks Review: Symptoms of memory bottleneckHigh Performance Statistical Computing : Discontinuity in performance curve Memory size of process increasing Resident memory size of process relatively large System activity shows memory activity Scaling Up Principals of addressing memory bottlenecks Memory hierarchy Locality of reference Programming patterns Add more resources Modify data types Modify data structures Modify algorithms M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 74
- 75. Memory HierarchyHigh Performance Statistical Computing : Registers (<1KB) Scaling Up Cache (1 MB) Ram (10 Gigabytes) Local Storage (10‟s Terabytes) ONLINE Storage (100‟s Petabytes) OFFLINE STORAGE ( 10‟s Exabytes ) If a register access took a Buy one, get 8092 free! second, tape access would take a few centuries.. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 75
- 76. Locality of reference Temporal locality: reuse same data elementsHigh Performance Statistical Computing : Spatial locality: use elements that are “near” each- other in memory What is “near”? Scaling Up For vectors and files: sequential ordering For matrices: either row or column ordering depends on language For complex data structures: use experimentation and analysis Row-Major Order M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 76
- 77. Adding More Resources “$$$” OptimizationHigh Performance Statistical Computing : Buy more memory, or… use the RCE to request a larger share Scaling Up This is effective if local set size < share size M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 77
- 78. System and Application Resource Limits Limits imposed by system or applicationHigh Performance Statistical Computing : Virtual Memory Logical memory space for process Virtual memory limits maximum size of memory requrested Scaling Up Can prevent program from starting, or loading large data Physical Memory Physical RAM installed in system Usually smaller than VM, but not always Maximum efficient local set Resident Size Limits Affects maximum efficient local set - not as severely as physical limits M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 78
- 79. Limits in Linux and OS X Where limits are set:High Performance Statistical Computing : Set at bootup Set by system at login – group/user level total memory limits Set in shell at process creation – request new limit (up to user maximum) Set in code via setrlimit Scaling Up Set in application Know your limits Linux/OS X: /usr/bin/ulimit –a R: none for Linux Stata: query memory Limits on 32 v. 64 bit systems 32 bit OS has limit of 4GB for virtual & physical memory 64 bit OS No practical limit on virtual memory Physical memory still limited by hardware configuration and design Data structures may require more memory to store, since pointers and default data types are larger M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 79
- 80. Limits in windows systems Where limits are set:High Performance Statistical Computing : Limits implied by configuration at boot Virtual memory typically depends on configured paging space on disk + pagefile R: memory.limit() Limits on 32 v. 64 bit systems Most 32 bit windows OS has limit of 3GB physical memory Scaling Up 32 Bit addressing allows 4GB, but 1GB reserved for memory mapped hardware, so only 3GB left over in most Windows configurations 64 bit OS No practical limit on virtual memory (8 TB) Physical memory still limited by hardware configuration and design Data structures may require more memory to store, since pointers and default data types are larger Some windows applications are 32-bit versions, so still limited to 4GB virtual memory. M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 80 of 85

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment