Visualizing database performance   hotsos 13-v2
Upcoming SlideShare
Loading in...5
×
 

Visualizing database performance hotsos 13-v2

on

  • 822 views

 

Statistics

Views

Total Views
822
Views on SlideShare
791
Embed Views
31

Actions

Likes
6
Downloads
29
Comments
1

1 Embed 31

https://twitter.com 31

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • slide with rhino on treadmill is awesome!
    And yeah, that's how many dev's think about tuning.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The goal is to...Structure = Trends, repetitions and outliers, etc. High bandwidth information channel.Apply pattern matching skills and prior knowledge to analysis of data.
  • Just a photo. Add a list of resources at the end. R is my favorite but there are many many others.
  • 3 data preparation techniques
  • You can also pivot and apply pre-analysis.The goal is on one hand to get all the data you are going to need, so you won’t have to move back and forth between the database and R.On the other hand, minimizing the amount of data you have to copy over the network. And as DB experts and R newbies – most cleanup activities are easier for us in the DB rather than elsewhere.
  • Example from Greg Rahn blog post: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  • Re-shape makes pivoting easySometimes you didn’t know you should filter out data before you started working on it in R
  • I don’t really want the buffer cache data, its too large and will distort all my charts
  • Perl is awesome for processing lines of text, can be used to aggregate (with hash maps), filter, etc. So are SED and AWKAlso, data that is not from the database, sometimes doesn’t look like a table, so you can’t massage it with R easily.Frits Hoogland has wonderful example of using sed to extract wait information our of 10046 file.:http://fritshoogland.wordpress.com/2012/01/18/using-r-and-oracle-tracefiles/
  • Shape of data – distribution, common values, outliers. Charts should be useful, but not necessarily sexy.
  • You also need at least two solutions, but that’s for later
  • We can see what looks like failed exports (but don’t know when they failed), we can see that our largest database has large variance in times, we can see that most databases have export times far outside the average, and we can see the 75% percent point
  • YuryVelikanov: http://www.pythian.com/blog/upgraded-to-11gr2-congrats-you-are-in-direct-reads-trouble/
  • Published by Greg Rahn: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778755/
  • We pay attention to what is interesting. And what is interesting is the story, the outliers, the changes, the discoverieshttp://headrush.typepad.com/creating_passionate_users/2005/12/but_is_it_inter.html
  • From Baron Schwartz blog: http://www.xaprb.com/blog/2011/01/15/sleep-while-you-can-because-it-wont-last-long/Showing number of blog posts on MySQL over time. Clearly we are running out of blog posts.Extrapolating without a model to explain what you are looking at.just drawing a line through data is not enough – you need a model.

Visualizing database performance   hotsos 13-v2 Visualizing database performance hotsos 13-v2 Presentation Transcript

  • Visualizing DatabasePerformance with RGwen Shapira, Senior ConsultantFebruary, 2013
  • About Me – Oracle ACE Director – Member of Oak Table – 14 years of IT – Performance Tuning – Troubleshooting – Hadoop – Presents, Blogs, Tweets – @gwenshap2 © 2013 Pythian
  • About Pythian• Recognized Leader: – Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server – Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments• Expertise: – Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs. – Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC• Global Reach & Scalability: – Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response3 © 2013 Pythian
  • Will Talk About:• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes
  • Will NOT Talk About:• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss
  • Why Visualize?• Yet another analysis tool• But more fun• Highly effective• Communications tool, too• But not at the same time6 © 2013 Pythian
  • Reveal Structurein Data
  • Visualization Tools
  • R Studio9 © 2013 Pythian
  • Getting Data In Shape10 © 2013 Pythian
  • Use the DB, Luke Aggregate Scale Filter11 © 2013 Pythian
  • Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver", "/Users/grahn/code/jdbc/ojdbc6.jar")conn<-dbConnect(drv, "jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl", "grahn","grahn")# import the data into a data.framelfs <-dbGetQuery(conn, "select SAMPLE_ID, TIME_WAITED from ashdump where EVENT=log file sync’ order by SAMPLE_ID")12 © 2013 Pythian
  • With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,819200013 © 2013 Pythian
  • Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<- cast(shared_pool,SNAP_TIME ~ NAME,max) Time free memory log_buffer buffer_cache 12-03-09 00:00:00 645935368 16924672 21676163072 12-03-09 04:00:00 19274137614 © 2013 Pythian
  • With R out of scale15 © 2013 Pythian
  • Select Subset of datamax_shared_pool <-subset(max_shared_pool, select = -c(buffer_cache))boxplot( (max_shared_pool)/1024/1024, xlab="Size in MBytes", horizontal=TRUE, las=1,par(mar=c(4,6,2,1)))16 © 2013 Pythian
  • With R17 © 2013 Pythian
  • More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT 14929 User I/O cell single block10526629 physical read10526629 5015 User I/O cell single block physical read10465699 21572 Concurrency library cache: mutex X10465699 65938 Concurrency library cache: mutex X18 © 2013 Pythian
  • Filtering Datanew <- subset (old, row filter, column filter)phys_io <- subset(ash, WAIT_CLASS == ―User I/O‖, select = -c(EVENT))SAMPLE_ID TIME_WAITED WAIT_CLASS10526629 14929 User I/O10526629 5015 User I/O19 © 2013 Pythian
  • Another Filtering Syntaxshort_waits <- subset(ash, ash$TIME_WAITED < 10000)short_waits <- ash[ash$TIME_WAITED < 10000,] Not a Typo!SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT10526629 5015 User I/O cell single block physical read20 © 2013 Pythian
  • Summarize with DDPLYinstall.packages(‖plyr")library(plyr)ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED));SAMPLE_ID N MEAN MAX10526629 2 9972 1492910465699 2 43755 6593821 © 2013 Pythian
  • Cheating for DBAslibrary(sqldf)ash2 = sqldf(selectSAMPLE_ID, count(*) N,mean(TIME_WAITED), max(TIME_WAITED)from ashwhere WAIT_CLASS=―User I/O‖group by SAMPLE_ID)22 © 2013 Pythian
  • When all else failsText is text.Frits Hoogland converts 10046 trace to CSV for R withSED:s/^(WAIT) #([0-9]*): nam=(.*) ela= *([0-9]*)[0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) obj#=([0-9-]*) tim=([0-9]*)$/1|2|3|4|5|6|7|8|9/23 © 2013 Pythian
  • Exploring Data24 © 2013 Pythian
  • Directions to Explore• Shape of data• Correlations• Changes over time25 © 2013 Pythian
  • The Goal of Analysis is a Story • Who • What • When • Where • Why • Why • Why • Why • Why26 © 2013 Pythian
  • Boxplot 75% of exports take• Initial step less than 600m• Identify outliers• Compare groups• Summarize Fail?27 © 2013 Pythian
  • For Example: WHAT?28 © 2013 Pythian
  • How its done?ash <- read.csv(~/Downloads/ash1.csv)boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class", ylab="Time Waited (s)", cex.axis=1.2)29 © 2013 Pythian
  • Scatter Plot• Incredibly versatile• Use to: – Show changes over time – Show correlations – Highlight trends – Find model – Pretty much everything30 © 2013 Pythian
  • WHAT?31 © 2013 Pythian
  • Log Data32 © 2013 Pythian
  • How its done?install.packages("ggplot2")library(ggplot2)ggplot(ash, aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash, aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))+geom_point();33 © 2013 Pythian
  • Only ”Small Waits” 500us Physical IO?34 © 2013 Pythian
  • Filteringsmall_waits <- ash[ash$TIME_WAITED<15000,]ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()35 © 2013 Pythian
  • Smoothing36 © 2013 Pythian
  • Smoothingggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) +geom_smooth()ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point()+ geom_smooth()37 © 2013 Pythian
  • Data over Time 11gR2 !38 © 2013 Pythian
  • Finding Correlation39 © 2013 Pythian
  • Regression (is not Causation)40 © 2013 Pythian
  • How?concurr2 <-ddply(concurr,.(SAMPLE_ID), summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED));ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number ofSamples")+ylab("Max Time Waited (s)")41 © 2013 Pythian
  • Heatmap• Values as ―blocks‖ in a matrix• Clearer than scatter plot for large amounts of data• Shows less information• Performance data made sexy42 © 2013 Pythian
  • Heatmap43 © 2013 Pythian
  • How?ash2 <- ddply(concurr,.(SAMPLE_ID),summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in%c("Concurrency","User I/O","Other"),]ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) +geom_tile(aes(fill = log(N))) +scale_fill_gradient(low = ‖green‖, high = ‖red")44 © 2013 Pythian
  • Presenting Your Data45 © 2013 Pythian
  • FACT―Even irrelevantneuroscienceinformation in anexplanation of apsychologicalphenomenon mayinterfere with people’sabilities to criticallyconsider the underlyinglogic of thisexplanation.‖46 © 2013 Pythian
  • Numerical quantities focus on expected values – graphical summaries on unexpected values --John Tukey47
  • Our goal is an interesting presentation.What is “Interesting”?• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety48 © 2013 Pythian
  • Bad Visualizations Lie1. Omit important data2. Distort data3. Misleading4. Confusing5. Fake correlations and Bad models49 © 2013 Pythian
  • Bad vs. Good Visuals50 © 2013 Pythian
  • Eye-API• Good: • Bad: – distances – shades – locations – relative area – length – angles – high contrast51 © 2013 Pythian
  • Good or Bad?52 © 2013 Pythian
  • 53 © 2013 Pythian
  • #1 Mistake – Throw a line on Data54 © 2013 Pythian
  • 55 © 2013 Pythian
  • Avoid Pie Charts56 © 2013 Pythian
  • Infographics always have Pie Charts57 © 2013 Pythian
  • Which is better?58 © 2013 Pythian
  • Creativity is Allowed59 © 2013 Pythian
  • Make it Beautiful – for Geeks• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise60 © 2013 Pythian
  • IMPORTant R Libraries• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/2013/02/10-r- packages-every-data-scientist-should-know- about.html61 © 2013 Pythian
  • Other Visualization Tools• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi62 © 2013 Pythian
  • Thank you – Q&ATo contact us sales@pythian.com 1-877-PYTHIANTo follow us http://www.pythian.com/blog http://www.facebook.com/pages/The-Pythian- Group/163902527671 @pythian http://www.linkedin.com/company/pythian63 © 2013 Pythian