Visualizing DatabasePerformance with RGwen Shapira, Senior ConsultantFebruary, 2013
About Me                     – Oracle ACE Director                     – Member of Oak Table                     – 14 year...
About Pythian•   Recognized Leader:    – Global industry-leader in remote database administration services and      consul...
Will Talk About:•   Data pre-processing tools•   Visualization tools and techniques•   How to make great looking charts•  ...
Will NOT Talk About:•   How to collect performance data•   Cool ASH queries•   How to program in R•   Statistics•   Machin...
Why Visualize?• Yet another analysis tool• But more fun• Highly effective• Communications tool, too• But not at the same t...
Reveal Structurein Data
Visualization Tools
R Studio9          © 2013 Pythian
Getting Data In Shape10            © 2013 Pythian
Use the DB, Luke                               Aggregate     Scale                               Filter11            © 201...
Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver",             "/Users/grahn/code/jdbc/ojdbc6...
With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_b...
Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<-    ...
With R         out of scale15           © 2013 Pythian
Select Subset of datamax_shared_pool <-subset(max_shared_pool, select = -c(buffer_cache))boxplot(  (max_shared_pool)/1024/...
With R17       © 2013 Pythian
More SubsetsSAMPLE_ID   TIME_WAITED           WAIT_CLASS    EVENT            14929                 User I/O      cell sing...
Filtering Datanew <- subset (old, row filter, column filter)phys_io <- subset(ash,                    WAIT_CLASS == ―User ...
Another Filtering Syntaxshort_waits <- subset(ash, ash$TIME_WAITED < 10000)short_waits <- ash[ash$TIME_WAITED < 10000,]   ...
Summarize with DDPLYinstall.packages(‖plyr")library(plyr)ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise,   N=length(TIME_WAITED...
Cheating for DBAslibrary(sqldf)ash2 = sqldf(selectSAMPLE_ID, count(*) N,mean(TIME_WAITED), max(TIME_WAITED)from ashwhere W...
When all else failsText is text.Frits Hoogland converts 10046 trace to CSV for R withSED:s/^(WAIT) #([0-9]*): nam=(.*) ela...
Exploring Data24               © 2013 Pythian
Directions to Explore• Shape of data• Correlations• Changes over time25                    © 2013 Pythian
The Goal of Analysis is a Story                         •      Who                         •      What                    ...
Boxplot                                            75% of                                          exports take•    Initia...
For Example:                                WHAT?28             © 2013 Pythian
How its done?ash <- read.csv(~/Downloads/ash1.csv)boxplot(ash$TIME_WAITED/1000000 ~         ash$WAIT_CLASS,         xlab="...
Scatter Plot• Incredibly versatile• Use to:     –   Show changes over time     –   Show correlations     –   Highlight tre...
WHAT?31   © 2013 Pythian
Log Data32         © 2013 Pythian
How its done?install.packages("ggplot2")library(ggplot2)ggplot(ash,    aes(SAMPLE_ID,TIME_WAITED,    color=factor(WAIT_CLA...
Only ”Small Waits”                                500us                               Physical                            ...
Filteringsmall_waits <- ash[ash$TIME_WAITED<15000,]ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS)))...
Smoothing36          © 2013 Pythian
Smoothingggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) +geom_smooth()ggplot(ash,aes(SAMPLE_ID,TI...
Data over Time                 11gR2                   !38               © 2013 Pythian
Finding Correlation39            © 2013 Pythian
Regression (is not Causation)40             © 2013 Pythian
How?concurr2 <-ddply(concurr,.(SAMPLE_ID), summarise,  N=length(TIME_WAITED),  mean=mean(TIME_WAITED),  max=max(TIME_WAITE...
Heatmap• Values as ―blocks‖ in  a matrix• Clearer than scatter  plot for large amounts  of data• Shows less  information• ...
Heatmap43        © 2013 Pythian
How?ash2 <- ddply(concurr,.(SAMPLE_ID),summarise,N=length(TIME_WAITED),   mean=mean(TIME_WAITED),   max=max(TIME_WAITED))a...
Presenting Your Data45            © 2013 Pythian
FACT―Even irrelevantneuroscienceinformation in anexplanation of apsychologicalphenomenon mayinterfere with people’sabiliti...
Numerical quantities focus on     expected values –     graphical summaries on unexpected     values        --John Tukey47
Our goal is an interesting presentation.What is “Interesting”?•    Surprise•    Beauty•    Stories•    Visuals•    Counter...
Bad Visualizations Lie1.   Omit important data2.   Distort data3.   Misleading4.   Confusing5.   Fake correlations and Bad...
Bad vs. Good Visuals50             © 2013 Pythian
Eye-API• Good:                            • Bad:     –   distances                        – shades     –   locations      ...
Good or Bad?52             © 2013 Pythian
53   © 2013 Pythian
#1 Mistake – Throw a line on Data54              © 2013 Pythian
55   © 2013 Pythian
Avoid Pie Charts56            © 2013 Pythian
Infographics always have Pie Charts57                © 2013 Pythian
Which is better?58            © 2013 Pythian
Creativity is Allowed59               © 2013 Pythian
Make it Beautiful – for Geeks•    Contrast•    Reduce noise•    Few colors•    Few fonts•    Lots of Data•    More Signal•...
IMPORTant R Libraries•    reshape•    plyr•    ggplot2•    sqldf•    http://blog.revolutionanalytics.com/2013/02/10-r-    ...
Other Visualization Tools•    R + R Studio•    Excel•    Gephi•    JIT, D3.js•    Excel•    ggobi62                  © 201...
Thank you – Q&ATo contact us        sales@pythian.com        1-877-PYTHIANTo follow us        http://www.pythian.com/blog ...
Upcoming SlideShare
Loading in...5
×

Visualizing database performance hotsos 13-v2

915

Published on

1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
915
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
58
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide
  • The goal is to...Structure = Trends, repetitions and outliers, etc. High bandwidth information channel.Apply pattern matching skills and prior knowledge to analysis of data.
  • Just a photo. Add a list of resources at the end. R is my favorite but there are many many others.
  • 3 data preparation techniques
  • You can also pivot and apply pre-analysis.The goal is on one hand to get all the data you are going to need, so you won’t have to move back and forth between the database and R.On the other hand, minimizing the amount of data you have to copy over the network. And as DB experts and R newbies – most cleanup activities are easier for us in the DB rather than elsewhere.
  • Example from Greg Rahn blog post: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  • Re-shape makes pivoting easySometimes you didn’t know you should filter out data before you started working on it in R
  • I don’t really want the buffer cache data, its too large and will distort all my charts
  • Perl is awesome for processing lines of text, can be used to aggregate (with hash maps), filter, etc. So are SED and AWKAlso, data that is not from the database, sometimes doesn’t look like a table, so you can’t massage it with R easily.Frits Hoogland has wonderful example of using sed to extract wait information our of 10046 file.:http://fritshoogland.wordpress.com/2012/01/18/using-r-and-oracle-tracefiles/
  • Shape of data – distribution, common values, outliers. Charts should be useful, but not necessarily sexy.
  • You also need at least two solutions, but that’s for later
  • We can see what looks like failed exports (but don’t know when they failed), we can see that our largest database has large variance in times, we can see that most databases have export times far outside the average, and we can see the 75% percent point
  • YuryVelikanov: http://www.pythian.com/blog/upgraded-to-11gr2-congrats-you-are-in-direct-reads-trouble/
  • Published by Greg Rahn: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778755/
  • We pay attention to what is interesting. And what is interesting is the story, the outliers, the changes, the discoverieshttp://headrush.typepad.com/creating_passionate_users/2005/12/but_is_it_inter.html
  • From Baron Schwartz blog: http://www.xaprb.com/blog/2011/01/15/sleep-while-you-can-because-it-wont-last-long/Showing number of blog posts on MySQL over time. Clearly we are running out of blog posts.Extrapolating without a model to explain what you are looking at.just drawing a line through data is not enough – you need a model.
  • Visualizing database performance hotsos 13-v2

    1. 1. Visualizing DatabasePerformance with RGwen Shapira, Senior ConsultantFebruary, 2013
    2. 2. About Me – Oracle ACE Director – Member of Oak Table – 14 years of IT – Performance Tuning – Troubleshooting – Hadoop – Presents, Blogs, Tweets – @gwenshap2 © 2013 Pythian
    3. 3. About Pythian• Recognized Leader: – Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server – Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments• Expertise: – Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs. – Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC• Global Reach & Scalability: – Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response3 © 2013 Pythian
    4. 4. Will Talk About:• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes
    5. 5. Will NOT Talk About:• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss
    6. 6. Why Visualize?• Yet another analysis tool• But more fun• Highly effective• Communications tool, too• But not at the same time6 © 2013 Pythian
    7. 7. Reveal Structurein Data
    8. 8. Visualization Tools
    9. 9. R Studio9 © 2013 Pythian
    10. 10. Getting Data In Shape10 © 2013 Pythian
    11. 11. Use the DB, Luke Aggregate Scale Filter11 © 2013 Pythian
    12. 12. Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver", "/Users/grahn/code/jdbc/ojdbc6.jar")conn<-dbConnect(drv, "jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl", "grahn","grahn")# import the data into a data.framelfs <-dbGetQuery(conn, "select SAMPLE_ID, TIME_WAITED from ashdump where EVENT=log file sync’ order by SAMPLE_ID")12 © 2013 Pythian
    13. 13. With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,819200013 © 2013 Pythian
    14. 14. Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<- cast(shared_pool,SNAP_TIME ~ NAME,max) Time free memory log_buffer buffer_cache 12-03-09 00:00:00 645935368 16924672 21676163072 12-03-09 04:00:00 19274137614 © 2013 Pythian
    15. 15. With R out of scale15 © 2013 Pythian
    16. 16. Select Subset of datamax_shared_pool <-subset(max_shared_pool, select = -c(buffer_cache))boxplot( (max_shared_pool)/1024/1024, xlab="Size in MBytes", horizontal=TRUE, las=1,par(mar=c(4,6,2,1)))16 © 2013 Pythian
    17. 17. With R17 © 2013 Pythian
    18. 18. More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT 14929 User I/O cell single block10526629 physical read10526629 5015 User I/O cell single block physical read10465699 21572 Concurrency library cache: mutex X10465699 65938 Concurrency library cache: mutex X18 © 2013 Pythian
    19. 19. Filtering Datanew <- subset (old, row filter, column filter)phys_io <- subset(ash, WAIT_CLASS == ―User I/O‖, select = -c(EVENT))SAMPLE_ID TIME_WAITED WAIT_CLASS10526629 14929 User I/O10526629 5015 User I/O19 © 2013 Pythian
    20. 20. Another Filtering Syntaxshort_waits <- subset(ash, ash$TIME_WAITED < 10000)short_waits <- ash[ash$TIME_WAITED < 10000,] Not a Typo!SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT10526629 5015 User I/O cell single block physical read20 © 2013 Pythian
    21. 21. Summarize with DDPLYinstall.packages(‖plyr")library(plyr)ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED));SAMPLE_ID N MEAN MAX10526629 2 9972 1492910465699 2 43755 6593821 © 2013 Pythian
    22. 22. Cheating for DBAslibrary(sqldf)ash2 = sqldf(selectSAMPLE_ID, count(*) N,mean(TIME_WAITED), max(TIME_WAITED)from ashwhere WAIT_CLASS=―User I/O‖group by SAMPLE_ID)22 © 2013 Pythian
    23. 23. When all else failsText is text.Frits Hoogland converts 10046 trace to CSV for R withSED:s/^(WAIT) #([0-9]*): nam=(.*) ela= *([0-9]*)[0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) obj#=([0-9-]*) tim=([0-9]*)$/1|2|3|4|5|6|7|8|9/23 © 2013 Pythian
    24. 24. Exploring Data24 © 2013 Pythian
    25. 25. Directions to Explore• Shape of data• Correlations• Changes over time25 © 2013 Pythian
    26. 26. The Goal of Analysis is a Story • Who • What • When • Where • Why • Why • Why • Why • Why26 © 2013 Pythian
    27. 27. Boxplot 75% of exports take• Initial step less than 600m• Identify outliers• Compare groups• Summarize Fail?27 © 2013 Pythian
    28. 28. For Example: WHAT?28 © 2013 Pythian
    29. 29. How its done?ash <- read.csv(~/Downloads/ash1.csv)boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class", ylab="Time Waited (s)", cex.axis=1.2)29 © 2013 Pythian
    30. 30. Scatter Plot• Incredibly versatile• Use to: – Show changes over time – Show correlations – Highlight trends – Find model – Pretty much everything30 © 2013 Pythian
    31. 31. WHAT?31 © 2013 Pythian
    32. 32. Log Data32 © 2013 Pythian
    33. 33. How its done?install.packages("ggplot2")library(ggplot2)ggplot(ash, aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash, aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))+geom_point();33 © 2013 Pythian
    34. 34. Only ”Small Waits” 500us Physical IO?34 © 2013 Pythian
    35. 35. Filteringsmall_waits <- ash[ash$TIME_WAITED<15000,]ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()35 © 2013 Pythian
    36. 36. Smoothing36 © 2013 Pythian
    37. 37. Smoothingggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) +geom_smooth()ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point()+ geom_smooth()37 © 2013 Pythian
    38. 38. Data over Time 11gR2 !38 © 2013 Pythian
    39. 39. Finding Correlation39 © 2013 Pythian
    40. 40. Regression (is not Causation)40 © 2013 Pythian
    41. 41. How?concurr2 <-ddply(concurr,.(SAMPLE_ID), summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED));ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number ofSamples")+ylab("Max Time Waited (s)")41 © 2013 Pythian
    42. 42. Heatmap• Values as ―blocks‖ in a matrix• Clearer than scatter plot for large amounts of data• Shows less information• Performance data made sexy42 © 2013 Pythian
    43. 43. Heatmap43 © 2013 Pythian
    44. 44. How?ash2 <- ddply(concurr,.(SAMPLE_ID),summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in%c("Concurrency","User I/O","Other"),]ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) +geom_tile(aes(fill = log(N))) +scale_fill_gradient(low = ‖green‖, high = ‖red")44 © 2013 Pythian
    45. 45. Presenting Your Data45 © 2013 Pythian
    46. 46. FACT―Even irrelevantneuroscienceinformation in anexplanation of apsychologicalphenomenon mayinterfere with people’sabilities to criticallyconsider the underlyinglogic of thisexplanation.‖46 © 2013 Pythian
    47. 47. Numerical quantities focus on expected values – graphical summaries on unexpected values --John Tukey47
    48. 48. Our goal is an interesting presentation.What is “Interesting”?• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety48 © 2013 Pythian
    49. 49. Bad Visualizations Lie1. Omit important data2. Distort data3. Misleading4. Confusing5. Fake correlations and Bad models49 © 2013 Pythian
    50. 50. Bad vs. Good Visuals50 © 2013 Pythian
    51. 51. Eye-API• Good: • Bad: – distances – shades – locations – relative area – length – angles – high contrast51 © 2013 Pythian
    52. 52. Good or Bad?52 © 2013 Pythian
    53. 53. 53 © 2013 Pythian
    54. 54. #1 Mistake – Throw a line on Data54 © 2013 Pythian
    55. 55. 55 © 2013 Pythian
    56. 56. Avoid Pie Charts56 © 2013 Pythian
    57. 57. Infographics always have Pie Charts57 © 2013 Pythian
    58. 58. Which is better?58 © 2013 Pythian
    59. 59. Creativity is Allowed59 © 2013 Pythian
    60. 60. Make it Beautiful – for Geeks• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise60 © 2013 Pythian
    61. 61. IMPORTant R Libraries• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/2013/02/10-r- packages-every-data-scientist-should-know- about.html61 © 2013 Pythian
    62. 62. Other Visualization Tools• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi62 © 2013 Pythian
    63. 63. Thank you – Q&ATo contact us sales@pythian.com 1-877-PYTHIANTo follow us http://www.pythian.com/blog http://www.facebook.com/pages/The-Pythian- Group/163902527671 @pythian http://www.linkedin.com/company/pythian63 © 2013 Pythian
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×