• Like


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.


Uploaded on


More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Visualization - visual display of graphical information.I am going to show how to be more effective in analyzing and communication information using graphical methods.Visualization is sometimes discarded as a cop-out. Newbies and managers use graphs because they are not manly enough. Real DBAs use numbers and command line!In the excellent book “Lies, Damn Lies and Statistics” there is entire chapter dedicated to graphs and the author says something like: People use graphs because they are afraid of numbers, maybe a trauma from school.This is a bit like saying that people use cars because they are too lazy to walk. Sometimes its true. But it ignores the fact that cars are really more efficient.In the same way, graphs are really a more efficient way to display information. In fact, for reasons I’ll show soon, graphs are even more useful experts than they are for beginners.What I’ll take about:Why using graphics is so efficientNew graphical methods Simple design principals
  • Structure = Trends, repetitions and outliers, etc. High bandwidth information channel.Apply pattern matching skills and prior knowledge to analysis of data.
  • We can easily find information in very ambiguous data. Its an evolutionary thing.
  • First line of attack.
  • Quantifiable visual differences – comparative length of parallel lines. 2D location.
  • Differences between color shades and sizes of shapes are difficult to compare and quantify
  • Average describes normal distributions quite well. Give height as an example for why average is a good descriptor for normal distribution.
  • Extremely Skewed distribution! Its not even close to normal. Average does not really describe how slow export can get.
  • That looks like a good description. But wait!
  • Sometimes export doesn’t run at all. I can explain the outliers (both low and high) - those 5 days one Netapp head was down and we didn’t run exports, and when we did performance was awful. Since I can explain the outliers – I know I can remove them.
  • histogram. Looks kind of normal, but hard to tell.
  • qqnorm. Yep, looks normal with some noise. You don’t see a consistent skew.
  • MultipleBoxplots
  • Scatter plot
  • Less is more. Be clear and to the point. Do not distort or mislead. Think of your data as a fashion model – you look at her and photograph her from all positions and angles, but only the best photos appear in the magazine – often hiding as much as they reveal!


  • 1. Visualization
    For analysis and communication
    Chen “Gwen” Shapira
  • 2. Reveal Structure in Data
  • 3.
  • 4.
  • 5. Verify Your Findings
    Prior knowledge
    Statistical tools
    Graphs are only the starting point
  • 6. Not all visuals are created equal
  • 7.
  • 8.
  • 9. Numerical quantities focus on expected values – graphical summaries on unexpected values
    – John Tukey
  • 10. How long does it take to run full export on ITGDB10?
  • 11. 5 Hours and 45 minutes. On average.
  • 12.
  • 13. Most of the time it take 3 to 6.5 hours.
    But it can take as long as 20 hours!
  • 14.
  • 15. 5 hours on average, when the storage works.
  • 16.
  • 17. I got rid of the outliers.
    Am I normal now?
  • 18.
  • 19.
  • 20. What about the rest of the servers?
  • 21.
  • 22.
  • 23. Does our maintenance have impact on response times?
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Communicating Information