Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Visualization - visual display of graphical information.
    I am going to show how to be more effective in analyzing and communication information using graphical methods.
    Visualization is sometimes discarded as a cop-out. Newbies and managers use graphs because they are not manly enough. Real DBAs use numbers and command line!
    In the excellent book “Lies, Damn Lies and Statistics” there is entire chapter dedicated to graphs and the author says something like: People use graphs because they are afraid of numbers, maybe a trauma from school.

    This is a bit like saying that people use cars because they are too lazy to walk. Sometimes its true. But it ignores the fact that cars are really more efficient.
    In the same way, graphs are really a more efficient way to display information. In fact, for reasons I’ll show soon, graphs are even more useful experts than they are for beginners.

    What I’ll take about:
    Why using graphics is so efficient
    New graphical methods
    Simple design principals
  • Structure = Trends, repetitions and outliers, etc.
    High bandwidth information channel.
    Apply pattern matching skills and prior knowledge to analysis of data.
  • We can easily find information in very ambiguous data. Its an evolutionary thing.
  • First line of attack.
  • Quantifiable visual differences – comparative length of parallel lines. 2D location.
  • Differences between color shades and sizes of shapes are difficult to compare and quantify
  • Average describes normal distributions quite well. Give height as an example for why average is a good descriptor for normal distribution.
  • Extremely Skewed distribution! Its not even close to normal. Average does not really describe how slow export can get.
  • That looks like a good description. But wait!
  • Sometimes export doesn’t run at all. I can explain the outliers (both low and high) - those 5 days one Netapp head was down and we didn’t run exports, and when we did performance was awful. Since I can explain the outliers – I know I can remove them.
  • histogram. Looks kind of normal, but hard to tell.
  • qqnorm. Yep, looks normal with some noise. You don’t see a consistent skew.
  • Multiple Boxplots
  • Scatter plot
  • Less is more. Be clear and to the point. Do not distort or mislead. Think of your data as a fashion model – you look at her and photograph her from all positions and angles, but only the best photos appear in the magazine – often hiding as much as they reveal!
  • Visualization

    1. 1. VisualizationFor analysis and communication Chen “Gwen” Shapira
    2. 2. Reveal Structure in Data
    3. 3. Verify Your Findings • Prior knowledge • Statistical tools • Graphs are only the starting point
    4. 4. Not all visuals are created equal
    5. 5. Numerical quantities focus on expected values – graphical summaries on unexpected values – John Tukey
    6. 6. How long does it take to run full export on ITGDB10?
    7. 7. 5 Hours and 45 minutes. On average.
    8. 8. Most of the time it take 3 to 6.5 hours. But it can take as long as 20 hours!
    9. 9. 5 hours on average, when the storage works.
    10. 10. I got rid of the outliers. Am I normal now?
    11. 11. What about the rest of the servers?
    12. 12. Does our maintenance have impact on response times?
    13. 13. 0 2 4 6 8 10 12 14 RowLabels 24-SEP-0922.00.00 25-SEP-0902.00.00 25-SEP-0906.00.00 25-SEP-0910.00.00 25-SEP-0914.00.00 25-SEP-0918.00.00 25-SEP-0922.00.00 26-SEP-0902.00.00 26-SEP-0906.00.00 26-SEP-0910.00.00 26-SEP-0914.00.00 26-SEP-0918.00.00 26-SEP-0922.00.00 27-SEP-0902.00.00 27-SEP-0906.00.00 27-SEP-0910.00.00 27-SEP-0914.00.00 27-SEP-0918.00.00 27-SEP-0922.00.00 28-SEP-0902.00.00 28-SEP-0906.00.00 28-SEP-0910.00.00 28-SEP-0914.00.00 28-SEP-0918.00.00 28-SEP-0922.00.00 29-SEP-0902.00.00 29-SEP-0906.00.00 29-SEP-0910.00.00 29-SEP-0914.00.00 29-SEP-0918.00.00 29-SEP-0922.00.00 30-SEP-0902.00.00 30-SEP-0906.00.00 30-SEP-0910.00.00 30-SEP-0914.00.00 30-SEP-0918.00.00 30-SEP-0922.00.00 01-OCT-0902.00.00 01-OCT-0906.00.00 01-OCT-0910.00.00 01-OCT-0914.00.00 01-OCT-0918.00.00 Series5 Series7 Series9
    14. 14. Communicating Information
    15. 15. 0 10 20 30 40 50 60 70 80 90 100 AxisTitle Axis Title oracle
    16. 16. 0 20 40 60 80 100 120 oracle India Pakistan Singapore Kenya Sri Lanka Nigeria Hong Kong South Korea Japan El Salvador Jordan China United Arab Emirates Taiwan United States Guatemala Costa Rica Ecuador Russian Federation South Africa
    17. 17. 0 20 40 60 80 100 120 Russian Federation Costa Rica Ecuador United Arab Emirates Taiwan United States Guatemala China Jordan Japan El Salvador South Korea Hong Kong Nigeria Kenya Sri Lanka Singapore Pakistan India Oracle Google Searches - By Region, Normalized
    18. 18. 0 20 40 60 80 100 120 Russian Federation Costa Rica Ecuador United Arab Emirates Taiwan United States Guatemala China Jordan Japan El Salvador South Korea Hong Kong Nigeria Kenya Sri Lanka Singapore Pakistan India Oracle Google Searches - By Region, Normalized