VisualizationFor analysis and communication
Chen “Gwen” Shapira
Reveal Structure
in Data
Verify Your Findings
• Prior knowledge
• Statistical tools
• Graphs are only the starting point
Not all visuals are
created equal
Numerical quantities focus on
expected values – graphical
summaries on unexpected values
– John Tukey
How long does it take
to run full export on
ITGDB10?
5 Hours and 45
minutes. On average.
Most of the time it
take 3 to 6.5 hours.
But it can take as long
as 20 hours!
5 hours on average,
when the storage works.
I got rid of the outliers.
Am I normal now?
What about the rest of
the servers?
Does our maintenance
have impact on response
times?
0
2
4
6
8
10
12
14 RowLabels
24-SEP-0922.00.00
25-SEP-0902.00.00
25-SEP-0906.00.00
25-SEP-0910.00.00
25-SEP-0914.00.00
25-...
Communicating
Information
0
10
20
30
40
50
60
70
80
90
100
AxisTitle
Axis Title
oracle
0
20
40
60
80
100
120
oracle
India
Pakistan
Singapore
Kenya
Sri Lanka
Nigeria
Hong Kong
South Korea
Japan
El Salvador
Jord...
0 20 40 60 80 100 120
Russian Federation
Costa Rica
Ecuador
United Arab Emirates
Taiwan
United States
Guatemala
China
Jord...
0 20 40 60 80 100 120
Russian Federation
Costa Rica
Ecuador
United Arab Emirates
Taiwan
United States
Guatemala
China
Jord...
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Visualization
Upcoming SlideShare
Loading in...5
×

Visualization

929

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
929
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Visualization - visual display of graphical information.
    I am going to show how to be more effective in analyzing and communication information using graphical methods.
    Visualization is sometimes discarded as a cop-out. Newbies and managers use graphs because they are not manly enough. Real DBAs use numbers and command line!
    In the excellent book “Lies, Damn Lies and Statistics” there is entire chapter dedicated to graphs and the author says something like: People use graphs because they are afraid of numbers, maybe a trauma from school.

    This is a bit like saying that people use cars because they are too lazy to walk. Sometimes its true. But it ignores the fact that cars are really more efficient.
    In the same way, graphs are really a more efficient way to display information. In fact, for reasons I’ll show soon, graphs are even more useful experts than they are for beginners.

    What I’ll take about:
    Why using graphics is so efficient
    New graphical methods
    Simple design principals
  • Structure = Trends, repetitions and outliers, etc.
    High bandwidth information channel.
    Apply pattern matching skills and prior knowledge to analysis of data.
  • We can easily find information in very ambiguous data. Its an evolutionary thing.
  • First line of attack.
  • Quantifiable visual differences – comparative length of parallel lines. 2D location.
  • Differences between color shades and sizes of shapes are difficult to compare and quantify
  • Average describes normal distributions quite well. Give height as an example for why average is a good descriptor for normal distribution.
  • Extremely Skewed distribution! Its not even close to normal. Average does not really describe how slow export can get.
  • That looks like a good description. But wait!
  • Sometimes export doesn’t run at all. I can explain the outliers (both low and high) - those 5 days one Netapp head was down and we didn’t run exports, and when we did performance was awful. Since I can explain the outliers – I know I can remove them.
  • histogram. Looks kind of normal, but hard to tell.
  • qqnorm. Yep, looks normal with some noise. You don’t see a consistent skew.
  • Multiple Boxplots
  • Scatter plot
  • Less is more. Be clear and to the point. Do not distort or mislead. Think of your data as a fashion model – you look at her and photograph her from all positions and angles, but only the best photos appear in the magazine – often hiding as much as they reveal!
  • Visualization

    1. 1. VisualizationFor analysis and communication Chen “Gwen” Shapira
    2. 2. Reveal Structure in Data
    3. 3. Verify Your Findings • Prior knowledge • Statistical tools • Graphs are only the starting point
    4. 4. Not all visuals are created equal
    5. 5. Numerical quantities focus on expected values – graphical summaries on unexpected values – John Tukey
    6. 6. How long does it take to run full export on ITGDB10?
    7. 7. 5 Hours and 45 minutes. On average.
    8. 8. Most of the time it take 3 to 6.5 hours. But it can take as long as 20 hours!
    9. 9. 5 hours on average, when the storage works.
    10. 10. I got rid of the outliers. Am I normal now?
    11. 11. What about the rest of the servers?
    12. 12. Does our maintenance have impact on response times?
    13. 13. 0 2 4 6 8 10 12 14 RowLabels 24-SEP-0922.00.00 25-SEP-0902.00.00 25-SEP-0906.00.00 25-SEP-0910.00.00 25-SEP-0914.00.00 25-SEP-0918.00.00 25-SEP-0922.00.00 26-SEP-0902.00.00 26-SEP-0906.00.00 26-SEP-0910.00.00 26-SEP-0914.00.00 26-SEP-0918.00.00 26-SEP-0922.00.00 27-SEP-0902.00.00 27-SEP-0906.00.00 27-SEP-0910.00.00 27-SEP-0914.00.00 27-SEP-0918.00.00 27-SEP-0922.00.00 28-SEP-0902.00.00 28-SEP-0906.00.00 28-SEP-0910.00.00 28-SEP-0914.00.00 28-SEP-0918.00.00 28-SEP-0922.00.00 29-SEP-0902.00.00 29-SEP-0906.00.00 29-SEP-0910.00.00 29-SEP-0914.00.00 29-SEP-0918.00.00 29-SEP-0922.00.00 30-SEP-0902.00.00 30-SEP-0906.00.00 30-SEP-0910.00.00 30-SEP-0914.00.00 30-SEP-0918.00.00 30-SEP-0922.00.00 01-OCT-0902.00.00 01-OCT-0906.00.00 01-OCT-0910.00.00 01-OCT-0914.00.00 01-OCT-0918.00.00 Series5 Series7 Series9
    14. 14. Communicating Information
    15. 15. 0 10 20 30 40 50 60 70 80 90 100 AxisTitle Axis Title oracle
    16. 16. 0 20 40 60 80 100 120 oracle India Pakistan Singapore Kenya Sri Lanka Nigeria Hong Kong South Korea Japan El Salvador Jordan China United Arab Emirates Taiwan United States Guatemala Costa Rica Ecuador Russian Federation South Africa
    17. 17. 0 20 40 60 80 100 120 Russian Federation Costa Rica Ecuador United Arab Emirates Taiwan United States Guatemala China Jordan Japan El Salvador South Korea Hong Kong Nigeria Kenya Sri Lanka Singapore Pakistan India Oracle Google Searches - By Region, Normalized
    18. 18. 0 20 40 60 80 100 120 Russian Federation Costa Rica Ecuador United Arab Emirates Taiwan United States Guatemala China Jordan Japan El Salvador South Korea Hong Kong Nigeria Kenya Sri Lanka Singapore Pakistan India Oracle Google Searches - By Region, Normalized
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×