Data visualisation is about telling picture stories using numbers.In the next few minutes, we’ll cover the history of data visualisations, and some examples I’ve been working on.My name is Anand. More about me at www.s-anand.net
This is a data-generated map of London. Red spots indicate where photos on Flickr were taken. Blue spots indicate Twitter messages.Just with this, you can already see the streets, the river Thames, the tourist spots and the business districts.
The earliest data visualisation was shown by Florence Nightingale to Queen Elizabeth during the 100 Years War.The red shows deaths from wounds. The blue shows deaths from illnesses.The Queen got the point, started funding hospitals more, and England won the war.
When Cholera struck London, water wasn’t known as the carrier.Dr Snow plotted a map showing cholera incidents along with distance to water pumps.He identified a damaged pump as the source of the disease, and saved thousands of lives.
That was how data visualisation began. But WHY do we need it? Aren’t numbers obvious?Take a look at the price and sales by city on this table.Every column has the same average, and the same variance. So are they similar?
No. Each city has a very distinct pattern.But it’s not easy to spot this pattern with just the numbers and the averages.If there’s ONE rule you want to remember from this talk, it is: Do NOT trust averages. Always plot it.
One day, a senior electricity board official said to us, “We know our meter readings have a lot of fraud in them.But when we go to the Union, they ask for proof. It’s sure to be there somewhere in our data. But it’s too large, and we don’t know how to analyse it.Can YOU help us?
We plotted each of the 200 billionreadings, and got what looks like a smooth lognormal curve.But with spikes – at exactly the slab boundaries. People with a reading of 100 pay bills at a lower rate than those with a reading of 101.
And this isn’t randomly spread out. There are SPECIFIC people whose meter reading is consistently at the slab boundaries.The first row, for instance, is a famous personality, and her reading shows 200, 200, 200, 200…So do a lot of others’.
We also showed the degree of fraud by geographic sections. Section 1 has very high fraud.Section 5’s fraud fell dramatically in Jun, and shot back up in September.That happens to be EXACTLY when a particular section manager was transferred in, then out.
In another example I worked on, a bank approached us and said, we want to find patterns in currency, stock and commodity prices.Specifically, how do they move with each other? Are there blocks of securities that are related? Can you show it in a visually obvious way?
This has 19 securities and their correlations with each other.The Australian Dollar and the Euro have a correlation of 68%, and that’s the plot of their values over time.The green indicates a positive correlation. The red indicates a negative one.
You can now see two big blocks of securities.The S&P, the FTSE, the BSE, and for some reason, the Pakistani Rupee.The Singapore Dollar, the Japanese Yen, Gold, Swiss Franc and the Chinese Yuan.They move together with each other, but when any one block goes up, the other block is sure to go down.
The Tamil Nadu education department shared with us the marks of every single student over the last 3 years.I tried to see if I could predict their marks. Does gender make a difference? Does subject matter?Also silly things like whether the first letter of the name matters, and whether the sun sign matters.
The sun sign matters a LOT. August borns score a good 10% more than June borns, and this is statistically extremely significant.You can see the same pattern every year, in every district, in every class, in every subject.The reason was clear in retrospect, but I’ll let you guess it.
We plotted that across a number of cities.Bangalore – dense network with a central connected component.Chennai and Pune – not so well connected, but not too bad either.The other cities barely have a network.And you can FEEL this if you visit the cities and talk to geeks.
Who’s the best Indian one-daybatsman? The size represents every run ever scored. The colour represents speed. Red is slow, green is fast.Sehwag’s very fast – but so was Kapil, especially for his time.
This is a drilldown, showing every single match they played.With this, you’ll be able to see who the consistent players are, and where exactly their runs came from.You can also click to see that particular match statistics.
All of these visualisations, and the picture stories they told, were generated purely using numbers, and only using programs, with not one bit of manual adjustment.For more such, andmore details about these, here are the links.
DETECTING FRAUD “ We know meter readings are incorrect, for various reasons. We don’t, however, have the concrete proof we need to start the process of meter readingENERGY UTILITY automation. Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.
SECURITIES FINDING PATTERNS Which securities move together? How should I diversify? What should I sell to reduce risk? What’s a reliable predictor of a security?
68% correlation between AUD & EURPlot of 6 month daily AUD - EUR values
PREDICTING MARKS What determines a child’s marks? Do girls score better than boys? Does the choice of subject matter?EDUCATION Does the medium of instruction matter? Does community or religion matter? Does the first letter of their name matter? Does their sun sign matter?
… and peaksBased on the results of the 20 lakh for Sep-bornsstudents taking the Class XII exams The marksat Tamil Nadu over the last 3 years, shoot up for Aug bornsit appears that the month you wereborn in can make a difference of asmuch as 120 marks out of 1,200. 120 marks out of 1200 explainable by month of birth June borns score the lowest An identical pattern was observed in 2009 and 2010…“It’s simply that in Canada the eligibilitycutoff for age-class hockey is January 1. Aboy who turns ten on January 2, then,could be playing alongside someone whodoesn’t turn ten until the end of the year—and at that age, in preadolescence, atwelve-month gap in age represents anenormous difference in physical maturity.” -- Malcolm Gladwell, Outliers … and across districts, gender, subjects, and class X & XII.
EXPLORING RELATIONS This is the social network of programmers across various Indian cities, using the follower network at Github.com – a Facebook for developers. Each circle represents a coder. The size shows their number of followers. The colour shows the language they develop in.NETWORKS The lines show whom they follow.