Githubland
Luis Osa <luis.osa.gdc@gmail.com>
November 2014
0 / 16
Motivation: No code rule
“Data beers” rules do not allow to show code during talks.
But data is the same as code!
(cf. Lisp homoiconicity, Unix “rule of representation”)
1 / 16
Where is there a lot of code? Github
is the VCS developed by the Linux kernel team
is a Git repository web-based hosting service
2 / 16
Inspiration: Blatt maps
U.S. Census Bureau data on second languages in
American households 1
1
http://gizmodo.com/the-most-common-languages-spoken-in-the-u-s-state-by-1575719698
3 / 16
European (programming) languages
Figure: Most popular languages
4 / 16
European (programming) languages
Figure: The problem is Octopress
5 / 16
European (programming) languages
Figure: Most popular languages excluding JavaScript
6 / 16
European (programming) languages
Figure: The problem is the web
7 / 16
European (programming) languages
Figure: Most popular languages excluding JavaScript and PHP
8 / 16
Processing Github information
Github offers a REST API, but it has rate limits
GitHub Archive publishes all public commits in hourly
archives
Google BigQuery has the Github timeline as public data
9 / 16
Which countries are there in Europe?
There may be new countries:
There may be less countries:
A solution: DBpedia and SPARQL
DBpedia has a SPARQL endpoint to receive queries. There
are wrapper libraries
10 / 16
No Twitter
Quite tired of people categorizing tweets. There are many
APIs out there!
Do not worry, we are still going to get rich! → using World
Bank macroeconomic data 2
2
Sherouse, Oliver (2014). Wbdata. Arlington, VA. Available from http://github.com/OliverSherouse/wbdata.
11 / 16
Google Correlations
Figure: “Clojure programming destroys jobs”, Del Cacho, Carlos, 2014
12 / 16
corr(GDP, language)
Figure: Pearson correlation of GDP with language preference 3
3
Negative values denote a language used in richer countries; a low value in the language precedence means a
higher place in the language preference list for a country.
13 / 16
corr(unemployment, language)
Figure: Pearson correlation of unemployment with language preference4
4
Positive values show preferred languages in countries with low unemployment
14 / 16
corr(debt, language)
Figure: Pearson correlation of total government debt as % of GDP with language preference5
5
Positive values show preferred languages in countries with low debt
15 / 16
Take away messages
Data talk about code!
16 / 16
Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
16 / 16
Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
16 / 16
Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
16 / 16
Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
but do not program in OCaml if you can avoid it
16 / 16

[Databeers] 27-11-2014 “Githubland”. Luis Osa