The document discusses analyzing code repositories on GitHub to examine programming language trends across European countries. It summarizes that GitHub data, along with public databases like DBpedia and BigQuery, can be queried through APIs to study the relationship between languages and macroeconomic indicators like GDP, unemployment, and debt. Programming languages are found to correlate with country wealth levels, with richer countries preferring languages like Clojure and poorer countries using languages like PHP and JavaScript more frequently.
2. Motivation: No code rule
“Data beers” rules do not allow to show code during talks.
But data is the same as code!
(cf. Lisp homoiconicity, Unix “rule of representation”)
1 / 16
3. Where is there a lot of code? Github
is the VCS developed by the Linux kernel team
is a Git repository web-based hosting service
2 / 16
4. Inspiration: Blatt maps
U.S. Census Bureau data on second languages in
American households 1
1
http://gizmodo.com/the-most-common-languages-spoken-in-the-u-s-state-by-1575719698
3 / 16
10. Processing Github information
Github offers a REST API, but it has rate limits
GitHub Archive publishes all public commits in hourly
archives
Google BigQuery has the Github timeline as public data
9 / 16
11. Which countries are there in Europe?
There may be new countries:
There may be less countries:
A solution: DBpedia and SPARQL
DBpedia has a SPARQL endpoint to receive queries. There
are wrapper libraries
10 / 16
12. No Twitter
Quite tired of people categorizing tweets. There are many
APIs out there!
Do not worry, we are still going to get rich! → using World
Bank macroeconomic data 2
2
Sherouse, Oliver (2014). Wbdata. Arlington, VA. Available from http://github.com/OliverSherouse/wbdata.
11 / 16
14. corr(GDP, language)
Figure: Pearson correlation of GDP with language preference 3
3
Negative values denote a language used in richer countries; a low value in the language precedence means a
higher place in the language preference list for a country.
13 / 16
15. corr(unemployment, language)
Figure: Pearson correlation of unemployment with language preference4
4
Positive values show preferred languages in countries with low unemployment
14 / 16
16. corr(debt, language)
Figure: Pearson correlation of total government debt as % of GDP with language preference5
5
Positive values show preferred languages in countries with low debt
15 / 16
18. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
16 / 16
19. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
16 / 16
20. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
16 / 16
21. Take away messages
Data talk about code!
SPARQL and other APIs: all data is on your laptop
BigQuery and other tools: your laptop controls clusters
All languages are beautiful
but do not program in OCaml if you can avoid it
16 / 16