Slidedeck covering 3-4 projects:
1. Susanne: susanne.bitballoon.com. World Recession Susceptibility Analysis. More information here: https://www.slideshare.net/AdityaGupta91/making-susanne
2. LinkedIn's Site Analytics Platform
3. India's (Planning Commission of India - Niti Aayog) energy planning tool: indiaenergy.gov.in -> http://iess2047.gov.in/
4. Finally, World-stats. Connecting world bank database to R.
2. OVERVIEW
Overview of the BTP
(and Recap)
IESS 2047 and Susanne
wo(R)ldata
Demo (Questions)
Architecture
2
Life of a bug
Key Challenges / Solutions
Snippets / Pain Points
Updates on Report
Roadmap / Ideas / Todo
4. IESS 2047
http://www.indiaenergy.gov.in/, Energy Department Project
Dashboard for policy makers focussing on long term energy
sustainability.
Offers demand vs. supply (renewable and non-renewable) policy
handles (with 4 levels from status-quo to aggressive policy)
See real-time 5-year graphical projections (upto 2047)
Projections available for Environmental Impact, Import Dependency,
Supply-Demand Maps, Land-use (and requirement), etc.
4
5. CONTRIBUTIONTO IESS 2047
Was given a UK code version and an Indian Model (Excel)
Was responsible for successful deployment of v1 for India in a 2-
person engineering team
Evaluated
5
6. SUSANNE
World Recession SUSceptibility ANalysis
project taken under Statistical Computation with Dr. Ashwin
Decided to study:
Can we predict recession susceptibility (based on World Bank data)?
What is recession susceptibility?
Effect of econ. variables on Recession Susceptibility?
Built a discretised SVM model with more than 90% accuracy
Presented all results on susanne.bitballoon.com (no backend)
6
7. WO(R)LDATA
https://github.com/ca9/world-stats/
Concept: Be able establish statistical relationships between economic
variables of choice within seconds.
Why: Aids policy-making, preliminary analysis.
Value Add: Easy and intuitive. No need to code, wrestle with
data, or R (policy makers quite likely wont).
Under the Hood: Powerful R models invoked. Graphing, Charts,
Basic Analysis - all done for the user.
7
9. (SHORT, PLACEHOLDER, QUICK AND DIRTY)
ANSWERS
Corruption percentile rank seems to be very highly associated with high Income.
[High Percentile means low corruption].
It accounts for 55% of the variation in Income.
LDA predicts Income Quintile with 61% accuracy based on just Corruption
percentile.
9
10. (AND MORE)
ANSWERS
In contrast, education expenditure shows no great linkage to Income.
It accounts for roughly 1.4% variation in Income.
Unsurprisingly, US and EU have the highest income levels.
However, education expenditure level is more evenly distributed (Ex. Namibia)
10
11. OTHER QUESTIONS
Does Income predict corruption!?
How do these vary over the years?
Takes a moment to find out…
11
12. ANSWERS
A lot more data is instantly unlocked.
South African countries have very low
corruption too.
Income strongly related to low
corruption too.
Both values appear relatively
stable over the years.
Income shows a slight upward trend.
12
14. LIFE OF A BUG
Result Indicator was not the chosen one (tried multiple indicators after a while)
Did the angular code collect data correctly?
Was the right request sent to the server?
Did the flask server unpack correctly?
Did it siphon data to R correctly?
Am I reading R results correctly?
Am I resending R data correctly?
Am I unpacking back on the client correctly?
Is this merely a library/directive bug?
14
15. KEY CHALLENGES
Flask-Angular Compatibility
Theft and repeated loss of work
Absence of a unix-only dev. machine (with 2GB+ RAM).
VM issues.
Unforeseen Engineering/Design Challenges:
ManagingYears, Countries, Asynchronisity (Dynamic Models), Data Formatting, Size of
Data, Multi-software/library Dependencies, Manual Merges (From caching), Cookie Size,
Continuous Classification, Default Behaviour, Redesigns
Example: Dataframe Caching, NA removal support, Session/Caching redo.
15
16. KEY SOLUTIONS
Server Side sqlite Sessions
koding.com, Flask, rpy2, pandas, wbdata
flask-triangle
Cached Result for Frontend
Firebug (Scope), Flask Context,
Angular-Debugger, Pycharm-debugger
19. EXPLANATIONS
Possible causes of above error:
Unicode issue. R. An R library (‘kvsm’,‘MASS’). Python (or Flask).
A Python library (like Pandas). Pycharm (or the Debugger).
wbdata? rpy2?
A bad int/float/str? Bad NA values? Data specific?
Quintile variables? Exponentials?
System[Socket Broken] - Out of Memory?
19
20. UPDATES SINCE REPORT
Added LDA.
Added a world map, for visual effect.
Cleaned up UI significantly, including
selector for the map.
Added PCA support (Unused)
Tested SVM (currently not supported by
backend).
Set up aVM:
http://uakk127b233d.aditya11009.koding.io:5000/
Added a timeline average bar-graph
20
21. TODO, ROADMAP, IDEAS
Spreadsheets Plugin - export, and edit
IndependentTools (Collector / Editing / R-Console)
R Graphics (Non-interactive Bitmaps)
Country/Subregion Selector. DissimilarYears. NA support.
More Result windows. Greater granularity on results.
Better variable accumulation and UI.
21