Primer: Data-Driven Startups


Published on

How can startups find data and use it to help their business?
Presentation for the Digital Incubation Center, Qatar Ministry of Transportation and Communications
Heather Leson
March 9, 2016

  1. 1. Primer: Data-Driven Startups Digital Incubation Centre, Ministry of Transportation and Communications Doha, Qatar Heather Leson March 9, 2016
  2. 2. Data Examples
  3. 3. Cultural: Data about cultural works and artefacts — for example titles and authors — and generally collected and held by galleries, libraries, archives and museums. Science: Data that is produced as part of scientific research from astronomy to zoology. Finance: Data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc). Statistics: Data produced by statistical offices such as the census and key socioeconomic indicators. Weather: The many types of information used to understand and predict the weather and climate. Environment: Information related to the natural environment such presence and level of pollutants, the quality and rivers and seas. Transport: Data such as timetables, routes, on-time statistics. Types of Open Data (Source:
  4. 4. Kasra and QCRI: Connecting Startups & Research
  5. 5. Metis: Collaborating with CMU to get data working within the privacy/security guidelines Academic Planning Made Easier.
  6. 6. Mumm: Connecting with the local Cairo data science community. Data for food.
  7. 7. Exantium: Strategy firm connecting open data to government and business. Part of a global network.
  8. 8. Data-Driven Recipes
  9. 9. 1. How to: Technical Training/Business for Data Literacy
  10. 10. 2. How to: Host a Data Expedition
  11. 11. Storyteller Role: Generate Ideas, interesting questions, help defining the questions and assist in the information products/story outputs. Scout Role: Scouts hunt down data from across the web. They can be non-technical or technical, depending on how difficult it is to obtain data (whether it is easily downloadable or needs to be scraped etc). Analyst Role: Analysts are the ones who crunch the data found by the scouts and test the hypotheses generated by the storytellers. “Engineers” (Optional) Role: create information outputs (varying degrees of technical from coding to using ‘off the shelf’ tools Designers Role: Beautify the outputs and make sure the story really comes through the data.
  12. 12. 3. How to: Data Clinics to connect entrepreneurs, business and government
  13. 13. Data Discovery
  14. 14. DIY Data: BQ Magazine’s Faces of Qatar
  15. 15. DIY Data: QCRI Social Computing Groundtruth Data Collection Phones, photos and food consumption for Health Monitoring
  16. 16. You are a Smart City: Create a local map dataset
  17. 17. Data Pipeline
  18. 18. Qatar Data Expedition
  19. 19. What are the questions you seek to answer? What is the license? Can you reuse/publish the data? Is the source credible? Is the data credible? Where did they get their data? How much time do I have to search? How am I organizing my research? Keen to learn more about verification? (it is in Arabic too!) Consider
  20. 20. Who is publishing about Qatar...on biodiversity? United States 7,440 occurrences, 97.77% geo- referenced. United Kingdom 832 occurrences, 8.29% geo-referenced. Sweden 620 occurrences, 0.32% geo-referenced. Netherlands 298 occurrences, 5.03% geo-referenced. Source: Global Biodiversity Information Facility
  21. 21. What about data on tourism? Source: Knoema Data Atlas, which aggregates the World Development Indicators, 2015 $6, 616,000,000 USD International Tourism expenditures for travel items (Time for more boutique travel startups)
  22. 22. World Bank UN Data UNESCO Institute of Statistics HDX WEF Forbes: Top 35 big data sources Visually: 30 places to find Open Data
  23. 23. Location Data OpenStreetMap: Free, open Dataset Get data: GADM: Administrative Boundaries Bing Imagery
  24. 24. Ministry of Development Planning and Statistics In economic statistics: Quarterly and annual Gross Domestic Product -GDP (constant and current) by economic activity Monthly, quarterly and annual Consumer Price Index, Production Price Index-PPI, Foreign Trade Statistics (import and export), Building permits In social statistics: Labor force statistics (through a labor force sample survey) Marriage, health, birth, fertility, education, disability, mortality statistics (in coordination with other ministries) In environmental statistics: Monthly rainfall, Monthly and annual average concentrations of air pollutants, Capacities of urban wastewater treatment plants In population statistics: Population growth rate, Population sex ratio
  25. 25. QALM portal (Qatar Information Exchange) QALM is an ambitious national project, developed by a number of government partners including: The General Secretariat for Development Planning, The Statistics Authority, The Supreme Council of Health, The Supreme Education Council, Supreme Council of Family Affairs, ictQATAR, Ministerial Cabinet and the Permanent Population Committee. Data is available in multiple formats! To get data from the Ministry of Development. Check their website. If you are looking for other data, they are an email away.
  26. 26. Using Data
  27. 27. Learn how:
  28. 28. "Expenditure Components Of GDP at Current Prices (Mn Qatari Riyal) Source - Ministry of Development Planning and Statistics " "",""," ",,,,,,,,,,,,,,,,,, "","","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013",,,,,"2014",,,, "","","Total","Total","Total","Total","Total","Total","Total","Total","Total","Q1","Q2","Q3","Q4","Total","Q1","Q2","Q3","Q4","Total" "Gross Domestic product","B.1G",115512.376669,162091.018049205,221610.304141365,290151.574403828,419582.826273579,355986.474251774,455445,618089.239045503,692654.670488044,186654.189573065,177830.42 0532429,185433.336051801,189857.929208376,739776,193880.888003083,189653.51105388,193080.129441538,194397.657502752,771013.233251822 "Household Final Consumption Expenditure","P.3a",20166,25889.8602243444,36186.326795032,49728.6119489121,64675.8351579253,68622.9919301139,73645.7899114015,79905.6820538706,87682.19979384,24130.4586981125,24802.4 947262859,23572.4447936237,26368.9939206421,98874.3921386642,26807.1948166319,27414.3657651239,26424.7106136522,28729.6901996358,109375.961395044 "Government Final Consumption Expenditure","P.3b",15094,23171.9888517611,32616.2047008325,35989.9119915317,42695.8750950427,55652.33697478,63689.0870608494,77007.4825664626,89527.4435418714,24336.9460716118,24384. 7648280038,24240.4862291342,25297.5589689309,98259.7560976807,26593.3225341388,26861.3831859924,27030.5661941075,27714.3396569197,108199.611571158 "Gross capital formation","P.5",36399.044558,55609.5389690997,92830.0390858622,133518.050463385,172523.116020611,152947.14534688,142449.123027749,177621.474425169,194347.357152333,49488.7848033409,4 9657.1609781394,58089.4050290433,60871.3763188034,218106.851763655,53389.3706523124,58868.7621027634,67296.8526337788,77579.6276461965,257731.66028562 "Exports (Goods & Services)- F.O.B","P.6",74122.332111,105496.630004,139210.733559638,174896,257467,182033,283832,442959.8,520182,141152,131890,134332,131751,539125,146457,134748,131592,116481,528682 "Imports (Goods & Services)-F.O.B","P.7",-30269,-48077,-79233,-103981,-117779,-103269,-108171,-159405.2,-199084.33,-52454,-52904,-54801,-54431,-214590,-59366,-58239,-59264,-56107,-232976 "*Figures for 2013 & 2014 are Preliminary estimates Powered by © QALM" Census data extracted...not usable yet..
  29. 29. Qatar Census (Source: Doha News 2016)
  30. 30. South African Census Data
  31. 31. Open Refine Sublime Text There are many tools for software developers and data scientists too. Note: you still need the Human API to analyze and make decisions for your business. Of course, if you can afford it, then you can get your business intelligence from KPMG, Gartner, Bloomberg, McKinley or PWC. Until then…. Some tools to Clean Datasets Learn more with Lillian and her online courses.
  32. 32. Tools for Charts, Graphs and Infographics More LMGTFY: 712402 (source: TuktukDesign, Noun Project ccby)
  33. 33. Map tools Mapbox: CartoDB: Leaflet: Google: ARCgis: Time mapper: Also: if you are collecting your own location data, try Field Papers or crowdsource map photos with Mapillary. (They just got 8M funding!) (source: Mister Pixel, Noun Project, ccby)
  34. 34. QCRI Combining Data Sources: Real-Time Traffic Monitoring ● Collection and classification of traffic related tweets (script, research tool) ● Continuous Real-time querying of Google Traffic API ● Qatar Traffic Profiling & Modeling ○ Geo: City, zone, district ○ Time: Hourly, daily, weekly, and monthly ● Usage: ○ Detection of abnormal behaviors ○ Predictions ○ Monthly Public reports ■ Commute status ■ Deadpoints
  35. 35. The best way to learn is to find data and make data information products. Try to recreate the diagrams and track back the data. Track how other startups use data. Copy. Remix.
  36. 36. Social Entrepreneurship & Social Good
  37. 37. Impact of Data-Driven Business You know your business. Data can give you a leading edge. Be a Data-Driven Startup. Some reading: ODI Report: The Economic Impact of Open Data ODI - Open Data Means Business How to build a business from Open Data (1) How to build a business from Open Data (2) OpenMENA - 19 studies on Open Data
  38. 38. ABC: Always be Charging How can you have a Data-Driven Career? What is your Data Plan for your startup? Can you use Data-Driven Journalism techniques to improve your business? What kind of data do you need to grow your business? What type of training do you want/need?
  39. 39. Thank you @heatherleson @qatarcomputing