SlideShare a Scribd company logo
1 of 30
Data Prospecting
a.k.a Adventure in 1st year of sinar project
Who am I?
•   Software Developer in OnApp at day
•   Code Monkey in sinar project at night
Who is sinar project
What we trying to do
Why?
We go to the field
•   API
•   Scraping
•   Crowdsourcing
API
If there is no API...
So a scraper we make
Scraperwiki
Output
There is this little problem...
Thats where the cavalry comes in
MyMP
Crowd Computing!
This little problem
Lack of information.
Fatigue
Outside of crowdsourcing
•   Buy
•   Ask
Worst Case
After data is gathered
Result of Processed Data
How we use it
Even direct to app
Then we can use with
What next?
•   Currently maintain the existing project
•   Add more dataset,
•   Engagement with other civil society
•   Engaging volunteers, but we can be
    selective on who
•   Find funding(we are working on it!)
Want to help?
•   Before this
•   Get to know group involved.
•   Join meetups
•   Understand the issues at hand
•   It helps a lot.
Now you can start
Fix My Street + Crime Dataset
Contribute a scraper
Fork our code and add feature
Thanks for listening
           Find us at
        sinarproject.org
     team@sinarproject.org

More Related Content

Viewers also liked

Solucionario 4 b completo
Solucionario 4 b completoSolucionario 4 b completo
Solucionario 4 b completogoogle
 
Solucionario 4 b completo
Solucionario 4 b completoSolucionario 4 b completo
Solucionario 4 b completogoogle
 
Aula 3 Relatividade 2008
Aula 3   Relatividade 2008Aula 3   Relatividade 2008
Aula 3 Relatividade 2008João Lopes
 
Innovation Presentation To PTC Boston
Innovation Presentation To PTC BostonInnovation Presentation To PTC Boston
Innovation Presentation To PTC BostonPeter Doolan
 
Civic tech in malaysia and beyond
Civic tech in malaysia and beyondCivic tech in malaysia and beyond
Civic tech in malaysia and beyondswee meng ng
 
Build website in_django
Build website in_django Build website in_django
Build website in_django swee meng ng
 
How we use Bottle and Elasticsearch
How we use Bottle and ElasticsearchHow we use Bottle and Elasticsearch
How we use Bottle and Elasticsearchswee meng ng
 
Oracle ExaLogic Overview
Oracle ExaLogic OverviewOracle ExaLogic Overview
Oracle ExaLogic OverviewPeter Doolan
 
Enterprise Architecture Salesforce
Enterprise Architecture SalesforceEnterprise Architecture Salesforce
Enterprise Architecture SalesforcePeter Doolan
 
PDoolan Oracle Overview PPT Version
PDoolan Oracle Overview PPT VersionPDoolan Oracle Overview PPT Version
PDoolan Oracle Overview PPT VersionPeter Doolan
 
a hands on guide to django
a hands on guide to djangoa hands on guide to django
a hands on guide to djangoswee meng ng
 

Viewers also liked (11)

Solucionario 4 b completo
Solucionario 4 b completoSolucionario 4 b completo
Solucionario 4 b completo
 
Solucionario 4 b completo
Solucionario 4 b completoSolucionario 4 b completo
Solucionario 4 b completo
 
Aula 3 Relatividade 2008
Aula 3   Relatividade 2008Aula 3   Relatividade 2008
Aula 3 Relatividade 2008
 
Innovation Presentation To PTC Boston
Innovation Presentation To PTC BostonInnovation Presentation To PTC Boston
Innovation Presentation To PTC Boston
 
Civic tech in malaysia and beyond
Civic tech in malaysia and beyondCivic tech in malaysia and beyond
Civic tech in malaysia and beyond
 
Build website in_django
Build website in_django Build website in_django
Build website in_django
 
How we use Bottle and Elasticsearch
How we use Bottle and ElasticsearchHow we use Bottle and Elasticsearch
How we use Bottle and Elasticsearch
 
Oracle ExaLogic Overview
Oracle ExaLogic OverviewOracle ExaLogic Overview
Oracle ExaLogic Overview
 
Enterprise Architecture Salesforce
Enterprise Architecture SalesforceEnterprise Architecture Salesforce
Enterprise Architecture Salesforce
 
PDoolan Oracle Overview PPT Version
PDoolan Oracle Overview PPT VersionPDoolan Oracle Overview PPT Version
PDoolan Oracle Overview PPT Version
 
a hands on guide to django
a hands on guide to djangoa hands on guide to django
a hands on guide to django
 

Similar to Data prospecting

Distants indroduction
Distants indroductionDistants indroduction
Distants indroductionEd
 
Find my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentFind my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentPaulGrenyer1
 
Open Data Business Models - OSCON 2011
Open Data Business Models - OSCON 2011Open Data Business Models - OSCON 2011
Open Data Business Models - OSCON 2011lukec
 
Lean for Social Good 101
Lean for Social Good 101Lean for Social Good 101
Lean for Social Good 101Leah Neaderthal
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?DoThinger
 
The Who, What, Where, When, Why, and How of APIs
The Who, What, Where, When, Why, and How of APIsThe Who, What, Where, When, Why, and How of APIs
The Who, What, Where, When, Why, and How of APIsJavaun Moradi
 
Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014Andy Kriebel
 
The Miracle Mile Paradox ARG Case study
The Miracle Mile Paradox ARG Case studyThe Miracle Mile Paradox ARG Case study
The Miracle Mile Paradox ARG Case studyApril Arrglington
 
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"UX Antwerp Meetup
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
South Carolina Association of Volunteers
South Carolina Association of VolunteersSouth Carolina Association of Volunteers
South Carolina Association of VolunteersTina Arnoldi, MA, LPC
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Strangler application
Strangler applicationStrangler application
Strangler applicationrajivnarula
 
How to work with virtual volunteers
How to work with virtual volunteersHow to work with virtual volunteers
How to work with virtual volunteersTemi Adewumi
 
SocialMatica - 3 Audience Building Tools You Must Have
SocialMatica - 3 Audience Building Tools You Must HaveSocialMatica - 3 Audience Building Tools You Must Have
SocialMatica - 3 Audience Building Tools You Must HaveSocialmatica
 
It Only Takes a Minute
It Only Takes a MinuteIt Only Takes a Minute
It Only Takes a Minuteelliottofhook
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016LibreCon
 

Similar to Data prospecting (20)

Distants indroduction
Distants indroductionDistants indroduction
Distants indroduction
 
Find my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product developmentFind my tea [sync ipswich] a technical journey through new product development
Find my tea [sync ipswich] a technical journey through new product development
 
Open Data Business Models - OSCON 2011
Open Data Business Models - OSCON 2011Open Data Business Models - OSCON 2011
Open Data Business Models - OSCON 2011
 
Lean for Social Good 101
Lean for Social Good 101Lean for Social Good 101
Lean for Social Good 101
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
 
The Who, What, Where, When, Why, and How of APIs
The Who, What, Where, When, Why, and How of APIsThe Who, What, Where, When, Why, and How of APIs
The Who, What, Where, When, Why, and How of APIs
 
Connr
ConnrConnr
Connr
 
Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014Tableau @ Facebook - Summer 2014
Tableau @ Facebook - Summer 2014
 
The Miracle Mile Paradox ARG Case study
The Miracle Mile Paradox ARG Case studyThe Miracle Mile Paradox ARG Case study
The Miracle Mile Paradox ARG Case study
 
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"
UX Antwerp Meetup June 2018 - "Design Thinking a Festival Event"
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
SC Association of Volunteers Presentation
SC Association of Volunteers PresentationSC Association of Volunteers Presentation
SC Association of Volunteers Presentation
 
South Carolina Association of Volunteers
South Carolina Association of VolunteersSouth Carolina Association of Volunteers
South Carolina Association of Volunteers
 
Six Months In: Caravan Studios Update
Six Months In: Caravan Studios UpdateSix Months In: Caravan Studios Update
Six Months In: Caravan Studios Update
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Strangler application
Strangler applicationStrangler application
Strangler application
 
How to work with virtual volunteers
How to work with virtual volunteersHow to work with virtual volunteers
How to work with virtual volunteers
 
SocialMatica - 3 Audience Building Tools You Must Have
SocialMatica - 3 Audience Building Tools You Must HaveSocialMatica - 3 Audience Building Tools You Must Have
SocialMatica - 3 Audience Building Tools You Must Have
 
It Only Takes a Minute
It Only Takes a MinuteIt Only Takes a Minute
It Only Takes a Minute
 
Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Data prospecting

Editor's Notes

  1. We are a group of concerned citizen that decides to use technology to make govt process more transparent. We are also have interest in open data, and understand what open data can do.
  2. We try to use technology to create a transparent government, and with the help of civil society we collaborate with, make citizen involve in the process. We also interested in bring the data to software developers, to be used to make app, or anything related.
  3. Because - It help reduce corruption - We can use government data to do many thing, to provide data for apps, etc. Good goal except - The govt don't have an open data policy - Nor we have freedom of information act - Some data is well hidden(not many people know) - Some incomplete - Many don't exist
  4. To jump start it, we start by the following process
  5. - Lets start with API. API should be familiar to most. - There is not many API usable for sinar purpose, some is noisy, other is in free text(hard to parse) - Maps is somewhat of an exception, but we still lack a big number of information needed for many project, such as boundary etc. - Business information on map exist, but comes licensing issues is a concern, for example reusing foursquare data, and reusing google geocoding api outside of map. There is clauses against this. - World Bank is the true exception, excellent data source with permissive licence
  6. - Since the govt don't have API. - We going to scrape it. You will be surprise what kind of information is available on websites. - For those that exist, many is incomplete, but some can be use as a seed for a bigger project. For example * parliament with mp http://www.parlimen.gov.my/index.php?modload=ahlidewan&uweb=dr bills http://www.parlimen.gov.my/index.php?modload=document&uweb=dr&doc=bills * AG chambers site with some court case: http://www.agc.gov.my/index.php?option=com_content&view=article&id=175&Itemid=63&lang=en gazzette: http://www.federalgazette.agc.gov.my/ * ministry of health have, medical device: http://www.mdb.gov.my/mdb/index.php?option=com_content&task=view&id=20&Itemid=65 Medicine price: http://www.pharmacy.gov.my/index.cfm?&menuid=154&parentid=163&lang=EN
  7. - A scraper is a script that extract data from webpage and convert it into a structured format - It can practically written in most programming language, store as file or in database. - Most of our scraper uses python, simply because it is a language we are comfortable with. Above is our MP scraper, our early mp
  8. - open data is one of our goal - data need to shared outside. - scraperwiki is a solution - Free, provide storage, host a scraper, schedule jobs to run scraper. - Many open data project use it
  9. - Scraper output can be in json, csv, - the first MP and CIDB is in csv form. - We also use a database, billwatcher is an example. Billwatcher also use elasticsearch, for search - Above is one of our earlier scraper https://scraperwiki.com/scrapers/malaysian_mp_profile/ - The data can be downloaded on the link
  10. Scraping can only get us that far, - the data can be incomplete. - But most of the time, the data simply not available, crime data is one. - Sometime if the data exist, it is in a hard to process format. PDF, excel, video - Some data is scattered around, MyMP is such. Not very easy to write a scraper for this.
  11. That is when we ask for help. - People can help a lot better compared to computer - The bonus from asking for help is, we can get real experienced people worked on a problem, especially when we approach civil society working on a issue. Our first experiment to ask for help is MyMP.
  12. - MyMP is a project with collaboration with Undimsia. - It can be found at http://reps.sinarproject.org/ - We are collecting MP information for voter education. - A big part of information comes from interview, internet search. This is powered by plone a CMS.
  13. We manage to get quite a number of mp information out. So technology is not the issue.
  14. - Lack of information however is a big issue. In this case, MP not approachable, no information online etc. - It got too hard, volunteer tend to leave. - We realized that this is a serious research task, in which people pay researcher for. - This still going on though a bit slower.
  15. Other method to get data - Some information can be bought, SSM again is a good example. Is not scalable if from own pocket - We can try to ask, we know some initiative is successful in asking. But we are a very small group. - Though NGO might have data somewhere, which is why we are try collaborate with more groups for this.
  16. It just means the data ends up in a blackhole, or simply don't exist.
  17. After data gathering is completed. - We will need to process data a bit. - For example, the cidb data set on the screen is a list of documents, that is harder to process than say a flat json or csv. - In fact we are putting it into google fusion table, it is nicer to flatten it. - This is done in a few way, we have a script for this.
  18. This is from our CIDB data on googlefusion table, show the CSV content generated from processing json previously. The script to generate csv is in https://github.com/Sinar/cidb_json2db/blob/master/json2db.php Written in php, convert the field name, take the json and split into different CSV
  19. In the end we can use this to feed into an application, for example that is our CIDB Data on our fusion table. With fusion table doing their magic. Project Dataset https://www.google.com/fusiontables/DataSource?docid=1nTiuWSBXqvqphUj9l5axW496WJiFa51Uhw18T7g Director Dataset https://www.google.com/fusiontables/DataSource?docid=10WxkMewqZS7i67Qg-Hyknwx2_UdTKjnVqU9sgzA Company Dataset https://www.google.com/fusiontables/data?docid=1D4uCH96DRabvOIkUTaAEVxNKvpoIcbQCFkf4OaQ
  20. or make a new application from the data. The billwatcher is build on bill dataset we scraped, http://billwatcher.sinarproject.org/ https://github.com/sinar/Malaysian-Bill-Watcher
  21. We encourage people to use the tool of their choice to make use of the data.
  22. Groups like undimsia have been working on issues for sometime, undimsia involve in voter education, transparency international in corruption etc. Join in the meetup, get involved, understand how they work. What we learn is tech is not everything, but tech can help them a lot. But first understand these groups, don't just push tech because it is cool. Their events can be fun http://www.undimsia.com/ http://www.loyarburok.com/
  23. - We need Malaysian contribution to OpenSpending, a project to keep track of govt project - Pretty easy, but tedious, you need to read the budget and add into google spreadsheet or produce a CSV - The openspending.org have the guide at http://openspending.org/help/index.html
  24. - We need a FixMyStreet Style project to look at issue on the street - Easy to start now, use crowdmap, it is a hosted Ushahidi instance, which is well known among open data community. - The same project can be use to track crime - The image is for crowdmap project. - Recommended because it have a proper API, allow reuse. Crowdmap is at https://crowdmap.com/ The example project https://klatm.crowdmap.com/
  25. Write scaper and get the data released.
  26. - Fork our code and add feature. - All our project is open source, we try to be clear with license - Though we tend to be biased toward python and rails and plone. - Our focus is maintenance now. We are reluctant to add new app. - But if you are willing to maintain it, join us! In fact billwatcher have a few enhancement comes from volunteer, for example the model code is fixed by volunteer.
  27. Thats all from me, QnA at the end of the webcamp, find us at sinarproject.org or team@sinarproject.org