2. Our Speakers
Dennis Brink, Executive Director of Canadian Open
Data Institute
Trish Garner, Manager at Open Data at City of
Toronto
Mark MacDonnell, Software Developer at SELA
Canada
Adam Muise, Principal Architect at Horton Works
Jason Lavigne, Founder & CEO, Black & White Logic
3. Open Data – What is it?
Dennis Brink
Executive Director, Canadian Open Data
Institute
IAMCP Director and RIC Advisor
5. Open Data Definition from
Open Knowledge Foundation (OKF)
• https://okfn.org/opendata/
‘Open knowledge’ is any content,
information or data that people are
free to use, re-use and redistribute
— without any legal, technological
or social restriction.
6. Creative Commons & License Types
• Attribution-ShareAlike 4.0 International
• “CC BY-SA 4.0”
• Attribution — You must give appropriate
credit, provide a link to the license, and indicate
if changes were made.
• ShareAlike — If you remix, transform, or build
upon the material, you must distribute your
contributions under the same license as the
original.
7. Creative Commons & License Types
• Attribution-ShareAlike 4.0 International
• “CC BY-SA 4.0”
• You are free to:
• Share — copy and redistribute the material
in any medium or format
• Adapt — remix, transform, and build upon
the material for any purpose, even
commercially.
• The licensor cannot revoke these freedoms
as long as you follow the license terms.
8. Examples of Open Data
Not For Profit’s
Sunlight Foundation (US)
sunlightfoundation.com
The Open Data Institute (UK) theodi.org
10. Example of Federal Data - Census
http://data.gc.ca/data/en/dataset/e3586bbf-93b8-40e8-8b5a-
14023e1e705e
Population by home language, by province and territory (2011 Census)
Language spoken most often at home[1] Canada
number
Total 33,121,175
English 21,457,075
French 6,827,865
Non-official language 3,673,865
English and French 131,205
English and non-official language 875,135
French and non-official language 109,705
English, French and non-official language 46,325
[E] : use with caution. 1. Refers to the language spoken most often at home by the in
11. Example of Provincial Data - baby
names
http://www.ontario.ca/government/ontario-top-baby-names-male
12. Example of City Data - parking
tickets
http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=9e56e0
3bb8d1e310VgnVCM10000071d60f89RCRD
13. Shawn Petersen aka “Saint John Shawn”
• Propertize.ca a New Brunswick property tax
assessment comparison tool.
• location detection to see all nearby properties
Address PID PAN Assessment (Year, Amount, Levy) Change Last Sale(s) Property Description Tax Class
228 Lancaster
Avenue
33357 1698246 2013 - $633,000.00 - $30,386.54
2012 - $600,900.00 - $29,341.35
2011 - $573,400.00 - $27,998.55
5.34% PARTS DEPOT Fully Taxable
266 Lancaster
Avenue
33225 1698115 2013 - $358,300.00 - $11,645.83
2012 - $340,600.00 - $11,274.88
2011 - $317,900.00 - $10,523.45
5.20% APT HOUSES & LAND Fully Taxable
Lancaster Avenue 55146310 5103560 2013 - $113,500.00 - $3,633.94
2012 - $111,900.00 - $3,649.85
2011 - $103,900.00 - $3,388.92
1.43% PARK AREA Fully Taxable
248 Lancaster
Avenue
55012132 1701756 2013 - $736,900.00 - $35,374.15
2012 - $732,800.00 - $35,781.89
2011 - $722,100.00 - $35,259.43
0.56% BOWLING CENTRE Fully Taxable
14. Data Marketplace – Microsoft
http://datamarket.azure.com/browse/data?price=paid
15. Big Data vs. Open Data
How are they Different?
• Look at the 3 V’s in a Big Data
World
16. Big Data vs. Open Data
• Volume
– ever increasing
– storage issue
• Variety
– unstructured
– external
– social/mobile/IoT’s
• Velocity
– daily or live streaming
17. Big Data vs. Open Data
How are they Different?
• All V’s are different
• Census:
– static, structured, and 60 GB in
size
18. NYU’s GovLab Survey
• OpenData500.com
• Survey of companies using Open Data for commercial
purposes.
• Attempting to expand to other countries.
–We want Canada represented!
31. Making Money with Open
Data
Mark MacDonnell, Software Developer, SELA
Diamond Program
32. What is (good) open data?
• Has a license declaring it open
• Can be accessed freely and easily
• Provided in a standard, structured way
• Reliable
33. Why publish your data?
• Provide a data subset or sample
• Allows you to focus on core
services
• Bring attention to your services
• Better information sharing
• Foster more open data
34. Why consume open data?
• Avoid maintenance costs
• Amalgamate information
• Support your own data
• Foster more open data
37. Getting the word out
• To developers
– Blogs
– Social media
– Emails
• To the public
– Press
releases
– News
conferences
To data owners
◦ Emails
◦ Social media
◦ Comments
To the public
◦ Apps
◦ Blog
To data owners
◦ Contact
◦ FAQs
To developers
◦ Reviews
◦ Downloads
◦ Un-installs
Data Owners Developers The Public
38. What to do?
• Have a clear audience
– Don’t forget the other audience
• Be proactive
• Say where you’re going
– Secrets don’t help
• Always be communicating
– Make it easy (RSS, Twitter, etc.)
39. Provision
Create data sets from
webpages
Automatically crawls site
for data
Currently growing
Useful for internal activities
Tools for building APIs from
data
Value added to the open data
Live in the wild
Good for exposing your data
40. Consumption
• Gather the data and do research
• Combine open data with closed data
• Process the data
• Value added
48. 1. 76 percent reported an
increase in productivity
2. 93 percent said their social
circle had increased a lot
3. 86 percent said their
business network had
grown
BENEFITS
Source: Global Coworking Survey
2013
Three interlocking rings as they are interconnected.
The Federal government has updated their license to follow these standards.
Programs at Sunlight – Backed by philanthropic $
Program Spending, iPhone App for legislation, Influence Explorer for fundraising
Programs at ODI – Intersection of public, private, and citizens.
Evaluating and improving govt. O.D. programs, Training journalists, helping incubate companies, promoting OD usage and applications.
Available in CSV and usually with supporting documents in HTML
Federal site is now including the dataset metadata available in JSON format
Home language is 65% English, 21% French, 11% Other, 3% Multiple
This is nice and neat data – it all adds up and is not ambiguous
Available in CSV and usually with supporting documents in HTML
Federal site is now including the dataset metadata available in JSON format
Home language is 65% English, 21% French, 11% Other, 3% Multiple
This is nice and neat data – it all adds up and is not ambiguous
Feds have over 200,000 data sets but more than 90% of that is Geo-type data.
My Son is named Conrad
Only 6 people born in Ontario in 1999 with that name.
Suppressed for less than 5 names, privacy issue.
Only 200 data sets
Data Science Problem:
2.4 Million tickets in 2012.
Cannot analyze in Excel
Most popular streets was Yonge Street – but also the longest.
Now we need geo data to determine the density of tickets.
ETL problem also as data is dirty – Yong, Yongge, Yonge, St., Street, Stret
Data sets are over a thousand.
Propertize.ca to compare their property tax assessments. Address or nearby search, last three years, increases, etc.
This program was quite successful and shows how open data had limited use until a developer exposed it in a better fashion.
Easy to use tool – notice how everyone’s taxes went up!
Ongoing challenge with data restrictions.
Paid and unpaid data sets.
The paid are the most useful – Zip+4 demographics.
The person who coined this is a Gartner employee who lives in Chicago. He tweeted at me a year ago.
Primer on how a data centric environment changes things:
Volume – technology issue
Variety – Organizational issue
Velocity – Organizational issue
Static every 4 years.
Structured – table driven and from a multiple choice questionnaire.
Analysis Size is within current database sizes, calculations can be done in memory.
A Project that we are undertaking
The GovLab is part of NYU.
Takes after the Fortune 500 but not by revenue.
The GovLab is part of NYU.
Sectors for companies vs. Types of Data sets by Dept.