Drawing on cutting edge examples from the University of Bristol and the City of Bristol, Simon will discuss innovative applications of data science that derive business value from open data through enriching and integrating with confidential 'closed data'. He also highlights recent technological advances that are enabling open data science on highly sensitive closed data.
1. Adding Open Data Value
to 'Closed Data' Problems
Dr Simon Price
Research Fellow, University of Bristol
Data Scientist, Capgemini Insights & Data
2. Who am I?
• 30 years software development and leadership roles
• Moved into Data Science via PhD in Machine Learning (2014)
• Research Fellow in Machine Learning group
~20 Machine Learning researchers
• Led project to establish Bristol’s open research data repository
• One of the organisers of Open Data Institute (ODI) Bristol
• Data Scientist in Big Data Analytics team
~100 Data Scientists, Big Data Engineers and Data Analysts
• Focus on Open Source and Big Data technologies to solve client problems
3. Outline
1. Case study: open data + ‘closed data’
2. Deriving value from open data
3. Data Science with ‘closed data’
4. Case study: SubSift
Conferences using SubSift
• ECML-PKDD: European Conference on
Machine Learning and Principles and
Practice of Knowledge Discovery in
Databases
• KDD: ACM SIGKDD International
Conference on Knowledge Discovery and
Data Mining
• PAKDD: Pacific-Asia Conference on
Knowledge Discovery and Data Mining
• SDM: SIAM International Conference on
Data Mining
Journals using SubSift
• Machine Learning
• Data Mining and Knowledge Discovery
https://doi.org/10.1145/2979672
5. Initial problem addressed by SubSift
Matching submitted conference papers to possible reviewers in Programme Committee
15. Open research data
• data.bris.ac.uk
• Research data storage facility
• Each researcher gets 10TB "forever"
16.
17. 140+ datasets live on opendata.bristol.gov.uk
Mostly static but some real-time data
Examples
• Government: Elections since 2007
• Community: Quality of Life survey
• Education: School Results
• Energy: Installed PV, Energy Use in Council Buildings
• Environment: Real time & Historic Air Quality, Flood Alerts (EA)
• Land use: 2013 Planning applications
• Health: Life expectancy/ Mortality, Obesity, NHS Spend
Open government data
18.
19. Deriving value from open data
1. Data Science
2. Using open data to enrich and connect ’closed data’
38. Data science with ‘closed data’
• Custom R server running
inside secure data
repository / warehouse
• Enables non-disclosive,
remote analysis of
sensitive research data.