In this presentation for the ESSnet Linked Open Statistics final event, Sergi Segiev is presenting the learned lessons from two implemented use cases with open data for finding valuable insights.
You can also refer to the presentation 'Data Reveals Corruption Practices' by Yasen Kiprov - http://bit.ly/2WsFxsP
5. 60+ meetups and
conferences
Monthlyevents andconferences
where people with expertise in
different areas share their cases and
problems in different industries
from a Data Science prospective.
7 Datathons
A weekend-long online and
physical international
competition with real-world
business cases, top experts, 350+
participants from 20+ countries.
Members
over 50 countries
• Senior developers
• Master and PhD Students in Data
Science, Statistics, Business
Analytics etc.
• Domainexperts passionate about
data
• Mathematical and datageeks
Working with
Universities
Organizing different events,
Academia Datathons and
monthly challengesworking
closely with +10 Universities
150+ solutions
from 28 cases
100+ teams with 1 to 25 years of
expertise involvedin solving
variousbusiness cases
15+ training
sessions
Various trainings, workshops,
master classes, summer school
etc… are organizedwith practical
implications in Data Science
domain.
6. OUR OWN ENVIRONMENT
Data.Platform [THE FOUNDATION OF BEING GLOBAL ]
6
Online Learn repositories
Data.Chat
Data.Cloud
230 + SIG + Scientific Articles
75 000 + Messages
Jupyter Notebook integration
with R and Python
8. DATATHON 2017 CASE
DATA REVEALS CORRUPTION PRACTICES
8
Input data:
Bulgarian public procurement
EC Procurement
Trade Register
Open Government from Council of Ministers
Output:
The size of the uploaded data is approximately 12.5
million triples (more than 2 GB of uncompressed data).
A interesting question that can be explored about
conflicts of interest.
9. DATATHON 2017 CASE
BULGARIAN TRADE REGISTER
9
Reference through company
or institution UIC ID (ЕИК):
Name
Address
Legal form
Open Government from Council of Ministers
People of interest are linked:
has Manager
has ActiveManager
has Partner
has ActivePartner
has ActiveOwner
10. DATATHON 2017 CASE
BULGARIAN PUBLIC PROCUREMENTS
10
Raw data in CSV format
Total of 207579 contracts for the period 2007 - 2016
Each Procurement has a Contract with
Title
Kind (delivery, service, construction)
Issuing Authority
Lots
Awarded tender
Contract Price
Actual Price
Dates
Reference: Data Reveals Corruption Practices, Yasen Kiprov
11. DATATHON 2017 CASE
EC PROCUREMENTS
11
Raw data in XML format
Total of 11798 projects with
Beneficiaries
Lots
Dates
Payments
Reference: Data Reveals Corruption Practices, Yasen Kiprov
12. DATATHON 2017 CASE
SO …
12
Person: Ясен
Company: Профай
Links:
hasManager
hasPartner
…
EGN is obfuscated
Reference: Data Reveals Corruption Practices, Yasen Kiprov
13. DATATHON 2017 CASE
FURTHER POSSIBLE QUERIES
13
A conflict of interest may arise if a person A managing a
government entity is also a related party (such as, for example,
owner) of a private contractor of the government entity.
Connected companies:
Companies which have a common active member
Influencers - people who are involved in many companies
People who are involved both in the Authority and the
Awarded Tender
Reference: Data Reveals Corruption Practices, Yasen Kiprov
16. WHAT WE DID?
Promote in media groups
Meetup and webinar
Participate in 2nd Hackathon
Company exposure and PR
16
DATATHON 2017 CASE
LESSONS
OUTREACH
No exposure from press
No support from Open Data
None is using the result
17. DATATHON 2019 CASE
SOFIA AIR POLLUTION
17
Input data:
Weather
Topography
Industrial pollution data
Heating data
Constructions data
Air Quality data
Output:
The different factors that affect air pollution levels
NIGGG
18.
19. WHAT WE DID?
Involve Sofia Municipality
Community exposure
19
DATATHON 2019 CASE
LESSONS
OUTREACH
Exposure to press
BI tool project is initiated
Support from SDA
NIGGG
20. Accelerators: Metro, BMW startup garage, Coca Cola, McDonalds
Open Source Challenges
Google Deepmind [ www.deepmind.com/research/open-source/open-source-code/]
Amazon: over 1600 projects on GitHub [www.aws.amazon.com/opensource/]
Baidu [github.com/baidu]
Uniliver: R&D €1 billion, staff: 6 200, 30 - 40 %
OpenAI [github.com/openai]
20
OPEN-SOURCE CULTURE
21. 21
OPEN-SOURCE CULTURE
LESSONS LEARNED
OUTREACH
Exposure to press
BI tool project is initiated
Support from SDA
Focus
Find communities
Listen and work with them - open interesting datasets
Rely on open-source culture – you are note alone
Promote
Be entrepreneur – praise the great results
PR Visibility
Support
Financial, Resources, Logistics, Media
22. 22
Engaged community
Special Interests Groups
Keeping the community strong and active
A monthly meeting where two speakers
present a Data Science topic ending with a
discussion over a beer in Sofia and streamed
for the rest of the world
Meetups
A five days summer training with
more than 10 topics and several
practical tasks.
Summer School
A 3-4 hours meeting, every week where a
group of people work together on their
own data science projects and discuss
ideas on improving their progress.
Coding sessions
A two days comprehensive training with
presentations and workshops with a focus on
big data and data science
Data Science Master class
A 3 to 4 hours introductory presentation or
demonstration with practical exercises on
various topics (Probability programing, Retailer
time series analysis, BI intro with Power BI,
Machine learning intro etc.)
Workshops
Data
Science
Society
23. Become Part of
Data Science Society!
DSS:
What we do and what we achieve
on our website or social networks:
http://datasciencesociety.net
Contacts:
Reach out directly.
Email:
info@datasciencesociety.net
Phone: +359 888 400 290