2. ABOUT ME
ABOUT BIRN SRBIJA
Data Journalist. Graduated in Journalism and
Communication. Self-taught coder with
programming skills in Python and Java Script,
required for data mining and data visualisation.
Small independent non-profit newsroom
oriented to investigative journalism.
3. DATA JOURNALISM
THE CONTEXT
Journalism + small set of tools used in data and computer
science
Data Journalists - Journalists with an additional set of technical
skills
The Purpose - Find the story in data
4. TOOLS WE USE
(MY EXPERIENCE, BY
INVESTED TIME)
FLOURISH - DATA
VISUALISATION
PYTHON - DATA MINING SOMETHING ELSE
30%
60% 10%
5. PYTHON FOR DATA MINING
JUPYTER NOTEBOOK
(GEO)PANDAS
BEAUTIFUL SOUP,
SELENIUM...
Key frameworks and libraries for data mining
6. JAN MAR MAY JUL SEP
50
40
30
20
10
0
FLOURISH FOR DATA VISUALISATION
“Flourish was to enable
everyone to tell stories with
data. Launched in 2018, the
tool is used by a huge
community of creators”.
website: https://flourish.studio
EMBED ON WEBSITE
PREPARE DATA SET CHOOSE TEMPLATE
7. 1
2
3
4
5
6
DATA JOURNALISM WORKFLOW
Find the data
Find the potential sources of data
that can be useful
Get the data
Find the way to get the data. There is
an option for simple download (rare
cases), or request via REST API, web
scraping
Clean the data
Bring the data to state that can be
useful for further data analysis
Data Analysis
Find the story in data
Data Visualisation
Find the most appropriate charts to
represent findings
Tell the story with the data
Find the best ways and tools for the
data storytelling
10. DATA SOURCES
Lack of sources of data produced by the government in
machine-readable formats and REST API services
Lack of sources useful for the purpose of investigative
journalism (mostly statistics)
Data on demand sent on paper
Some sources of data are not free
Institutions refuse to give requested data
11. 2019 2020 2021
0
5
10
15
20
ENGAGING DATA STORYTELLING
INTERESTING
UNDERSTANDABLE
LESS “DASHBOARDISH”
We produce stories for a broader audience,
therefore they should be told in a different and
more simple way compared to the data
science approach. A definite answer is still in
progress.
13. MORE FLEXIBLE CMS
MORE
AFFORDABLE
DATA
STORYTELLING
PLATFORMS
TECHNICAL STAFF
News websites are mostly in PHP WordPress,
so they ask for a lot of tweaking for data
storytelling purposes
There are specialized platforms for storytelling
in digital environments, but they are too
expensive for small newsrooms
It is hard to make a budget for dedicated
teams with skills in programming and web
development. (From the perspective of small
newsrooms.)
15. GOVERNMENT
Open data platforms exist, however, data still should be more
systematically formatted
It would be nice to see more REST API services
Data on demand should be delivered in machine-readable
formats
Some data sources and services should be free of charge
for journalists
16. MEDIA INDUSTRY
Media stakeholders should be more aware of the potential of
data for journalism purposes
Newsrooms should have dedicated teams capable of finding
stories in data
Newsrooms should incorporate data storytelling platforms
inside of their content management systems (CMS) to
deliver more engaging stories for their audiences
17. EDUCATION
Universities should offer programs for data journalism
Newsrooms should give opportunities to journalists
interested in data journalism to learn skills
IT and media industry could be more connected for the
purpose of delivering better data driven journalism
19. TAX FRAUD SCHEMA
The story in short: An organized group of people take ownership
of companies with debts. Example: You have a company with
huge debts. You will pay me to be the new owner. It is on me how
I will resolve that, and you do not need to worry. This “business
model” was applied to hundreds and hundreds of cases. As a
consequence, creditors couldn’t collect debts.
20. HOW WE COLLECTED DATA
METHODS:
1. WEB SCRAPING
2. DATA ON DEMAND
We have only basic pieces of information
about individuals. So we need to get more data
from government data sources like the National
Bank,Tax Administration and Trade Register.
21. WHAT WE DISCOVERED
Debt in total for all companies from this schema. (Money that
Government and creditors lost)
Who are new fictional owners and how many companies are
on their name. (In some cases more than 200 companies)
Where are those companies now. (Fictional addresses in
residential area, in some cases 300 companies per adress)
After a lot of work on data cleaning, we had
some findings.
22. KILLING THE
COMPETITION
The story in short: How two private companies with strong political
connections became exclusive contractors of government-owned
electric power company (EPS) during the period of 10 years
23. HOW WE COLLECTED DATA
METHODS:
WEB SCRAPING
The Public Procurement Office has data about
all public procurements (tenders). At that time,
data were not downloadable, so we need to
scrape them
24. WHAT WE DISCOVERED
These two companies drastically increased their incomes from
public procurements after change of government.
The number of bidders where these companies get contracts
- decreased. In the last few years, they were the only
bidders.
During this period average number of bidders for all public
procurements (EPS) decreased from 3.14 to 1.77.
After a lot of work on data cleaning, we had
some findings.
25. HIDING THE RIVER
The story in short: The number of floating objects (splav) on
Belgrade rivers increases year by year. We all witnessed that,
however, nobody knows the exact data. We have tried to
calculate the increase of the area under floating objects and
their number during the period of five years.
26. HOW WE COLLECTED DATA
METHODS:
DRAWING POLIGONS ON SATELITE IMAGES, CALCULATE AREAS
Primary data sources were satellite images
taken in 2015. and in 2021.
27. WHAT WE DISCOVERED
The total area under commercial floating objects on Belgrade
rivers increase for more than 25% from 2015. to 2021.
The total number of floating objects increased for more than
20% in the same period of time.
After a lot of work on data cleaning and
transforming, we had some findings.