6. #DPL15 | @sarahcat21
Private companies do not have strong incentives
(e.g. legal obligations) to share data. Many may
have competitive incentives to obfuscate
Investors may request non-disclosure.
11. #DPL15 | @sarahcat21
Investors ask questions like
might raise capital
in the next 6
months? What startups is
12. #DPL15 | @sarahcat21
Our data analysts seek to understand:
Why does this question matter?
What data is required to answer this question?
Where can this data be accessed?
13. #DPL15 | @sarahcat21
Next, data analysts:
Define repeatable processes for data collection.
Determine whether processes can be replicated
through web scraping and/or machine learning
algorithms to collect data at scale.
Write functional specifications, reviewed by
sales and engineering team members.
14. #DPL15 | @sarahcat21
Next, web and/or machine learning
Write dev designs, reviewed by data analysts.
Upon implementation and marketing release,
this data becomes available to customers.
New questions arise and the cycle starts again.
17. #DPL15 | @sarahcat21
Problems with existing sources
Rely on wiki-style data collection (cannot confirm
the credibility of sources)
News reports are better; but
facts are harder to extricate
different sources report different figures
18. #DPL15 | @sarahcat21
Solution: funding automation
A new framework for collecting and synthesizing
News article fact extraction (machine learning)
Funding override system (web engineering)
Funding confirmation email campaign
26. #DPL15 | @sarahcat21
Where we struggled
Our initial implementation of a funding override
system was inefficient. Why?
Because our data analysts and developers were
not aligned on functional requirements.
27. #DPL15 | @sarahcat21
Analysts must work closely with developers
○ Pre-spec check-ins
○ Analysts review dev designs to ensure that
the system design addresses the use case.
Analysts must avoid being prescriptive
Analysts must understand data mining and
machine learning concepts
28. #DPL15 | @sarahcat21
Where we succeeded
Implementation of news article fact extraction
was successful. Why?
Because data analysts and developers worked as
service providers to each other.