A preview of my talk about Data Strategies - what they are, how to implement one and what to do if you need to tame your Data Chaos. Includes tools and architecture examples!
2. What we'll do
Define Data Strategy
Identify organisational symptoms of
no /an insufficient Data Strategy
Learn how to move towards data maturity
Understand Data Architecture Principles & Tooling
What we won't do
Bash BigQuery - we still BigQuery
4. "It takes hours for me to put this report together
"These reports are supposed to match but never
do, it's just like that"
"This field is repeated in multiple places"
"Check with <ultimate data person here>
- they know the data"
"Oh, I thought this data explained
this but you're using it for that"
"The analytics are quite slow"
"Remember to refer to this
extensive list of data quirks"
"We can't really trust the data"
5. All your data is in a tool, not in files
Data Systems are slowing down
(ie custom reporting, dashboards, ETLs)
Data flow is not documented
Little to no Load Testing
No schema management, not even manual checks
Data Scientists/Analysts/other folks using the data, are
spending more than half their time cleaning and
normalising the data or no Engineers at all
Hardly any access control - relying on trust
The same data is being maintained in different places by
different teams
Evidence in Data Systems
7. Having a data dumping ground
Re-implementing the same transformations all over the
organisation
Putting up with painfully slow reports / queries &
dashboards
Having no idea where the data is used
No clear data owners
Not being able to present your data in different formats
easily
No planning, just dealing with issues and projects as they
crop up
Keeping data to maybe use it one day
What a Data Strategy is NOT
8. Understand:
How data supports your
Business Strategy
Things we want to achieve with data
Things we have to do our data
Offensive
Defensive
14. What not to do
Do not panic
Do not stop building things
15. Things to do first
Add timestamps and other audit
& lineage metadata
Understand your org's data flow
Find data owners
Understand where access problems are
- try to mitigate them with access controls
you currently have
Start a Data Dictionary
Start storing important Raw data files
Consider a Data Guild