Data Con LA 2020
Description
Understand the data product lifecycle and ensure your data is set up for success
In order to get the most out of your data team, understanding the infrastructure needs at every step of the data product lifecycle is imperative. In my presentation we'll cover: - Collect the Right Data: Collect what you want in the future not where you are now - Silo to Warehouse: Consolidating disparate data sources and establish source of truth - Setting Your Team Up for Success: Development Platform and DataOps - Don't Forget to A.I.M. - Thinking about product adoption, implementation, and monitoring - So What? - Tracking impact and making the case for more data
Speaker
Kisa Brostrom, boodleAI, Vice President of Data
2. Your Tour Guide
Kisa Brostrom - VP of Data at boodleAI
● Bachelors of Mechanical Engineering: May ‘15
● Data Engineer/Scientist @ Rolls-Royce: Oct ‘15 - Oct ‘17
● Senior Data Scientist/Head of Analytics @ Cannella
Media: Oct ‘17 - Jan ‘19
● Senior Data Scientist @ PatientPop: Jan ‘19 - April ‘20
● Vice President of Data @ boodleAI: July ‘20 - Present
32yo, wife, mom of 3 (7, 4, 5mo), baker, dancer, entrepreneur,
carpenter, mechanic, teacher, problem-solver
3. Come Along on the Journey - Data Project Lifecycle
Data
Collection
Data
Curation
Analysis &
Development
Adoption,
Implementation,
& Monitoring
Impact &
Education
Most companies focus on one or two aspects of data projects, but failure to support the entire lifecycle
will result in a failure to use data to its maximum benefit.
4. Collect the Right Data
SYNOPSIS: Collect the data you want in the future, not what you’ll use now
Data collection requirements are more than
what you need to accomplish your analysis.
Spend time at the beginning of each
project thinking about:
● What metrics can measure impact?
What data do we need?
● What subsequent analyses can result
from this project? What data do we
need?
● Do we have historical records of this
data or should we?
● Where can we get that data?
CONSIDERATIONS:
● Data Sources:
○ Web Scrapers
○ APIs
○ Databases
● Data Model
● Pipelines
● Personnel
○ Data Engineers
○ Data Ops
○ Product Engineers
5. Silo to Warehouse
SYNOPSIS: Consolidate disparate data sources and establish source of truth
When companies are first created, data is
usually stored in silos. Careful consideration
has to be taken when combining disparate
data sources. You have to establish:
● Where is the data stored?
● What format is the data stored?
● What data model best supports the
types of analyses we do or will do?
● How maintain accurate sync and how
often should that occur?
● What is the hierarchy of data sources?
CONSIDERATIONS:
● Data Sources
○ APIs, s3, Databases (NoSQL/SQL)
● Data Model
○ Time Series, Historical State, etc.
● Pipelines
○ Integrations with common tools:
Salesforce, Shopify, Zuora, Marketo, etc.
○ Sync Schedule (FiveTran, Lambda, etc.)
● Source of Truth
6. Setting Up Your Team for Success
SYNOPSIS: Data Teams need a platform for access and ownerships
Data development takes on many forms
and at its best, dynamic, fast, and evolving.
Security is important and shouldn’t be
compromised. To ensure the proper
balance between the two:
● Assign a central owner for ALL data
within a company
● Create a secure development
platform that SCALES
● Ensure easy access and integration of
data from development platform
● Security shouldn’t be compromised
but neither should accessibility
CONSIDERATIONS:
● Development Platform
○ Scalability - more power for
data intensive projects
○ Accessed and Shared between
multiple people
○ Easy transfer from
development to production
● Ownership
● Security
○ Secure but easy access to data
● Personnel
○ Data Ops
7. Don’t Forget to A.I.M.
SYNOPSIS: At the beginning of the project, create adoption plan,
implementation plan, and maintenance plan
Adoption, Implementation, and
Maintenance are perhaps the most
forgotten aspects of data projects. A great
model is WORTHLESS unless people use it.
At the beginning of the project, ask:
● What is the final output of the project
going to be?
● Who is going to use it?
● How do I get it to them? Is this a one
time or ongoing requirement?
● What happens when it goes wrong?
How do we know if it goes wrong?
CONSIDERATIONS:
● ADOPTION:
○ Advocacy/Buy-in
○ Integration into existing tools
○ Training
● IMPLEMENTATION:
○ BI Tool / Dashboards
○ Reports
○ Productionalizing Code
○ Scheduling/Recurrence
● MAINTENANCE:
○ Retraining
○ Duplicating Code
○ Logs and Alerts
8. So What?
SYNOPSIS: Track your impact and make the case for more data
At the end of the day, a data team is an expensive
resource for the company. ROI, education, and
advocacy play a HUGE part in the ongoing health
of your team. At the end of your project, make
sure you:
● Have a plan to measure impact: finite,
defined impact metrics that tie back to $$$
● Publish and PARADE your results
● Explain why people should care (again tie
back to $$$ and how it helps them)
● Explore the possible: cast vision for next
steps and additional projects
CONSIDERATIONS:
● Educate
○ Info Sessions
○ Blog
○ White Papers
○ Case Studies
● Advocate
○ Stakeholders
○ Possibilities
● Publish
○ Dashboards
9. Come Along on the Journey - Data Project Lifecycle
Data Collection:
Public Data
collected via
web scrapers
and APIs
Data Curation:
Host Data in
Google Sheets
or Local SQLite
Analysis &
Development:
SageMaker
Adoption,
Implementation, &
Monitoring:
Build Your Own
Dashboards in
DataStudio &
Reoccuring Jobs via
SageMaker
Impact &
Education:
Lunch and Learn
Sessions
Engage Early and
Often
Tie to OKRs
FOR DATA ANALYSTS OR SCIENTISTS:
If you don’t have support externally, you can always support yourself.