An overview of the tools, engineering infrastructure, and decisions that we've been making in concert with our newsroom and executive leadership.- What tools might you use and why? Python, Scala, Spark, AWS, Google GCP, BigQuery, Redshift, Tableau, SQL?- What engineering tradeoffs do you have to make when building data science platforms and choosing big data tools?- How can you empower analysts, engineers, data scientists, and privacy advocates to build a data driven culture ecosystem at at any size company?- How do you recruit, train, and retain world class data talent against competitors like Netflix, Google, and Snap?
2. Introduction
Who am I?
● Stan Dyro. - Lead Data Engineer at the Los Angeles Times
● Former Lead Engineer at VideoAmp
● Technologist, programmer
● Creative problem solver
Why you should care about data?
● Software is eating the world
● and.. “Data is the fuel”
● We’re hiring!
3. Technology Direction
● As technology leaders or individual contributors, we have a lot of decisions:
Leaders Engineers Product
Who to hire Who to connect with How you help your company
Structure of teams Which companies to work
for
How to make your company
better
What culture to set Which tools to learn How to help your teams
4. What’s important and what’s not
● Teams are important. Teams are important.
● People stick around for good bosses. Studies bear this out
● People value the little things.
5. What’s important and what’s not
The opposite of microaggressions are microprotections:
● Highly paid tech professionals will value contributions over money.
Examples
● Upward mobility
● Ability to learn
● Ability to be proud of what they do
For example:
● Our culture at the LA Times is to “inform, engage and empower.”
● Unless you’re FAANG or flush with VC cash, $400,000 dollar salaries are not an option,
so apply what makes you different
6. Measure all the things
So let’s talk about data.
Now that you’ve built or worked on helping your great team.
What can you measure about your business?
● Your Customer
● Your Web Traffic
● Your Finances
7. Structure of a data stack
● Presentation - usually visual tools BI tools, static visualizations
● Databases - Relational data, big data, caching
● Workflow - Workflow tools, programming languages, code repositories
● Storage - Scalable storage, cloud storage options
8. Why it’s easier than ever
It’s simple.
Open Source.
We stand on the backs of giants in our field. Hadoop. Spark.
Hive. Google Cloud, AWS. The infrastructure is at our
fingertips.
Easy to launch a software business for cheap. So focus on
what matters: your customers.
9. Data Tools
Databases
● Relational databases
● Data warehouses
● Data lake
Programming Languages
● SQL
● Python
● Ruby
● Javascript / node.js
● C#
● Java
● Scala
13. Tools - Data Analytics
Business Intelligence
● Tableau Online
● Looker
● Power BI
Visualization
● Tableau
● R Studio
● Python libraries
● D3 / Javascript