Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Databricks
Empowering you to do
more with data
Francois Callewaert
Senior Data Scientist
Agenda
▪ Intro
▪ Data Lifecycle
▪ Demo
My career in data tools
Before Now
EDA-ML EDA++, sharing
Build and refresh
own datasets
Build and refresh
own datasets in
...
Software (app,
website, IOT…)
User / signals
Raw logs
Clean logs
(de-dup, PII
removal…)
Business data
(aggregates, joins,
...
Demo
• Dataset = https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store
• User actions: VIEW, ...
Dimension table
(id → name)
Data Engineer
Cloud bucket
(raw logs)
CSV files are appended
every few seconds
Software
Engine...
Data Analyst
Clean event logs
Data
Engineer
Data
Analyst
Notebook
EDA
Job
Product metrics
(Business data)
SQL
Analytics
Da...
Data Scientist
Clean event logs
Data
Engineer
Data
Scientis
t
Notebook
EDA - Feature engineering
Job
Feature table
MLFlow
...
Contact: francois.callewaert@databricks.com
Do more with data.
Databricks
Conclusion
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Databricks: A Tool That Empowers You To Do More With Data

Download to read offline

In this talk we will present how Databricks has enabled the author to achieve more with data, enabling one person to build a coherent data project with data engineering, analysis and science components, with better collaboration, better productionalization methods, with larger datasets and faster.

The talk will include a demo that will illustrate how the multiple functionalities of Databricks help to build a coherent data project with Databricks jobs, Delta Lake and auto-loader for data engineering, SQL Analytics for Data Analysis, Spark ML and MLFlow for data science, and Projects for collaboration.

  • Be the first to like this

Databricks: A Tool That Empowers You To Do More With Data

  1. 1. Databricks Empowering you to do more with data Francois Callewaert Senior Data Scientist
  2. 2. Agenda ▪ Intro ▪ Data Lifecycle ▪ Demo
  3. 3. My career in data tools Before Now EDA-ML EDA++, sharing Build and refresh own datasets Build and refresh own datasets in Python Azure SQL DB Azure Data Factory Azure VM Windows Scheduler Model-serving and delivery Azure ML
  4. 4. Software (app, website, IOT…) User / signals Raw logs Clean logs (de-dup, PII removal…) Business data (aggregates, joins, metrics, features...) Analytics Modelling Software Engineer Data Engineer Data Analyst Data Scientist Recommender Systems Business Decisions Data lifecycle Program Manager ML Engineer Bronze Silver Gold DE/DA/DS
  5. 5. Demo • Dataset = https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store • User actions: VIEW, CART, PURCHASE • Data Engineer: Delta, Auto-loader • Data Analyst: Spark SQL, Notebooks, Jobs, SQL Analytics • Data Scientist: pySpark, Notebooks, Jobs • GitHub: https://github.com/databricks/tech-talks/tree/master/samples/2020-09- 16%20%7C%20eCommerce%20demo
  6. 6. Dimension table (id → name) Data Engineer Cloud bucket (raw logs) CSV files are appended every few seconds Software Engineer Auto-loader +JOIN eCommerce events (BUY, VIEW, CART) Clean event logs Data Engineer
  7. 7. Data Analyst Clean event logs Data Engineer Data Analyst Notebook EDA Job Product metrics (Business data) SQL Analytics Dashboard
  8. 8. Data Scientist Clean event logs Data Engineer Data Scientis t Notebook EDA - Feature engineering Job Feature table MLFlow experiment Prod model
  9. 9. Contact: francois.callewaert@databricks.com Do more with data. Databricks Conclusion

In this talk we will present how Databricks has enabled the author to achieve more with data, enabling one person to build a coherent data project with data engineering, analysis and science components, with better collaboration, better productionalization methods, with larger datasets and faster. The talk will include a demo that will illustrate how the multiple functionalities of Databricks help to build a coherent data project with Databricks jobs, Delta Lake and auto-loader for data engineering, SQL Analytics for Data Analysis, Spark ML and MLFlow for data science, and Projects for collaboration.

Views

Total views

205

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×