SlideShare a Scribd company logo
Data
Engineering
& Data
Science
• Please install/update to the latest zoom
app
• Otherwise you cannot join our
breakout rooms
• https://zoom.us/download
• Start at 09:05 🕘
WELCOME!
Intro
LAURENS VIJNCK - DATA SHEET
• Data Engineer/Data Scientist
• 1 year at Selligent
• Master thesis on Streaming Analytics
• Interests: Streaming, Distributed Computing
JONNY DAENEN - DATA SHEET
• Data Engineer/Scientist
• 4 years at Selligent
• PhD Computer Science
• Focus = Data, Cloud
TAKEAWAY
• Cloud Technology alleviates operational burden
• Devops is a state of mind
• Road to production: Devil is in the details
AGENDA
• Intro
• Selligent & Data
• Use Case 1
• Real-time Data Analysis
• Big Data Tech in Google Cloud
• Data Engineering @ Selligent
• Roles & Tools
• Use Case 2
• Visual Data Exploration
• Reports in Data Studio
• Short presentation
INTERACTIVITY
• Raise your hand
• Use the chat
Selligent
BUILDING CONTENT
Data
What is data?
What data types do you
know?
Why is data useful?
When is data big data?
Big Data - the 3 V’s
17
Which data tech do you
know?
DATA & AI LANDSCAPE 2020
SOCIAL ANALYTICSLOG ANALYTICS WEB / MOBILE /
COMMERCE ANALYTICS
SEARCH
BI PLATFORMS
COMPUTER VISION SPEECH & NLP
DATA ANALYST PLATFORMS
DATA SCIENCE
NOTEBOOKS
DATA SCIENCE
PLATFORMS
APPLICATIONS — INDUSTRY
APPLICATIONS — ENTERPRISE
HUMAN
CAPITAL
PARTNERSHIPS
GOV’T &
INTELLIGENCE
EDUCATION REAL ESTATE FINANCE -
LENDING
ADVERTISING
MARKETING -
B2B
SALES
INFRASTRUCTURE
SERVER-
LESS
GRAPH DBs MPP DBsNoSQL DATABASES
DATA GOVERNANCEETL / DATA
TRANSFORMATION
CLUSTERSVCS
HORIZONTAL AI
MACHINE LEARNING FINANCE AUTOMATION & RPA
CUSTOMER EXPERIENCE / SERVICE
LEGAL
MARKETING - B2C
REGTECH&
COMPLIANCE
SECURITY
AGRICULTURE
FINANCE - INVESTING
HEALTHCARE INDUSTRIAL
INSURANCE
LIFE SCIENCES OTHERTRANSPORTATIONAI OPSDATAGENERATION
&LABELLING
GPUDBs&
CLOUD
AI HARDWAREMGMT / MONITORING
NewSQL DATABASES
VISUALIZATION
ANALYTICS & MACHINE INTELLIGENCE
DATA
QUALITY
DATA INTEGRATION
DATA LAKES DATA
WAREHOUSES
STREAMING /
IN-MEMORY
HADOOPSTORAGE
OTHERLOCATION INTELLIGENCE
DATA SOURCES & APIs
DATA MARKETPLACES
& DISCOVERY
AIR / SPACE / SEAFINANCIAL & ECONOMIC DATA PEOPLE / ENTITIES RESEARCHINCUBATORS &
SCHOOLS
DATA RESOURCES
DATA SERVICES
OPEN SOURCE
QUERY / DATA FLOW STREAMING &
MESSAGING
STAT TOOLS &
LANGUAGES
AI OPS
& INFRA
COLLABORATION SECURITYFRAMEWORKS DATA ACCESS & DATABASES SEARCH VISUALIZATIONLOGGING & MONITORINGORCHESTRATION
&PIPELINES
AI / MACHINE LEARNING / DEEP LEARNING
© Matt Turck (@mattturck) & FirstMark (@firstmarkcap) mattturck.com/data2020Version 1.0 - September 2020
COMMERCE
Facts
4k interactions per
second
250M PER DAY
150M emails per DAY
99.7% deliverability rate
25
Image courtesy of Monica Rogati
Use Case 1
Analyze interaction data
in real time
DATA SOURCE: BEHAVIORAL DATA
Show interest
Receive notification
Walk in store
Buy in store
Visit homepage
Install app
Open app
Demo: Part 1
STORAGE - BIGQUERY
• Columnar Storage
• No ops (serverless)
• Pay for storage
• Pay per byte queried (columns & time touched)
• Data Market
BigQuery
INGEST - PUB/SUB
• Event enters system
• Event is sent to Pub/Sub
• No ops (serverless)
• Globally available
• Pay as you go
• 7 day retention
• No ordering (alpha feature)
• No server-side filtering
Pub/Sub
PROCESSING - DATAFLOW
• Aggregation of events per consumer per tenant
• Dataflow
• Managed (choose your machines)
• Serverless
• Auto-scaling
• In-flight pipeline updates
• Monitoring
• Exactly-once
• Batch and strEAMing (Apache Beam)
• SQL available
• Documentation Unclear
DataFlow
Assignment
DATA EXPLORATION
• How many teams and members are present?
• How large is the audience?
• When did every member become active?
• What is the activity timespan of every team?
• Which minute of the hour received the most clicks?
The Daily Life of a Data
Engineer @ Selligent
PROBLEM
40
41
HOW WOULD YOU SOLVE THIS
GIVEN THE PREVIOUS DATASET?
user nr. 3
12:07:00
CLICK
MAX
THE KEY DATA CHALLENGES
Team roles
DATA ENGINEER
• Fault-tolerance
• What if pipeline fails?
• Streaming means: Re-execute, Re-execute, Re-execute
• Bundle
• Out of order processing of successive windows
• Can you deal with it?
• Depends on use case
• Exactly-once?
• Use native dataflow/beam operators
THE INTEGRATION WIZARD
• "Legacy application"
• Changes needed
• Release cycle of 6 weeks
• Alignment with other teams
SEND-TIME OPTIMIZATION
THE BUTCHER
• Unit tests
• Dataflow test framework
• Integration test
• Between services
• External components
• Mocking?
• Performance test
• Does it scale
• Multi-tenancy
THE AUTOMATION KING
• Infrastructure as Code
• Terraform
• Testing & Deployment (CICD)
• CircleCi
TERRAFORM
• Resources
• Automatic inference of dependencies
TERRAFORM
• Dependency graph
CIRCLECI
• Everything as code
• Automation
• 1 file
THE MONITOR
• Create dashboards
• Create alerts
• Custom inspection app
• App Engine
• Simple Python Flask App
• Cost!
Stackdriver
THE BANDIT
• Security & Compliance
• GDPR
• Checking
• Handling
• Archival
• Audit Logs
• (DPIA, Threat analysis, ...)
THE MANAGER
• Onboarding & Clients
• How to create business value?
• How to measure success?
• Who does activations?
• Do we need initial data loads?
• Who triggers it?
• What documents need to be signed?
• What do clients expect?
THE DATA SCIENTIST
• Analysis
• Potential improvements
• Client feedback
• AI Notebooks
• JupyterLab
• One-click start
TEAM VALUES
Everything as code
• Traceable
• Reproducable
• Explicit
Cloud/Serverless
• Less management
• Devops becomes easier
• Pay as you go
Automation
• Less ops work
• Reliable releases
• Continuous delivery
TOOLS
Cloud/Serverless
• PubSub
• BigQuery
• Datastore
• Dataflow (managed)
• Airflow (managed)
Everything as code
• git
• cicd
• infrastructure
Automation
• Airflow
• Azure pipelines
• Robot
Hanne Van Briel
Product Management
Tom Artoos
Back-end
Yohan Laudelout
Scrum Master
Timo Naessens
Quality
Kirill Ismagulov
Data Engineer / Scientist
Dirk Dupont
Data Engineer / Scientist
Jonny Daenen
Data Engineer / Scientist
TEAM DeLorean
Laurens Vijnck
Data Engineer / Scientist
Use Case 2
Visual Data Exploration
Assignment Datastudio
BUSINESS QUESTIONS
• What hour of the day are most people active?
• Which users are active before 10am?
• What is the most clicks people have done in 1 hour?
• How many clicks does an average user do in 1 week?
• How is the channel usage distributed?
• ...?
VISUALIZATIONS
• Heatmap of user activity in a day
• Heatmap of activity per channel
• Year over year comparison of activity per month
• For a given user, the activity timeline
• For a given user, the average number of clicks per week
Presentation
Image courtesy of Kris Peeters (Data Minded)
TAKEAWAY
• Cloud Technology alleviates operational burden
• Devops is a state of mind
• Road to production: Devil is in the details
Feedback Form
Q&A
PXL Data Engineering Workshop By Selligent

More Related Content

What's hot

Get your Service Intelligence off to a Flying Start
Get your Service Intelligence off to a Flying StartGet your Service Intelligence off to a Flying Start
Get your Service Intelligence off to a Flying Start
Splunk
 
SplunkLive! Customer Presentation – Nissan
SplunkLive! Customer Presentation – NissanSplunkLive! Customer Presentation – Nissan
SplunkLive! Customer Presentation – Nissan
Splunk
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013Nick Galbreath
 
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
Spark Summit
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
Amazon Web Services
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
How to Design, Build and Map IT and Biz Services Breakout Session
How to Design, Build and Map IT and Biz Services Breakout SessionHow to Design, Build and Map IT and Biz Services Breakout Session
How to Design, Build and Map IT and Biz Services Breakout Session
Splunk
 
Splunk at Aaron's Inc
Splunk at Aaron's IncSplunk at Aaron's Inc
Splunk at Aaron's Inc
Splunk
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive Development
Lightbend
 
CA Technologies Customer Presentation
CA Technologies Customer PresentationCA Technologies Customer Presentation
CA Technologies Customer Presentation
Splunk
 
Thinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesThinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiences
New Relic
 
Pason Customer Presentation
Pason Customer PresentationPason Customer Presentation
Pason Customer Presentation
Splunk
 
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-OnGetting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-On
Splunk
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
Splunk
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Data Con LA
 
Healthcare IT
Healthcare ITHealthcare IT
Healthcare IT
RISC Networks
 
DevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with ConfidenceDevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with Confidence
New Relic
 
Iaas Pricing Models
Iaas Pricing ModelsIaas Pricing Models
Iaas Pricing Models
RISC Networks
 
Benefits of Grid Computing in the Cloud
Benefits of Grid Computing in the CloudBenefits of Grid Computing in the Cloud
Benefits of Grid Computing in the Cloud
RightScale
 
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
AIIM International
 

What's hot (20)

Get your Service Intelligence off to a Flying Start
Get your Service Intelligence off to a Flying StartGet your Service Intelligence off to a Flying Start
Get your Service Intelligence off to a Flying Start
 
SplunkLive! Customer Presentation – Nissan
SplunkLive! Customer Presentation – NissanSplunkLive! Customer Presentation – Nissan
SplunkLive! Customer Presentation – Nissan
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
A Tale of a Data Driven Culture-(Gloria Lau, Timeful)
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
How to Design, Build and Map IT and Biz Services Breakout Session
How to Design, Build and Map IT and Biz Services Breakout SessionHow to Design, Build and Map IT and Biz Services Breakout Session
How to Design, Build and Map IT and Biz Services Breakout Session
 
Splunk at Aaron's Inc
Splunk at Aaron's IncSplunk at Aaron's Inc
Splunk at Aaron's Inc
 
IBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentIBM and Lightbend Build Integrated Platform for Cognitive Development
IBM and Lightbend Build Integrated Platform for Cognitive Development
 
CA Technologies Customer Presentation
CA Technologies Customer PresentationCA Technologies Customer Presentation
CA Technologies Customer Presentation
 
Thinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesThinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiences
 
Pason Customer Presentation
Pason Customer PresentationPason Customer Presentation
Pason Customer Presentation
 
Getting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-OnGetting Started with Splunk Enterprise Hands-On
Getting Started with Splunk Enterprise Hands-On
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
 
Healthcare IT
Healthcare ITHealthcare IT
Healthcare IT
 
DevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with ConfidenceDevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with Confidence
 
Iaas Pricing Models
Iaas Pricing ModelsIaas Pricing Models
Iaas Pricing Models
 
Benefits of Grid Computing in the Cloud
Benefits of Grid Computing in the CloudBenefits of Grid Computing in the Cloud
Benefits of Grid Computing in the Cloud
 
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
 

Similar to PXL Data Engineering Workshop By Selligent

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
JC Davis
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
IBM Analytics
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
Inside Analysis
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
Paul Morse
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
Robin Vermeirsch
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
ExtraHop Networks
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Denodo
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
RightScale
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013DataTactics
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
College of Technology Pantnagar lecture- Jainendra
College of Technology Pantnagar lecture- Jainendra College of Technology Pantnagar lecture- Jainendra
College of Technology Pantnagar lecture- Jainendra
Jainendra Kumar
 
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software Ireland
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
Splunk
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
marpierc
 
The Biggest Mistake you can make with your Data Center Licenses
The Biggest Mistake you can make with your Data Center LicensesThe Biggest Mistake you can make with your Data Center Licenses
The Biggest Mistake you can make with your Data Center Licenses
Ivanti
 

Similar to PXL Data Engineering Workshop By Selligent (20)

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
College of Technology Pantnagar lecture- Jainendra
College of Technology Pantnagar lecture- Jainendra College of Technology Pantnagar lecture- Jainendra
College of Technology Pantnagar lecture- Jainendra
 
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
Curiosity Software and RCG Global Services Present - Solving Test Data: the g...
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
The Biggest Mistake you can make with your Data Center Licenses
The Biggest Mistake you can make with your Data Center LicensesThe Biggest Mistake you can make with your Data Center Licenses
The Biggest Mistake you can make with your Data Center Licenses
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

PXL Data Engineering Workshop By Selligent

  • 1. Data Engineering & Data Science • Please install/update to the latest zoom app • Otherwise you cannot join our breakout rooms • https://zoom.us/download • Start at 09:05 🕘 WELCOME!
  • 3. LAURENS VIJNCK - DATA SHEET • Data Engineer/Data Scientist • 1 year at Selligent • Master thesis on Streaming Analytics • Interests: Streaming, Distributed Computing
  • 4. JONNY DAENEN - DATA SHEET • Data Engineer/Scientist • 4 years at Selligent • PhD Computer Science • Focus = Data, Cloud
  • 5. TAKEAWAY • Cloud Technology alleviates operational burden • Devops is a state of mind • Road to production: Devil is in the details
  • 6. AGENDA • Intro • Selligent & Data • Use Case 1 • Real-time Data Analysis • Big Data Tech in Google Cloud • Data Engineering @ Selligent • Roles & Tools • Use Case 2 • Visual Data Exploration • Reports in Data Studio • Short presentation
  • 7. INTERACTIVITY • Raise your hand • Use the chat
  • 9.
  • 11.
  • 12. Data
  • 14. What data types do you know?
  • 15. Why is data useful?
  • 16. When is data big data?
  • 17. Big Data - the 3 V’s 17
  • 18. Which data tech do you know?
  • 19. DATA & AI LANDSCAPE 2020 SOCIAL ANALYTICSLOG ANALYTICS WEB / MOBILE / COMMERCE ANALYTICS SEARCH BI PLATFORMS COMPUTER VISION SPEECH & NLP DATA ANALYST PLATFORMS DATA SCIENCE NOTEBOOKS DATA SCIENCE PLATFORMS APPLICATIONS — INDUSTRY APPLICATIONS — ENTERPRISE HUMAN CAPITAL PARTNERSHIPS GOV’T & INTELLIGENCE EDUCATION REAL ESTATE FINANCE - LENDING ADVERTISING MARKETING - B2B SALES INFRASTRUCTURE SERVER- LESS GRAPH DBs MPP DBsNoSQL DATABASES DATA GOVERNANCEETL / DATA TRANSFORMATION CLUSTERSVCS HORIZONTAL AI MACHINE LEARNING FINANCE AUTOMATION & RPA CUSTOMER EXPERIENCE / SERVICE LEGAL MARKETING - B2C REGTECH& COMPLIANCE SECURITY AGRICULTURE FINANCE - INVESTING HEALTHCARE INDUSTRIAL INSURANCE LIFE SCIENCES OTHERTRANSPORTATIONAI OPSDATAGENERATION &LABELLING GPUDBs& CLOUD AI HARDWAREMGMT / MONITORING NewSQL DATABASES VISUALIZATION ANALYTICS & MACHINE INTELLIGENCE DATA QUALITY DATA INTEGRATION DATA LAKES DATA WAREHOUSES STREAMING / IN-MEMORY HADOOPSTORAGE OTHERLOCATION INTELLIGENCE DATA SOURCES & APIs DATA MARKETPLACES & DISCOVERY AIR / SPACE / SEAFINANCIAL & ECONOMIC DATA PEOPLE / ENTITIES RESEARCHINCUBATORS & SCHOOLS DATA RESOURCES DATA SERVICES OPEN SOURCE QUERY / DATA FLOW STREAMING & MESSAGING STAT TOOLS & LANGUAGES AI OPS & INFRA COLLABORATION SECURITYFRAMEWORKS DATA ACCESS & DATABASES SEARCH VISUALIZATIONLOGGING & MONITORINGORCHESTRATION &PIPELINES AI / MACHINE LEARNING / DEEP LEARNING © Matt Turck (@mattturck) & FirstMark (@firstmarkcap) mattturck.com/data2020Version 1.0 - September 2020 COMMERCE
  • 20. Facts
  • 24.
  • 25. 25
  • 26. Image courtesy of Monica Rogati
  • 27. Use Case 1 Analyze interaction data in real time
  • 28. DATA SOURCE: BEHAVIORAL DATA Show interest Receive notification Walk in store Buy in store Visit homepage Install app Open app
  • 29.
  • 31. STORAGE - BIGQUERY • Columnar Storage • No ops (serverless) • Pay for storage • Pay per byte queried (columns & time touched) • Data Market BigQuery
  • 32. INGEST - PUB/SUB • Event enters system • Event is sent to Pub/Sub • No ops (serverless) • Globally available • Pay as you go • 7 day retention • No ordering (alpha feature) • No server-side filtering Pub/Sub
  • 33. PROCESSING - DATAFLOW • Aggregation of events per consumer per tenant • Dataflow • Managed (choose your machines) • Serverless • Auto-scaling • In-flight pipeline updates • Monitoring • Exactly-once • Batch and strEAMing (Apache Beam) • SQL available • Documentation Unclear DataFlow
  • 34.
  • 35.
  • 37. DATA EXPLORATION • How many teams and members are present? • How large is the audience? • When did every member become active? • What is the activity timespan of every team? • Which minute of the hour received the most clicks?
  • 38. The Daily Life of a Data Engineer @ Selligent
  • 40. 40
  • 41. 41
  • 42. HOW WOULD YOU SOLVE THIS GIVEN THE PREVIOUS DATASET?
  • 43.
  • 44.
  • 46. THE KEY DATA CHALLENGES
  • 48. DATA ENGINEER • Fault-tolerance • What if pipeline fails? • Streaming means: Re-execute, Re-execute, Re-execute • Bundle • Out of order processing of successive windows • Can you deal with it? • Depends on use case • Exactly-once? • Use native dataflow/beam operators
  • 49. THE INTEGRATION WIZARD • "Legacy application" • Changes needed • Release cycle of 6 weeks • Alignment with other teams
  • 51. THE BUTCHER • Unit tests • Dataflow test framework • Integration test • Between services • External components • Mocking? • Performance test • Does it scale • Multi-tenancy
  • 52. THE AUTOMATION KING • Infrastructure as Code • Terraform • Testing & Deployment (CICD) • CircleCi
  • 53. TERRAFORM • Resources • Automatic inference of dependencies
  • 55. CIRCLECI • Everything as code • Automation • 1 file
  • 56. THE MONITOR • Create dashboards • Create alerts • Custom inspection app • App Engine • Simple Python Flask App • Cost! Stackdriver
  • 57.
  • 58. THE BANDIT • Security & Compliance • GDPR • Checking • Handling • Archival • Audit Logs • (DPIA, Threat analysis, ...)
  • 59. THE MANAGER • Onboarding & Clients • How to create business value? • How to measure success? • Who does activations? • Do we need initial data loads? • Who triggers it? • What documents need to be signed? • What do clients expect?
  • 60. THE DATA SCIENTIST • Analysis • Potential improvements • Client feedback • AI Notebooks • JupyterLab • One-click start
  • 61. TEAM VALUES Everything as code • Traceable • Reproducable • Explicit Cloud/Serverless • Less management • Devops becomes easier • Pay as you go Automation • Less ops work • Reliable releases • Continuous delivery
  • 62. TOOLS Cloud/Serverless • PubSub • BigQuery • Datastore • Dataflow (managed) • Airflow (managed) Everything as code • git • cicd • infrastructure Automation • Airflow • Azure pipelines • Robot
  • 63. Hanne Van Briel Product Management Tom Artoos Back-end Yohan Laudelout Scrum Master Timo Naessens Quality Kirill Ismagulov Data Engineer / Scientist Dirk Dupont Data Engineer / Scientist Jonny Daenen Data Engineer / Scientist TEAM DeLorean Laurens Vijnck Data Engineer / Scientist
  • 64. Use Case 2 Visual Data Exploration
  • 65.
  • 67. BUSINESS QUESTIONS • What hour of the day are most people active? • Which users are active before 10am? • What is the most clicks people have done in 1 hour? • How many clicks does an average user do in 1 week? • How is the channel usage distributed? • ...?
  • 68. VISUALIZATIONS • Heatmap of user activity in a day • Heatmap of activity per channel • Year over year comparison of activity per month • For a given user, the activity timeline • For a given user, the average number of clicks per week
  • 70.
  • 71. Image courtesy of Kris Peeters (Data Minded)
  • 72. TAKEAWAY • Cloud Technology alleviates operational burden • Devops is a state of mind • Road to production: Devil is in the details
  • 74. Q&A