SlideShare a Scribd company logo
Kubernetes as
Data Platform
Riga DevOpsDays 2018-09-28
Eric Skoglund, Bonnier News
Lars Albertsson, Mimeria
1
2
3
4
5
Brand Scope Data Scope
➔ Behavioral Data
➔ Technical Data
No Content Data
Scoping the platform
Cloud Selection
6
Cloud Selection
7
The Pragmatic Choice
➔ Known to people in the dev teams
➔ New base platform for all other applications within
Bonnier News
Use Case Driven Development
➔ Use cases drive the development of the platform
➔ Focus on value and quality not on slurping in all data in the company
➔ Start with simple use cases!
8
9
FIND USE CASE
THAT PROVIDE
VALUE
NEW DATA INTO
THE PLATFORM
EVOLVE THE
PLATFORM
BASED ON
REQUIREMENTS
Use Case Driven Development
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ extraction?
○ data governance?
○ history?
Data-centric innovation
10
A collaboration paradigm
11
Stream storage
Data lake
Data
democratised
Onboard driven by use case
12
Data lake
Data platform == collaboration platform
13
Data lake
Data platform overview
14
Data lake
Cold
store
Service
Service
Online
services
Offline
data platform
Batch
processing
Data platform overview
15
Data lake
Cold
store
Dataset
Job
Service
Service
Online
services
Offline
data platform
Batch
processing
Data platform overview
16
Data lake
Cold
store
Dataset
Pipeline
Service
Service
Online
services
Offline
data platform
Job
Batch
processing
Workflow
orchestration
Data platform overview
17
Data lake
Batch
processing
Online
services
Cold
store
Service
Data feature
Dataset
Pipeline
Service
Service
Online
services
Offline
data platform
Internal
services
Job
Life of a change, batch pipelines
18
● My pipeline, version 2!
○ Dual datasets during transition
● Run downstream parallel pipelines
○ Cheap
○ Low risk
○ Easy rollback
● Easy to test end-to-end
○ Upstream team can do the change
∆?
Egress target change
19
● Need output in different storage!
○ Adding egress target is easy
○ Egress target backfill is easy
● Facilitates cost limitation
○ Partially aggregate → BigQuery / Redshift
○ Limited retention in egress storage
Life of an error, batch pipelines
20
● My dataset, bad version!
1. Revert serving datasets to old
2. Fix bug
3. Remove faulty datasets
4. Backfill is automatic (Luigi)
Done!
● Low cost of error
○ Reactive QA
○ Production environment sufficient
Deployment example, on-premise
21
source
repo Luigi DSL, jars, config
my-pipe-7.tar.gz
Luigi
daemon
> pip install my-pipe-7.tar.gz
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Redundant cron schedule,
higher frequency
All that a pipeline needs, installed atomically
10 * * * * luigi --module mymodule MyDaily
Standard deployment artifact Standard artifact store
Deployment example, cloud native
22
source
repo Luigi DSL, jars, config
my-pipe:7
Luigi
daemon
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Redundant cron schedule,
higher frequency
kind: CronJob
spec:
schedule: "10 * * * *"
command: "luigi --module mymodule MyDaily"
Docker image Docker registry
S3 / GCS
Dataproc /
EMR
Deployment, one cluster less
23
source
repo Luigi DSL, jars, config
my-pipe:7
Luigi
daemon
Worker
Worker
Worker
Worker
Worker
Worker
Workerspark-submit
--master=local
Redundant cron schedule,
higher frequency
kind: CronJob
spec:
schedule: "10 * * * *"
command: "luigi --module mymodule MyDaily"
Docker image Docker registry
S3 / GCS
Continuous deployment
24
mono-
repo PR build,
affected
CI tests
mymodule/mypipe:revtag
Luigi
daemon
Worker
Worker
Worker
Worker
Worker
Worker
Workerspark-submit
--master=local
kind: CronJob
spec:
schedule: "10 * * * *"
command: "luigi --module mymodule MyDaily"
Openshift registry
S3
master
branch
pipeline tests
doc build
Some pipelines are straightforward
25
Some are twisted
26
Autoscaling
27
GDPR
Article 17.
“The data subject shall have the right to obtain from the controller the erasure of personal data concerning
him or her without undue delay and the controller shall have the obligation to erase personal data without
undue delay where one of the following grounds applies:“
➔ the personal data are no longer necessary in relation to the purposes for which they were collected
or otherwise processed - Data Retention
➔ the data subject withdraws consent on which the processing is based - Data Deletion Requests
28
GDPR
29
{
id: ….
pii: [...]
}
CREATE
KEY FOR ID
ENCRYPT PERSONAL
DATA WITH KEY
GDPR - Retention
30
{
id: ….
pii: [...]
}
CREATE
KEY FOR
ID
ENCRYPT
PERSONAL DATA
WITH KEY
➔ Each dataset has a retention time from
the owners of the data
➔ Create new keys each 30 days
➔ Destroy keys older than the retention
time
GDPR - Right to be forgotten
31
List of users
that have
requested
deletion
Find keys
for those
users
Destroy
keys
Use Cases in Use
➔ Machine Learning
◆ Built a system that tries to predict if a visitor will watch an ad in a video or not
➔ Creating Reports
◆ Daily reporting data for ad team
◆ Weekly report of ad viewing data for site team
➔ GDPR Registry Extract
◆ Collect data from multiple different sources
◆ Merge the data
◆ Send data to be viewed by the user
32
Lessons Learned
Cloud selection is influenced by data location
Most data for the use cases we started with was on Google Cloud Storage / BigQuery
incurring extra development time and cost to exfiltrate that data.
Kubernetes?
Same platform as other teams + great support from infrastructure platform team.
No Spark cluster maintenance, tweaking, debugging.
Autoscaling works, but some challenges for batch jobs.
33
Summary
Use case driven development == Short Time to Production
First pipeline in 3 weeks
Small team 2-4 People
Keep it simple
10-15 Pipelines
34

More Related Content

What's hot

Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
Imply
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
Simon Su
 
Tracking the Performance of the Web with HTTP Archive
Tracking the Performance of the Web with HTTP ArchiveTracking the Performance of the Web with HTTP Archive
Tracking the Performance of the Web with HTTP Archive
Rick Viscomi
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
 
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveAkamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Rick Viscomi
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
Globus
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
Pradeep Bhadani
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
Alex Van Boxel
 
Dataflow shuffle service
Dataflow shuffle service Dataflow shuffle service
Dataflow shuffle service
Yuta Hono
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
Ido Green
 
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Codemotion
 
Taking Your Database Global with Kubernetes
Taking Your Database Global with KubernetesTaking Your Database Global with Kubernetes
Taking Your Database Global with Kubernetes
Christopher Bradford
 
Why Certify? Everything to know about Google Cloud Certifications
Why Certify? Everything to know about Google Cloud CertificationsWhy Certify? Everything to know about Google Cloud Certifications
Why Certify? Everything to know about Google Cloud Certifications
Ervin Weber
 
XDC demo: CTA
XDC demo: CTAXDC demo: CTA
XDC demo: CTA
EOSC-hub project
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
Google Cloud Platform - Japan
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
Data Con LA
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
Imply
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Charles Allen
 
InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)
Gene Leybzon
 

What's hot (20)

Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
 
Tracking the Performance of the Web with HTTP Archive
Tracking the Performance of the Web with HTTP ArchiveTracking the Performance of the Web with HTTP Archive
Tracking the Performance of the Web with HTTP Archive
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP ArchiveAkamai Edge: Tracking the Performance of the Web with HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Introduction to Google Cloud Platform
Introduction to Google Cloud PlatformIntroduction to Google Cloud Platform
Introduction to Google Cloud Platform
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
Dataflow shuffle service
Dataflow shuffle service Dataflow shuffle service
Dataflow shuffle service
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
 
Taking Your Database Global with Kubernetes
Taking Your Database Global with KubernetesTaking Your Database Global with Kubernetes
Taking Your Database Global with Kubernetes
 
Why Certify? Everything to know about Google Cloud Certifications
Why Certify? Everything to know about Google Cloud CertificationsWhy Certify? Everything to know about Google Cloud Certifications
Why Certify? Everything to know about Google Cloud Certifications
 
XDC demo: CTA
XDC demo: CTAXDC demo: CTA
XDC demo: CTA
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
[Cloud OnAir] Talks by DevRel Vol.4 データ管理とデータ ベース 2020年8月27日 放送
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
 
InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)InterPlanetary File System (IPFS)
InterPlanetary File System (IPFS)
 

Similar to DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data platform

Gaming analytics on gcp
Gaming analytics on gcpGaming analytics on gcp
Gaming analytics on gcp
Myunggeun Choi
 
GCP Gaming 2016 Seoul, Korea Gaming Analytics
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsGCP Gaming 2016 Seoul, Korea Gaming Analytics
GCP Gaming 2016 Seoul, Korea Gaming Analytics
Chris Jang
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Márton Kodok
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
DataStax
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
bigdata trunk
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
Tapio Rautonen
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
Neo4j
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Jonathan Singer
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
Avere Systems
 
How DBAs can garner the power of the Oracle Public Cloud?
How DBAs can garner the  power of the Oracle Public  Cloud?How DBAs can garner the  power of the Oracle Public  Cloud?
How DBAs can garner the power of the Oracle Public Cloud?
Gustavo Rene Antunez
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
EDB
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
Cloud Native NoVA
 
The Fastest Way to Redis on Pivotal Cloud Foundry
The Fastest Way to Redis on Pivotal Cloud FoundryThe Fastest Way to Redis on Pivotal Cloud Foundry
The Fastest Way to Redis on Pivotal Cloud Foundry
VMware Tanzu
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
Amihay Zer-Kavod
 

Similar to DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data platform (20)

Gaming analytics on gcp
Gaming analytics on gcpGaming analytics on gcp
Gaming analytics on gcp
 
GCP Gaming 2016 Seoul, Korea Gaming Analytics
GCP Gaming 2016 Seoul, Korea Gaming AnalyticsGCP Gaming 2016 Seoul, Korea Gaming Analytics
GCP Gaming 2016 Seoul, Korea Gaming Analytics
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Introduction to PaaS and Heroku
Introduction to PaaS and HerokuIntroduction to PaaS and Heroku
Introduction to PaaS and Heroku
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
How DBAs can garner the power of the Oracle Public Cloud?
How DBAs can garner the  power of the Oracle Public  Cloud?How DBAs can garner the  power of the Oracle Public  Cloud?
How DBAs can garner the power of the Oracle Public Cloud?
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
The Fastest Way to Redis on Pivotal Cloud Foundry
The Fastest Way to Redis on Pivotal Cloud FoundryThe Fastest Way to Redis on Pivotal Cloud Foundry
The Fastest Way to Redis on Pivotal Cloud Foundry
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 

More from DevOpsDays Riga

DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOpsDevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflowDevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot OverlordsDevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs CultureDevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCityDevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDays Riga
 
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
DevOpsDays Riga
 

More from DevOpsDays Riga (20)

DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOpsDevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
DevOpsDaysRiga 2017: Mark Smalley - Kill DevOps
 
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
DevOpsDaysRiga 2018: Serhat Can - The Rocky Path to Migrating Production Appl...
 
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
DevOpsDaysRiga 2018: Uldis Karlovs-Karlovskis - DevOpsDays Ignite Karaoke - S...
 
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
DevOpsDaysRiga 2018: Anton Babenko - What you see is what you get… for AWS in...
 
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
DevOpsDaysRiga 2018: Juris Puce - GDPR and other security regulation imposed ...
 
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
DevOpsDaysRiga 2018: Heather Wild - Keep Yourself Alive -Stopping the effects...
 
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
DevOpsDaysRiga 2018: Philipp Krenn - Building Distributed Systems in Distribu...
 
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflowDevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
DevOpsDaysRiga 2018: Antonio Pigna - Put the brAIn into your DevOps workflow
 
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot OverlordsDevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
DevOpsDaysRiga 2018: Christina Aldan - Fearing the Robot Overlords
 
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
DevOpsDaysRiga 2018: Jan de Vries - Realising the power of antifragility is l...
 
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
DevOpsDaysRiga 2018: Ken Mugrage - DevOps and DevOpsDays - Where it started, ...
 
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
DevOpsDaysRiga 2018: Matty Stratton - How Do You Infect Your Organization Wit...
 
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
DevOpsDaysRiga 2018: Jon Hall - DevOps in the enterprise: how "swarming" can ...
 
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
DevOpsDaysRiga 2018: Stas Zvinyatskovsky - Transformation: how big can you dr...
 
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
DevOpsDaysRiga 2018: Joep Piscaer - Reducing inertia with Public Cloud and Op...
 
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs CultureDevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
DevOpsDaysRiga 2018: Andrey Adamovich - DevOps Transformations: Tools vs Culture
 
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
DevOpsDaysRiga 2018: Thiago de Faria - Chaos while deploying ML and making su...
 
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCityDevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
DevOpsDaysRiga 2018: Anton Arhipov - Build pipelines with TeamCity
 
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
DevOpsDaysRiga 2018: Neil Crawford - Trunk based development, continuous depl...
 
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 

DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data platform

  • 1. Kubernetes as Data Platform Riga DevOpsDays 2018-09-28 Eric Skoglund, Bonnier News Lars Albertsson, Mimeria 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5 Brand Scope Data Scope ➔ Behavioral Data ➔ Technical Data No Content Data Scoping the platform
  • 7. Cloud Selection 7 The Pragmatic Choice ➔ Known to people in the dev teams ➔ New base platform for all other applications within Bonnier News
  • 8. Use Case Driven Development ➔ Use cases drive the development of the platform ➔ Focus on value and quality not on slurping in all data in the company ➔ Start with simple use cases! 8
  • 9. 9 FIND USE CASE THAT PROVIDE VALUE NEW DATA INTO THE PLATFORM EVOLVE THE PLATFORM BASED ON REQUIREMENTS Use Case Driven Development
  • 10. ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ extraction? ○ data governance? ○ history? Data-centric innovation 10
  • 11. A collaboration paradigm 11 Stream storage Data lake Data democratised
  • 12. Onboard driven by use case 12 Data lake
  • 13. Data platform == collaboration platform 13 Data lake
  • 14. Data platform overview 14 Data lake Cold store Service Service Online services Offline data platform Batch processing
  • 15. Data platform overview 15 Data lake Cold store Dataset Job Service Service Online services Offline data platform Batch processing
  • 16. Data platform overview 16 Data lake Cold store Dataset Pipeline Service Service Online services Offline data platform Job Batch processing Workflow orchestration
  • 17. Data platform overview 17 Data lake Batch processing Online services Cold store Service Data feature Dataset Pipeline Service Service Online services Offline data platform Internal services Job
  • 18. Life of a change, batch pipelines 18 ● My pipeline, version 2! ○ Dual datasets during transition ● Run downstream parallel pipelines ○ Cheap ○ Low risk ○ Easy rollback ● Easy to test end-to-end ○ Upstream team can do the change ∆?
  • 19. Egress target change 19 ● Need output in different storage! ○ Adding egress target is easy ○ Egress target backfill is easy ● Facilitates cost limitation ○ Partially aggregate → BigQuery / Redshift ○ Limited retention in egress storage
  • 20. Life of an error, batch pipelines 20 ● My dataset, bad version! 1. Revert serving datasets to old 2. Fix bug 3. Remove faulty datasets 4. Backfill is automatic (Luigi) Done! ● Low cost of error ○ Reactive QA ○ Production environment sufficient
  • 21. Deployment example, on-premise 21 source repo Luigi DSL, jars, config my-pipe-7.tar.gz Luigi daemon > pip install my-pipe-7.tar.gz Worker Worker Worker Worker Worker Worker Worker Worker Redundant cron schedule, higher frequency All that a pipeline needs, installed atomically 10 * * * * luigi --module mymodule MyDaily Standard deployment artifact Standard artifact store
  • 22. Deployment example, cloud native 22 source repo Luigi DSL, jars, config my-pipe:7 Luigi daemon Worker Worker Worker Worker Worker Worker Worker Worker Redundant cron schedule, higher frequency kind: CronJob spec: schedule: "10 * * * *" command: "luigi --module mymodule MyDaily" Docker image Docker registry S3 / GCS Dataproc / EMR
  • 23. Deployment, one cluster less 23 source repo Luigi DSL, jars, config my-pipe:7 Luigi daemon Worker Worker Worker Worker Worker Worker Workerspark-submit --master=local Redundant cron schedule, higher frequency kind: CronJob spec: schedule: "10 * * * *" command: "luigi --module mymodule MyDaily" Docker image Docker registry S3 / GCS
  • 24. Continuous deployment 24 mono- repo PR build, affected CI tests mymodule/mypipe:revtag Luigi daemon Worker Worker Worker Worker Worker Worker Workerspark-submit --master=local kind: CronJob spec: schedule: "10 * * * *" command: "luigi --module mymodule MyDaily" Openshift registry S3 master branch pipeline tests doc build
  • 25. Some pipelines are straightforward 25
  • 28. GDPR Article 17. “The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies:“ ➔ the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed - Data Retention ➔ the data subject withdraws consent on which the processing is based - Data Deletion Requests 28
  • 29. GDPR 29 { id: …. pii: [...] } CREATE KEY FOR ID ENCRYPT PERSONAL DATA WITH KEY
  • 30. GDPR - Retention 30 { id: …. pii: [...] } CREATE KEY FOR ID ENCRYPT PERSONAL DATA WITH KEY ➔ Each dataset has a retention time from the owners of the data ➔ Create new keys each 30 days ➔ Destroy keys older than the retention time
  • 31. GDPR - Right to be forgotten 31 List of users that have requested deletion Find keys for those users Destroy keys
  • 32. Use Cases in Use ➔ Machine Learning ◆ Built a system that tries to predict if a visitor will watch an ad in a video or not ➔ Creating Reports ◆ Daily reporting data for ad team ◆ Weekly report of ad viewing data for site team ➔ GDPR Registry Extract ◆ Collect data from multiple different sources ◆ Merge the data ◆ Send data to be viewed by the user 32
  • 33. Lessons Learned Cloud selection is influenced by data location Most data for the use cases we started with was on Google Cloud Storage / BigQuery incurring extra development time and cost to exfiltrate that data. Kubernetes? Same platform as other teams + great support from infrastructure platform team. No Spark cluster maintenance, tweaking, debugging. Autoscaling works, but some challenges for batch jobs. 33
  • 34. Summary Use case driven development == Short Time to Production First pipeline in 3 weeks Small team 2-4 People Keep it simple 10-15 Pipelines 34