SlideShare a Scribd company logo
Metadata Driven Development
Content
• What is M3D?
• M3D Components
• Why we need it?
• M3D @ Adidas
• What’s to come?
What is it?
Framework
Cloud and platform agnostic
Easy development of new feats
Multiple source/target system
Creation of data lake envs
What is it?
M3D Components
M3D Engine M3D API
https://github.com/adidas/m3d-engine https://github.com/adidas/m3d-api
Features in action!
Creation of data lake envs
Inbound
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Features in action!
Creation of data lake envs
Inbound
Landing
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Raw
File
Features in action!
Creation of data lake envs
Inbound
Landing
Lake
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Features in action!
Creation of data lake envs
Inbound
Landing
Lake Out
View
View View
View
Lake
Features in action!
Inbound
Landing
Lake Out
Lake
Multiple source/target system
M3D
Target
Exasol
Target Y
Target
Target Z
AWS
Cloud and platform agnostic
Cloudera
Features in action!
Inbound
Landing
Lake
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Parqu
et
Easy development of new feats
Easily define new
readers/writers based on traits
Introduce transformations on
datasets
Add new APIs to M3D Api
Why We Need It?
Simplification of definition of new tables in lake
Easy way to schedule loads
Clear semantics defined globally
Single extraction from source systems
Simple definition of views as interface for applications
M3D @ Adidas
We have the so called “Cockpit” for definition of global semantics
Orchestration with Jenkins as a Service
Loading data to lake (example):
Place files in
Inbound layer
python m3d_main.py -function
create_emr_cluster  -
core_instance_type m4.large  -
master_instance_type m4.large  -
core_instance_count 3  -
destination_system emr  -
destination_database emr_database 
-destination_environment test  -
config config.json  -emr_version emr-
5.23.0
python m3d_main.py -function
create_table  -config config.json  -
destination_system emr  -
destination_database emr_database  -
destination_environment test  -
destination_table table_name  -
emr_cluster_id id-of-started-cluster
python m3d_main.py -function load_table  -config config.json  -
destination_system emr  -destination_database emr_database  -
destination_environment test  -destination_table table_name  -
load_type FullLoad  -emr_cluster_id id-of-started-cluster
M3D @ Adidas
9 TB In Memory Lake Size
150 TB Lake Size on Big Data Platform
500+ tables in Data Lake
> 900 views consumed by Products
M3D @ Adidas
Table
M3D @ Adidas
M3D @ Adidas
What’s to come?
Airflow as orchestration for M3D
Publish M3D Api as library
UI for quick setup of M3D Api/Engine
PySpark version for M3D engine
Thank you!
Questions?

More Related Content

What's hot

Embulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructureEmbulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructure
Hiroshi Toyama
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
Modern Data Stack France
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
Varun Thacker
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Outlyer
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
Alex Ivanov
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
Sadayuki Furuhashi
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
wqchen
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PROIDEA
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary Server
Ezra Zygmuntowicz
 
Distcp gobblin
Distcp gobblinDistcp gobblin
Distcp gobblin
Vasanth Rajamani
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
Pierre Baillet
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi
 
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Hw09   Building Data Intensive Apps  A Closer Look At Trending Topics.OrgHw09   Building Data Intensive Apps  A Closer Look At Trending Topics.Org
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Cloudera, Inc.
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
hkbhadraa
 
Building Hadoop Data Applications with Kite
Building Hadoop Data Applications with KiteBuilding Hadoop Data Applications with Kite
Building Hadoop Data Applications with Kite
huguk
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
huguk
 

What's hot (20)

Embulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructureEmbulk and Machine Learning infrastructure
Embulk and Machine Learning infrastructure
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
 
SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)SparkR - Play Spark Using R (20160909 HadoopCon)
SparkR - Play Spark Using R (20160909 HadoopCon)
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary Server
 
Distcp gobblin
Distcp gobblinDistcp gobblin
Distcp gobblin
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Hw09   Building Data Intensive Apps  A Closer Look At Trending Topics.OrgHw09   Building Data Intensive Apps  A Closer Look At Trending Topics.Org
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
 
Big data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands OnBig data Lambda Architecture - Batch Layer Hands On
Big data Lambda Architecture - Batch Layer Hands On
 
Building Hadoop Data Applications with Kite
Building Hadoop Data Applications with KiteBuilding Hadoop Data Applications with Kite
Building Hadoop Data Applications with Kite
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
 

Similar to M3D - Metadata Driven Development

Building a Scalable Asset Management (DAM) Platform in the AWS
Building a Scalable Asset Management (DAM) Platform in the AWSBuilding a Scalable Asset Management (DAM) Platform in the AWS
Building a Scalable Asset Management (DAM) Platform in the AWS
Rahul Shukla
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Amazon Web Services
 
Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
Amazon Web Services
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
Eva Tse
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
Amazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
Amazon Web Services
 
SnapDish AWS
SnapDish AWSSnapDish AWS
SnapDish AWS
Fumikazu Kiyota
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Databricks
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
Amazon Web Services
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
Amazon Web Services Korea
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
Amazon Web Services
 
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdfREPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
AkashGoel82
 
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
Effectively Scale and Operate AEM with MongoDB by Norberto LeiteEffectively Scale and Operate AEM with MongoDB by Norberto Leite
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
AEM HUB
 
M|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX PlatformM|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX Platform
MariaDB plc
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
MariaDB plc
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
DataWorks Summit
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
Norberto Leite
 

Similar to M3D - Metadata Driven Development (20)

Building a Scalable Asset Management (DAM) Platform in the AWS
Building a Scalable Asset Management (DAM) Platform in the AWSBuilding a Scalable Asset Management (DAM) Platform in the AWS
Building a Scalable Asset Management (DAM) Platform in the AWS
 
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
Building a Scalable Digital Asset Management Platform in the Cloud (MED402) |...
 
Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
SnapDish AWS
SnapDish AWSSnapDish AWS
SnapDish AWS
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdfREPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
REPEAT_1_Deep_dive_on_new_features_in_Amazon_RDS_for_SQL_Server_DAT364-R1(1).pdf
 
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
Effectively Scale and Operate AEM with MongoDB by Norberto LeiteEffectively Scale and Operate AEM with MongoDB by Norberto Leite
Effectively Scale and Operate AEM with MongoDB by Norberto Leite
 
M|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX PlatformM|18 What's New in the MariaDB AX Platform
M|18 What's New in the MariaDB AX Platform
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
 

Recently uploaded

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

M3D - Metadata Driven Development

  • 2. Content • What is M3D? • M3D Components • Why we need it? • M3D @ Adidas • What’s to come?
  • 3. What is it? Framework Cloud and platform agnostic Easy development of new feats Multiple source/target system Creation of data lake envs
  • 5. M3D Components M3D Engine M3D API https://github.com/adidas/m3d-engine https://github.com/adidas/m3d-api
  • 6. Features in action! Creation of data lake envs Inbound Raw File Raw File Raw File Raw File Raw File Raw File
  • 7. Features in action! Creation of data lake envs Inbound Landing Raw File Raw File Raw File Raw File Raw File Raw File
  • 8. Features in action! Creation of data lake envs Inbound Landing Lake Parqu et Parqu et Parqu et Parqu et Parqu et Parqu et
  • 9. Features in action! Creation of data lake envs Inbound Landing Lake Out View View View View Lake
  • 10. Features in action! Inbound Landing Lake Out Lake Multiple source/target system M3D Target Exasol Target Y Target Target Z AWS Cloud and platform agnostic Cloudera
  • 11. Features in action! Inbound Landing Lake Parqu et Parqu et Parqu et Parqu et Parqu et Parqu et Easy development of new feats Easily define new readers/writers based on traits Introduce transformations on datasets Add new APIs to M3D Api
  • 12. Why We Need It? Simplification of definition of new tables in lake Easy way to schedule loads Clear semantics defined globally Single extraction from source systems Simple definition of views as interface for applications
  • 13. M3D @ Adidas We have the so called “Cockpit” for definition of global semantics Orchestration with Jenkins as a Service Loading data to lake (example): Place files in Inbound layer python m3d_main.py -function create_emr_cluster - core_instance_type m4.large - master_instance_type m4.large - core_instance_count 3 - destination_system emr - destination_database emr_database -destination_environment test - config config.json -emr_version emr- 5.23.0 python m3d_main.py -function create_table -config config.json - destination_system emr - destination_database emr_database - destination_environment test - destination_table table_name - emr_cluster_id id-of-started-cluster python m3d_main.py -function load_table -config config.json - destination_system emr -destination_database emr_database - destination_environment test -destination_table table_name - load_type FullLoad -emr_cluster_id id-of-started-cluster
  • 14. M3D @ Adidas 9 TB In Memory Lake Size 150 TB Lake Size on Big Data Platform 500+ tables in Data Lake > 900 views consumed by Products
  • 18. What’s to come? Airflow as orchestration for M3D Publish M3D Api as library UI for quick setup of M3D Api/Engine PySpark version for M3D engine