SlideShare a Scribd company logo
1 of 26
Download to read offline
Min Tu Pradhan Cadabam
Gobblin Configuration
Management
Gobblin Meetup June 2016
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
Job Configs Vs. Dataset Configs
Copy Job
- Permission for loginEvent 700
- Permission for logoutEvent 777
Option 1 : One job per dataset
- Too many jobs
- Long whitelist
- Difficult to maintain
Option 2 : Prefix
- Too many configs
- Can not have single config for
all datasets with same
permissions
/events/loginEvent
/events/logoutEvent
/events/loginEvent - 700
/events/logoutEvent - 777
Source Destination
Copy Job 1 Copy Job
2
dest.permission = 700
whitelist = loginEvent
dest.permission = 777
whitelist = logoutEvent
loginEvent.dest.permission = 700
logoutEvent.dest.permission = 777
Copy Job with prefix
Data Life Cycle Management Configs
/events/loginEvent_Avro /events/loginEvent_Orc
/events/loginEvent_Orc Retention Job
Conversion Job
Copy Job
• Shared configs across jobs
• Destination path of conversion job is source path of copy job
• Retention job works on destination path of copy job
• Dataset needs to be enabled in all jobs
/events/loginEvent_Orc
/events/loginEvent_Orc
Retention Job
Retention Job
Other Motivations
• New version of configs should be deployable
without deploying new binaries
• Should be easy to rollback to previous stable
version of configs
• Config changes should have an audit trail
• Complex value types and substitution resolution
support
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
At a very high-level, we extend typesafe config with:
• Abstraction of a Config Store
• Config versioning
• Support for logical “import” URIs
• Ability to traverse the ”import” relationships
Dataset Configuration Management
Architecture
Client Application
ConfigClient API
ConfigStore API
HadoopF
S
Store
Hive
MetaStor
e
Adapter
MySQL
Adapter
Zookeepe
r
Adapter
…
Data Model
Config Store
Dataset config key (URI):
/events/loginEvent
Key1: value1
Key2: value2
…
KeyM: valueM
Dataset config key (URI):
/events
Tag config key(URI):
/tags
imports
Imported by
Tag config key(URI):
/tags/highPriority
keyA: valueX
keyB: valueY
Implicit import Implicit import
HOCON format
• Support Java Properties file
• Support Json file
• Value substitution
• “+=“ syntax to append elements to arrays, path += "/bin”
• …
gobblin.retention : {
selection {
timeBased.lookbackTime=3y
}
}
Using Configs in code
ConfigClient client =
ConfigClient.createConfigClient(VersionStabilityPolicy policy);
Config config = client.getConfig(URI uri);
Collection<URI> imports = client.getImports(URI dataset, boolean recursive);
Collection<URI> importedBy = client.getImportedBy(URI tag, boolean recursive);
Config lifecycle at LinkedIn
Example of a config store on HDFS
ROOT
├── _CONFIG_STORE // contents = latest non-rolled-back version
├── 1.0.53 // version directory
├── events
│ ├── main.conf
│ ├── loginEvent
│ │ └── main.conf // configuration file for /events/loginEvent
│ │ └── includes.conf // specify import links for /events/loginEvent
│ ├── shareEvent
│ │ └── includes.conf
│ └── clickEvent
│ └── includes.conf
│
└── tags
├── highPriority
│ └── main.conf // configuration file for /tags/highPriority
│ └── includes.conf // specify import links for /tags/highPriority
├── blacklist
└── 10Days
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
Retention
├── events
├── loginEvent
│ ├── 2016-06-20.avro
│ └── 2016-06-25.avro
└── logoutEvent
├── 2016-05-10.avro
└── 2016-06-10.avro
├── events
├── loginEvent
│ └── 2016-06-25.avro
└── logoutEvent
└── 2016-06-10.avro
• Deleting data that is not required
• Most common retention policy is to delete data older than some days
Example
• Retention policy of 10 days for loginEvent
• Retention policy of 30 days for logoutEvent
Before Retention After Retention
More complex use cases in Production
• Default retention policy of 30 days for all events
• Retention policy of 10 days for loginEvent
• Blacklist retention for clickEvent
• 3 years retention for high priority events like shareEvent
● “events” is the common parent block for “shareEvent”, “loginEvent”,
“logoutEvent”, “clickEvent”
● Each block implicitly imports configs from the parent block, “logoutEvent”
implicitly imports “events” (Dashed lines)
● Any block can explicitly import any other block (Solid lines)
● A child block overrides any key value pairs specified in the parent block
Retention Config
● “logoutEvent” inherits the default retention of 30 days from implicit import,
“events”
logoutEvent 30 Days
● “loginEvent” inherits the default retention of 30 days from implicit import,
“events”
● “loginEvent” defines a 10 days policy which overrides the 30 days inherited
from “events”
loginEvent 10 Days
● “shareEvent” explicitly imports a high priority tag which has retention of 3
years
● “clickEvent” explicitly imports blacklist tag which disables retention for
“clickEvent”
Retention Config for share/clickEvent
├── events
│ ├── main.conf // Default 30 Days
│ ├── loginEvent
│ │ └── main.conf // 10 Days
│ ├── shareEvent
│ │ └── includes.conf // Import /tags/highPriority
│ └── clickEvent
│ └── includes.conf // Import /tags/blacklist
│
└── tags
├── highPriority
│ └── main.conf // Define 3 Years retention
└── blacklist
HDFS Config store
Retention Config Examples
/events/main.conf
gobblin.retention : {
dataset : {
finder.class=gobblin.data.management.retention.CleanableDatasetFinder
pattern="/events/*"
}
selection {
policy.class = gobblin.data.management.SelectBeforeTimeBasedSelectionPolicy
timeBased.lookbackTime=30d
}
version : {
finder.class=gobblin.data.management.DateTimeDatasetVersionFinder
}
}
gobblin.retention : {
selection {
timeBased.lookbackTime=3y
}
}
/tags/highPriority/main.conf
Supported Policies
• SelectBeforeTimeBasedSelectionPolicy
• NewestKSelectionPolicy
• DailyDependentHourlyPolicy
• CombineSelectionPolicy
More policies - http://gobblin.readthedocs.io/en/latest/data-management/Gobblin-
Retention/
Future work
• Config stores other than Hdfs based config store
• Improve tooling, validation and UI for config store
deployment
Questions

More Related Content

What's hot

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkDatabricks
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Sangjin Lee
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
My Favorite PostgreSQL Books
My Favorite PostgreSQL BooksMy Favorite PostgreSQL Books
My Favorite PostgreSQL BooksEDB
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 

What's hot (20)

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
My Favorite PostgreSQL Books
My Favorite PostgreSQL BooksMy Favorite PostgreSQL Books
My Favorite PostgreSQL Books
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 

Similar to Gobbin config-meetup-june-2016

Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Ville Mattila
 
Building Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksBuilding Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksMike Hugo
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - RundeckNeil McCaughley
 
國民雲端架構 Django + GAE
國民雲端架構 Django + GAE國民雲端架構 Django + GAE
國民雲端架構 Django + GAEWinston Chen
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...GITS Indonesia
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with LuigiTeemu Kurppa
 
7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script TaskPramod Singla
 
Spring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodeSpring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodePurnima Kamath
 
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rulesSrijan Technologies
 
Web Standards Support in WebKit
Web Standards Support in WebKitWeb Standards Support in WebKit
Web Standards Support in WebKitJoone Hur
 
Odoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSOdoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSElínAnna Jónasdóttir
 
GitPro Whitepaper
GitPro WhitepaperGitPro Whitepaper
GitPro WhitepaperERP Buddies
 
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...Edureka!
 
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...Rudy Jahchan
 
Open Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java DevelopersOpen Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java Developerscboecking
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWeaveworks
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 

Similar to Gobbin config-meetup-june-2016 (20)

Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
 
Building Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksBuilding Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And Tricks
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - Rundeck
 
國民雲端架構 Django + GAE
國民雲端架構 Django + GAE國民雲端架構 Django + GAE
國民雲端架構 Django + GAE
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task
 
Spring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodeSpring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who Code
 
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
 
Web Standards Support in WebKit
Web Standards Support in WebKitWeb Standards Support in WebKit
Web Standards Support in WebKit
 
Odoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSOdoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMS
 
GitPro Whitepaper
GitPro WhitepaperGitPro Whitepaper
GitPro Whitepaper
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
 
Tips & Tricks for Maven Tycho
Tips & Tricks for Maven TychoTips & Tricks for Maven Tycho
Tips & Tricks for Maven Tycho
 
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
 
Open Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java DevelopersOpen Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java Developers
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to Django
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 

Recently uploaded

Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfMind IT Systems
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...jackiepotts6
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckNaval Singh
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJpolinaucc
 
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevpmgdscunsri
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdfOffsiteNOC
 
Boost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made EasyBoost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made Easymichealwillson701
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfCloudMetic
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial FrontiersRaphaël Semeteys
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
 
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxCYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxBarakaMuyengi
 
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...telebusocialmarketin
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurPriyadarshini T
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfICS
 
Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developersmichealwillson701
 
Mobile App Development company Houston
Mobile  App  Development  company HoustonMobile  App  Development  company Houston
Mobile App Development company Houstonjennysmithusa549
 
Revolutionize Your Field Service Management with FSM Grid
Revolutionize Your Field Service Management with FSM GridRevolutionize Your Field Service Management with FSM Grid
Revolutionize Your Field Service Management with FSM GridMathew Thomas
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tipsmichealwillson701
 

Recently uploaded (20)

Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deck
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJ
 
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf
 
20140812 - OBD2 Solution
20140812 - OBD2 Solution20140812 - OBD2 Solution
20140812 - OBD2 Solution
 
Boost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made EasyBoost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made Easy
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdf
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
 
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxCYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
 
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdf
 
Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developers
 
Mobile App Development company Houston
Mobile  App  Development  company HoustonMobile  App  Development  company Houston
Mobile App Development company Houston
 
Revolutionize Your Field Service Management with FSM Grid
Revolutionize Your Field Service Management with FSM GridRevolutionize Your Field Service Management with FSM Grid
Revolutionize Your Field Service Management with FSM Grid
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tips
 

Gobbin config-meetup-june-2016

  • 1. Min Tu Pradhan Cadabam Gobblin Configuration Management Gobblin Meetup June 2016
  • 2. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 3. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 4. Job Configs Vs. Dataset Configs Copy Job - Permission for loginEvent 700 - Permission for logoutEvent 777 Option 1 : One job per dataset - Too many jobs - Long whitelist - Difficult to maintain Option 2 : Prefix - Too many configs - Can not have single config for all datasets with same permissions /events/loginEvent /events/logoutEvent /events/loginEvent - 700 /events/logoutEvent - 777 Source Destination Copy Job 1 Copy Job 2 dest.permission = 700 whitelist = loginEvent dest.permission = 777 whitelist = logoutEvent loginEvent.dest.permission = 700 logoutEvent.dest.permission = 777 Copy Job with prefix
  • 5. Data Life Cycle Management Configs /events/loginEvent_Avro /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Conversion Job Copy Job • Shared configs across jobs • Destination path of conversion job is source path of copy job • Retention job works on destination path of copy job • Dataset needs to be enabled in all jobs /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Retention Job
  • 6. Other Motivations • New version of configs should be deployable without deploying new binaries • Should be easy to rollback to previous stable version of configs • Config changes should have an audit trail • Complex value types and substitution resolution support
  • 7. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 8. At a very high-level, we extend typesafe config with: • Abstraction of a Config Store • Config versioning • Support for logical “import” URIs • Ability to traverse the ”import” relationships Dataset Configuration Management
  • 9. Architecture Client Application ConfigClient API ConfigStore API HadoopF S Store Hive MetaStor e Adapter MySQL Adapter Zookeepe r Adapter …
  • 10. Data Model Config Store Dataset config key (URI): /events/loginEvent Key1: value1 Key2: value2 … KeyM: valueM Dataset config key (URI): /events Tag config key(URI): /tags imports Imported by Tag config key(URI): /tags/highPriority keyA: valueX keyB: valueY Implicit import Implicit import
  • 11. HOCON format • Support Java Properties file • Support Json file • Value substitution • “+=“ syntax to append elements to arrays, path += "/bin” • … gobblin.retention : { selection { timeBased.lookbackTime=3y } }
  • 12. Using Configs in code ConfigClient client = ConfigClient.createConfigClient(VersionStabilityPolicy policy); Config config = client.getConfig(URI uri); Collection<URI> imports = client.getImports(URI dataset, boolean recursive); Collection<URI> importedBy = client.getImportedBy(URI tag, boolean recursive);
  • 14. Example of a config store on HDFS ROOT ├── _CONFIG_STORE // contents = latest non-rolled-back version ├── 1.0.53 // version directory ├── events │ ├── main.conf │ ├── loginEvent │ │ └── main.conf // configuration file for /events/loginEvent │ │ └── includes.conf // specify import links for /events/loginEvent │ ├── shareEvent │ │ └── includes.conf │ └── clickEvent │ └── includes.conf │ └── tags ├── highPriority │ └── main.conf // configuration file for /tags/highPriority │ └── includes.conf // specify import links for /tags/highPriority ├── blacklist └── 10Days
  • 15. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 16. Retention ├── events ├── loginEvent │ ├── 2016-06-20.avro │ └── 2016-06-25.avro └── logoutEvent ├── 2016-05-10.avro └── 2016-06-10.avro ├── events ├── loginEvent │ └── 2016-06-25.avro └── logoutEvent └── 2016-06-10.avro • Deleting data that is not required • Most common retention policy is to delete data older than some days Example • Retention policy of 10 days for loginEvent • Retention policy of 30 days for logoutEvent Before Retention After Retention
  • 17. More complex use cases in Production • Default retention policy of 30 days for all events • Retention policy of 10 days for loginEvent • Blacklist retention for clickEvent • 3 years retention for high priority events like shareEvent
  • 18. ● “events” is the common parent block for “shareEvent”, “loginEvent”, “logoutEvent”, “clickEvent” ● Each block implicitly imports configs from the parent block, “logoutEvent” implicitly imports “events” (Dashed lines) ● Any block can explicitly import any other block (Solid lines) ● A child block overrides any key value pairs specified in the parent block Retention Config
  • 19. ● “logoutEvent” inherits the default retention of 30 days from implicit import, “events” logoutEvent 30 Days
  • 20. ● “loginEvent” inherits the default retention of 30 days from implicit import, “events” ● “loginEvent” defines a 10 days policy which overrides the 30 days inherited from “events” loginEvent 10 Days
  • 21. ● “shareEvent” explicitly imports a high priority tag which has retention of 3 years ● “clickEvent” explicitly imports blacklist tag which disables retention for “clickEvent” Retention Config for share/clickEvent
  • 22. ├── events │ ├── main.conf // Default 30 Days │ ├── loginEvent │ │ └── main.conf // 10 Days │ ├── shareEvent │ │ └── includes.conf // Import /tags/highPriority │ └── clickEvent │ └── includes.conf // Import /tags/blacklist │ └── tags ├── highPriority │ └── main.conf // Define 3 Years retention └── blacklist HDFS Config store
  • 23. Retention Config Examples /events/main.conf gobblin.retention : { dataset : { finder.class=gobblin.data.management.retention.CleanableDatasetFinder pattern="/events/*" } selection { policy.class = gobblin.data.management.SelectBeforeTimeBasedSelectionPolicy timeBased.lookbackTime=30d } version : { finder.class=gobblin.data.management.DateTimeDatasetVersionFinder } } gobblin.retention : { selection { timeBased.lookbackTime=3y } } /tags/highPriority/main.conf
  • 24. Supported Policies • SelectBeforeTimeBasedSelectionPolicy • NewestKSelectionPolicy • DailyDependentHourlyPolicy • CombineSelectionPolicy More policies - http://gobblin.readthedocs.io/en/latest/data-management/Gobblin- Retention/
  • 25. Future work • Config stores other than Hdfs based config store • Improve tooling, validation and UI for config store deployment

Editor's Notes

  1. Config versioning – For stable config store once the a version has been deployed it should not be changed
  2. - two logic types: dataset, tags - both in tree hierarchy  - inherent/override parent/imported tags
  3. Fix font Add hocon example
  4. CROSS_JVM_STABILITY STRONG_LOCAL_STABILITY WEAK_LOCAL_STABILITY READ_FRESHEST Handle case: config updated while version released ( configStore is NOT stable) Calling getConfig multiple times , get same value?
  5. Each directory is one config node (dataset, tags )