Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gobbin config-meetup-june-2016

287 views

Published on

Config management techniques to allow data set and cluster overrides.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Gobbin config-meetup-june-2016

  1. 1. Min Tu Pradhan Cadabam Gobblin Configuration Management Gobblin Meetup June 2016
  2. 2. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  3. 3. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  4. 4. Job Configs Vs. Dataset Configs Copy Job - Permission for loginEvent 700 - Permission for logoutEvent 777 Option 1 : One job per dataset - Too many jobs - Long whitelist - Difficult to maintain Option 2 : Prefix - Too many configs - Can not have single config for all datasets with same permissions /events/loginEvent /events/logoutEvent /events/loginEvent - 700 /events/logoutEvent - 777 Source Destination Copy Job 1 Copy Job 2 dest.permission = 700 whitelist = loginEvent dest.permission = 777 whitelist = logoutEvent loginEvent.dest.permission = 700 logoutEvent.dest.permission = 777 Copy Job with prefix
  5. 5. Data Life Cycle Management Configs /events/loginEvent_Avro /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Conversion Job Copy Job • Shared configs across jobs • Destination path of conversion job is source path of copy job • Retention job works on destination path of copy job • Dataset needs to be enabled in all jobs /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Retention Job
  6. 6. Other Motivations • New version of configs should be deployable without deploying new binaries • Should be easy to rollback to previous stable version of configs • Config changes should have an audit trail • Complex value types and substitution resolution support
  7. 7. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  8. 8. At a very high-level, we extend typesafe config with: • Abstraction of a Config Store • Config versioning • Support for logical “import” URIs • Ability to traverse the ”import” relationships Dataset Configuration Management
  9. 9. Architecture Client Application ConfigClient API ConfigStore API HadoopF S Store Hive MetaStor e Adapter MySQL Adapter Zookeepe r Adapter …
  10. 10. Data Model Config Store Dataset config key (URI): /events/loginEvent Key1: value1 Key2: value2 … KeyM: valueM Dataset config key (URI): /events Tag config key(URI): /tags imports Imported by Tag config key(URI): /tags/highPriority keyA: valueX keyB: valueY Implicit import Implicit import
  11. 11. HOCON format • Support Java Properties file • Support Json file • Value substitution • “+=“ syntax to append elements to arrays, path += "/bin” • … gobblin.retention : { selection { timeBased.lookbackTime=3y } }
  12. 12. Using Configs in code ConfigClient client = ConfigClient.createConfigClient(VersionStabilityPolicy policy); Config config = client.getConfig(URI uri); Collection<URI> imports = client.getImports(URI dataset, boolean recursive); Collection<URI> importedBy = client.getImportedBy(URI tag, boolean recursive);
  13. 13. Config lifecycle at LinkedIn
  14. 14. Example of a config store on HDFS ROOT ├── _CONFIG_STORE // contents = latest non-rolled-back version ├── 1.0.53 // version directory ├── events │ ├── main.conf │ ├── loginEvent │ │ └── main.conf // configuration file for /events/loginEvent │ │ └── includes.conf // specify import links for /events/loginEvent │ ├── shareEvent │ │ └── includes.conf │ └── clickEvent │ └── includes.conf │ └── tags ├── highPriority │ └── main.conf // configuration file for /tags/highPriority │ └── includes.conf // specify import links for /tags/highPriority ├── blacklist └── 10Days
  15. 15. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  16. 16. Retention ├── events ├── loginEvent │ ├── 2016-06-20.avro │ └── 2016-06-25.avro └── logoutEvent ├── 2016-05-10.avro └── 2016-06-10.avro ├── events ├── loginEvent │ └── 2016-06-25.avro └── logoutEvent └── 2016-06-10.avro • Deleting data that is not required • Most common retention policy is to delete data older than some days Example • Retention policy of 10 days for loginEvent • Retention policy of 30 days for logoutEvent Before Retention After Retention
  17. 17. More complex use cases in Production • Default retention policy of 30 days for all events • Retention policy of 10 days for loginEvent • Blacklist retention for clickEvent • 3 years retention for high priority events like shareEvent
  18. 18. ● “events” is the common parent block for “shareEvent”, “loginEvent”, “logoutEvent”, “clickEvent” ● Each block implicitly imports configs from the parent block, “logoutEvent” implicitly imports “events” (Dashed lines) ● Any block can explicitly import any other block (Solid lines) ● A child block overrides any key value pairs specified in the parent block Retention Config
  19. 19. ● “logoutEvent” inherits the default retention of 30 days from implicit import, “events” logoutEvent 30 Days
  20. 20. ● “loginEvent” inherits the default retention of 30 days from implicit import, “events” ● “loginEvent” defines a 10 days policy which overrides the 30 days inherited from “events” loginEvent 10 Days
  21. 21. ● “shareEvent” explicitly imports a high priority tag which has retention of 3 years ● “clickEvent” explicitly imports blacklist tag which disables retention for “clickEvent” Retention Config for share/clickEvent
  22. 22. ├── events │ ├── main.conf // Default 30 Days │ ├── loginEvent │ │ └── main.conf // 10 Days │ ├── shareEvent │ │ └── includes.conf // Import /tags/highPriority │ └── clickEvent │ └── includes.conf // Import /tags/blacklist │ └── tags ├── highPriority │ └── main.conf // Define 3 Years retention └── blacklist HDFS Config store
  23. 23. Retention Config Examples /events/main.conf gobblin.retention : { dataset : { finder.class=gobblin.data.management.retention.CleanableDatasetFinder pattern="/events/*" } selection { policy.class = gobblin.data.management.SelectBeforeTimeBasedSelectionPolicy timeBased.lookbackTime=30d } version : { finder.class=gobblin.data.management.DateTimeDatasetVersionFinder } } gobblin.retention : { selection { timeBased.lookbackTime=3y } } /tags/highPriority/main.conf
  24. 24. Supported Policies • SelectBeforeTimeBasedSelectionPolicy • NewestKSelectionPolicy • DailyDependentHourlyPolicy • CombineSelectionPolicy More policies - http://gobblin.readthedocs.io/en/latest/data-management/Gobblin- Retention/
  25. 25. Future work • Config stores other than Hdfs based config store • Improve tooling, validation and UI for config store deployment
  26. 26. Questions

×