Getting data into Rudder


Published on

Rudder recently got new features allowing to integrate data from various sources into the configuration policies. This talk will cover the data management workflow in Rudder, including the improvements in 4.0 and 4.1, focusing on real practical usecases.

In particular, we will go through the possible data flows: the data sources, that can be local to the server, the node or fetched from a remote API or another node, the data manipulation tools, in the server or in the policies, and finally the ways to use this data in the policies (as directive parameters, templating data, etc.)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Getting data into Rudder

  1. 1. Getting data into CfgMgmtCamp 2017
  2. 2. Introduction ● Rudder developer at Normation ● Sysadmin ● Contact:, amousset on #rudder I mainly work on: ● ncf ● Techniques ● Packaging and system integration ● Documentation ● Configuration Agent (CFEngine)
  3. 3. Getting data into Rudder? ● Not a changelog, but the Big Picture ● Reminder about data flows in Rudder ● Explore new features ● Give examples and use cases ● From anywhere to applied policies
  4. 4. Which data are we talking about? Code + Data => Desired State Techniques ncf framework Directives Groups Inventories Node Properties System State What the agent actually does on the system
  5. 5. Data Types ● String (everything is a string) ● Classes (~boolean) defined or not, represented by a string – package_present_htop_kept – centos_7 ● Iterator (also called an array) – my_array = [ “item1”, “item2” ] – ${my_array} will be evaluated twice, once with item1 and once with item2 ● Dict (~JSON) – my_dict = { “key1”: “value1”, “key2”: [...] } – ${my_dict[key1]} gives value1 – Automatically mapped to an iterator when necessary
  6. 6. Local DataRun Generation Server DataCode System
  7. 7. Demo Environment ● Rudder 4.1 beta 2 ● The objective is to configure a loadbalancer with Nginx on a CentOS 7 server ● We have access to an API describing the backend servers to use, depending on the environment and location ● We have a Rudder root server (called server) and the loadbalancer (called lb1).
  8. 8. Server Side
  9. 9. Data Manipulation - Generation ● Before policy generation ● Link data to a Node – Inventory Data – Node Properties – Directive Parameters ● Transform data – Default values – Javascript Engine
  10. 10. Inventory Data ● Used to define a group, which is based on a query on inventory data ● A very limited set can be used in configuration (node’s hostname and id, information about its policy server, etc.)
  11. 11. Node properties ● Key-Value Data – Key is a string – Value is a string or a JSON object ● Associated to a node ● Stored on the root server
  12. 12. Node properties ● Let’s define two simple node properties: – “env” = “production” – “location” = “eu” ● With the following command: – curl -H "X-API-Token: token" -H "Content-Type: application/json" -k -X POST https://server/rudder/api/latest/nodes/dba62005-061b-485e- a2d5-7fc750c8a8f3 -d ‘{"properties": [ { "name": "env" , "value": "production" }, { "name": "location", "value": "eu" } ]}’
  13. 13. Node Properties In the inventory of our node: Use them: ● with ${[key]} in directives parameters ● as group definition criteria
  14. 14. Datasources ● Standard Node properties are push-based ● Datasources are pull-based: – Will automatically define Node properties – Using an HTTP API – Can select a part of the response
  15. 15. Datasources ● We have an api on: ● http://api.rudder.local/environment/configuration.json ● That returns as JSON containing our configuration, in the form: { “eu”: …, “us”: …. }
  16. 16. Datasources ● Let’s define a datasource called “Remove Configuration” ● URL: http://api.rudder.local/${[env]}/configuration. json ● JSON Path: $.${[location]} – $. is the JSON object root – $.key is the value if the key sub-object ● We could also have node-specific data using its id, or using the external id as a node property
  17. 17. Datasources ● After generation, we get:
  18. 18. Javascript Engine ● Allows transforming data in Directive Parameters – "${variable}".substring(0,3) – rudder.hash.sha512(string) –"SHA512", password , salt)
  19. 19. Server Side Data - Summary ● Node properties, defined by an API call or a datasource – ${[key]} – ${[key] | default = "value" } – ${[key] | default = "${variable}" } ● Directive Parameters (that can include node properties) – ${generic_variable_definition.variable_name} ● Global Parameters – ${rudder.param.parameter_name} ● Inventory Data – ${}
  20. 20. Server Side Data - Summary ● All those variables can be used in directive parameters, datasource parameters, and some of them in the Technique Editor. ● Javascript Engine allows easy text manipulation on variables
  21. 21. GenerationTechniques External API Rudder API Web User Node Properties Directive Parameter Global Parameters Inventory Data Groups/Rules Policies
  22. 22. Node Side
  23. 23. Data Manipulation - Runtime ● After policy generation, on the node ● Load dynamic data into policy – Local files – Remote files (from the server or another node) – System state ● Transform Data – Extract – Merge
  24. 24. Define data on the host ● Generic Methods – variable_dict_from_file – variable_iterator_from_file – variable_string_from_file ● Technique – CFEngine variable definition using a JSON file
  25. 25. Define data on the host ● Copy remote files: – sharedfile_from_node: New in 4.1, uses a new relay API the share securely files between nodes. – file_copy_from_remote_source: Copy files from the root server. ● These files can be data files loaded after being copied
  26. 26. Managing data on the Node ● We will now override the server-defined data with a local information. We have a port number in /etc/lb.json on lb1: { "port": 8080 } ● We will use variable_dict_merge to merge the configuration that comes from the API with the local configuration.
  27. 27. Using those parameters in a template Mustache template for our nginx configuration: http { {{#vars.configuration.loadbalancer.upstreams}} upstream {{{name}}} { {{#servers}} server {{{host}}} weight={{{weight}}}; {{/servers}} } {{/vars.configuration.loadbalancer.upstreams}} ...
  28. 28. Templating in Rudder - Reminder ● Mustache: Default templating system, a lot faster in Rudder. ● Jinja2: allows a lot more inside of the template than Mustache. Requires a recent agent and is quite slower than Mustache templating.
  29. 29. Node Side- Summary ● Load data from local or remote (policy server or other node) files ● Ability to override dict with others ● Allows: – Using local data in to policies – Communication with other local tools ● System variables and classes: inventory data
  30. 30. Local FilesRun System Shared File On Other Node Shared File On Policy Server Node Node Server Policies
  31. 31. Best Practices ● Dicts are good for encapsulated data, to be passed from outside Rudder to the agent. ● Overriding: – Generic Variable Definition + Priority (generation) – ${variable | default= “default value”} (generation) – variable_dict_merge(default, override) (runtime) ● Define a naming convention for namespace + variable name. ● Execution order of runtime data manipulation is tricky, and relations difficult to track. Try to encapsulate as much as possible
  32. 32. Best Practices ● Datasources properties are deleted on 404 errors ● Provide sane defaults ● Use templating (fails on error, maintainable), add checks for other sensitive actions ● Generation time is safer, only use runtime data management when necessary (secret, local communication, etc.)
  33. 33. What’s next? ● Directive parameters for techniques created with the Technique Editor ● Other levels of properties for easier override (Group Properties, etc.) ● Other sources of properties on the server (Databases, etc.) ● Integration with other tools ● Data explorability: autocompletion, etc. ● Security, encryption
  34. 34. Questions?