Ever find yourself needing data pipelines to feed a hungry data-driven culture, but not sure where to start, or what features are essential? In this talk, I will demonstrate a baseline data pipeline infrastructure built with Jenkins and Docker EE that checks all the boxes. Data pipelines often exist as that mysterious plumbing buried underground: occasionally inspected, but largely prone to silent failures and the ensuing hot fixes. Join the quest to daylight the infrastructure and benefit!
2. Data pipelines are software constructs concerned with
transforming business data into actionable insights.
• Complex joins using multiple data sources
• Resolve data dependencies
• Clean, conform data across the organization
Data Pipelines
Primer
4. Wishlist
Data Pipelines
Systems with the following traits builds trust with stakeholders and
ensure both developer and operational agility
• Visibility
• Manageability
• Alerting and Monitoring
• Extensibility (plugins)
5. • Visibility — multiple Dashboard options
• Management — Parameterized Jobs, Job History
• Extensibility: plugin support + 3rd party integration
• Vibrant community + active development
• De-facto Continuous Integration/Delivery Engine
• Pipeline DSL
Jenkins
Data Pipelines
11. • Swarm resilient to node failures
• Declarative service model
• Robust secrets management
• Plugins (storage, network, and authentication)
Docker Swarm
12. Persistent Data
Jenkins uses the filesystem to store
state. Volume plugin is necessary to
ensure data is available to service.
REX-Ray plugin supports multiple cloud
storage platforms including Azure,
AWS, and Google
14. Docker Secrets
Jenkins production, staging and development
instances can employ an identical naming
scheme
• Consistent interface across services
• Secrets may be associated with multiple
services
• Secrets may be aliased
19. Demo: Scaling Jenkins Workers
Using Jenkins Swarm Plugin (no relation to Docker Swarm) to auto-
register Jenkins workers with Jenkins Master
• UCP broadcast doesn’t work over an Overlay net
• Requires credentials and matching Jenkins URL value
• JNLP-based mechanism registers Jenkins worker w/ master
• Usually run custom build image(s) and use labels to
disambiguate
20. Demo: Scaling Workers
List Swarm Nodes
* aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.State.Name
== "running") | [.PublicIpAddress,.PrivateIpAddress,.InstanceId,[(.Tags |
sort_by(.Key) | .[] | select(.Key | contains("swarm-node-type",
"aws:cloudformation:stack-name"))) | .Value][0,1,2]] | @tsv’
Scale up to 6 Jenkins workers on 3 Swarm worker nodes
* docker service scale mydemo_jenkins_worker=6
Observe, then return to 3 Jenkins workers on 3 Swarm worker node
* watch -n 1 --differences docker service ps mydemo_jenkins_worker
* docker service scale mydemo_jenkins_worker=3
24. Demo: Data following service
Using Docker EE AWS + REX-Ray/efs volume plugin
• Ensure instance IAM role has permission to manage EFS volumes
• Ensure EFS_SECURITYGROUPS matches instance Security Group
• Use: “—alias” to insulate against tag version lock-in
• Use: swarm-exec docker plugin install --grant-all-permissions
rexray/efs EFS_SECURITYGROUPS=<SwarmWideSG>
25. Demo: Data following service
List Swarm Nodes
* aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.State.Name == "running") |
[.PublicIpAddress,.PrivateIpAddress,.InstanceId,[(.Tags | sort_by(.Key) | .[] | select(.Key |
contains("swarm-node-type", "aws:cloudformation:stack-name"))) | .Value][0,1,2]] | @tsv’
Update Jenkins Master to run on Swarm worker node
* docker service update --constraint-rm "node.role==manager" --constraint-add
"node.role==worker" mydemo_jenkins_master
Observe, then return Jenkins Master to Swarm manager node
* watch -n 1 --differences docker service ps mydemo_jenkins_master
* docker service update --constraint-rm "node.role==worker" --constraint-add
"node.role==manager" mydemo_jenkins_master
28. So much more to discuss…
Data Pipelines
• Incremental and Idempotent processing
• Operational metrics and Process
Intelligence
• Data Lineage using Jenkins
• Data Validation with xUnit