In Data Engineer’s Lunch #20: DataOps vs DevOps, we discuss the definitions and differences between DataOps (Data Operations) and DevOps (Dev Operations).
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-20-dataops-vs-devops/
Accompany YouTube: https://youtu.be/KEYO5DN9J1w
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
2. Topics to cover
● Data Operations (DataOps)
○ Data Engineering
○ Analytics
○ Data Science
○ BI
● Dev Operations (DevOps)
○ Software Engineering
○ Code Reviews
○ Continuous Testing
○ Monitoring
3. Dev Operations (DevOps)
● DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from
design through the development process to production support.
● DevOps is also characterized by operations staff making use many of the same techniques as developers for their systems work.
● Values
○ People over Processes over Tools
● Principles
○ Infrastructure as Code
○ Do it Right / Do it Once
● Practices
○ Source Control
○ Config management
○ Metrics
○ Monitoring
4. Dev Operations (DevOps) Tools
● CICD (Jenkins, CircleCI , TeamCity, ADO, Google Build, AWS DevOps)
○ Continuous Integration (automated)
■ Pull down the code when code is committed
■ Build it
■ Unit Tests
■ Run it / run some tests
○ Continuous Delivery
■ One button deployment of the whole stack
■ Looks good?
■ Push to stage
■ Looks good?
■ Push to prod
6. Data Operations (DataOps)
● DataOps (data operations) is an Agile approach to designing, implementing and maintaining a distributed
data architecture that will support a wide range of open source tools and frameworks in production. The
goal of DataOps is to create business value from big data.
● Values
○ People over Processes over Tools (Agile)
● Principles
○ Infrastructure as Code
○ Do it Right / Do it Once
7. Data Operations (DataOps) cont.
● Practices
○ Inherited from DevOps
■ Source Control
■ Config Management
■ Metrics
■ Monitoring
○ Semantic Rules / Metadata
○ Feedback loops to Validate Data
○ Metrics for Execution
○ Automate as much of the processes of the data pipeline
○ Data Profiling
8. Data Operations (DataOps) Tools
● Tools
○ Inherited from DevOps
■ “CICD”
■ Config Management
■ Orchestration
■ Virtualization
■ Containerization
○ Scheduling (beyond cron)
■ Airflow
■ Jenkins
■ Luigi
■ Cloud Run
■ Kubernetes Pods for Jobs/CronJob
○ Data Catalogs
■ Giving the user one place to find the data