Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 39 Ad

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

Download to read offline

In Data Engineer's Lunch #60, Rahul Singh, CEO here at Anant, will discuss modern data processing/pipeline approaches.

Want to learn about modern data engineering patterns & practices for global data platforms? A high-level overview of different types, frameworks, and workflows in data processing and pipeline design.

In Data Engineer's Lunch #60, Rahul Singh, CEO here at Anant, will discuss modern data processing/pipeline approaches.

Want to learn about modern data engineering patterns & practices for global data platforms? A high-level overview of different types, frameworks, and workflows in data processing and pipeline design.

Advertisement
Advertisement

More Related Content

More from Anant Corporation (20)

Recently uploaded (20)

Advertisement

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

  1. 1. Developing Enterprise Consciousness Global Data & Analytics Platforms on Cloud Neutral Systems Rahul Xavier Singh Anant Corporation Confidential - Not to be Distributed Except to Pre-Approved Audiences. Copyright Anant 2022
  2. 2. Data & Analytics platforms are the central nervous system of business platforms. Every business, small or large has a need to get connected.
  3. 3. Challenge Business Platform Playbook Framework Approach Technology Management Solutions Data Engineering / Operations Cloud Engineering / Operations Implementation
  4. 4. Business Challenge
  5. 5. Platform Thinking
  6. 6. Enterprise Consciousness People, Processes, Information, and Systems are connected and in sync.
  7. 7. 7 Beyond 12 Factor … “Enterprise Consciousness” ● Current Business Information is available to the user / customer in the swiftest way possible within the bounds of reasonable costs. ● Business Information is generally available to the enterprise, siloed only by security and governance. ● Data platforms make use of appropriate resources for hot vs. cold, raw vs. enhanced data. ● Data platforms are always available, redundant, always trying to achieve a RPO/RTO of zero.
  8. 8. Platform Challenge
  9. 9. 9 Data Platform Operations
  10. 10. Optimized Core enabled Business Modularity This process needs to be done in sequence. Otherwise we end up having to redo the work.
  11. 11. Business Silos Standardized Platform Optimized Core Business Modularity Phases of Business Modularity
  12. 12. How? Project Information Client Service Information Corporate Guides Collaborative Documents Assets & Files Corporate Assets Business Platform ● Curate framework of systems. ● Work with a vetted team of experts. ● Connect it all together. ● Focus on finding, analyzing, and acting on knowledge & communication towards business success.
  13. 13. Platform Playbook Platform Contexts Responsibilities Approach Framework Tools Approach Setup Training Administration Configuration Knowledge Framework Distributed Realtime Extendable Automated Monitored
  14. 14. Streamline. Organize. Unify.
  15. 15. Technology & Management Challenge
  16. 16. Current & Future State Current Tools & Issues ● RMQ, Redis, Mongo not scaling. ● C#, Java , Node etc. can’t do Big Data alone. ● Data Replication / Resiliency is difficult ● DevOps / DataOps ? Future Goals ● Scalable & Resilient Message Delivery ● Fault Tolerant Data Processing ● Real Time Data Storage & Retrieval ● Automatic Deployment & Upgrades ● Predictable , Scalable Growth for Platform ● Customer Satisfaction of Data Quality & Freshness Example : Cloud neutral global data & analytics platform. Technologies in evaluation.
  17. 17. Old Data & Analytics: ETL + Batch + Waiting Much of the current thinking is that the state of the systems in an enterprise are synchronous and that analysis must be done sequentially, iteratively from beginning to end in batch.
  18. 18. New Data & Analytics: Events + Current Data The growing trend in new thinking is that the state of the systems in an enterprise is dynamically asynchronous and that there is no “state” but everything is a stream of events.
  19. 19. New Data & Analytics : Streams + Queues + Bus Streams, Queues, Bus: These technologies have been around for a long time. What’s different today is that the customer demand for realtime is forcing it across the board.
  20. 20. Realtime Components
  21. 21. Cassandra + Spark + Kafka : Use Cases Image: http://manishsingh.net/post/lambda-architecture-with-kafka-spark-and-cassandra
  22. 22. Cassandra + Spark + Kafka : Use Cases Image: https://mesosphere.com/blog/kafka-dcos-tutorial/ 1. Lambda Architecture: Balances stream and batch processing for reliability. 2. Machine Learning : Delivering predictive, and descriptive analytics in real-time. 3. Master Data Management : Ensure that all the data is consistent all the time in all the systems. 4. Realtime Customer Experience : Customers are always informed, recommendations are made, etc. 5. Realtime Information Systems : Team members are always informed, etc.
  23. 23. Framework
  24. 24. Framework Components ● Major Components ○ Persistent Queues ( RAM/BUS) ○ Queue Processing & Compute ( CPU) ○ Persistent Storage (DISK/RAM) ○ Reporting Engine (Display) ○ Orchestration Framework (Motherboard) ○ Scheduler (Operating System) ● Strategies ○ Cloud Native on Google ○ Self-Managed Open Source ○ Self-Managed Commercial Source ○ Managed Commercial Source
  25. 25. 26 Framework
  26. 26. Approach
  27. 27. Approach 28
  28. 28. Context
  29. 29. Fast Data
  30. 30. General Awareness
  31. 31. Databases
  32. 32. C* Databases
  33. 33. Profession
  34. 34. Infrastructure
  35. 35. Sessions
  36. 36. 37 Framework - Core Concepts & Technology Engineering Data Pipelines Infrastructure Architecture Planning Operations Containers Architecture Planning Operations Orchestration Software Architecture Planning Operations Orchestration Automation DevOps DataOps
  37. 37. 38 Topics Data Pipeline Data Engineering Tools Apache Spark* Apache Kafka* Kubernetes/Docker/Helm Terraform/Ansible GitOps for Dev/DataOps Airflow Argo/Kubeflow
  38. 38. Sessions ● Presentation ○ Overview of how this fits into the larger picture ○ Concept of the topics ○ Any tour of the technology ● Discussion ○ Session - Two Way Q&A ○ Online - Slack ○ Offline - Slack / Email ● Assignment ○ Hands-on ○ Self-paced work to put into practice ○ Create portfolio items ○ Try out new technology Notes Slides ● Overview ● Concepts ● Tech Tour ● Q&A ● Online ● Offline Presentation Discussion Git Repo ● Design ● Engineering ● Automation Assignment

Editor's Notes

  • Challenge
    Large organizations and small businesses have the same problem.
    Large organizations need to integrate data first before they can extract value whether through business intelligence, analytics, or machine learning.
    Small organizations use online platforms to run their organizations (likely to use software that they have less control of their data)

  • Solution
    Enterprise consciousness - An eventually consistent enterprise where all data synchronized between applications back to a central translytical database.

  • Challenge
    Currently the components are broken up in to different vendors and parts.
    Similar to building a computer every time for every client.

  • Challenge
    Currently the components are broken up in to different vendors and parts.
    Similar to building a computer every time for every client.

×