Developing Enterprise
Consciousness
Global Data & Analytics Platforms on Cloud Neutral Systems
Rahul Xavier Singh Anant Corporation
Confidential - Not to be Distributed Except to Pre-Approved Audiences. Copyright Anant 2022
Data & Analytics platforms
are the central nervous
system of business
platforms. Every business,
small or large has a need
to get connected.
Challenge
Business
Platform
Playbook
Framework
Approach
Technology
Management
Solutions
Data Engineering / Operations
Cloud Engineering / Operations
Implementation
Business Challenge
Platform Thinking
Enterprise Consciousness
People, Processes,
Information, and
Systems are
connected and in
sync.
7
Beyond 12 Factor … “Enterprise Consciousness”
● Current Business Information is available to the user / customer in
the swiftest way possible within the bounds of reasonable costs.
● Business Information is generally available to the enterprise, siloed
only by security and governance.
● Data platforms make use of appropriate resources for hot vs. cold,
raw vs. enhanced data.
● Data platforms are always available, redundant, always trying to
achieve a RPO/RTO of zero.
Platform Challenge
9
Data Platform Operations
Optimized Core enabled Business Modularity
This process needs to
be done in sequence.
Otherwise we end up
having to redo the
work.
Business Silos
Standardized
Platform Optimized Core
Business
Modularity
Phases of Business Modularity
How?
Project
Information
Client Service
Information
Corporate
Guides
Collaborative
Documents
Assets
& Files
Corporate
Assets
Business Platform
● Curate framework of
systems.
● Work with a vetted team
of experts.
● Connect it all together.
● Focus on finding,
analyzing, and acting on
knowledge &
communication towards
business success.
Platform Playbook
Platform
Contexts
Responsibilities
Approach
Framework
Tools
Approach
Setup
Training
Administration
Configuration
Knowledge
Framework
Distributed
Realtime
Extendable
Automated
Monitored
Streamline. Organize. Unify.
Technology &
Management Challenge
Current & Future State
Current Tools & Issues
● RMQ, Redis, Mongo not scaling.
● C#, Java , Node etc. can’t do Big Data alone.
● Data Replication / Resiliency is difficult
● DevOps / DataOps ?
Future Goals
● Scalable & Resilient Message Delivery
● Fault Tolerant Data Processing
● Real Time Data Storage & Retrieval
● Automatic Deployment & Upgrades
● Predictable , Scalable Growth for Platform
● Customer Satisfaction of Data Quality & Freshness
Example : Cloud neutral global data & analytics platform.
Technologies in evaluation.
Old Data & Analytics: ETL + Batch + Waiting
Much of the current thinking is that the state of the systems
in an enterprise are synchronous and that analysis must be
done sequentially, iteratively from beginning to end in batch.
New Data & Analytics: Events + Current Data
The growing trend in new thinking is that the state of the
systems in an enterprise is dynamically asynchronous and
that there is no “state” but everything is a stream of events.
New Data & Analytics : Streams + Queues + Bus
Streams, Queues, Bus: These technologies have been
around for a long time. What’s different today is that the
customer demand for realtime is forcing it across the board.
Realtime Components
Cassandra + Spark + Kafka : Use Cases
Image: http://manishsingh.net/post/lambda-architecture-with-kafka-spark-and-cassandra
Cassandra + Spark + Kafka : Use Cases
Image: https://mesosphere.com/blog/kafka-dcos-tutorial/
1. Lambda Architecture: Balances stream
and batch processing for reliability.
2. Machine Learning : Delivering predictive,
and descriptive analytics in real-time.
3. Master Data Management : Ensure that all
the data is consistent all the time in all the
systems.
4. Realtime Customer Experience :
Customers are always informed,
recommendations are made, etc.
5. Realtime Information Systems : Team
members are always informed, etc.
Framework
Framework Components
● Major Components
○ Persistent Queues ( RAM/BUS)
○ Queue Processing & Compute ( CPU)
○ Persistent Storage (DISK/RAM)
○ Reporting Engine (Display)
○ Orchestration Framework (Motherboard)
○ Scheduler (Operating System)
● Strategies
○ Cloud Native on Google
○ Self-Managed Open Source
○ Self-Managed Commercial Source
○ Managed Commercial Source
26
Framework
Approach
Approach
28
Context
Fast Data
General Awareness
Databases
C* Databases
Profession
Infrastructure
Sessions
37
Framework - Core Concepts & Technology
Engineering
Data Pipelines
Infrastructure
Architecture
Planning
Operations
Containers
Architecture
Planning
Operations
Orchestration
Software
Architecture
Planning
Operations
Orchestration
Automation
DevOps
DataOps
38
Topics
Data Pipeline
Data Engineering Tools
Apache Spark*
Apache Kafka*
Kubernetes/Docker/Helm
Terraform/Ansible
GitOps for Dev/DataOps
Airflow
Argo/Kubeflow
Sessions ● Presentation
○ Overview of how this fits into the larger
picture
○ Concept of the topics
○ Any tour of the technology
● Discussion
○ Session - Two Way Q&A
○ Online - Slack
○ Offline - Slack / Email
● Assignment
○ Hands-on
○ Self-paced work to put into practice
○ Create portfolio items
○ Try out new technology
Notes
Slides
● Overview
● Concepts
● Tech Tour
● Q&A
● Online
● Offline
Presentation Discussion
Git Repo
● Design
● Engineering
● Automation
Assignment

Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness

Editor's Notes

  • #6 Challenge Large organizations and small businesses have the same problem. Large organizations need to integrate data first before they can extract value whether through business intelligence, analytics, or machine learning. Small organizations use online platforms to run their organizations (likely to use software that they have less control of their data)
  • #8  Solution Enterprise consciousness - An eventually consistent enterprise where all data synchronized between applications back to a central translytical database.
  • #10 Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  • #26 Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.