A presentation given by Raghavan Sadagopan, Sr. Director from CapitalOne & Lakshmi Narayana, Sr. Lead Software Engineer from CapitalOne, at our 2024 Austin API Summit, March 12-13.
Session Description: Managing Risk is critical to the success of an organization. Managing Risks starts with identifying potential Risks which in the digital world are signals emanating from varying source systems. Identifying potential risks real-time enables organizations to mitigate / better prepare for potential exposures. The session will share our point of view on implementing an API centric event mesh architecture that routes events in real-time through a scalable and resilient cloud-native service on AWS.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Getting Better at Risk Management Using Event Driven Mesh Architecture - Raghavan Sadagopan & Lakshmi Narayana, CapitalOne
1. Public
Getting better at Risk Management
through Event Driven Mesh Architecture
Raghavan Sadagopan & Lakshmi Narayana
2. We recognized early on that the
winners in banking will be great
technology companies
Shifted to 100% agile model
for delivering software
Began externalizing
solutions developed
internally
Completed our move to the
public cloud
Modernized our data
ecosystem on the cloud
Began to modernize our architecture
with RESTful APIs
Became Open Source first
Declared we are all in on
the public cloud
2013
2014
2015
2016
2017
2020
2021
3. We Are
● Tech is central to everything we do at Capital One: for
nearly a decade, we have invested and invented like
the very best technology companies
● Our world-class, in-house technology team now
numbers 12,000+ people, most of which are
engineers
● We're all in on the cloud like no other bank out there,
which enables us to create exceptional, innovative
experiences for our customers
● Advanced, automated DevOps and CI/CD approaches
mean engineers spend more time at the top of the tech
stack building new and unique digital banking
experiences
4. Cloud Computing
In 2020, we left our data centers to
create exceptional banking experiences
for our customers, becoming the first
bank to go all-in on the public cloud
○ Going all in on the cloud—and embracing cloud-
native services like serverless computing—
has enabled instant provisioning of
infrastructure and rapid innovation
○ Today, we're using real-time, streaming data at scale,
machine learning, and the power of the cloud to solve
unique, challenging technology problems and deliver
intelligent, personalized solutions that benefit
millions of customers
○ Capital One Shopping,
built in the cloud with
microservices
architecture and
streaming data, helps
customers save money
shopping online by
automatically finding
lower prices, coupons
and online credits
5. Open Source Software
Capital One made an “open source first”
declaration in 2014 and that’s when we made our
first contributions to the open source community.
○ We sponsor FINOS, Python, Continuous Delivery and the Cloud
Native Computing Foundations to help keep open source
sustainable
○ Capital One’s contributions to the open source community
have been significant and we've released more than 40 of
our own software projects
○ We’ve invested for years to build the culture and governance
required to be open source-first in a highly regulated industry
Featured Open
Source Projects:
Data Profiler,
Rubicon-ML and
Hygieia
6. Want to learn more?
● Want to learn more about our Tech? Check out Capital One
Tech to find out more about enterprise software solutions,
ideas and stories.
● At Capital One, we celebrate and honor the differences that
makes us all unique- inside and outside of work. Help us
create a more equitable future for all! Join us! Visit Capital
One Careers to view our open roles.
● Follow us on Twitter at CapitalOneTech
7. • Basics of Risk Management
• Event Mesh
• Risk Management use cases
• Point of view Architecture
Agenda
8. Public
What is a Risk Management?
Risk management is the process of identifying, assessing and controlling threats to an
organization's capital, earnings and operations. These risks stem from a variety of sources,
including financial uncertainties, legal liabilities, technology issues, strategic management errors,
accidents and natural disasters.
A successful risk management program helps an organization consider the full range of risks it
faces. Risk management also examines the relationship between different types of business risks
and the cascading impact they could have on an organization's strategic goals.
Source: Blog from TechTarget.com
9. Public
Some basics of Risk Management as defined by the International
Organization for Standardization (ISO)
Objective
result to be achieved
Organization
person or group of people that has its
own functions with responsibilities,
authorities and relationships to
achieve its objectives
Risk
effect of uncertainty on objectives
Risk Management
coordinated activities to direct and
control an organization with regard to
risk
Source: Risk management - Vocabulary (ISO 31073:2022)
10. Public
The first step to get better at Risk Management is to get better at
identifying the risks
Identify sources of the risk, areas of impact, events (including changes in circumstances) and their
causes and potential consequences.
In identifying the risk, consider these kinds of questions:
• What could happen?
• How could it happen?
• Where could it happen?
• Why might it happen?
• What might be the impact?
11. Public
What is an Event-Driven Mesh Architecture(EDMA)?
An event mesh is a communication layer
that enables the seamless exchange of
events (data/messages) between
various applications, services, and
systems in a distributed and decoupled
manner. The primary purpose of an
event mesh is to simplify and optimize
event-driven communication within a
complex ecosystem of applications.
12. Public
Using an event mesh for risk management can enhance the real-time
monitoring, analysis, and response to potential risks within an
organization
Event Sources
Identify various sources of events within your
organization that may be related to risks.
These sources can include financial data,
security logs, market data, IoT devices,
customer interactions, and more.
Event Routing and Processing
Implement an event mesh to efficiently route
events from different sources to the
appropriate risk management applications and
services. The event mesh can handle the
complex routing logic based on predefined
rules.
Real-time Monitoring
Use the event mesh to provide real-time
monitoring capabilities. Events related to
potential risks can be continuously streamed
to risk monitoring dashboards and analytics
systems.
Risk Detection
Implement risk detection algorithms and
models that analyze incoming events in real-
time. These algorithms can identify patterns,
anomalies, and indicators of potential risks.
The event mesh ensures that relevant events
are delivered promptly to the detection
systems.
Alerting and Notification
Set up alerting mechanisms within your event
mesh to notify risk managers or relevant
stakeholders when a potential risk is detected.
Alerts can be sent via various channels, such
as email, SMS, or integration with
collaboration tools.
Compliance and Reporting
Use the event mesh to facilitate compliance
monitoring and reporting. It can capture and
log events relevant to compliance
requirements, making it easier to demonstrate
adherence to regulations.
16. Public
Building Scalable & High Resiliency Architecture
Serverless
Technologies
Multi
Availability
Zone (AZ)
deployment
Multi Region
Active /
Active mode
Enhanced
Monitoring
17. Public
Benefits of Event Driven Architecture
Scalability
Fault Tolerance
Loose Coupling Modularity
Responsiveness Reduced Cost
18. Public
Best Practices of Event Driven Architecture
Security Measures
Event Contracts Event Sourcing Observability
Documentation
Governance Process
https://www.iso.org/obp/ui/en/#iso:std:iso:31073:ed-1:v1:en
3.1.2
objective
result to be achieved
Note 1 to entry: An objective can be strategic, tactical or operational.
Note 2 to entry: Objectives can relate to different disciplines (such as financial, health and safety, and environmental goals) and can apply at different levels (such as strategic, organization-wide, project, product and process).
Note 3 to entry: An objective can be expressed in other ways, e.g. as an intended outcome, a purpose, an operational criterion, as a management system objective, or by the use of other words with similar meaning (e.g. aim, goal, target).
3.1.1
risk
effect of uncertainty (3.1.3) on objectives (3.1.2)
Note 1 to entry: An effect is a deviation from the expected. It can be positive, negative or both, and can address, create or result in opportunities (3.3.23) and threats (3.3.13).
Note 2 to entry: Objectives can have different aspects and categories, and can be applied at different levels.
Note 3 to entry: Risk is usually expressed in terms of risk sources (3.3.10), potential events (3.3.11), their consequences (3.3.18) and their likelihood (3.3.16).
3.2.1
risk management
coordinated activities to direct and control an organization (3.3.7) with regard to risk (3.1.1)
Here's the image representing an event mesh for risk management in a futuristic and abstract style. It features a complex network of interconnected nodes and pathways, with neon colors highlighting the information flow and decision-making processes, set against a backdrop of cityscapes and corporate buildings. This visual captures the advanced technology and analytical insight associated with event mesh in risk management.
Thanks Raghavan! As we saw in previous slides, Capital One is leveraging AWS to build modern applications.
Any Event driven architecture typically consists of Event Producers, Event Routers and Event Consumers.
A producer publishes an event to the router, which filters and pushes the events to consumers. Producer services and consumer services are decoupled, which allows them to be scaled, updated, and deployed independently.
Implementing event driven using serverless technologies.
Since we are talking about Risk Management and it’s a very complex system with several different types of users, system integrations, notifications , complex approval flow in each component.
Here is the one of the ways to leverage modern serverless event driven mode using AWS event bridge as the router:
Event Producers: Can be Risk Officers —> whose job is identifying Risk, Assessing Risk -> When an risk is identified, in the complex risk management eco system, we would like to have
Target systems notified in certain conditions so that the systems act in a decoupled and reactive way.
Rather than consumers polling data in a traditional way, now the events have become more reactive and leveraging Event bridge rules simplifies lot of boiler code and makes it easy to configure and scale the targets.
IN future if we want to add a new system that listens to the High risk events —> we can pretty quick add the integration with low code solution.
Serverless workloads like Fargate/ Lambdas/ Event Bridge inherently scale and handle failures gracefully. They are cost efficient, Zero server management, high availability and high performance. Global Endpoints by event bridge helps in resiliency as it maintains the replication and provides replay mechanism for our consumers
Next One is Muti AZ Deployment - In an event of AZ1 failure - our application will be still operating. In case of Database write instance failure - AWS will takes care of promoting available Reader instance to master automatically.
Multi Region Deployment - Active Active mode - We have deployed our infra in 2 regions and the compute layer here is taking traffic on both regions making it high availability.
Enhanced Monitoring:
The success of an application greatly relies on the importance of effective application monitoring.
We use AWS CloudWatch - For application logs, Container insights
AWS X-ray for distributed tracing w/ underlying services
And Some external monitoring solutions.
Given the critical nature of platforms, we conducted thorough and regular performance testing, adapting our infrastructure configurations to meet the specific requirements of our tenants.
**Scalability:** EDA allows for horizontal scaling, as components can be independently scaled based on their event processing needs. This makes it easier to manage resource allocation and handle varying loads.
**Loose Coupling:** Components are independent, reducing interdependence and improving system flexibility. This loose coupling facilitates easier changes, updates, and maintenance of individual components without impacting others. Simplified Communication.
Modularity: The modular nature of EDA allows teams to develop, deploy, and update components independently, leading to faster iteration and innovation.
**Resilience and Fault Tolerance:** The decoupled nature of EDA means that the failure of one component does not directly impact others. This isolation helps improve the overall system's resilience and fault tolerance.
**Real-time Responsiveness:** EDA enables systems to respond immediately to events as they occur, making it ideal for applications that require real-time processing, such as fraud detection, IoT systems, and user interaction scenarios.Asynchronous Processing
. Cost Reduction: Efficient Resource Utilization, Reduced Development and Maintenance Costs, Reduced Integration Costs, Automated Scaling
Define Clear Event Contracts:** Establish well-defined contracts for events, including their structure, format, and metadata. This ensures consistency and interoperability across different components and services.
Event Sourcing: Store events as a source of truth for system state.
Monitoring and Logging: Implement robust monitoring and logging for effective debugging.
Security Measures: Use encryption and authentication mechanisms to secure event communication.
Documentation: Maintain comprehensive documentation for understanding event flows and system architecture