TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Decision platform strategies
1. June 2013
REAL-TIME DECISION STRATEGIES
Risk & Compliance Engineering, PayPal
Philip Wright
This deck contains generic architecture information, and does not
reflect the exact details of current or planned systems.
3. Confidential and Proprietary3
PROBLEM STATEMENT
In a connected and fast changing online world, business’ are seeking new strategies to
deliver the best customer experiences possible. A key component of these strategies is
improved decision making. An optimal business decision making strategy might be
measured against the following criteria:
• Fast – decisions are delivered quickly
• Accurate – correct decisions are generated
• Available – decisions are available when needed
Problem: It is challenging and costly to develop a single solution that meets all
business goals.
Goal: Explore major trade-offs that are typically considered when designing
architectures for real-time decisions.
4. Confidential and Proprietary4
ANALYTICS
Data is the input to the decision making process. Optimal decisions are obtained by
having the right data for the problem domain. Data is often derived from domain
specific sources such as a business transactions or processes.
Determining what data should be used in a decision is often the result of statistical
research and analytical processes that narrow down the key attributes in a data set to
those that are most deterministic.
A decision system may process the data into attributes or variables optimized for
decision making by aggregating, grouping or summarizing using statistical methods or
mathematical formulae.
Data processing may be performed offline or in real-time as needed.
5. Confidential and Proprietary5
DATA SET SIZE
Better quality decisions can be generated from larger, richer data sets. Finding new
more predictive variables is critical to many business strategies and that process often
starts by collecting more data. Data sets in some business domains can easily exceed
a petabytes in size.
Data Set Size
6. Confidential and Proprietary6
THE NEED FOR SPEED
Business advantage can be achieved through faster decisions. Making decisions
quickly sometimes requires pre-processing a decision in batch before it is needed. This
is how many systems are able to deliver very fast results.
7. Confidential and Proprietary7
ACCURACY
Pre-calculated results can only factor in to the decision the data that was available
when the pre-processing was performed. Decisions using this data could be made
minutes, hours or days later.
If there is any significant change in the data after pre-processing, a decision can be
made on out-of-date or stale data and could result in a poor business outcome or bad
customer experience.
8. Confidential and Proprietary8
REAL-TIME PROCESSING
Processing data in real-time can significantly improve the accuracy of decisions, since
the system can factor in the most recent changes in data. Systems such as those used
by many financial institutions, banks and the stock markets rely heavily on real-time
data.
9. Confidential and Proprietary9
REAL-TIME PROCESSING
Processing large amounts of data in real-time to generate a decision can be costly in
terms of hardware infrastructure needed to execute hundreds or even thousands of
database queries. These systems typically do not scale linearly and can add significant
complexity and latency to the decision system.
10. Confidential and Proprietary10
SPEED VS ACCURACY
It is challenging to optimize for both speed and accuracy. When a system is optimized
for speed, processing time needs to be kept to a minimum. Reducing the input data to
the decision process is one common way of minimizing processing time, but this has
the potential of generating less accurate decisions than could otherwise be delivered.
When a system is optimized for accuracy it usually requires more data and complex
processing which takes more time to generate the result. A trade-off must be found that
balances speed and accuracy based on the use-case.
Data Set Size
11. Confidential and Proprietary11
AVAILABILITY
The constant quest for richer data typically requires additional hardware infrastructure
and introduces new dependencies to the system which will reduce the overall
availability of the decision solution, sometimes defeating the benefits of adding the new
data source since a more accurate decision isn’t useful if it can’t be generated when
needed.
12. Confidential and Proprietary12
EXAMPLE
We can Combine best in class strategies to deliver fast, accurate and reliable
decision infrastructure that can support diverse business solutions.
Consider a theoretical business decision system with the following
requirements:
Requirement Goal Priority
Average Response Time <= 100ms 1
Decision Accuracy >= 99% 1
Decision Availability >= 99.99% 2
Here greater business priority has been placed on response time and
accuracy.
13. Confidential and Proprietary13
TRADE-OFFS
Accuracy
Use real-time data for
accuracy/correctness
Optimize the data set size
according to strategy
Reduce system dependencies
Pre-calculate and cache decisions
whenever possible
Expand and optimize
data sets for deep
analytics
Pre-calculate and cache decisions
whenever possible
An optimal decision strategy may consider these trade-offs:
?
14. Confidential and Proprietary14
DECISION OPTIMIZATION
Breaking a decision response down into its components may reveal properties that
allow us to design a system that can be further optimized. For example, consider a
system that generates decisions that fall into 3 broad segments, ‘Yes’, ‘No’ and ‘Not
sure’. The ‘Yes’ and ‘No’ segments contain decisions with a high degree of certainty
that can be made quickly using mostly pre-calculated data.
The ‘Not sure’ segment contains decisions that are neither clear ‘Yes’ or clear ‘No’
decisions. To minimize the risks of an incorrect decision, we may want to force all
decisions into the ‘Yes’ or ‘No’ segments, to do this we may need to perform further
processing on the ‘Not sure’ segment.
Decisions that fall between Yes
and No
No decisionsYes decisions
Not sure NoYes
Decisions
15. Confidential and Proprietary15
LAYERED STRATEGY
One approach to solving this problem would be through a layered decision strategy.
Such a solution may combine these components:
• A high performance tier that focuses on delivering highly available, fast decisions
that cover 80% of business requests with limited data requirements and minimal
system dependencies that is capable of delivering 99% correct decisions in an
average response time of 50ms.
• A deep analytics decision tier leveraging a larger richer data set for processing the
remaining 20% of requests that is able to deliver 99% correct decisions in an
average response time of 300ms.
When combined, the systems should deliver the business goals of 99% availability,
100ms average response time with a decision accuracy of 99% or greater.
16. Confidential and Proprietary16
FAST AND HIGHLY AVAILABLE
Continuing the hybrid concept, it may be feasible to design a system that supports fast
decisions which are 99% accurate for 80% of the requests entering the system. This
solution could leverage cached pre-calculated data to satisfy the 50ms response time
with 99.99% availability.
Client
Offline
Simple Decision Logic
Pre-calculated Data
Fast Decisions
Highly Available
Cached
Data
System Characteristics
Availability* >= 99.99%
Response
Time*
<= 50ms
Accuracy >= 99%
* Average
Average cache
response time 40ms
Availability 99.99%
17. Confidential and Proprietary17
IMPROVED ACCURACY
To meet the accuracy requirements of our system we need to include more data and
analytics processing for 20% of the requests entering the system. To do this we may
choose to add real-time and external data sources. The system will be more accurate
as a result, but the availability of the system will be lowered to 95% due to the increase
in the number of components and the response time will go up to 300ms due to the
external data access.
Rich Analytics
Advanced Decision Logic
Real-time Data
Data
Client
Data Data
External
Data
System Characteristics
Availability* >= 95%
Response
Time*
<= 300ms
Accuracy >= 99%
* Average
Average service
response time 300ms
Average DB response time 200msDB availability 95%
Average Response Time: 300msAvailability: < 95%
Service availability
99.99%
18. Confidential and Proprietary18
LAYERED SOLUTION DESIGN
Rich Analytics
Advanced Decision Logic
Real-time Data
Data
Client
Offline
Average Response Time: 50ms
Simple Decision Logic
Pre-calculated Data
Fast Decisions
Highly Available
Data Data
Cached
Data
External
Data
Average Response Time: 300ms
Availability: > 99.99%
Availability: > 95%
* Average
Combined Characteristics
Availability* >= 99%
Response
Time*
<= 100ms
Accuracy >= 99%
Average Response Time: 100msAvailability to client: > 99%
19. Confidential and Proprietary19
CONCLUSION
It is often necessary to make trade-offs when designing a system to meet strict
performance and availability goals. As we have seen with this theoretical example, with
some research into the problem domain, solutions can be found that solve the most
critical business problems without major compromises of key requirements.
In this example we split the solution into two distinct components, each focused on
solving a specific part of the business problem. By layering a solution in this way we
were able to trade-off 1% availability for improved average response time and
decisions with higher accuracy. Fully analyzing the business requirements will give the
designer the greatest flexibility and the appropriate basis for making sound trade-offs
that work for the problem domain.
Editor's Notes
Philip Wright is Directorof Architecture at Paypal and has contributed to strategy and development of major Risk management platforms and solutions at ebay/Paypal since 2005.