More Related Content
Similar to Risk managementusinghadoop
Similar to Risk managementusinghadoop (20)
Risk managementusinghadoop
- 2. Capital Markets Risk Management
And Hadoop
Kevin Samborn and
Nitin Agrawal
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 2
- 5. What is Risk Management
• Risk is a tool – the goal is to optimize and understand risk
o Too much risk is locally and systemically dangerous
o Too little risk means the firm may be “leaving profit on the table”
• Portfolio exposure
o Modern portfolios contain many different types of assets
o Simple instruments, Complex instruments and derivatives
• Many types of risk measures
o Defined scenario-based stress testing
o Value at Risk (VaR)
o “Sensitivities”
• Key is valuation under different scenarios
• VaR is used in banking regulations, margin calculations and risk
management
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 6
- 6. Value at Risk (VaR)
• VaR is a statistical measure of risk – expressed as amount of loss given
probability %. E.g. 97.5% chance that the firm will not lose more than 1mill
USD over the next 5 days
• Computing VaR is a challenging data sourcing and compute intensive process
• VaR calculation:
o Generate statistical scenarios of market behavior
o Revalue the portfolio for each scenario, compare returns to today’s value
o Sort results and select the desired percentage return: VALUE AT RISK
• Different VaR techniques:
o Parametric – analytic approximation
o Historical – captures real (historical) market dynamics
o Monte Carlo – many scenarios, depends on statistical distributions
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 7
- 7. VaR Graphically
Source: An Introduction To Value at Risk (VAR), Investopedia, May 2010
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 8
- 8. Complexities
• For modern financial firms, VaR is complex. Calculation requirements:
o Different types of assets require different valuation models
• Risk-based approach
• Full revaluation
o With large numbers of scenarios, many thousands of calculations are required
o Monte Carlo simulations require significant calibration, depending on large historical
data
• Many different reporting dimensions
o VaR is not additive across dimensions. Product/asset class, Currency
o Portfolio – including “what-if” and intraday activity
• Intraday market changes requiring new simulations
• Incremental VaR – how does a single (new) trade contribute to the total
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 9
- 11. Hadoop Core
• Data stored with REDUNDANCY on a • Provides an EASY ABSTRACTION for
Distributed File System processing large data sets
• Abstracts H/W FAILURES delivering a • Infrastructure for PARALLEL DATA
highly-available service on PROCESSING across huge
COMMODITY H/W Commodity cluster
• SCALES-UP from single to thousands • Infrastructure for TASK and LOAD
of nodes MANAGEMENT
• Data stored WITHOUT A SCHEMA • Framework achieves DATA-PROCESS
• Tuned for SEQUENTIAL DATA ACCESS LOCALITY
Makes two critical assumptions though:
• Data doesn’t need to be updated
• Data doesn’t need to be accessed randomly
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 12
- 12. A Simple Map Reduce Job
Problem Statement: From historical price data, create frequency distribution of 1-day %age change
for various stocks
Stock Date Open Close BP|1, 33
Map 1 S Reduce 1
BP 23-Nov 435.25 435.5
O BP|2, 64
NXT 23-Nov 3598 3620
R …
MKS 23-Nov 378.5 380.7
BP 22-Nov 434.8 433.6 T
NXT 22-Nov 3579 3603 Map 2 / Reduce 2 NXT|81, 2
MKS 22-Nov 377.8 378 S NXT|-20, 5
BP 21-Nov 430.75 433 H
NXT 21-Nov 3574 3582 …
U
MKS 21-Nov 375 376
F Reduce 3 Output3
BP 20-Nov 430.9 432.25
F
NXT 20-Nov 3592 3600
MKS 20-Nov 373.7 375.3 Map M L
BP 19-Nov 422.5 431.6 E
NXT 19-Nov 3560 3600
MKS 19-Nov 368.5 372.6 Reduce N Output N
BP 16-Nov 423.9 416.6
NXT 16-Nov 3575 3542
MKS 16-Nov 370.3 366.4
BP
public void reduce(Text key, Iterable<IntWritable> values,
15-Nov public void map(LongWritable key, Text value, Context
422 425.4
NXT Context context) throws IOException, InterruptedException {
15-Nov 3596 3550
context) throwsLong> freqDist InterruptedException {
Map<Integer, IOException, = buildFreqDistribution(values);
MKS 15-Nov 376.5 370.6
SecurityAttributes sa =
Set<Integer> percentChanges = freqDist.keySet();
RecordsReadHelper.readAttribs(value.toString());
for (Integer percentChange : percentChanges) {
context.write(new Text(sa.getTicker()), + "|" + percentChange.toString()),
context.write(new Text(key.toString()
new IntWritable(sa.getPercentChange()));
new LongWritable(freqDist.get(percentChange)));
} } © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 13
- 13. Hadoop Ecosystem | How/Where These Fit
VISUALIZATION TOOLS
USERS DATA WAREHOUSE
PROCESSING
Sqoop Zoo
hiho Keeper
Scribe
HUE
Flume
LOAD STORAGE SUPPORT
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 14
- 15. Monte Carlo VaR
2 Steps
IBM … …
MSFT … … HLV1 = (∑AiVi) 1
IBM.CO … … HLV2 = (∑AiVi) 2
… … … … …
…
V1 V2 V3 V10,000 HLV10k= (∑AiVi) 10k
Aggregation
SIMULATION Aggregation
AGGREGATION
Challenges
Daily trade data could be massive
Valuations are Compute intensive
VaR is not a simple arithmetic sum across hierarchies
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 16
- 16. IBM ……
MSFT ……
IBM.CO ……
…… ……
Simulation Step - MapReduce V1 V2 V3
SIMULATION
MAP REDUCE
- Read-through portfolio data - For the Underlyer, perform 10k random
- Emit (K,V) as walks in parallel
(Underlyer,InstrumentDetails) - For each random walk output, simulate
e.g. (IBM, IBM.CO.DEC14.225) derivative prices
- Emit 10k sets of simulated prices of the
stock and associated derivatives i.e.
IBM , [V1, V2, …..V10000]
IBM.CO.DEC14.225 , [V1, V2, …..V10000]
Job job = new Job(getConf());
SecurityAttributes stockAttrib = (SecurityAttributes) iter.next();
job.setJobName("RandomValuationGenerator");
simPricesStock = getSimPricesForStock(stockAttrib);
job.setMapperClass(SecurityAttributeMapper.class);
writeReducerOutput(stockAttrib, simPricesStock, context);
job.setReducerClass(PriceSimulationsReducer.class);
…
public void BlackScholesMertonPricingOption(); Context context) throws IOException,
bsmp = new map(LongWritable key, Text value,
InterruptedException { {
while (iter.hasNext())
SecurityAttributes sa secAttribs = iter.next();
SecurityAttributes = RecordsReadHelper.readAttribs(value.toString());
writeReducerOutput(secAttribs,getSimPricesForOptions(
context.write(new Text(sa.getUnderlyer()), sa);
} simPricesStock, bsmp, secAttribs), context);
}
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 17
- 17. HLV1 =
… iVi)=
HLV2
(∑A 1
Aggregation Step MapReduce (∑AiVi) 2
…Aggregation
Aggregation
MAP REDUCE
- Read-through de-normalized • For the hierarchy level (e.g. US|ERIC),
portfolio data perform ∑AiVi for each simulation and
- Emit (K,V) as (Hierarchy-level, get simulated portfolio values - HLVi
Position Details) • Sort HLVi , find 1%, 5% and 10% values
US , [IBM, 225, 191.23] and emit position and VaR data
US|Tech , [IBM, 400, 191.23]
US|Tech|Eric , [IBM, 400, 191.23]
Map<String, Double> portfolioPositionData = combineInputForPFPositionData(rows);
Map<String, Double[]> simulatedPrices=
protected void map(LongWritable key, HoldingWritable value, Context context)
loadSimulatedPrices(portfolioPositionData.keySet());
throws java.io.IOException ,InterruptedException {
for(long i=0; i<NO_OF_SIMULATIONS-1; i++) {
SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());
simulatedPFValues.add(getPFSimulatedValue(i,
Set<String> hierarchyLevels = sa.getHierarchyLevels();
portfolioPositionData, simulatedPrices)); }
for (String hierarchyLevel : hierarchyLevels) {
Collections.sort(simulatedPFValues);
context.write(new Text(hierarchyLevel), new
Text(sa.getPositionDtls()));simulatedPFValues);
emitResults(portfolioPositionData,
} © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 18
- 18. DEMO RUN
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 21
- 19. Observations
• As expected, processing time of Map jobs increased marginally
when input data volume was increased
• Process was IO-bound on Simulation’s Reduce job as
intermediate data emitted was huge
• Data replication factor needs to be chosen carefully
• MapReduce jobs should be designed such that Map/Reduce
output is not huge
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 22
- 22. Appendix
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 25
- 25. Let’s build a Simple Map Reduce Job
Problem Statement: Across a huge set of documents, we need to find all locations (i.e.
document, page, line) for all words having more than 10 characters.
D
A
T
A
N
O
D
E
2 STORAGE
D
A
T
A
N
O
D
E
1
Store Map
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 28