This document summarizes a presentation on using MapReduce and Hadoop for financial risk exposure calculation. It discusses splitting the calculation by trade or by scenario to distribute the work across multiple machines. Managing data locality, reliability, and fault tolerance are challenges when aggregating results from many simulations run in parallel. The Datasynapse GridServer platform is proposed for scheduling jobs across engines to improve performance and scalability for these types of large-scale financial risk calculations.
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Cisco Canada
IP SLA is a Cisco IOS feature available today to actively and proactively measure and report many network metrics. It is easy to use, and is supported by many existing network management applications.
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Introduction to Network Performance Measurement with Cisco IOS IP Service Lev...Cisco Canada
IP SLA is a Cisco IOS feature available today to actively and proactively measure and report many network metrics. It is easy to use, and is supported by many existing network management applications.
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Pragmatic Analytics - Case Studies of High Performance Computing for Better B...Eoin Brazil
SCIENCE FOUNDATION IRELAND DIGITAL CONTENT WORKSHOP
Monday, July 25th 2011, Guinness Storehouse, Dublin
Session 4 - Data Analytics, Mining and Visualisation
Dr Eoin Brazil, Senior Software Developer and Tech Transfer Manager, Irish Centre for High End Computing (NUIG)
Pragmatic Analytics - Case Studies of High Performance Computing for Better Business and Big Data.
Self-Adaptive SLA-Driven Capacity Management for Internet ServicesBruno Abrahao
This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we
present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA
contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in
real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due
to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations
for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor
sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from
the performance model approximations.
The CIBER / CA partnership & Why CIBER is moving to Nimsoft MonitorCA Nimsoft
Tony Testa's n•fluence 2012 presentation, outlining CIBER's choice of Nimsoft Monitor.
For more information, visit: http://www.nimsoft.com/solutions/nimsoft-monitor.html.
In this talk, Yuri Ardulov, Principal System Architect at RingCentral will share how to use Kapacitor with the Kapacitor Manager that they built at RingCentral.
IBM Systems Technical Symposium Melbourne, 2015Filipe Miranda
IBM Systems Technical Symposium Melbourne, 2015 - this slide deck will cover z IBM Systems and IBM Power Systems news from Red Hat. This is a technical deck that shows examples of how to exploit LUN auto scanning when using FCP with NPIV and CPACF cryptography. As for Power Systems it covers RHEV for Power and RHEL LE for Power Systems.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/
For more information about edge AI and vision, please visit:
http://www.edge-ai-vision.com
Christine Cheng, co-chair of the inference benchmark working group at MLPerf and a senior machine learning optimization engineer at Intel, delivers the presentation “MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Cheng explains how MLPerf’s inference benchmark suite for evaluating processor performance works and is evolving.
Performing Simulation-Based, Real-time Decision Making with Cloud HPCinside-BigData.com
Zach Smocha from Rescale presented this deck at the HPC User Forum in Tucson.
Watch the video presentation: http://wp.me/p3RLHQ-fdC
Learn more: http://www.rescale.com/
and
http://hpcuserforum.com
"ZYNQ-7000 High Performance Electric Drive and Silicon Carbide Multilevel inverter with Scilab Hardware-in-the-loop"
By Giulio Corradi, Xilinx for ScilabTEC 2015
Red Hat Enterprise Linux: The web performance leaderJoanne El Chah
This reports illustrates how Red Hat Enterprise Linux beats Microsoft Windows as the best platform for enterprise web applications. Superior performance, coupled with significant cost savings derived from the ability to leverage the latest multicore x86 hardware as well as virtualization, adds up to competitive advantage.
Model based transaction-aware cloud resources management case study and met...Leonid Grinshpan, Ph.D.
The presentation introduces a method of cloud resources allocation to enterprise applications (EA) depending on business transaction metrics. The approach is using queuing models; it was devised while working on a real-life EA capacity planning project requested by one of the Oracle customers. An implementation of a proposed solution brought a number of database servers from 40 to 21 without compromising transaction times.
The presentation describes components of proposed methodology: building application’s queuing model, obtaining input data for modeling (workload characterization and transaction profile), solving model and analyzing what-if scenarios. The presentation compares ways and means of collecting input data; it identifies instrumentation of software at its development stage as an ultimate solution and encourages research of technologies delivering instrumented EAs.
Takeaway: model-based transaction-aware cloud resources management significantly improves cloud profitability by minimizing a number of hardware servers hosting applications while delivering required service level.
Pragmatic Analytics - Case Studies of High Performance Computing for Better B...Eoin Brazil
SCIENCE FOUNDATION IRELAND DIGITAL CONTENT WORKSHOP
Monday, July 25th 2011, Guinness Storehouse, Dublin
Session 4 - Data Analytics, Mining and Visualisation
Dr Eoin Brazil, Senior Software Developer and Tech Transfer Manager, Irish Centre for High End Computing (NUIG)
Pragmatic Analytics - Case Studies of High Performance Computing for Better Business and Big Data.
Self-Adaptive SLA-Driven Capacity Management for Internet ServicesBruno Abrahao
This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we
present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA
contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in
real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due
to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations
for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor
sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from
the performance model approximations.
The CIBER / CA partnership & Why CIBER is moving to Nimsoft MonitorCA Nimsoft
Tony Testa's n•fluence 2012 presentation, outlining CIBER's choice of Nimsoft Monitor.
For more information, visit: http://www.nimsoft.com/solutions/nimsoft-monitor.html.
In this talk, Yuri Ardulov, Principal System Architect at RingCentral will share how to use Kapacitor with the Kapacitor Manager that they built at RingCentral.
IBM Systems Technical Symposium Melbourne, 2015Filipe Miranda
IBM Systems Technical Symposium Melbourne, 2015 - this slide deck will cover z IBM Systems and IBM Power Systems news from Red Hat. This is a technical deck that shows examples of how to exploit LUN auto scanning when using FCP with NPIV and CPACF cryptography. As for Power Systems it covers RHEV for Power and RHEL LE for Power Systems.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/
For more information about edge AI and vision, please visit:
http://www.edge-ai-vision.com
Christine Cheng, co-chair of the inference benchmark working group at MLPerf and a senior machine learning optimization engineer at Intel, delivers the presentation “MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Cheng explains how MLPerf’s inference benchmark suite for evaluating processor performance works and is evolving.
Performing Simulation-Based, Real-time Decision Making with Cloud HPCinside-BigData.com
Zach Smocha from Rescale presented this deck at the HPC User Forum in Tucson.
Watch the video presentation: http://wp.me/p3RLHQ-fdC
Learn more: http://www.rescale.com/
and
http://hpcuserforum.com
"ZYNQ-7000 High Performance Electric Drive and Silicon Carbide Multilevel inverter with Scilab Hardware-in-the-loop"
By Giulio Corradi, Xilinx for ScilabTEC 2015
Red Hat Enterprise Linux: The web performance leaderJoanne El Chah
This reports illustrates how Red Hat Enterprise Linux beats Microsoft Windows as the best platform for enterprise web applications. Superior performance, coupled with significant cost savings derived from the ability to leverage the latest multicore x86 hardware as well as virtualization, adds up to competitive advantage.
Model based transaction-aware cloud resources management case study and met...Leonid Grinshpan, Ph.D.
The presentation introduces a method of cloud resources allocation to enterprise applications (EA) depending on business transaction metrics. The approach is using queuing models; it was devised while working on a real-life EA capacity planning project requested by one of the Oracle customers. An implementation of a proposed solution brought a number of database servers from 40 to 21 without compromising transaction times.
The presentation describes components of proposed methodology: building application’s queuing model, obtaining input data for modeling (workload characterization and transaction profile), solving model and analyzing what-if scenarios. The presentation compares ways and means of collecting input data; it identifies instrumentation of software at its development stage as an ultimate solution and encourages research of technologies delivering instrumented EAs.
Takeaway: model-based transaction-aware cloud resources management significantly improves cloud profitability by minimizing a number of hardware servers hosting applications while delivering required service level.
3. Risk Management
Market activity growing because of huge profit
Instability on global finance
sophisticated financial products,
derivatives,
credit
Market regulation – Basel for Europe
- Capital ratio n% to be blocked by financial institutions
- Can be reduced if high performance measurement system
4. Risk Management
Market Risk
Market volatility
Stress scenarios based on historical events11 September 2001
Value At Risk = probability
that maximum loss in a period
Of time does not exceed
(1 - Percentile)%
5. Risk Management
Credit Risk
Counterparty default : Country, Organisations
Market conditions stressed on time-points up to 50 years
in the future
Credit V@R : Curve of exposures for each time-point
7. Risk Management
Requirements
Availability – 99% of blades availability
Performance – low latency, high throughput
Scalability – Ideally linear scalability
Reliability – Fault tolerance, retry strategies, engines black-listing
Maintainability – Upgrade with minimal effort
Extensibility – Unstable models, should be able modify processing
Security – Control over data access
Manageability – Resource consumption statistics
policy and SLA management for multiple clients
2011 IPM - HPC4 7
8. Counterparty Exposure
• Between 6 and 40
daily run
Nb Simulations
{1.25,
• Each calculation cube: 2.33,
• 10 000 000 Deals 0.95}
• 150 time points
• 10 000 simulations
• 1 000 000 aggregations Nb time points
• -> 15 trillions operations per
calculation
• -> 450 Terabytes of
intermediate data
ls
ea
D
b
N
Risk System IT
2011 IPM - HPC4 8
10. Counterparty Exposure
• Spit by trade TimePoint
Trade 0
Trade 1
Simulation
Trades Trade n
Simulation 200
Simulation 350
Loop {Trades}
Send Trade_i to Blades_n
Fetch [Simulations] from Blade_n
next iteration
2011 IPM - HPC4 10
11. Counterparty Exposure
• Spit by trade
simulations too big to fit in the blade memory
If N simulations can fit in memory
-> transfer over network =
{Trades} * {simulations} * size(simulation) / N
= 5000 TB / N transfer
PV matrix generated on the blades
Data affinity can reduce considerably the network transfer
-> Keep simulation in the blade as a “state”
-> Client should maintain orchestration, reliability
2011 IPM - HPC4 11
13. Counterparty Exposure
TimePoint
Split by scenario Simulation 0
Simulation 1
Simulation
Trades
Trade 0 Simulation n
Trade i
Loop {Simulations}
Send {Simulations n to m} to {Blades}
Send {Trades} to {Blades}
next iteration
2011 IPM - HPC4 13
14. Counterparty Exposure
Split by scenario
Affinity -> Data centric
Scenarios sent only once to the blades
Each trade sent {scenarios} times, size (trade) << size (scenario)
PV matrix constructed on the client progressively
-> Client should maintain reliability, fault-tolerance
-> Client should maintain all the states for the generated PV
-> Too heavy, we will need multiple clients
2011 IPM - HPC4 14
15. Counterparty Exposure
Aggregation
All PV matrix required : Point to point aggregation
-> Too big to fit in client memory
-> Constant disk access
We can also distribute aggregator processes
-> Underlying problematic of reliability, fault-tolerance
2011 IPM - HPC4 15
17. Datasynapse GridServer
Director
Entry access point, authentication
Engine balancer: Moves engines between brokers depending on charges, demands,
policies (weight base,home/shared).
Routes client to brokers
Broker
Handles Client (Driver) requests
Schedules service sessions
Schedules service instances to engines
Scheduling period takes into account engine states, discriminators, blacklisting
Maintains pool of engines
2011 IPM - HPC4 17
18. Datasynapse GridServer
Engine Daemon
1 Deamon per machine
Manages Engine instances.
Interact with Director – migration between brokers
Engine instance
1 per CPU
Manages application
Communicate with Client (Driver) for data and processing
Maintains state, checkpoints, init data.
Interact with Broker, receive assignment and interruption
2011 IPM - HPC4 18
19. Datasynapse GridServer
Driver
Embedded in client
Provides API in C++, Java, .NET, SOAP
- Service oriented : loose interaction between client code and engine code
- Object oriented : Client and Service are coupled and exchange data in intermediate objects
- Pdriver : PDS scripting language – used for MPI jobs
Data generally transferred directly between client and engines – http server on driver
Data collection is immediate, later or never
For COLLECT_LATER and COLLECT_NEVER, better push the data to the broker : In case
of failure, client may not be here anymore so input data will be lost.
Possibility of pushing initial data – will be kept as static data in engine machine and
used by different instances of the service.
2011 IPM - HPC4 19
20. Datasynapse GridServer
Architecture:
Multiple broker – 1 per organisation
– Service Level Agreement (SLA)
defined for each Line Of Business
(LOB) .
Failover brokers – Service states
are stored in a database and
resubmitted through FO broker if
live broker down
Secondary director active if failure
on primary director
2011 IPM - HPC4 20
21. Datasynapse GridServer
Brokers form a “partition” of shared engines
-> Home/shared or weight based
Home/Shared
Each engine has a default broker “home” and migrates to “shared” broker if there is a demand.
Engines are interrupted in the shared broker if there is a demand in home broker.
Different parameters for defining the pace of migration and define the fluidity of the system.
Weight Based
Engine are homed to brokers indifferently, but based on the director weights by the Director (for ex :
60%-40%)
Movement between brokers follow same procedure than Home/Shared.
2011 IPM - HPC4 21
23. Counterparty Exposure
Split by Scenario (2)
//asynchronous callback
class InvocationHandler : public ServiceInvocationHandler {
public:
void handleResponse(const string &response, int id) {
cout << "response from async callback " << id << ": " << response << endl;
}
void handleError(ServiceInvocationException &e, int id) {
cout << "error from async callback " << id << ": " << e.getMessage() << endl;
}};
All the invocations will be queued in the broker.
Scheduler will forward them to the engine containing the right scenario.
InvocationHandler will be triggered by the broker to the callback after each invocation
has completed, the result data will be pushed directly by the engines to avoid
bottleneck on the broker.
2011 IPM - HPC4 23