ScaleFast Grid And Flow

718 views
669 views

Published on

A python based grid computing project. With process work-flow built in, deploy and manage simple through to complex business processes across a distributed network of dedicated or on demand commodity computers. Run command line apps, native Python, Java and .Net code.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
718
On SlideShare
0
From Embeds
0
Number of Embeds
166
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Scheduling Tools: Active Batch
  • Issues in the top grouping can be addressed by tools like Active Batch
  • Support staff can now see in very granular process details where process failed.It is no easier to determine the cause of a failure, was it: Resource issues Bad static data Bugs in code
  • There is also an instance of flow that can be easily integrated into an Enterprise Workflow/Automation Application
  • ScaleFast Grid And Flow

    1. 1. Grid and Flow<br />By Robert Betts<br />robert.betts@scalefast.com<br />
    2. 2. The Offering<br />A distributed, stable and well synchronised platform for grid computing.<br />A methodology, toolset and library for deploying an efficient, co-ordinated and parallel platform.<br />
    3. 3. Current Operational Challenges<br />Processing of large datasets open to high rates of failure<br />Processing takes a long time to complete<br />Processing is often executed sequentially<br />Technology bottlenecks e.g. 32 bit software often only supports between 2 and 3 GB RAM<br />64 bit processes can hog available resources<br />Most tools can’t exploit multi core configurations<br />Dedicated hardware allocated to accommodate the maximum processing load<br />Difficult to audit and trace processing problems<br />
    4. 4. With ScaleFast you can ...<br />Centralise all business processes<br />Define hierarchical processes with step inter-dependencies<br />Parallelise the running of processes and steps<br />Re-run failed processes from any point.<br />Automatically split large process steps<br />Make use of multi core/processor computers<br />Distribute jobs across multiple computers<br />Make use of user workstations and other idle computing resources<br />
    5. 5. ScaleFast Grid<br />Distributed Computing Grid<br />A distributor and worker nodes<br />Implements map/reduce<br />Workers can run on user workstations or dedicated infrastructure<br />Can be easily deployed to a cloud platform<br />Supports the native running of Python, Java and .Net<br />
    6. 6. ScaleFast Flow<br />Process Workflow Engine<br />Processes are made up of individual jobs which have inter-dependencies<br />The output of one job can be the input of the next job<br />Processes have notifications based on success or failure<br />Flow has a built in scheduler which can be triggered by:<br />Time with multiple time zone support<br />User Interface<br />API<br />Processes can be restarted from any point of failure<br />Processes can be made up of sub processes<br />
    7. 7. Common Use Cases<br />Reporting and data processing<br />Stabilising processes that fail due to resource constraints<br />Speeding up processes that take a long time to run<br />Improve and/or balance resource utilisation <br />Process orchestration and scheduling<br />Co-ordinating processes with event based synchronisation<br />Parameter and data flow between process steps<br />Centralisation and versioning of processes<br />Reducing support administration with full process auditing<br />General processing and application development<br />Any application/process that would benefit from parallelism<br />Risk Management and PL Processing<br />Distributed Computations<br />
    8. 8. Case Study 1 – Hedge Fund<br />After<br />Before<br /><ul><li>Trade volumes of 10 000 per day
    9. 9. Reports continuously failing
    10. 10. Reporting taking longer to run each day
    11. 11. System support occupies a fulltime resource with additional assistance frequently required
    12. 12. Overnight failures push EOD processing to t+2 (SLA at t+1 am)
    13. 13. Fund considers:</li></ul>Adding head count with full time EOD support resources<br />Purchasing additional hardware<br />Purchasing a scheduling and automation product<br /><ul><li>Process failures reduced significantly
    14. 14. Fine grained audit trail of all processes
    15. 15. EOD processing time reduced from 8 hours to 50 minutes
    16. 16. Hardware freed up
    17. 17. No additional headcount required</li></li></ul><li>Case Study 2 – Hedge Fund<br />After<br />Before<br /><ul><li>Fixed Income Risk Project</li></ul>New trading system implementation for Risk Management and P&L<br />Requirement to take all Risk processing in house<br /><ul><li>After trading system implemented</li></ul>EOD process become more complex and onerous<br />Reports begin to fail<br /><ul><li>Fund considers:</li></ul>Head count requirement in supporting new trading system<br />Purchasing of hardware for additional processing<br /><ul><li>Process failures reduced significantly
    18. 18. Fine grained audit trail of all processes
    19. 19. EOD processing time reduced from 4 hours to 30 minutes
    20. 20. Hardware freed up
    21. 21. Risk Engine built on top of Grid and Flow.
    22. 22. Scenario analysis report with 50 scenarios across 2000 positions runs in under 5 minutes</li></li></ul><li>Case Study 3 – Bank<br />After<br />Before<br /><ul><li>Key EOD reports failing due to resource constraints
    23. 23. Tried shell scripts to split reports
    24. 24. Tried refactoring reports
    25. 25. Reports still took 7 hours to run and taking longer to run each day
    26. 26. A single failure required a complete restart
    27. 27. A failure and restart would result in SLA failures to all downstream systems
    28. 28. Bank considers:</li></ul>Purchasing additional hardware<br />Re-assessing support requirements<br /><ul><li>Process failures reduced significantly
    29. 29. On failure, reporting process can now resume from any step
    30. 30. Completed reports are now processed in 40 minutes
    31. 31. Fine grained audit trail of process to aid support staff
    32. 32. Hardware freed up for other projects</li></li></ul><li>Other Uses<br />Monte Carlo framework for pricing exotic structured credit instruments<br />Risk Management Processing<br />General application processing<br />Process synchronisation<br />Loading and parsing large datasets<br />
    33. 33. SCALEFAST Architecture<br />Flow stores, versions and schedules workflows which are predefined and synchronised grid jobs.<br />Grid Clients are any processes able to submit grid jobs.<br />Grid Distributor receives job requests and maps the reduced jobs as tasks across workers.<br />Grid Workers request and process job tasks<br />Flow<br />Grid Client 0 <br />Grid Client ...<br />Grid Client Q<br />Grid Distributor<br />Worker 0<br />Worker ...<br />Worker X<br />Server 0<br />Runner<br />Runner<br />Runner<br />Server ...<br />Server N<br />Local disk<br />Shared Storage<br />
    34. 34. Flow GUI<br />A simple process example with 3 steps<br />Processes can have multiple branch dependencies e.g. 1 to many and many to 1<br />Processes can be build up from sub processes<br />Flow highlights the status of the individual steps<br />By clicking on a step, you are redirected to the Grid for further details.<br />Processes can be paused and restarted.<br />On a processes failure, it can be restarted at any step in the process.<br />
    35. 35. Grid Job Details GUI<br />Parameters, status and details of a grid job<br />Individual tasks can be drilled down into<br />Stderr and Stdout out can be accessed and queried across all tasks<br />Input parameters, context and output visible at job or task level<br />
    36. 36. Grid Summary GUI<br />High level view on the Grid status and activity<br />View active worker nodes<br />View job activity and history<br />

    ×