Data Aware Routing In The Cloud

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Data Aware Routing In The Cloud - Presentation Transcript

    1. Data Aware Routing in the Cloud
      Eugene Steinberg
      CTO, Grid Dynamics
    2. Traditional HPC approach:HPC Cluster with centralized data access
    3. Data Driven Scalability Challenges in traditional HPC approach
      • Data is far away
      • Latency of remote connection
      • Latency of data travelling through pipes
      • Chatty random data access algorithms are expensive
      • Data is centralized
      • Centralized HW resources are always limited
      • Centralized RAM is limited: disk I/O is inevitable
      • Connections are limited
      • Highly concurrent data access doesn’t scale well
    4. Usual solution: Data Grids
      • Classic Data Grid
      • Data is partitioned
      • Partitions are stored in memory
      • Data grid is deployed near to compute grid
      • Search is parallelized over partitions
      • Build-in replication, persistence, coherence, failover
      • What is Achieved?
      • Reduced latency and data moving cost
      • Improved connection scalability
      • Reduced data contention
      • No Disk I/O – 100% memory speed
      • Is this the best we can do?
    5. Limitations of Compute Grid + Data Grid
      • Two separate grid environments
      • Hardware, footprint and management costs of dual infrastructure
      • Segregated infrastructures cannot share resources
      • Sub-optimal resource utilization
      • Compute grid is CPU-bound, not RAM-bound
      • Data grid is RAM-bound, not CPU-bound
      • Still sub-optimal performance
      • Still paying for remote network calls and data movement
    6. Better answer: Compute-Data Grid
      • Shared hardware between compute & data grid
      • Data grid utilizes RAM of host machines
      • Compute grid runs HPC jobs on the same host machines
      • Opportunity for data aware routing
      • Many applications support compute-data affinity
      • Reduced overhead on remote calls and data movement
      • Opportunity to scale
      • As HPC application needs to scale in and out, data partitions are spread over larger or smaller pool of hosts
    7. Data aware routing in Compute-Data grid
      • What is data aware routing?
      • Route HPC tasks to machine hosting relevant data grid partition
      • Relevant partition is identified by data affinity key
      • Why data aware routing?
      • Task access data via loopback – reduced latency
      • Data doesn’t travel through pipes and switches – bandwidth savings
      • Data aware routing architecture goals
      • Pluggable: lightweight core + adapters for variety of grid products
      • Non-intrusive: neither compute grid, nor data grid are aware of being coordinated
    8. Data aware routing architecture
    9. Components
      • Core Components
      • Data Grid Adapter: service responsible for knowing Data Grid’s topology and state
      • Data Aware Wrapper: client side library extending Compute Grid’s scheduling API to support data-aware job scheduling
      • Variation on Configuration
      • Data Grid Adapter can be deployed as a remote service or embedded as a library
    10. Data aware routing workflow
      Client submits task + affinity key to Wrapper
      Wrapper requests host hint from Data Grid Adapter
      Data Grid Adapter gives host hint using affinity key
      Wrapper submits task with host hint using HPC API
    11. Cloud Factor: opportunities and challenges
      Emerging Cloud Computing technologies
      • Offer benefits for HPC architectures
      • Automated cluster provisioning and deployment
      • Ability to timely set up large clusters for once-in-a-blue-moon jobs
      • Ability to scale cluster up and down on demand
      • Easy to spawn isolated development/testing environments
      • … as well as some new challenges
      • Networking limitations (e.g. absence of multicast)
      • No strong promises on CPU, I/O and bandwidth shares
      • No real perimeter firewall – security considerations
    12. Demo: Simplistic Trading Analytics
      • Application characteristics
      • Data intensive application: N*10K trade objects (2Kb payload), for 4*N stock tickers
      • Processing algorithm
      • Job: to evaluate all tickers
      • Task: to process ticker trades with chatty random-access algorithm
      • Data source and task scheduling control
      • Data sources: RDBMS or IMDG
      • Scheduling mode: neutral and data-aware
      • Task and job latency measurement
    13. SGE + Oracle Coherence on AWS
      • AWS provisioning considerations
      • Customization of popular images is faster then spinning up a custom AMI
      • make sure sshd started on freshly started machine
      • SGE considerations
      • Reverse DNS lookup: /etc/hosts to contain all hostnames
      • pw-less ssh access
      • Oracle Coherence considerations
      • No multicast: unicast + well-known addresses
      • Many short-living clients: do not join TCMP cluster, use Coherence*Extend
    14. Data aware routing in the cloud
      Eugene Steinberg
      esteinberg@griddynamics.com

    + Grid DynamicsGrid Dynamics, 3 months ago

    custom

    139 views, 0 favs, 0 embeds more stats

    Large-scale HPC infrastructures such as Sun Grid En more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 139
      • 139 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 7
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories