XThe Top Five Six Reasons         to Use a  Distributed Data Grid                   Webinar              December, 2011 Bi...
Agenda• About ScaleOut Software• Overview of Products• What is a Distributed Data Grid (DDG)?• The Top Six Reasons• What t...
Company• Founded in September 2003, privately funded• Offices in Bellevue, WA and Beaverton, OR• Team:     – Dr. William B...
It’s All About Scaling Performance• Scaling performance:                                                SCALE OUT         ...
What is a Distributed Data Grid?(Aka “distributed cache”, “in-memory data grid”)                                          ...
Distributed Data Grids: A Closer Look• Incorporates a client-side, in-  process cache (“near cache”):                     ...
The Need for Memory-Based StorageExample: Web server farm:                                                                ...
The Need for Memory-Based StorageExample: Cloud Application:           Cloud Application                                  ...
Scalability Challenges for Applications•       “Scaled out” server applications repeatedly access two types of data:      ...
Wide Range of Applications for DDGsFinancial Services            E-commerce• Portfolio risk analysis     • Session-state s...
Product: ScaleOut StateServer®Fully distributed data grid designed for storing application  data on server farms, compute ...
Product: ScaleOut Remote Client Option• Allows hosting ScaleOut  StateServer on a separate  server farm.                  ...
Products: Grid Computing Edition                                             Compute Servers• Extends ScaleOut  StateServe...
Products: ScaleOut GeoServer OptionGlobal, Multi-Site Data Grids• Extends SOSS across multiple sites.• Ensures against sit...
Reason #1: Faster Access Time• Eliminates repeated network data transfers.• Eliminates repeated object deserialization.   ...
Example of Faster API Read Access• Example for direct API access:     – 10 KB objects, 20:1 read/update ratio     – 3-host...
Reason #2: Linearly Scalable ThroughputScaleOut StateServer automatically scales its performance to matchthe size and work...
What is Scalable Throughput?• What it is (a perfect fit for server farms):     –   Workload W takes time T on 1 server ( 1...
How SOSS Achieves Scalable Throughput• Fully peer-to-peer architecture to eliminate bottlenecks.• Automatically partitione...
Integrated, Powerful Platform for Scaling• All product features benefit from the scalable, hi-av  architecture:           ...
Impact of Scalable TP on Access Latency• Scalable, distributed data grid scales throughput and  thereby maintains low late...
Putting it Together: How SOSS Works• Creating or updating an object:     – Client connects to a SOSS service instance and ...
How SOSS Works• Reading an object:     –   Client connects to SOSS service and makes request.     –   Local SOSS service f...
How SOSS Works• Adding a new host:     – Neighboring hosts detect SOSS on new host.     – Hosts automatically establish ne...
Reason #3: High Availability• Recovering from a host failure:     –   Host or NIC fails.     –   Neighboring hosts detect ...
SOSS: Integrated High Availability• Peer-to-peer architecture for maximum redundancy & scalability• Fully integrated data ...
Reason #4: Sharing Data Across the Farm     The first step for server farms (1998): load-balanced,       stateless, Web ap...
The Evolution in DDGs and Data Sharing                          Drivers:                          • Scaling data access & ...
Data Sharing: a Closer Look• Advantages of sharing data in a distributed data grid:     – Boosts application performance a...
Basic APIs for Data Access                .                                                   key•    Are easy to use in C...
Example: Named Cache Access (Java)     static void Main(string argv[])     {        // Initialize string object to be stor...
Example: Named Cache Access (C#)     static void Main(string[] args)     {        // Initialize object to be stored:      ...
Fully Distributed Locking• Goal: synchronize access to a stored object by multiple client  threads.• Two mechanisms: pessi...
Advanced API Features•    Object timeouts•    Distributed locking for coordinating access•    Object dependency relationsh...
Parallel Data Analysis• The goal:     – Quickly analyze a large set of data for patterns and trends.     – Take advantage ...
Reason #5: Parallel Data Analysis• Rapid analysis of large data sets has become a top  priority.• Distributed data grids e...
Parallel Query• Goal: identify a set of objects with selected properties.• Uses all grid servers to scale query performanc...
Parallel Query Example (Java)• Mark class properties as indexes for SOSS query:public class Stock   implements Serializabl...
Parallel Query Example (C#)• Mark class properties as indexes for SOSS query:class Stock {      [SossIndex]      public st...
Parallel Method Invocation (“Map/Reduce”)• Goal: analyze a set of objects with selected properties.• Executes user’s code ...
Example in Financial ServicesAnalyze trading strategies across stock histories:Why?• Back-testing systems help guard again...
Stage the Data for Analysis• Step 1: Populate the distributed data grid with objects each of which  represents a price his...
Code the Eval and Merge Methods•    Step 2: Write a method to evaluate a stock history based on parameters:       Results ...
Run the Analysis • Step 4: Invoke parallel evaluation and merging of results:      Results Invoke(EvalStockHistory, MergeR...
Start parallel  analysis                                                 .eval()         stock                stock     st...
Advantages of Using PMI• Fast                                               PMI Engine     – Automatically scales applicat...
Comparison of DDGs and File-Based M/R                    DDG                      File-Based M/RData set size       Gigaby...
DDG Minimizes Data Motion• File-based map/reduce must move data to memory for analysis:            M/R Server             ...
Start parallel  analysis                                                 .eval()                                          ...
Performance Impact of Data Motion     Measured random access to DDG data to simulate file I/O:50                          ...
PMI Delivers 16X Speedup Over Hadoop                                  Throughput Comparison                            800...
Reason # 6: Simplify Data Migration• DDGs enable seamless data migration across on-  premise sites and the cloud:     – Au...
Example: Web Farm Cloud-Bursting• DDGs bridge on-premise and cloud-based in-memory storage of  Web session state.• DDG aut...
Example: Global Access to Shared Data      Mirrored Data Centers                                 SOSS SVR                 ...
What to Look for in a DDG Product                         • SSIs products have an unusually high level of integration and ...
SOSS Maximizes Ease of Use   Grid servers self-aggregate, self-heal, and automatically load-balance.Tree list shows:      ...
Real-time Performance Charting57                           ScaleOut Software, Inc.
SOSS Object Browser• Simplifies development.• Provides extremely useful visibility into grid usage.• Allows grid objects t...
SOSS Parallel Backup and Restore• Enables grid contents (or portions) to be backed up or  restored in parallel either to: ...
Recap: Top 6 Reasons to Use a DDG1. Faster access time for business logic state or database data2. Scalable throughput to ...
Thank you for joining us today!            Distributed Data Grids for  Server Farms & High Performance Computing          ...
Upcoming SlideShare
Loading in …5
×

Top 6 Reasons to Use a Distributed Data Grid

2,372
-1

Published on

Covers the problems of achieving scalability in server farm environments and how distributed data grids provide in-memory storage and boost performance. Includes summary of ScaleOut Software product offerings including ScaleOut State Server and Grid Computing Edition.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,372
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Top 6 Reasons to Use a Distributed Data Grid

  1. 1. XThe Top Five Six Reasons to Use a Distributed Data Grid Webinar December, 2011 Bill Bain (wbain@scaleoutsoftware.com) Copyright © 2011 by ScaleOut Software, Inc.
  2. 2. Agenda• About ScaleOut Software• Overview of Products• What is a Distributed Data Grid (DDG)?• The Top Six Reasons• What to Look for in a DDG Product2 ScaleOut Software, Inc.
  3. 3. Company• Founded in September 2003, privately funded• Offices in Bellevue, WA and Beaverton, OR• Team: – Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server – David Brinker, COO • 20 years software business and executive management experience • Mentor Graphics, Cadence, Webridge• Develops and markets Linux & Windows DDG products.• Seven years market experience. 3 ScaleOut Software, Inc.
  4. 4. It’s All About Scaling Performance• Scaling performance: SCALE OUT CPU Memory Scale Out Storage CPU CPU CPU CPU Memory Memory Memory Memory Scaling out: • Has excellent scalability. Storage Storage Storage Storage • But is challenging to implement.4 ScaleOut Software, Inc.
  5. 5. What is a Distributed Data Grid?(Aka “distributed cache”, “in-memory data grid”) Processor Processor• A new “vertical” storage tier: Cache Cache – Adds missing layer to boost performance. L2 Cache L2 Cache – Uses in-memory, out-of-process storage. Application Application Memory Memory – Avoids repeated trips to backing “In-Process” “In-Process” storage. Distributed Distributed• A new “horizontal” storage tier: Data Grid Data Grid “Out-of- “Out-of- – Allows data sharing among servers. Process” Process” – Scales performance & capacity. – Adds high availability. Backing Storage – Can be used independently of backing storage.5 ScaleOut Software, Inc.
  6. 6. Distributed Data Grids: A Closer Look• Incorporates a client-side, in- process cache (“near cache”): Application – Transparent to the application Memory – Holds recently accessed data. “In-Process” Client-side• Boosts performance: Cache – Eliminates repeated network data “In-Process” Distributed transfers & deserialization. Data Grid – Reduces access times to near “in- “Out-of- process” latency. Process” – Is automatically updated if the distributed grid changes. – Supports various coherency models (coherent, polled, event-driven)6 ScaleOut Software, Inc.
  7. 7. The Need for Memory-Based StorageExample: Web server farm: Internet• Load-balancer directs Load-balancer incoming client requests POW ER FAU LT DATA AL A RM Ethernet to Web servers.• Web and app. server farms build Web pages W eb Server Distributed, In-Memory DataServer W eb Server W eb Server W eb Server W eb Server W eb Grid and run business logic. Ethernet• Database server holds all mission-critical, LOB data. D atabase R aid D isk D atabase Server Array Server Bottleneck• Server farms share fast- Ethernet changing data using a Distributed, In-Memory Data Grid DDG to avoid bottlenecks and maximize scalability. App. Server App. Server App. Server App. Server 7 ScaleOut Software, Inc.
  8. 8. The Need for Memory-Based StorageExample: Cloud Application: Cloud Application App VS• Application runs as multiple, App VS virtual servers (VS). App VS App VS App VS• Application instances store and retrieve LOB data from cloud- Grid VS based file system or database. Grid VS Grid VS Distributed Data Grid• Applications need fast, scalable storage for fast-changing data.• Distributed data grid runs as multiple, virtual servers to provide “elastic,” in-memory storage. Cloud-Based Storage8 ScaleOut Software, Inc.
  9. 9. Scalability Challenges for Applications• “Scaled out” server applications repeatedly access two types of data: – Repeatedly referenced database-data (e.g., stock prices) and – Fast changing, business-logic data (e.g., session-state, workflow state)• Database servers are not designed to meet this need: Characteristics: Typical DBMS data Application data Volume High Low Lifetime/turnover Long/slow Short/fast Access patterns Complex Simple Data preservation Critical Less critical Fast access/update Less important More important• Scaled-out applications create additional challenges: – How to make shared application data quickly accessible by any server – How to maintain fast access and avoid bottlenecks as the server farm grows – How to keep application data highly available when a server fails 9 ScaleOut Software, Inc.
  10. 10. Wide Range of Applications for DDGsFinancial Services E-commerce• Portfolio risk analysis • Session-state storage• VaR calculations • Application state storage• Monte Carlo simulations • Online banking• Algorithmic trading • Loan applications• Market message caching • Wealth management• Derivatives trading • Online learning• Pricing calculations • Hotel reservations • News story cachingOther Applications• Edge servers: chat, email • Shopping carts• Online gaming servers • Social networking• Scientific computations • Service call tracking• Command and control • Online surveys10 ScaleOut Software, Inc.
  11. 11. Product: ScaleOut StateServer®Fully distributed data grid designed for storing application data on server farms, compute grids, and the cloud:• Runs in-memory directly on a farm or grid as a distributed service.• Automatically: – Distributes and shares SOSS data across the farm. Service Web Server – Reduces access time. – Scales when SOSS Service the farm grows. Ethernet Ethernet Web Server Internet – Survives when a server fails. SOSS Service DBMS Server• Cost-effective Web Server DBMS• Complements & offloads DBMS. SOSS Bottleneck Service• Portable across Windows and Linux. Web Server11 ScaleOut Software, Inc.
  12. 12. Product: ScaleOut Remote Client Option• Allows hosting ScaleOut StateServer on a separate server farm. Web or Application Server Farm• Ensures highly Client Application Client Application Client Application Client Application Client Application available connectivity to Windows Windows Windows Linux Linux Remote Client Remote Client Remote Client Remote Client Remote Client SOSS store. Load-balanced Connections• Automatically load-balances access requests to minimize Windows Linux Windows SOSS SOSS SOSS response times.• Uses multiple connections to maximize throughput. ScaleOut StateServer Farm12 ScaleOut Software, Inc.
  13. 13. Products: Grid Computing Edition Compute Servers• Extends ScaleOut StateServer for use in high performance computing (HPC) applications.• Provides advanced capabilities for parallel data Master analysis.• Includes optional management tools. SOSS .. Service• Complements SSI’s Data Bottleneck extended support plans. Database Servers13 ScaleOut Software, Inc.
  14. 14. Products: ScaleOut GeoServer OptionGlobal, Multi-Site Data Grids• Extends SOSS across multiple sites.• Ensures against site-wide failures.• Replicates data between data SOSS farms.• Employs scalable, hi-av connections.• Automatically handles membership changes at remote sites.• Can support both “push” and “pull” access models.14 ScaleOut Software, Inc.
  15. 15. Reason #1: Faster Access Time• Eliminates repeated network data transfers.• Eliminates repeated object deserialization. Average Response Time 10KB Objects 3500 20:1 Read/Update 3000 2500 Microseconds 2000 1500 1000 500 0 DDG DBMS15 ScaleOut Software, Inc.
  16. 16. Example of Faster API Read Access• Example for direct API access: – 10 KB objects, 20:1 read/update ratio – 3-host ScaleOut StateServer store with 3 clients• Results: – Distributed cache provided >6X faster read time than database server.16 ScaleOut Software, Inc.
  17. 17. Reason #2: Linearly Scalable ThroughputScaleOut StateServer automatically scales its performance to matchthe size and workload of a server farm or HPC compute grid. Read/Write Throughput 10KB Objects Accesses / Second 80,000 60,000 40,000 20,000 0 4 16 28 40 52 64 Nodes 16,000 ------------------------------------------- 256,000 #ObjectsTests performed in Microsoft Enterprise Engineering Center17 ScaleOut Software, Inc.
  18. 18. What is Scalable Throughput?• What it is (a perfect fit for server farms): – Workload W takes time T on 1 server ( 1 W/T). – Workload 2W takes time T on 2 servers (2 W/T). – Workload nW takes time T on n servers (n W/T). – Total completion time (i.e., response time) stays fixed.• What it is not (common misperception): – Workload W takes time T/2 on 2 servers (2 W/T). – Workload W takes time T/n on n servers (n W/T).• Why increase the workload with more servers? – Adding servers adds overhead (e.g., networking). – Increasing workload hides overheads for linear scaling. – DDG must keep overheads low for linear scaling. – Must not let network saturate! (Its throughput is fixed.)18 ScaleOut Software, Inc.
  19. 19. How SOSS Achieves Scalable Throughput• Fully peer-to-peer architecture to eliminate bottlenecks.• Automatically partitioned data storage with dynamic ScaleOut StateServer Distributed Cache Object Copy Replica load-balancing. Cache Service Cache Service Cache Service Cache Service• Fixed number of replicas Heartbeats Heartbeats Heartbeats per stored object (1 or 2) Web or Application Server Web or Application Server Web or Application Server Web or Application Server to avoid order-n overhead Ethernet (storage and latency)• Patented technique for scaling quorum updates to stored objects• Patented, scalable heart-beating algorithm19 ScaleOut Software, Inc.
  20. 20. Integrated, Powerful Platform for Scaling• All product features benefit from the scalable, hi-av architecture: Client Application Client Application Client Application Client Application – Ex. Parallel object Client Client Client Client Library Library Library Library eventing: Cache Cache Cache Cache • All hosts handle events. Service Service Service Service ScaleOut StateServer Distributed Cache • Event delivery is hi-av. – Ex. Global replication: • All hosts replicate objects. • Caches automatically handle membership changes. Local Farm Remote Farm20 ScaleOut Software, Inc.
  21. 21. Impact of Scalable TP on Access Latency• Scalable, distributed data grid scales throughput and thereby maintains low latency: – DDG scales throughput by adding servers. Access Latency vs. Throughput – Avoids throughput barrier Access Latency (msec) of a DBMS or file system. – Maintains low latency as throughput increases. – Network bandwidth is only throughput limit. – Also has inherently lower Throughput (accesses / sec) latency due to: • Memory-based storage • Client-side caching SOSS DBMS21 ScaleOut Software, Inc.
  22. 22. Putting it Together: How SOSS Works• Creating or updating an object: – Client connects to a SOSS service instance and makes request. – Local SOSS service load-balances request to a selected host. – Selected host creates object and one or two remote replicas. Client SOSS SOSS SOSS SOSS Server Server Server Server22 ScaleOut Software, Inc.
  23. 23. How SOSS Works• Reading an object: – Client connects to SOSS service and makes request. – Local SOSS service forwards to selected host. – Selected host returns object’s data. – Requesting host caches object for future reads. Client SOSS SOSS SOSS SOSS Server Server Server Server23 ScaleOut Software, Inc.
  24. 24. How SOSS Works• Adding a new host: – Neighboring hosts detect SOSS on new host. – Hosts automatically establish new membership. – Neighbor hosts migrate objects to new host to rebalance load. SOSS SOSS SOSS SOSS SOSS Server Server Server Server Server24 ScaleOut Software, Inc.
  25. 25. Reason #3: High Availability• Recovering from a host failure: – Host or NIC fails. – Neighboring hosts detect heartbeat failure. – Hosts establish new membership. – Neighbor host creates new object replica to “self-heal”. STOP SOSS SOSS SOSS SOSS Server Server Server Server25 ScaleOut Software, Inc.
  26. 26. SOSS: Integrated High Availability• Peer-to-peer architecture for maximum redundancy & scalability• Fully integrated data replication for data redundancy, scalability, and ease of use: – Partial replicas ensure scalable storage and throughput. – Per-server and per-client caches ensure fast access.• Self-discovery and self-healing for hi-av and ease of use• Patented quorum algorithm for reliable updating with scalability Client Application Retrieve Client Cached Library Copy Object Copy Replica Cache Cache Cache Cache Service Service Service Service ScaleOut StateServer Distributed Cache26 ScaleOut Software, Inc.
  27. 27. Reason #4: Sharing Data Across the Farm The first step for server farms (1998): load-balanced, stateless, Web applications: • Without the ability to share data, we need “sticky” sessions (no hi av!): SOSS Service • Or we can overload the Web Server database server: SOSS Service Ethernet Ethernet Web Server Internet SOSS Service DBMS Server Web Server • Or we can share data SOSS Service across the farm in a distributed data grid for Web Server both scalability & high av.27 ScaleOut Software, Inc.
  28. 28. The Evolution in DDGs and Data Sharing Drivers: • Scaling data access & analysis are critical to competitiveness. Cloud Computing • Server farms & the cloud are now mainstream using industry-standard APIs computing platforms.Market Penetration • Data access is a key bottleneck. • Short dev. cycles are mandatory. • Standard APIs are emerging. Expansion to new verticals (e.g., financial services) for data & compute grids Early adoption on Web and app. server farms for speed and hi-av Session-state Application Grid Platform-wide Data Storage Caching Computing Usage Analysis 2005 2006 2007 2008 2010 2011 28 ScaleOut Software, Inc.
  29. 29. Data Sharing: a Closer Look• Advantages of sharing data in a distributed data grid: – Boosts application performance and offloads the DBMS. – Advances & simplifies the programming model: • Allows “stateful” business objects • Keeps object/relational mapping at the data access layer• Examples: session & profile data, business objects, workflow state• Requirements of a distributed data grid: – Coherent storage so all clients see a consistent view – Easy-to-use APIs – Integrated object locking to enable coordinated updating – High availability to avoid data loss if a server fails – Advanced features to enable effective use of the grid (e.g., parallel query, map/reduce analysis)29 ScaleOut Software, Inc.
  30. 30. Basic APIs for Data Access . key• Are easy to use in C#, Java, or C/C++. Object• Store objects in the grid as serialized blobs.• Primarily use string or numeric keys to identify objects.• Group objects into name spaces (“named caches”). // Read and update object: MyClass retrievedObj; retrievedObj = cache["myObj"] as MyClass; retrievedObj.var1 = "Hello, again!"; cache["myObj"] = retrievedObj;30 ScaleOut Software, Inc.
  31. 31. Example: Named Cache Access (Java) static void Main(string argv[]) { // Initialize string object to be stored: String s = “Test string”; // Create a cache collection: SossCache cache = SossCacheFactory.getCache(“MyCache”); // Store object in ScaleOut StateServer (SOSS): CachedObjectId id = new CachedObjectId(UUID.randomUUID()); cache.put(id, s); // Read object stored in SOSS: String answerJNC = (String)cache.get(id); // Remove object from SOSS: cache.remove(id); }31 ScaleOut Software, Inc.
  32. 32. Example: Named Cache Access (C#) static void Main(string[] args) { // Initialize object to be stored: SampleClass sampleObj = new SampleClass(); sampleObj.var1 = "Hello, SOSS!"; // Create a cache: SossCache cache = CacheFactory.GetCache("myCache"); // Store object in the distributed cache: cache["myObj"] = sampleObj; // Read and update object stored in cache: SampleClass retrievedObj = null; retrievedObj = cache["myObj"] as SampleClass; retrievedObj.var1 = "Hello, again!"; cache["myObj"] = retrievedObj; // Remove object from the cache: cache.["myObj“] = null; }32 ScaleOut Software, Inc.
  33. 33. Fully Distributed Locking• Goal: synchronize access to a stored object by multiple client threads.• Two mechanisms: pessimistic and optimistic locking• Pessimistic uses read-modify-write semantics: – Can be set as default for all objects within a named cache. – Reads to locked objects are automatically retried. – Locks have timeouts to handle client failures. – Simple reads and updates can bypass locks.string myObj = cache.Retrieve("key", true); // read and lock...cache.Update("key", “new value", true); // update and unlock• Optimistic uses object’s version number to allow or inhibit an update: – User supplies version number from read to a locking update. – Benefit: one trip to the server if high probability of success.33 ScaleOut Software, Inc.
  34. 34. Advanced API Features• Object timeouts• Distributed locking for coordinating access• Object dependency relationships• Asynchronous events on object changes• Automatic access to a backing store• Object eviction on high memory usage• Object metadata• Bulk insertion• Authentication• Custom serialization for compression & encryption• Parallel query based on metadata or properties34 ScaleOut Software, Inc.
  35. 35. Parallel Data Analysis• The goal: – Quickly analyze a large set of data for patterns and trends. – Take advantage of scalable computing to shorten “time to insight.”• Applications: – Search – Financial services – Business intelligence – Risk analysis – Weather simulation – Structural modeling – Fluid-flow analysis – Climate modeling NCAR Community Climate Model http://www.vets.ucar.edu/vg/IPCC_CCSM3/index.shtml35 ScaleOut Software, Inc.
  36. 36. Reason #5: Parallel Data Analysis• Rapid analysis of large data sets has become a top priority.• Distributed data grids enable fast parallel analysis: – Automatically harness the power of many servers and cores. – Offer a simple, easy-to-use development model. – Deliver top performance for memory-based datasets.• Key attributes of DDG-based PMI vs. Random Access Throughput Comparison data analysis: 600 2mb time series objects SOSS PMI – Data is memory-based and 500 Random Access Objects per Second 400 data motion is minimized. 300 200 – Programming model is object- 100 oriented; parallelism is automatic. 0 Number of Nodes 4 8 12 16 20 24 28 32 Number of Objects 512 1024 1536 2048 2560 3072 3584 409636 ScaleOut Software, Inc.
  37. 37. Parallel Query• Goal: identify a set of objects with selected properties.• Uses all grid servers to scale query performance.• Uses fast, optimized lookup on each grid server. Query the DDG in parallel. Sequentially analyze all queried objects. Merge the keys into a list.37 ScaleOut Software, Inc.
  38. 38. Parallel Query Example (Java)• Mark class properties as indexes for SOSS query:public class Stock implements Serializable { private String ticker; private int totalShares; private double price;@SossIndexAttributepublic String getTicker() { return ticker;} … }• Define a query using these properties:NamedCache cache = CacheFactory.getCache("Stocks", false);Set keys = cache.queryKeys(Stock.class, or(equal("ticker", "GOOG"), equal("ticker", "ORCL")));38 ScaleOut Software, Inc.
  39. 39. Parallel Query Example (C#)• Mark class properties as indexes for SOSS query:class Stock { [SossIndex] public string Ticker { get; set; } public decimal TotalShares { get; set; } public decimal Price { get; set; }}• Define a query using these properties. Objects are automatically read into memory:NamedCache cache = CacheFactory.GetCache("Stocks");var q = from s in cache.QueryObjects<Stock>() where s.Ticker == "GOOG" || s.Ticker == "ORCL" select s;Console.WriteLine("{0} Stocks found", q.Count());39 ScaleOut Software, Inc.
  40. 40. Parallel Method Invocation (“Map/Reduce”)• Goal: analyze a set of objects with selected properties.• Executes user’s code in parallel across the grid.• Uses a parallel query to select objects for analysis. Analyze Data (Map) In-Memory Distributed Data Grid Runs Map/Reduce Analysis. Combine Results (Reduce)40 ScaleOut Software, Inc.
  41. 41. Example in Financial ServicesAnalyze trading strategies across stock histories:Why?• Back-testing systems help guard against risks in deploying new trading strategies.• Performance is critical for “first to market” advantage.• Uses significant amount of market data and computation time.How?• Write method E to analyze trading strategies across a single stock history.• Write method M to merge two sets of results.• Populate the data store with a set of stock histories.• Run method E in parallel on all stock histories.• Merge the results with method M to produce a report.• Refine and repeat…41 ScaleOut Software, Inc.
  42. 42. Stage the Data for Analysis• Step 1: Populate the distributed data grid with objects each of which represents a price history for a ticker symbol:42 ScaleOut Software, Inc.
  43. 43. Code the Eval and Merge Methods• Step 2: Write a method to evaluate a stock history based on parameters: Results EvalStockHistory(StockHistory history, Parameters params) { <analyze trading strategy for this stock history> return results; }• Step 3: Write a method to merge the results of two evaluations: Results MergeResuts(Results results1, Results results2) { <merge both results> return results; }• Notes: – This code can be run a sequential calculation on in-memory data. – No explicit accesses to the distributed data grid are used.43 ScaleOut Software, Inc.
  44. 44. Run the Analysis • Step 4: Invoke parallel evaluation and merging of results: Results Invoke(EvalStockHistory, MergeResults, querySpec, params);EvalStockHistory() MergeResults() 44 ScaleOut Software, Inc.
  45. 45. Start parallel analysis .eval() stock stock stock stock stock stock history history history history history history results results results results results results .merge() .merge() .merge() results results results .merge() results returned results to client 45 ScaleOut Software, Inc.
  46. 46. Advantages of Using PMI• Fast PMI Engine – Automatically scales application performance across grid servers. Core Core – Automatically uses all server cores. Core Core – Minimizes data motion between servers. – API-based invocation delivers very low latency.• Easy to Use: – User writes simple, “in memory” code; all grid accesses are implicit. Grid Service – Matches Java/C# model of object- oriented collections. – Requires no tuning.46 ScaleOut Software, Inc.
  47. 47. Comparison of DDGs and File-Based M/R DDG File-Based M/RData set size Gigabytes->terabytes Terabytes->petabytesData repository In-memory File / databaseData view Queried object collection File-based key/value pairsDevelopment time Low HighAutomatic Yes Applicationscalability dependentBest use Quick-turn analysis of Complex analysis of memory-based data large datasetsI/O overhead Low HighCluster mgt. Simple ComplexHigh availability Memory-based File-based47 ScaleOut Software, Inc.
  48. 48. DDG Minimizes Data Motion• File-based map/reduce must move data to memory for analysis: M/R Server M/R Server M/R Server E E E Server Memory File System / D D D D D D D D D Database• Memory-based DDG analyzes data in place: Grid Server Grid Server Grid Server E E E Distributed D D D D D D D D D Data Grid48 ScaleOut Software, Inc.
  49. 49. Start parallel analysis .eval() File I/O stock stock stock stock stock stock history history history history history history results results results results results results .merge() .merge() .merge() File I/O results results results File I/O .merge() results returned results to client 49 ScaleOut Software, Inc.
  50. 50. Performance Impact of Data Motion Measured random access to DDG data to simulate file I/O:50 ScaleOut Software, Inc.
  51. 51. PMI Delivers 16X Speedup Over Hadoop Throughput Comparison 800 700 Throughput (Obj/Sec) 600 SOSS PMI 500 Hadoop/SOSS 400 Hadoop 300 200 100 0 4 6 8 Number of Servers51 ScaleOut Software, Inc.
  52. 52. Reason # 6: Simplify Data Migration• DDGs enable seamless data migration across on- premise sites and the cloud: – Automatically access remote data as needed. – Efficiently manage WAN bandwidth. – Enable full data synchronization across sites. In-Memory Distributed Data Grid52 ScaleOut Software, Inc.
  53. 53. Example: Web Farm Cloud-Bursting• DDGs bridge on-premise and cloud-based in-memory storage of Web session state.• DDG automatically migrates session-state objects into the cloud on demand.• This enables seamless access to data across multiple sites. Cloud Application Web Load Balancer Cloud Application VS App App VS App VS App VS App VS App VS App VS App VS On-Premise Application 2 App VS App VS Server App Server App On-Premise Application 2 SOSS VS Server App Server App SOSS VS SOSS VSVS SOSS Aut o SOSS VS Mig matic rate ally Cloud-Based Distributed Automatically Cache Da ta SOSS Host SOSS Host SOSS Host SOSS VS Migrate Data SOSS Host Cloud hosted Cloud of Virtual Servers On-Premise Backing Distributed Data Grid On-Premise Cache Distributed Data Grid Store User’s On-Premise Application Cloud of Virtual Servers User’s On-Premise Application Virtual Distributed Data Grid53 ScaleOut Software, Inc.
  54. 54. Example: Global Access to Shared Data Mirrored Data Centers SOSS SVR Satellite Data Centers SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid SOSS SVR SOSS SVR SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid Distributed Data Grid SOSS SVR SOSS SVR SOSS SVR Distributed Data Grid Global Distributed Data Grid54 ScaleOut Software, Inc.
  55. 55. What to Look for in a DDG Product • SSIs products have an unusually high level of integration and Ease of Use focus on automatic operation. This dramatically simplifies deployment and management of a distributed data grid. Performance • In direct comparison tests, SSI demonstrates faster access performance and scalability in key benchmarks. • SSI’s architecture integrates both scalability and high Architecture availability and uniformly applies key architectural principles, such as peer-to-peer design. • Seamless interoperability across Windows and Unix (Linux, Portability Solaris, etc.) operating systems was designed into SSI’s architecture from the outset. • Advanced capabilities for "map/reduce"-style parallel data Data Analysis analysis open up important new applications for distributed data grids. • SSI’s comprehensive tools for managing distributed data grids, Manageability such as its object browser and parallel backup and restore utility, are unique in the industry.55 ScaleOut Software, Inc.
  56. 56. SOSS Maximizes Ease of Use Grid servers self-aggregate, self-heal, and automatically load-balance.Tree list shows: Host • Store status configuration • Host list pane: • Host status Just need to• Remote stores select subnet • Remote client shared by all configuration hosts. 56 ScaleOut Software, Inc.
  57. 57. Real-time Performance Charting57 ScaleOut Software, Inc.
  58. 58. SOSS Object Browser• Simplifies development.• Provides extremely useful visibility into grid usage.• Allows grid objects to be analyzed and managed.58 ScaleOut Software, Inc.
  59. 59. SOSS Parallel Backup and Restore• Enables grid contents (or portions) to be backed up or restored in parallel either to: – Separate file systems on all caching servers or – A single network file share• Creates backups or snapshots for later analysis.• Makes full use of SOSS’s parallel implementation to deliver highly scalable performance and high availability. Ethernet Ethernet SOSS SOSS SOSS SOSS SOSS SOSS SOSS SOSS Server Server Server Server Server Server Server Server Ethernet Ethernet59 ScaleOut Software, Inc.
  60. 60. Recap: Top 6 Reasons to Use a DDG1. Faster access time for business logic state or database data2. Scalable throughput to match a growing workload and keep response times low3. High availability to prevent data loss if a grid server (or network link) fails Access Latency vs. Throughput4. Shared access to data across Access Latency (msec) the server farm Grid DBMS5. Advanced capabilities for quickly and easily mining data using scalable, “map/reduce,” analysis6. Transparent data migration across multiple sites and the Throughput (accesses / sec) cloud.60 ScaleOut Software, Inc.
  61. 61. Thank you for joining us today! Distributed Data Grids for Server Farms & High Performance Computing www.scaleoutsoftware.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×