Your SlideShare is downloading. ×
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Real-time analysis using an in-memory data grid - Cloud Expo 2013

1,005
views

Published on

ScaleOut technical session at Cloud Expo 2013 in NY. Covers the use of in-memory data grids for real-time analysis of fast-changing data. Includes a financial services example.

ScaleOut technical session at Cloud Expo 2013 in NY. Covers the use of in-memory data grids for real-time analysis of fast-changing data. Includes a financial services example.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,005
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Performing Real-Time Analyticswith In-Memory Data GridsCopyright © 2013 by ScaleOut Software, Inc.Cloud ExpoJune 10, 2013Mikhail Sobolev (sobolev@scaleoutsoftware.com)David Brinker (daveb@scaleoutsoftware.com)
  • 2. 2 ScaleOut Software, Inc.• What is an In-Memory Data Grid (IMDG)?• Top Benefits of IMDGs• The Need for Real-Time Analytics• Example: A Platform for Managing Hedging Strategies• Using an IMDG to Perform Real-Time Analysis• Benchmark Results• Integrating an IMDG into Hadoop2Agenda
  • 3. 3 ScaleOut Software, Inc.• Dr. Mikhail Sobolev, Lead Java Architect• Ph.D. from Moscow Institute of Physics and Technology• Research and consulting focus in parallel computing• Responsible for development of scalable software services in Java• David Brinker, COO• 20 years software business and executive management experience• Mentor Graphics, Cadence, Webridge• Company: ScaleOut Software• Develops and markets IMDG products• Founded in September 2003• Offices in Bellevue, WA and Beaverton, OR• Eight years market experience in Windows& LinuxAbout the Speakers
  • 4. 4 ScaleOut Software, Inc.• ScaleOut StateServer®• Flagship product• IMDG middleware for Windowsand Linux• Industry-leading performance and ease of use• ScaleOut GeoServer® adds• WAN based data replication for DR• Breakthrough technology for globaldata access• ScaleOut Analytics Server™ adds• Real-time data analysis for operational data• Comprehensive management tools• ScaleOut hServer™ adds• 1st step for Hadoop real-time analytics• Accelerates data access and execution.ScaleOut Software ProductsScaleOut StateServer In-Memory Data GridGridServiceGridServiceGridServiceGridService
  • 5. 5 ScaleOut Software, Inc.In-memory storage for fast updates and retrieval of live data• Fits in the business logic layer:• Stores collections of Java/.NETobjects shared by multiple clients.• Uses create/read/update/deleteand query APIs to access data.• Implemented across a cluster ofservers or VMs:• Scales storage and throughputby adding servers.• Provides high availabilityin case a server fails.What is an In-Memory Data Grid?
  • 6. 6 ScaleOut Software, Inc.Scaling Data Access Using an IMDGExample: Cloud-Hosted App• Application runs as multiple virtualservers (VS).• Application instances store andretrieve LOB data from cloud-basedfile system or database-.• Applications need fast, scalablestorage for live data.• In-memory data grid runs asmultiple virtual servers to provide“elastic” in-memory storage forlive data.
  • 7. 7 ScaleOut Software, Inc.• As a “vertical” storage tier:• Runs as middleware software.• Adds missing storage layer to boostperformance.• Uses out-of-process memory.• Avoids repeated trips to a backing store.Where IMDGs Are DeployedProcessorCacheApplicationMemory“In-Process”L2 CacheProcessorCacheApplicationMemory“In-Process”L2 CacheBackingStorage• As a “horizontal” storage tier:• Allows data sharing among servers.• Scales performance & capacity.• Adds high availability.• Can be used independently of backingstorage.In-MemoryData Grid“Out-of-Process”In-MemoryData Grid“Out-of-Process”
  • 8. 8 ScaleOut Software, Inc.• IMDG incorporates a client-side in-processcache (“near cache”):• Transparent to the application• Holds recently accessed data• Boosts performance:• Eliminates repeated network data transfers &deserialization• Reduces access times to near “in-process”latency• Is automatically updated if the grid isupdated• Supports various coherency models(coherent, polled, event-driven)The Secret to Fast Access TimeApplicationMemory“In-Process”Client-sideCache“In-Process”In-MemoryData Grid“Out-of-Process”
  • 9. 9 ScaleOut Software, Inc.• IMDGs enable seamless data access across on-premise sites andcloud-based deployments:• Automatically accessremote data as needed.• Efficiently manageWAN bandwidth.• Enable full datacoherency across sites.• Supports multiple usagemodels:• Replication for DR• Remote access• Synchronized read/writeGlobal Data Integration
  • 10. 10 ScaleOut Software, Inc.• IMDG bridges on-premise and cloud-based in-memory storage ofWeb session state.• IMDG automatically migrates session-state objects into the cloudon demand.• This enables seamless access to data across multiple sites.Example: Web Farm Cloud-Bursting
  • 11. 11 ScaleOut Software, Inc.In-Memory Data Grid is middleware software which provides:1. Fast access time for fast-changing, “live” data2. Scalable throughput and storage capacity to match agrowing workload and keep response times low3. High availability to prevent data loss if a grid server (ornetwork link) fails4. Shared access to dataacross the server farm5. Global data access acrossmultiple sites and the cloud6. And … fast data analysisfor quickly and easily miningdata using “map/reduce”Top Benefits of IMDGsAccessLatencyThroughputGrid DBMSAccess Latency vs. ThroughputFasterScales
  • 12. 12 ScaleOut Software, Inc.• Traditional “big data” analysisplatforms analyze offline data:• Example: Hadoop• Very large, static datasets• Data is often copied from otherdisk-based storage systems to adistributed file system for analysis.• IMDGs store and analyze online data:• Fast-changing, operational data• Data storage is memory-based.• Data motion is minimized for fast,continuous analysis.IMDGs Analyze Live Data
  • 13. 13 ScaleOut Software, Inc.A few examples:• Equity trading: to minimize risk during a trading day• Ecommerce: to optimize real-time shopping activity• Reservations systems: to identify issues, reroute, etc.• Credit cards: to detect fraud in real time• Smart grids: to optimize power distribution & detect issuesOnline Systems Need Real-Time Analysis
  • 14. 14 ScaleOut Software, Inc.A platform for managing hedging strategies:• A hedge fund manages a set of hedging strategies:• Strategies can cover various marketsectors, such as high-tech, automotive,energy, consumer, real estate, etc.• Each strategy contains list of holdingsand rules for managing the holdings(such as target allocations).• Updates to market datacontinuously arrive duringthe trading day.• Challenge: The hedge fund must be able to quickly update andanalyze its hedging strategies and provide alerts to traders.Example in Financial Services
  • 15. 15 ScaleOut Software, Inc.• Deliver a stream of alerts to traderswithin a few seconds.• Enable the trader to examine strategy details in real time:The Result: Real-Time Alerts
  • 16. 16 ScaleOut Software, Inc.• The IMDG holds the set of strategy objects as an in-memory collection.• Updates to market datacontinuously flow throughthe IMDG.• The IMDG performsrepeated map/reduceanalysis on hedgingstrategies everysecond.• Each analysis iteration both updatesand analyzes every strategy object.• The IMDG collects alerts after eachanalysis and delivers them to thetrader.The Solution: Real-Time AnalyticsUsing an IMDG
  • 17. 17 ScaleOut Software, Inc.• Analyze every selected strategy object in parallel within the IMDG:• Update the strategy’s positions with latest market prices.• Evaluate the strategy’s rules to see if a trade is needed.• Example: Alert if current allocation exceeds target threshold.• Generate an alert if holdings need to be changed.• Merge the results across all strategy objects to create a set ofalerts.The Analysis Algorithm
  • 18. 18 ScaleOut Software, Inc.Shipping Analysis Code to the IMDG• IMDG creates Java or .NET execution environment for analysis:• Spans all IMDG servers.• Ensures tight integration with memory-based data storage.• IMDG client ships jars/assemblies to IMDG servers for execution:• Keeps development model simple.• Optionally allows pre-staging for multiple runs to shorten startup time.• Optionally allows automatic re-staging if code changes between runs.• Client starts analysis:• Sends invocation tothe IMDG.• IMDG returnsanalysis results.
  • 19. 19 ScaleOut Software, Inc.The parallel analysis executes in three steps:• Step 1: The application first selects all relevant objects in thecollection with a parallel query run on all grid servers.• Note: Query spec matches data’s object-oriented properties.Running the Analysis
  • 20. 20 ScaleOut Software, Inc.• Step 2: The IMDG automatically schedules analysis operationsacross all grid servers and cores.• The analysis runs on all objects selectedby the parallel query.• Each grid server analyzes its locally storedobjects to minimize data motion.• Parallel execution ensures fastcompletion time:• IMDG automatically distributesworkload across servers/cores.• Scaling the IMDG automaticallyhandles larger data sets.Running the Analysis: Step 2
  • 21. 21 ScaleOut Software, Inc.• File-based map/reduce must move data to memory for analysis:• IMDG’s memory-based computation engine analyzes data in place:IMDG Minimizes Data MotionD D D D D D D D DD D D D D D D D DGrid ServerGrid ServerGrid ServerE E EM/R ServerEM/R ServerEM/R ServerEFile System /DatabaseServerMemoryIn-MemoryData Grid
  • 22. 22 ScaleOut Software, Inc.• Step 3: The IMDG automatically merges all analysis results.• The IMDG first merges all results within each grid server in parallel.• It then merges results across all grid servers to create one combinedresult.• Efficient parallel mergeminimizes the delay incombining all results.• The IMDG delivers thecombined result to thetrader’s display as oneobject.Running the Analysis: Step 3
  • 23. 23 ScaleOut Software, Inc.Running a similar analysis algorithm (stock back-testing) within anIMDG:• IMDG hosted in Amazon cloud using 75 servers.• IMDG holds 1 TB of stock history data in memory.• IMDG handles continuous stream of updates (1.1 GB/s) whileperforming real-time analysis on live data.• Entire data set analyzed in4.1 seconds (250 GB/s).• IMDG scales linearly byadding servers asworkload grows.Benchmark Results
  • 24. 24 ScaleOut Software, Inc.• Typically used for very large, static, offline datasets• Data is held on disk in a file system (HDFS) or DBMS• Data is often copied from other disk-based storage systems toHDFS for analysis.Problem: Hadoop Cannot EfficientlyPerform Real-Time Analytics
  • 25. 25 ScaleOut Software, Inc.Comparison of IMDGs and HadoopIMDG HadoopData set size Gigabytes->terabytes Terabytes->petabytesData repository In-memory File / databaseData view Queried object collection File-based key/valuepairsDevelopment time Low HighAutomaticscalabilityYes Application dependentBest use Real-time analysis oflive, memory-based dataBatch analysis oflarge, static datasetsI/O overhead Low HighCluster mgt. Simple ComplexHigh availability Memory-based File-based
  • 26. 26 ScaleOut Software, Inc.• Survey result from Strata 2013: 93% of Hadoop users wouldbenefit from real-time data analytics.• Strategy: Integrate IMDG into Hadoop.• How:• Stage data in IMDG for fast access.• Thereby allow updates to data duringHadoop execution.• Automatically retrievedata from HDFS asnecessary.• Enable unchangedHadoop programstructure.• Combine scalabilityof Hadoop map/reduceand IMDG.Enabling Hadoop to PerformReal-Time Analysis
  • 27. 27 ScaleOut Software, Inc.• IMDG adds Hadoop grid recordreader for accessing key/valuepairs held in the IMDG.• Hadoop programs optionally canoutput results to IMDG with gridrecord writer.• Applications can access and updatekey/value pairs as live data duringanalysis.• Grid record reader and writeroptimize access to key/value pairsto eliminate network overhead.Accessing IMDG Data in Hadoop
  • 28. 28 ScaleOut Software, Inc.• IMDG adds wrapper for HDFS record reader to cache HDFS dataduring program execution.• Hadoop automatically retrieves data from IMDG on subsequent runs.• Wrapper accesses IMDG tostore and retrieve datawith minimum networkoverhead.• Useful in multiple “what-if”analyses on one data set• Tests with Terasortbenchmark havedemonstrated 11Xlower access latencyover HDFS without IMDG.Using IMDG as an HDFS Cache
  • 29. 29 ScaleOut Software, Inc.• IMDGs use in-memory storage to scale access to data forapplications which process live, fast-changing data.• IMDGs can be deployed in the cloud and provide global dataintegration across sites.• Many applications need toperform real-time analyticson live data.• IMDGs can meet this need,delivering results in secondsinstead of minutes or hours.• Hadoop was not designed forreal-time analytics, but…• IMDGs can enable Hadoop to accelerate access to data.Summary
  • 30. In-Memory Data Grids forServer Farms & Cloud Computingwww.scaleoutsoftware.com