ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


ScaleOut hServerv2: Enabling Real-Time Analytics Using Hadoop Map/Reduce

Uploaded on

Welcome to real-time analytics for Hadoop! ScaleOut hServer V2 is the world's first in-memory execution engine for Hadoop MapReduce. Now you can analyze live data using standard Hadoop MapReduce code, ...

Welcome to real-time analytics for Hadoop! ScaleOut hServer V2 is the world's first in-memory execution engine for Hadoop MapReduce. Now you can analyze live data using standard Hadoop MapReduce code, in memory and in parallel without the need to install and manage the Hadoop stack of software. (Only one small change is needed to your Hadoop program.) Gone are disk I/O latencies, slow start-up times, and software environment management headaches. Benchmark tests have demonstrated 20x faster execution time over the Apache Hadoop distribution. Now you can use Hadoop MapReduce in live applications in financial services, e-commerce, logistics, and countless other scenarios where results are needed in seconds instead of minutes or hours.

Learn more:
Watch the presentation video:

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 1 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Enabling Real-Time Analytics Using Hadoop Map/Reduce Briefing on New Product Release: ScaleOut hServer™ V2 October 14, 2013 Bill Bain, CEO ( David Brinker, COO ( Copyright © 2013 by ScaleOut Software, Inc.
  • 2. What’s New Today ScaleOut hServer V2: •  World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid •  Full Hadoop MapReduce support for “live” fast-changing data •  20x performance improvement in benchmark tests •  Significant new technology to simplify development and maximize ease of use 2 ScaleOut Software, Inc.
  • 3. About ScaleOut Software •  Develops and markets software middleware for: •  Scaling application performance and •  Performing real-time analytics using •  In-memory data storage and computing •  Executive Team: •  Dr. William Bain, Founder & CEO •  Career focused on parallel computing – Bell Labs, Intel, Microsoft •  3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server •  David Brinker, COO •  25 years software business and executive management experience •  Mentor Graphics, Cadence, Webridge •  Eight years market experience in Windows & Linux; 400 customers 3 ScaleOut Software, Inc.
  • 4. ScaleOut Software Products •  ScaleOut StateServer® ScaleOut StateServer In-Memory Data Grid •  In-Memory Data Grid for Windows and Linux •  Scales application performance. •  Industry-leading performance and ease of use •  ScaleOut GeoServer® adds •  WAN based data replication for DR •  Breakthrough technology for global data access •  ScaleOut Analytics Server® adds •  Real-time data analysis for “live” data •  Comprehensive management tools •  Introducing ScaleOut hServer™ V2 •  Full Hadoop Map/Reduce engine (20X faster*) •  Hadoop Map/Reduce on live, in-memory data 4 *in benchmark testing ScaleOut Software, Inc. Grid Service Grid Service Grid Service Grid Service
  • 5. IMDGs Perform Real-Time Analytics ScaleOut Analytics Server stores and analyzes “live” data: •  In-memory storage holds live data sets which are continuously updated and accessed within operational systems. •  Examples: stock ticker data, business rules, order & inventory data •  Integrated analytics engine tracks important patterns & trends. •  Data-parallel analysis delivers results in msec. to seconds. 5 ScaleOut Software, Inc.
  • 6. Example in Financial Services Integrate analysis into a stock trading platform: •  The IMDG holds market data and hedging strategies. •  Updates to market data continuously flow through the IMDG. •  The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time. •  IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers. 6 ScaleOut Software, Inc.
  • 7. Customers •  •  •  •  •  400 unique customers 35 Fortune 500 customers 32 countries 9,000 servers licensed 50% have multiple deployments Gov't)&) Education 10% Software 8% Example Uses Online loan apps & banking Portfolio management Other 3% Trading systems Entertain.)&) Commun. 13% Travel)&) Transport. 4% Ecommerce) Services 19% Ecommerce) Sales 17% Reservations systems Financial)&) Insurance 26% Ecommerce shopping Customer service sites Streaming entertainment Configuration engines Gaming % in $$s 7 ScaleOut Software, Inc.
  • 8. IMDGs Seeing Wide Adoption •  In-Memory Data Grids have become key in several fast-growth markets. •  Drivers: Big Data Analytics $18B 1 •  Cloud computing / virtualization •  Hardware enablement •  Competitive pressure HPC / Grid Computing •  Exploding workloads •  Big data analysis •  ScaleOut addresses scalability and analytics. 8 $25B ScaleOut Software, Inc. 3 In-Memory Data Grids $355M 4 Enterprise Software $292B 2 Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013
  • 9. Analytics Market Real-time Batch “Operational Intelligence” “Business Intelligence” Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses: Static data sets Petabytes Disk storage Hours to minutes Best uses: •  Tracking live data •  Immediately identifying trends and capturing opportunities 9 Big Data Analytics $18B Real-Time Batch Analytics Server Hadoop IBM Teradata SAS SAP hServer ScaleOut Software, Inc. •  Analyzing warehoused data •  Mining for longterm trends
  • 10. ScaleOut hServer Targeted Use Cases Run continuous Hadoop on live data, while it’s being updated. Accelerate Hadoop on static data with a one line code change. Quickly prototype Hadoop code. 10 “Capture perishable business opportunities and identify issues.” Real-time risk analysis Credit card fraud detection ... “Speed-up Hadoop execution by >10X for faster business insights.” Financial modeling Process simulations ... “Validate your Hadoop code before it goes into batch processing.” No need to install Hadoop stack ScaleOut Software, Inc. Fast-turn debug and tuning ...
  • 11. Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics •  Typically used for very large, static, offline datasets •  Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis. •  Hadoop Map/Reduce adds lengthy batch scheduling overhead. 11 ScaleOut Software, Inc.
  • 12. Solution: Integrate Hadoop M/R into In-Memory Data Grid Benefits: •  Enables real-time analysis using Hadoop M/R APIs. •  Accelerates data access by staging data in memory. •  Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions. •  Analyzes “live” data. •  Allows Hadoop M/R programs to run without change. •  Eliminates complexity in Hadoop deployment. •  Enables rapid prototyping. 12 ScaleOut Software, Inc.
  • 13. Introducing ScaleOut hServer™ V2 Enables Hadoop Map/Reduce to perform real-time analysis: •  Adds full Map/Reduce engine to SOAS IMDG. •  Delivers results in msec. to seconds instead of minutes or hours. •  Benchmark results show 20X speedup. •  Has flexible options for data storage/access: •  Hadoop programs can access/store key/value pairs using either IMDG or HDFS. •  Automatically caches HDFS data in IMDG for fast access. •  Allows dynamic updates to key/value pairs during analysis to support “live” data. •  Ships as open source Java library combined with SOAS IMDG. 13 ScaleOut Software, Inc.
  • 14. Enabling Access to IMDG Data •  ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG. •  Hadoop programs optionally can output results to IMDG with Grid Record Writer. •  Grid Record Reader optimizes access to key/value pairs to eliminate network overhead. •  Applications can access and update key/value pairs as operational data during analysis. 14 ScaleOut Software, Inc.
  • 15. Enabling Fast Access to HDFS Data •  ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution. •  Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs. •  Dataset Record Reader stores and retrieves data with minimum network and memory overheads. •  Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG. 15 ScaleOut Software, Inc.
  • 16. ScaleOut hServer Editions •  Offered in community and commercial editions •  Community Edition can be used for evaluation or production •  Hybrid open source / proprietary licensing Editions Community Commercial Up to 4 100s Expected data set size 256GB GB - TBs Pricing Free Subscription & perpetual Support 16 # Servers Community Forum Full support ScaleOut Software, Inc. (max)
  • 17. Summary •  IMDGs help scale application performance and analyze “live” data in real-time. •  Hadoop focuses on analyzing large, static (offline) datasets held in file systems. •  ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: •  Integrates Hadoop Map/Reduce engine with SOAS’s IMDG. •  Accelerates Map/Reduce execution by 20X in benchmark tests. •  Enables Hadoop applications to analyze “live,” in-memory data. •  Offers flexible access to both in-memory and file-based data. •  Eliminates complex Hadoop deployment and tuning. •  Offers a fast, easy-to-use platform for rapid prototyping. 17 ScaleOut Software, Inc.
  • 18. Online Systems Need Real-Time Analysis A •  •  •  •  •  18 few examples: Equity trading: to minimize risk during a trading day Ecommerce: to optimize real-time shopping activity Reservations systems: to identify issues, reroute, etc. Credit cards: to detect fraud in real time Smart grids: to optimize power distribution & detect issues ScaleOut Software, Inc.
  • 19. Hadoop Users Need Real-Time Analytics •  ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara). •  Based on 150 responses: •  78% of organizations generate fast-changing data. •  60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months. •  Only 42% consider Hadoop to be an effective platform for realtime analysis, but… •  93% would benefit from real-time data analytics. •  71% consider a 10X improvement in performance meaningful. •  Take-away: Hadoop users need real-time analytics. 19 ScaleOut Software, Inc.