SlideShare a Scribd company logo
1 of 49
Download to read offline
NEW USAGE MODEL FOR REAL-TIME ANALYTICS 
WILLIAM L. BAIN 
CEO AT SCALEOUT SOFTWARE, INC. SCALEOUT SOFTWARE, INC.
Using In-Memory Models of 
Real-World Systems for 
Operational Intelligence 
Big Data Hispano 
November 17, 2014 
Bill Bain, CEO (wbain@scaleoutsoftware.com) 
Copyright © 2014 by ScaleOut Software, Inc.
Agenda 
• What Is Operational Intelligence? 
• Example: Tracking Cable Viewers 
• Implementing OI Using an In-Memory Data Grid: 
• Distributing the Data Across a Cluster 
• Integrating Data-Parallel Analysis 
• Building an In-Memory Model 
• More Examples of In-Memory Models 
• Comparison to Spark and Storm 
• Implementing an Example in Financial Services 
• Using In-Memory Hadoop MapReduce for OI 
2 ScaleOut Software, Inc.
About the Speaker 
• Dr. William Bain, Founder & CEO 
• Career focused on parallel computing – Bell Labs, Intel, Microsoft 
• 3 prior start-ups, last acquired by Microsoft and product now ships as 
Network Load Balancing in Windows Server 
• ScaleOut Software develops and markets In-Memory Data Grids, 
software middleware for: 
• Scaling application performance and 
• Providing operational intelligence using 
• In-memory data storage and computing 
• Nine years in the market, 400 customers, 
10,000 servers; sample customers: 
3 ScaleOut Software, Inc.
Online Systems Need Operational 
Intelligence 
Goal: Provide immediate feedback to a system handling live data. 
A few examples: 
• Ecommerce: for personalized, real-time recommendations 
• Equity trading: to minimize risk during a trading day 
• Reservations systems: to identify issues, reroute, etc. 
• Credit cards & wire transfers: to detect fraud in real time 
• Smart grids: to optimize power distribution & detect issues 
4 ScaleOut Software, Inc.
Example: Track Cable TV Viewers 
• Goals: 
• Make real-time, personalized upsell offers. 
• Immediately respond to service issues. 
• Track aggregate behavior to identify patterns, e.g.: 
• Total instantaneous incoming event rate 
• Most popular programs and # viewers by zip code 
• Requirements: 
• Track events from 10M cable boxes with 25K events/sec (2.2B/day). 
• Correlate, cleanse, and enrich events per rules (e.g. ignore fast channel 
switches, match channels to programs). 
• Be able to feed enriched events to recommendation engine within 5 sec. 
• Immediately examine any cable box (e.g., box status) & track statistics. 
5 ScaleOut Software, Inc. 
©2011 Tammy Bruce presents LiveWire
The Result: An OI Platform 
Based on a simulated 
workload for San Diego 
metropolitan area: 
• Continuously correlates and 
enriches telemetry from 10M 
simulated set-top boxes (from 
synthetic load generator). 
• Processes more than 30K 
events/second. 
• Enriches events with program 
information every second. 
• Tracks aggregate statistics 
(e.g., top 10 programs by zip 
code) every 10 secs. 
6 ScaleOut Software, Inc. 
Real-Time Dashboard
Real-Time vs. Batch Analytics 
Big Data Analytics 
Real-Time Batch 
7 ScaleOut Software, Inc. 
Static data sets 
Petabytes 
Disk storage 
Minutes to hours 
Best uses: 
• Analyzing 
warehoused data 
• Mining for long-term 
trends 
Live data sets 
Gigabytes to terabytes 
In-memory storage 
Seconds to minutes 
Best uses: 
• Tracking live data 
• Immediately 
identifying trends 
and capturing 
opportunities 
• Providing immediate 
feedback 
Analytics 
Server 
hServer 
Hadoop 
IBM 
Teradata 
SAS 
SAP 
Real-time 
“Operational Intelligence” 
Batch 
“Business Intelligence”
Integrated View of Analytics 
• Operational intelligence can co-exist with business intelligence: 
• Processes streaming data close to its sources. 
• Provides real-time, “tactical” feedback (e.g., recommendations, alerts). 
• Transforms data for storage in the data warehouse (ETL). 
• Data warehouse provides “strategic” guidance. 
• Using the same tool set (e.g., Hadoop MapReduce) lowers TCO: 
• Leverages common skill set. 
• Simplifies design (e.g., loading data into HDFS). 
8 ScaleOut Software, Inc.
Challenges for Operational Intelligence 
• To keep up with fast 
growing “live” workloads & 
maintain fast response times: 
• Track state of entities within a 
live system. 
• Reliably process updates to 
data set in real-time. 
• To identify and respond to 
trends in fast-changing data: 
• Enrich & evaluate “live” data set 
in real time. 
• Respond to identified 
patterns within seconds. 
300 
250 
200 
150 
100 
50 
4000 
3500 
3000 
2500 
2000 
1500 
1000 
500 
9 ScaleOut Software, Inc. 
0 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
2003 
2004 
2005 
2006 
2007 
2008 
2009 
2010 
Millions 
Growth in Web Servers 
Source: 
Netcraft 
0 
2000 
2001 
2002 
2003 
2004 
2005 
2006 
2007 
2008 
2009 
2010 
2011 
2012 
2013 
Exebytes 
Growth in “Big Data” 
“More data has been 
created in the past three 
years than in the past 
40,000.”
In-Memory Architecture for 
Operational Intelligence 
• In-memory data grid 
(IMDG) holds active 
entities undergoing 
state changes in 
memory. 
• Backing store 
optionally holds large 
population of entities. 
• IMDG processes 
incoming stream of 
state changes. 
• Analytics engine 
examines entities in real 
time and generates 
alerts within seconds 
as needed. 
10 ScaleOut Software, Inc.
In-Memory Data Grid 
In-Memory Data Grid (IMDG) stores “live” data in a cluster: 
• Fits in the business logic layer: 
• Follows object-oriented view of data 
(vs. relational view). 
• Stores collections of Java/.NET/C++ 
objects shared by multiple clients. 
• Uses create/read/update/delete 
and query APIs to access data. 
• Implemented across a cluster of 
servers or VMs: 
• Scales storage and throughput 
by adding servers. 
• Provides high availability 
in case a server fails. 
11 ScaleOut Software, Inc.
IMDGs Use Object-Oriented Model 
• IMDG’s collections of objects act like 
process collections: 
• Unstructured, typically instances of a class 
(stored as serialized blobs) 
• Individually accessible / update-able 
• IMDG adds attributes: 
• Accessible by global key 
• Query-able by properties 
• Highly available 
• Optional timeouts 
• Distributed locking 
• Integration with a backing store 
• Optional dependency relationships 
• Asynchronous event handling 
12 ScaleOut Software, Inc. 
Object 
key 
Basic “CRUD” APIs: 
• Create(key, obj, tout) 
• Read(key) 
• Update(key, obj) 
• Delete(key) 
and… 
• Lock(key) 
• Unlock(key)
In-Memory, Data-Parallel Computing 
• Integrates with IMDG data storage to minimize data motion. 
• Ex.: Parallel Method Invocation (PMI), an object-oriented version 
of data-parallel computing from the HPC community: 
• Selects objects using a parallel query on data hosted in the IMDG. 
• Runs user-defined methods in parallel across the cluster and merges 
results. 
Analyze Data (Eval) 
Combine Results 
(Merge) 
13 ScaleOut Software, Inc. 
In-Memory Data Grid Runs 
Data-Parallel Computation.
Achieving Linear Speedup 
Avoid data motion (network or disk I/O) which limits throughput: 
14 ScaleOut Software, Inc.
In-Memory Model of “Live” Entities 
Object-oriented model tracks and analyzes real-world entities: 
15 ScaleOut Software, Inc. 
Real-Time 
Data Parallel 
Analysis 
In-Memory 
State in 
“IMDG” 
NoSQL 
Storage
Example: Cable Set-Top Boxes 
• Each cable box is represented as an object in the IMDG: 
• Object holds raw & enriched event streams, viewer parameters, and 
statistics. 
• IMDG captures 
incoming events by 
updating objects. 
• IMDG uses data-parallel 
computation to: 
• immediately 
enrich box objects 
to generate alerts 
to recc. engine, and 
• continuously 
collect and report 
global statistics. 
16 ScaleOut Software, Inc.
Example in Ecommerce: Inventory 
Management 
Fast map/reduce reconciles inventory and order systems 
for an online retailer: 
• Challenge: Inventory and online 
order management are handled 
by different applications. 
• Reconciled once per day. 
• Inaccurate orders reduces margins. 
• Solution: 
• Host SKUs in IMDG updated in real 
time by order & inventory systems. 
• Use MapReduce to reconcile in two minutes. 
• Enables real-time reconciliation to ensure accurate orders. 
17 ScaleOut Software, Inc.
Example: Web Shopping 
• IMDG holds customer 
information for active 
Web users. 
• IMDG saves/retrieves 
customer information 
from backing store. 
• Web browsers send 
activity information to 
analytics engine. 
• IMDG updates customer history and 
preferences. 
• Analytics engine identifies browsing and 
buying patterns. 
• Analytics engine makes suggestions in 
real-time. Also sends email follow-ups. 
18 ScaleOut Software, Inc.
Example: Retail Shopping 
Brick and mortar stores use OI to compete with online experience: 
• IMDG tracks opt-in customers to make recommendations. 
• RFID tags identify product selection and availability in showroom. 
• Analytics engine sends real-time advisories to sales staff via tablet. 
19 ScaleOut Software, Inc.
Comparison: IMDGs to Spark 
Focus: accelerating business intelligence 
using in-memory computing: 
• In-memory computing to accelerate and extend 
Hadoop MapReduce using data-parallel operators 
in Scala. 
• Stores data as “resilient 
distributed datasets” (RDDs): 
• Distributed across cluster 
• Immutable 
• Hold data from/output to HDFS. 
• Manages data stream as a sequence of RDDs. 
• Comparison to IMDG: 
• Not designed for operational systems: 
• Lacks high availability (uses lineage). 
• Intended for data-parallel operations: 
• Lacks CRUD APIs on individual objects. 
20 ScaleOut Software, Inc.
Comparison to Storm 
• Focus: continuous processing of input streams 
• Storm implements pipelined execution of tasks by “bolts” on 
incoming data streams. 
• Streams can be distributed to bolts with configurable mappings. 
• Developer controls the number of tasks per bolt. 
• Storm uses a centralized master node 
and Zookeeper for fault-tolerance. 
• Issues: 
• Managing global state 
• Minimizing data motion 
• Complexity / tuning 
21 ScaleOut Software, Inc.
Implementing an Example in FinServ 
• Hedge fund tracks a set of hedging strategies: 
• Strategies can cover various market 
sectors, such as high-tech, automotive, 
energy, consumer, real estate, etc. 
• Each strategy contains list of holdings 
and rules for managing the holdings 
(such as target allocations). 
• Updates to market data 
continuously arrive during 
the trading day. 
• The challenge: update and analyze a large population of 
hedging strategies to immediately alert traders. 
22 ScaleOut Software, Inc.
In-Memory Model 
• The IMDG holds hedging strategies as an object-oriented collection. 
• Updates to market data 
are managed as a series of 
snapshot objects. 
• The IMDG performs 
repeated data-parallel 
analysis on hedging 
strategies to generate 
alerts. 
• Merges alerts and feeds to 
traders in real time. 
• IMDG automatically and dynamically 
scales its throughput to handle new 
hedging strategies by adding servers. 
23 ScaleOut Software, Inc.
Implementing the Analysis 
Step 1: Select all objects using parallel query of strategy 
objects: 
• Query spec matches data’s object-oriented properties. 
• Selected objects are fed to the analysis engine on each local server. 
24 ScaleOut Software, Inc.
Java Example: Parallel Query 
public class Portfolio { 
private long id; 
private Set<Stock> longPositions; 
private Set<Stock> shortPositions; 
private double totalValue; 
private Region region; 
private boolean alerted; // alert for trading 
@SossIndexAttribute // query-able property 
public double getTotalValue() {…} 
@SossIndexAttribute // query-able property 
public Region getRegion() {…} 
public Set<Long> evalPositions(MarketSnapshot ms) {…}; 
} 
NamedCache pset = CacheFactory.getCache(“portfolios"); 
Set<Portfolio> res = pset.queryObjects(Portfolio.class, 
and(greaterThan(“totalValue”, 1000000), 
equals(“region”, Region.US))); 
25 ScaleOut Software, Inc.
Implementing the Analysis 
Step 2: Create parallel methods to update and analyze the 
queried collection of hedging strategies: 
• “Eval” method applies market snapshot to an instance of a strategy 
object: 
• Compare to a MapReduce mapper; adds an input parameter. 
• Updates the strategy object’s positions. 
• Analyzes the positions for a deviation from allowed rules. 
• Optionally generates an alert. 
• “Merge” method combines alerts across the collection of strategies: 
• Compare to a MapReduce combiner. 
• Uses binary combining. 
• Is applied globally to the object collection by the IMDG (unlike a Mapreduce 
reducer). 
• Note: both methods access hydrated objects; avoid need for CRUD access. 
26 ScaleOut Software, Inc.
Java Example: Parallel Method Invocation 
• Create method to analyze a queried portfolio and another method to 
pair-wise merge the result sets of alerted portfolios: 
public class PortfolioAnalysis implements 
Invokable<Portfolio, MarketSnapshot, Set<Long>> 
{ 
public Set<Long> eval(Portfolio p, MarketSnapshot ms) 
throws InvokeException { 
// update portfolio and return id if alerted: 
return p.evalPositions(ms); 
} 
public Set<Long> merge(Set<Long> set1, Set<Long> set2) 
throws InvokeException { 
set1.addAll(set2); 
return set1; // merged set of alerted portfolio ids 
}} 
27 ScaleOut Software, Inc.
Java Example: Parallel Method Invocation 
• Run a parallel method invocation on a queried set of portfolios and 
return set of ids for alerted portfolios: 
NamedCache pset = CacheFactory.getCache(“portfolios"); 
InvokeResult alertedPortolios = pset.invoke( 
PortfolioAnalysis.class, 
Portfolio.class, 
and(greaterThan(“totalValue”, 1000000), // query spec 
equals(“region”, Region.US)), 
marketSnapshot, // parameters 
... 
); 
System.out.println("The alerted portfolios are" + 
alertedPortfolios.getResult()); 
28 ScaleOut Software, Inc.
Running the Analysis 
• IMDG ships user’s code and libraries to its servers. 
• IMDG automatically schedules analysis operations across all grid 
servers and cores: 
• The analysis runs on all objects selected 
by the parallel query. 
• Each grid server analyzes its locally stored 
objects to minimize data motion. 
• Parallel execution ensures fast 
completion time: 
• IMDG automatically distributes 
workload across servers/cores. 
• Scaling the IMDG automatically 
handles larger data sets. 
29 ScaleOut Software, Inc.
Merging the Results 
• The IMDG automatically merges all analysis results: 
• The IMDG first merges all results within each grid server in parallel. 
• It then merges results across all grid servers to create one combined 
result. 
• Efficient parallel merge 
minimizes the delay in 
combining all results. 
• The IMDG delivers the 
combined result to the 
invoking application as 
one object. 
30 ScaleOut Software, Inc.
Output: Real-Time Alerts 
• In-memory analysis delivers a set of 
alerts to traders every 300 msec. 
• Enables the trader to examine strategy details in real time: 
31 ScaleOut Software, Inc.
Sample Performance Results for PMI 
• Measured a similar financial services application (back testing stock 
trading strategies on stock histories) 
• Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock 
history data in memory 
• IMDG handled a continuous stream of updates (1.1 GB/s) 
• Results: analyzed 1 TB in 4.1 seconds (250 GB/s) with linear scaling 
32 ScaleOut Software, Inc.
In-Memory MapReduce 
Benefits: 
• Enables use of standard Hadoop MapReduce for operational 
intelligence. 
• Accelerates data access by holding data in memory. 
• Analyzes and updates “ live” data. 
• Reduces overheads of standard 
Hadoop distributions: 
• Batch scheduling 
• Disk access 
• Data shuffling 
• Mandatory key sorting 
• Enables new features, e.g.: 
• Global combining, optional sorting 
33 ScaleOut Software, Inc.
Running MapReduce on an IMDG 
• A Hadoop distribution does not have to be installed unless HDFS is used. 
• The developer starts MapReduce applications from a remote workstation. 
• The IMDG automatically builds a reusable “invocation grid” of JVMs on the 
grid’s servers for PMI and ships the application’s jars. 
• Results are stored in the IMDG, HDFS, or optionally globally merged and 
returned to the remote workstation. 
34 ScaleOut Software, Inc.
Run In-Memory MR with YARN 
• YARN transparently integrates batch and in-memory MapReduce into a 
single execution framework with shared access to HDFS. 
• For example, IMDG can transparently run Apache Hive in-memory. 
Example of ScaleOut hServer with Hortonworks 
35 ScaleOut Software, Inc. 
Example of Hive 
Running on IMDG
Implementing MapReduce 
Run MapReduce as two PMI 
phases: 
• Data can be input from either the 
IMDG or an external data source. 
• Works with any input/output format 
compatible with the Apache 
distribution. 
• IMDG uses its data-parallel 
execution engine (PMI) to invoke 
the mappers and the reducers. 
• Eliminates batch scheduling 
overhead. 
• Intermediate results are stored 
within the IMDG. 
• Minimizes data motion between the 
mappers and reducers. 
• Allows optional sorting. 
• Output of a single reducer/combiner 
optionally can be globally merged. 
36 ScaleOut Software, Inc.
Accessing IMDG Data for M/R 
• IMDG adds grid input format for 
accessing key/value pairs held in 
the IMDG. 
• MapReduce programs optionally 
can output results to IMDG with 
grid output format. 
• Grid Record Reader optimizes 
access to key/value pairs to 
eliminate network overhead. 
• Applications can access and 
update key/value pairs as 
operational data during analysis. 
37 ScaleOut Software, Inc.
Optional Caching of HDFS Data 
• IMDG adds Dataset Record Reader (wrapper) to cache HDFS 
data during program execution. 
• Hadoop automatically retrieves data from IMDG on subsequent 
runs. 
• Dataset Record Reader 
stores and retrieves data 
with minimum network 
and memory overheads. 
• Tests with Terasort 
benchmark have 
demonstrated 11X 
faster access latency 
over HDFS without IMDG. 
38 ScaleOut Software, Inc.
In-Memory Storage Models 
IMDG needs multiple in-memory 
storage models: 
• Named cache, optimized for 
rich semantics on large 
objects: 
• Property-based query 
• Distributed locking 
• Access from remote grids 
• Named map, optimized for 
efficient storage and bulk 
analysis (e.g., MapReduce): 
• Highly efficient object storage 
• Pipelined, bulk-access 
mechanisms 
39 ScaleOut Software, Inc.
In-Memory Storage Optimizations 
In-Memory Concurrent Map: 
• Stores key/value pairs in chunks. 
• Allows CRUD operations on kvps. 
• Automatically organizes chunks into 
splits. 
• Uses per-split hash table to access 
keys and manage multi-valued 
keys. 
• Stores shuffled data set between 
mappers and reducers. 
• Pipelines chunks to mappers and 
from reducers. 
• Optionally uses memory mapped 
files to reduce access latency. 
• Provides support for sorting keys. 
40 ScaleOut Software, Inc.
In-Memory M/R Optimizations 
• MapReduce optimizations: 
• Optional sorting 
• Optional multicast of parameters to mappers 
• Optional O(logN) global combining (avoids 
single, sequential reducer) 
• Optional HDFS caching 
• Optional reuse of JVMs across jobs 
• Measured performance: 
• Startup times reduced to a few milliseconds 
• Word count benchmark shows 20X speedup. 
• Real-world example shows >40X speedup. 
• Current limitations: 
• No specific security for multi-tenancy 
• Intermediate data must fit in the IMDG 
41 ScaleOut Software, Inc.
Accelerating Start-Up Times 
• Re-use in-memory context across MapReduce jobs: 
public static void main(String argv[]) throws Exception { 
//Configure and load the invocation grid 
InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid"). 
// Add JAR files as IG dependencies 
addJar("main-job.jar"). addJar("first-library.jar"). 
// Add classes as IG dependencies 
addClass(MyMapper.class). addClass(MyReducer.class). 
// Define custom JVM parameters 
setJVMParameters("-Xms512M -Xmx1024M"). 
load(); 
//Run 10 jobs on the same invocation grid 
for(int i=0; i<10; i++) { 
Configuration conf = new Configuration(); 
//The preloaded invocation grid is passed as the parameter to the job 
Job job = new HServerJob(conf, "Job number "+i, false, grid); 
//......Configure the job here......... 
//Run the job 
job.waitForCompletion(true); 
} 
//Unload the invocation grid when we are done 
grid.unload(); 
} 
42 ScaleOut Software, Inc.
Recap 
• Online systems need operational 
intelligence on “live” data for 
immediate feedback. 
• Operational intelligence can be 
implemented using an IMDG 
integrated with data-parallel 
analysis. 
• IMDGs track “live” state: 
• Model real-world entities as a 
highly available object collection. 
• Enable updates to track changes. 
• Use data-parallel computation for 
immediate feedback with low 
latency. 
• Can run standard MapReduce. 
43 ScaleOut Software, Inc.
Thank you! 
44
Parallel Query Example (C#) 
• Mark class properties as indexes for query: 
class Stock { 
[SossIndex] 
public string Ticker { get; set; } 
public decimal TotalShares { get; set; } 
public decimal Price { get; set; }} 
• Define a query using these properties: 
NamedCache cache = CacheFactory.GetCache("Stocks"); 
var q = from s in cache.QueryObjects<Stock>() 
where s.Ticker == "GOOG" || s.Ticker == "ORCL" 
select s; 
Console.WriteLine("{0} Stocks found", q.Count()); 
45 ScaleOut Software, Inc.
Example of Analysis Code (C#) 
• Create method to analyze each queried stock object: 
static decimal eval(Stock stock, StockCalcParams params) 
{ 
return stock.Price * stock.TotalShares; 
} 
• Create method to pair-wise merge the results: 
static decimal merge(decimal r1, decimal r2) 
{ 
return r1 + r2; 
} 
46 ScaleOut Software, Inc.
Invoking the Parallel Analysis (C#) 
• Run a parallel method invocation: 
NamedCache cache = CacheFactory.GetCache("Stocks"); 
decimal valueOfSelectedStocks = 
(from s in cache.QueryObjects<Stock>() 
where s.Ticker == "GOOG" || s.Ticker == "ORCL" 
select s) 
.Invoke(new StockCalcParams(…), 
new Func<Stock, StockCalcParams, decimal>(eval)) 
.Merge(new Func<decimal, decimal, decimal>(merge)); 
Console.WriteLine(“The value of selected stocks is {0}", 
valueOfSelectedStocks); 
47 ScaleOut Software, Inc.
17TH ~ 18th NOV 2014 
MADRID (SPAIN)

More Related Content

What's hot

Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
Rob Winters
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 

What's hot (20)

Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
Transforming The Customer Experience With Real-Time Insights
Transforming The Customer Experience With Real-Time InsightsTransforming The Customer Experience With Real-Time Insights
Transforming The Customer Experience With Real-Time Insights
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Building Modern Data Platform with AWS
Building Modern Data Platform with AWSBuilding Modern Data Platform with AWS
Building Modern Data Platform with AWS
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
2021 gartner mq dsml
2021 gartner mq dsml2021 gartner mq dsml
2021 gartner mq dsml
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 

Viewers also liked

2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
MongoDB
 

Viewers also liked (20)

The BigMemory Revolution in Financial Services
The BigMemory Revolution in Financial ServicesThe BigMemory Revolution in Financial Services
The BigMemory Revolution in Financial Services
 
Terracotta Ditch the Disk webcast
Terracotta Ditch the Disk webcastTerracotta Ditch the Disk webcast
Terracotta Ditch the Disk webcast
 
5 Ways to Boost E-Commerce Site Performance with BigMemory
5 Ways to Boost E-Commerce Site Performance with BigMemory5 Ways to Boost E-Commerce Site Performance with BigMemory
5 Ways to Boost E-Commerce Site Performance with BigMemory
 
2014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-22014 05-07-fr - add dev series - session 6 - deploying your application-2
2014 05-07-fr - add dev series - session 6 - deploying your application-2
 
Alex Snaps JEEConf Presentation
Alex Snaps JEEConf PresentationAlex Snaps JEEConf Presentation
Alex Snaps JEEConf Presentation
 
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB WorldNoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
NoSQL Analytics: JSON Data Analysis and Acceleration in MongoDB World
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
 

Similar to New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
Yahoo Developer Network
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
DataWorks Summit
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
ssuserd23711
 

Similar to New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014 (20)

IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Operational Intelligence Using Hadoop
Operational Intelligence Using HadoopOperational Intelligence Using Hadoop
Operational Intelligence Using Hadoop
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Digital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdfDigital_IOT_(Microsoft_Solution).pdf
Digital_IOT_(Microsoft_Solution).pdf
 
WSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product OverviewWSO2 Data Analytics Server - Product Overview
WSO2 Data Analytics Server - Product Overview
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
 

More from Big Data Spain

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data Spain 2014

  • 1. NEW USAGE MODEL FOR REAL-TIME ANALYTICS WILLIAM L. BAIN CEO AT SCALEOUT SOFTWARE, INC. SCALEOUT SOFTWARE, INC.
  • 2. Using In-Memory Models of Real-World Systems for Operational Intelligence Big Data Hispano November 17, 2014 Bill Bain, CEO (wbain@scaleoutsoftware.com) Copyright © 2014 by ScaleOut Software, Inc.
  • 3. Agenda • What Is Operational Intelligence? • Example: Tracking Cable Viewers • Implementing OI Using an In-Memory Data Grid: • Distributing the Data Across a Cluster • Integrating Data-Parallel Analysis • Building an In-Memory Model • More Examples of In-Memory Models • Comparison to Spark and Storm • Implementing an Example in Financial Services • Using In-Memory Hadoop MapReduce for OI 2 ScaleOut Software, Inc.
  • 4. About the Speaker • Dr. William Bain, Founder & CEO • Career focused on parallel computing – Bell Labs, Intel, Microsoft • 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server • ScaleOut Software develops and markets In-Memory Data Grids, software middleware for: • Scaling application performance and • Providing operational intelligence using • In-memory data storage and computing • Nine years in the market, 400 customers, 10,000 servers; sample customers: 3 ScaleOut Software, Inc.
  • 5. Online Systems Need Operational Intelligence Goal: Provide immediate feedback to a system handling live data. A few examples: • Ecommerce: for personalized, real-time recommendations • Equity trading: to minimize risk during a trading day • Reservations systems: to identify issues, reroute, etc. • Credit cards & wire transfers: to detect fraud in real time • Smart grids: to optimize power distribution & detect issues 4 ScaleOut Software, Inc.
  • 6. Example: Track Cable TV Viewers • Goals: • Make real-time, personalized upsell offers. • Immediately respond to service issues. • Track aggregate behavior to identify patterns, e.g.: • Total instantaneous incoming event rate • Most popular programs and # viewers by zip code • Requirements: • Track events from 10M cable boxes with 25K events/sec (2.2B/day). • Correlate, cleanse, and enrich events per rules (e.g. ignore fast channel switches, match channels to programs). • Be able to feed enriched events to recommendation engine within 5 sec. • Immediately examine any cable box (e.g., box status) & track statistics. 5 ScaleOut Software, Inc. ©2011 Tammy Bruce presents LiveWire
  • 7. The Result: An OI Platform Based on a simulated workload for San Diego metropolitan area: • Continuously correlates and enriches telemetry from 10M simulated set-top boxes (from synthetic load generator). • Processes more than 30K events/second. • Enriches events with program information every second. • Tracks aggregate statistics (e.g., top 10 programs by zip code) every 10 secs. 6 ScaleOut Software, Inc. Real-Time Dashboard
  • 8. Real-Time vs. Batch Analytics Big Data Analytics Real-Time Batch 7 ScaleOut Software, Inc. Static data sets Petabytes Disk storage Minutes to hours Best uses: • Analyzing warehoused data • Mining for long-term trends Live data sets Gigabytes to terabytes In-memory storage Seconds to minutes Best uses: • Tracking live data • Immediately identifying trends and capturing opportunities • Providing immediate feedback Analytics Server hServer Hadoop IBM Teradata SAS SAP Real-time “Operational Intelligence” Batch “Business Intelligence”
  • 9. Integrated View of Analytics • Operational intelligence can co-exist with business intelligence: • Processes streaming data close to its sources. • Provides real-time, “tactical” feedback (e.g., recommendations, alerts). • Transforms data for storage in the data warehouse (ETL). • Data warehouse provides “strategic” guidance. • Using the same tool set (e.g., Hadoop MapReduce) lowers TCO: • Leverages common skill set. • Simplifies design (e.g., loading data into HDFS). 8 ScaleOut Software, Inc.
  • 10. Challenges for Operational Intelligence • To keep up with fast growing “live” workloads & maintain fast response times: • Track state of entities within a live system. • Reliably process updates to data set in real-time. • To identify and respond to trends in fast-changing data: • Enrich & evaluate “live” data set in real time. • Respond to identified patterns within seconds. 300 250 200 150 100 50 4000 3500 3000 2500 2000 1500 1000 500 9 ScaleOut Software, Inc. 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Millions Growth in Web Servers Source: Netcraft 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Exebytes Growth in “Big Data” “More data has been created in the past three years than in the past 40,000.”
  • 11. In-Memory Architecture for Operational Intelligence • In-memory data grid (IMDG) holds active entities undergoing state changes in memory. • Backing store optionally holds large population of entities. • IMDG processes incoming stream of state changes. • Analytics engine examines entities in real time and generates alerts within seconds as needed. 10 ScaleOut Software, Inc.
  • 12. In-Memory Data Grid In-Memory Data Grid (IMDG) stores “live” data in a cluster: • Fits in the business logic layer: • Follows object-oriented view of data (vs. relational view). • Stores collections of Java/.NET/C++ objects shared by multiple clients. • Uses create/read/update/delete and query APIs to access data. • Implemented across a cluster of servers or VMs: • Scales storage and throughput by adding servers. • Provides high availability in case a server fails. 11 ScaleOut Software, Inc.
  • 13. IMDGs Use Object-Oriented Model • IMDG’s collections of objects act like process collections: • Unstructured, typically instances of a class (stored as serialized blobs) • Individually accessible / update-able • IMDG adds attributes: • Accessible by global key • Query-able by properties • Highly available • Optional timeouts • Distributed locking • Integration with a backing store • Optional dependency relationships • Asynchronous event handling 12 ScaleOut Software, Inc. Object key Basic “CRUD” APIs: • Create(key, obj, tout) • Read(key) • Update(key, obj) • Delete(key) and… • Lock(key) • Unlock(key)
  • 14. In-Memory, Data-Parallel Computing • Integrates with IMDG data storage to minimize data motion. • Ex.: Parallel Method Invocation (PMI), an object-oriented version of data-parallel computing from the HPC community: • Selects objects using a parallel query on data hosted in the IMDG. • Runs user-defined methods in parallel across the cluster and merges results. Analyze Data (Eval) Combine Results (Merge) 13 ScaleOut Software, Inc. In-Memory Data Grid Runs Data-Parallel Computation.
  • 15. Achieving Linear Speedup Avoid data motion (network or disk I/O) which limits throughput: 14 ScaleOut Software, Inc.
  • 16. In-Memory Model of “Live” Entities Object-oriented model tracks and analyzes real-world entities: 15 ScaleOut Software, Inc. Real-Time Data Parallel Analysis In-Memory State in “IMDG” NoSQL Storage
  • 17. Example: Cable Set-Top Boxes • Each cable box is represented as an object in the IMDG: • Object holds raw & enriched event streams, viewer parameters, and statistics. • IMDG captures incoming events by updating objects. • IMDG uses data-parallel computation to: • immediately enrich box objects to generate alerts to recc. engine, and • continuously collect and report global statistics. 16 ScaleOut Software, Inc.
  • 18. Example in Ecommerce: Inventory Management Fast map/reduce reconciles inventory and order systems for an online retailer: • Challenge: Inventory and online order management are handled by different applications. • Reconciled once per day. • Inaccurate orders reduces margins. • Solution: • Host SKUs in IMDG updated in real time by order & inventory systems. • Use MapReduce to reconcile in two minutes. • Enables real-time reconciliation to ensure accurate orders. 17 ScaleOut Software, Inc.
  • 19. Example: Web Shopping • IMDG holds customer information for active Web users. • IMDG saves/retrieves customer information from backing store. • Web browsers send activity information to analytics engine. • IMDG updates customer history and preferences. • Analytics engine identifies browsing and buying patterns. • Analytics engine makes suggestions in real-time. Also sends email follow-ups. 18 ScaleOut Software, Inc.
  • 20. Example: Retail Shopping Brick and mortar stores use OI to compete with online experience: • IMDG tracks opt-in customers to make recommendations. • RFID tags identify product selection and availability in showroom. • Analytics engine sends real-time advisories to sales staff via tablet. 19 ScaleOut Software, Inc.
  • 21. Comparison: IMDGs to Spark Focus: accelerating business intelligence using in-memory computing: • In-memory computing to accelerate and extend Hadoop MapReduce using data-parallel operators in Scala. • Stores data as “resilient distributed datasets” (RDDs): • Distributed across cluster • Immutable • Hold data from/output to HDFS. • Manages data stream as a sequence of RDDs. • Comparison to IMDG: • Not designed for operational systems: • Lacks high availability (uses lineage). • Intended for data-parallel operations: • Lacks CRUD APIs on individual objects. 20 ScaleOut Software, Inc.
  • 22. Comparison to Storm • Focus: continuous processing of input streams • Storm implements pipelined execution of tasks by “bolts” on incoming data streams. • Streams can be distributed to bolts with configurable mappings. • Developer controls the number of tasks per bolt. • Storm uses a centralized master node and Zookeeper for fault-tolerance. • Issues: • Managing global state • Minimizing data motion • Complexity / tuning 21 ScaleOut Software, Inc.
  • 23. Implementing an Example in FinServ • Hedge fund tracks a set of hedging strategies: • Strategies can cover various market sectors, such as high-tech, automotive, energy, consumer, real estate, etc. • Each strategy contains list of holdings and rules for managing the holdings (such as target allocations). • Updates to market data continuously arrive during the trading day. • The challenge: update and analyze a large population of hedging strategies to immediately alert traders. 22 ScaleOut Software, Inc.
  • 24. In-Memory Model • The IMDG holds hedging strategies as an object-oriented collection. • Updates to market data are managed as a series of snapshot objects. • The IMDG performs repeated data-parallel analysis on hedging strategies to generate alerts. • Merges alerts and feeds to traders in real time. • IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers. 23 ScaleOut Software, Inc.
  • 25. Implementing the Analysis Step 1: Select all objects using parallel query of strategy objects: • Query spec matches data’s object-oriented properties. • Selected objects are fed to the analysis engine on each local server. 24 ScaleOut Software, Inc.
  • 26. Java Example: Parallel Query public class Portfolio { private long id; private Set<Stock> longPositions; private Set<Stock> shortPositions; private double totalValue; private Region region; private boolean alerted; // alert for trading @SossIndexAttribute // query-able property public double getTotalValue() {…} @SossIndexAttribute // query-able property public Region getRegion() {…} public Set<Long> evalPositions(MarketSnapshot ms) {…}; } NamedCache pset = CacheFactory.getCache(“portfolios"); Set<Portfolio> res = pset.queryObjects(Portfolio.class, and(greaterThan(“totalValue”, 1000000), equals(“region”, Region.US))); 25 ScaleOut Software, Inc.
  • 27. Implementing the Analysis Step 2: Create parallel methods to update and analyze the queried collection of hedging strategies: • “Eval” method applies market snapshot to an instance of a strategy object: • Compare to a MapReduce mapper; adds an input parameter. • Updates the strategy object’s positions. • Analyzes the positions for a deviation from allowed rules. • Optionally generates an alert. • “Merge” method combines alerts across the collection of strategies: • Compare to a MapReduce combiner. • Uses binary combining. • Is applied globally to the object collection by the IMDG (unlike a Mapreduce reducer). • Note: both methods access hydrated objects; avoid need for CRUD access. 26 ScaleOut Software, Inc.
  • 28. Java Example: Parallel Method Invocation • Create method to analyze a queried portfolio and another method to pair-wise merge the result sets of alerted portfolios: public class PortfolioAnalysis implements Invokable<Portfolio, MarketSnapshot, Set<Long>> { public Set<Long> eval(Portfolio p, MarketSnapshot ms) throws InvokeException { // update portfolio and return id if alerted: return p.evalPositions(ms); } public Set<Long> merge(Set<Long> set1, Set<Long> set2) throws InvokeException { set1.addAll(set2); return set1; // merged set of alerted portfolio ids }} 27 ScaleOut Software, Inc.
  • 29. Java Example: Parallel Method Invocation • Run a parallel method invocation on a queried set of portfolios and return set of ids for alerted portfolios: NamedCache pset = CacheFactory.getCache(“portfolios"); InvokeResult alertedPortolios = pset.invoke( PortfolioAnalysis.class, Portfolio.class, and(greaterThan(“totalValue”, 1000000), // query spec equals(“region”, Region.US)), marketSnapshot, // parameters ... ); System.out.println("The alerted portfolios are" + alertedPortfolios.getResult()); 28 ScaleOut Software, Inc.
  • 30. Running the Analysis • IMDG ships user’s code and libraries to its servers. • IMDG automatically schedules analysis operations across all grid servers and cores: • The analysis runs on all objects selected by the parallel query. • Each grid server analyzes its locally stored objects to minimize data motion. • Parallel execution ensures fast completion time: • IMDG automatically distributes workload across servers/cores. • Scaling the IMDG automatically handles larger data sets. 29 ScaleOut Software, Inc.
  • 31. Merging the Results • The IMDG automatically merges all analysis results: • The IMDG first merges all results within each grid server in parallel. • It then merges results across all grid servers to create one combined result. • Efficient parallel merge minimizes the delay in combining all results. • The IMDG delivers the combined result to the invoking application as one object. 30 ScaleOut Software, Inc.
  • 32. Output: Real-Time Alerts • In-memory analysis delivers a set of alerts to traders every 300 msec. • Enables the trader to examine strategy details in real time: 31 ScaleOut Software, Inc.
  • 33. Sample Performance Results for PMI • Measured a similar financial services application (back testing stock trading strategies on stock histories) • Hosted IMDG in Amazon EC2 using 75 servers holding 1 TB of stock history data in memory • IMDG handled a continuous stream of updates (1.1 GB/s) • Results: analyzed 1 TB in 4.1 seconds (250 GB/s) with linear scaling 32 ScaleOut Software, Inc.
  • 34. In-Memory MapReduce Benefits: • Enables use of standard Hadoop MapReduce for operational intelligence. • Accelerates data access by holding data in memory. • Analyzes and updates “ live” data. • Reduces overheads of standard Hadoop distributions: • Batch scheduling • Disk access • Data shuffling • Mandatory key sorting • Enables new features, e.g.: • Global combining, optional sorting 33 ScaleOut Software, Inc.
  • 35. Running MapReduce on an IMDG • A Hadoop distribution does not have to be installed unless HDFS is used. • The developer starts MapReduce applications from a remote workstation. • The IMDG automatically builds a reusable “invocation grid” of JVMs on the grid’s servers for PMI and ships the application’s jars. • Results are stored in the IMDG, HDFS, or optionally globally merged and returned to the remote workstation. 34 ScaleOut Software, Inc.
  • 36. Run In-Memory MR with YARN • YARN transparently integrates batch and in-memory MapReduce into a single execution framework with shared access to HDFS. • For example, IMDG can transparently run Apache Hive in-memory. Example of ScaleOut hServer with Hortonworks 35 ScaleOut Software, Inc. Example of Hive Running on IMDG
  • 37. Implementing MapReduce Run MapReduce as two PMI phases: • Data can be input from either the IMDG or an external data source. • Works with any input/output format compatible with the Apache distribution. • IMDG uses its data-parallel execution engine (PMI) to invoke the mappers and the reducers. • Eliminates batch scheduling overhead. • Intermediate results are stored within the IMDG. • Minimizes data motion between the mappers and reducers. • Allows optional sorting. • Output of a single reducer/combiner optionally can be globally merged. 36 ScaleOut Software, Inc.
  • 38. Accessing IMDG Data for M/R • IMDG adds grid input format for accessing key/value pairs held in the IMDG. • MapReduce programs optionally can output results to IMDG with grid output format. • Grid Record Reader optimizes access to key/value pairs to eliminate network overhead. • Applications can access and update key/value pairs as operational data during analysis. 37 ScaleOut Software, Inc.
  • 39. Optional Caching of HDFS Data • IMDG adds Dataset Record Reader (wrapper) to cache HDFS data during program execution. • Hadoop automatically retrieves data from IMDG on subsequent runs. • Dataset Record Reader stores and retrieves data with minimum network and memory overheads. • Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG. 38 ScaleOut Software, Inc.
  • 40. In-Memory Storage Models IMDG needs multiple in-memory storage models: • Named cache, optimized for rich semantics on large objects: • Property-based query • Distributed locking • Access from remote grids • Named map, optimized for efficient storage and bulk analysis (e.g., MapReduce): • Highly efficient object storage • Pipelined, bulk-access mechanisms 39 ScaleOut Software, Inc.
  • 41. In-Memory Storage Optimizations In-Memory Concurrent Map: • Stores key/value pairs in chunks. • Allows CRUD operations on kvps. • Automatically organizes chunks into splits. • Uses per-split hash table to access keys and manage multi-valued keys. • Stores shuffled data set between mappers and reducers. • Pipelines chunks to mappers and from reducers. • Optionally uses memory mapped files to reduce access latency. • Provides support for sorting keys. 40 ScaleOut Software, Inc.
  • 42. In-Memory M/R Optimizations • MapReduce optimizations: • Optional sorting • Optional multicast of parameters to mappers • Optional O(logN) global combining (avoids single, sequential reducer) • Optional HDFS caching • Optional reuse of JVMs across jobs • Measured performance: • Startup times reduced to a few milliseconds • Word count benchmark shows 20X speedup. • Real-world example shows >40X speedup. • Current limitations: • No specific security for multi-tenancy • Intermediate data must fit in the IMDG 41 ScaleOut Software, Inc.
  • 43. Accelerating Start-Up Times • Re-use in-memory context across MapReduce jobs: public static void main(String argv[]) throws Exception { //Configure and load the invocation grid InvocationGrid grid = HServerJob.getInvocationGridBuilder("myGrid"). // Add JAR files as IG dependencies addJar("main-job.jar"). addJar("first-library.jar"). // Add classes as IG dependencies addClass(MyMapper.class). addClass(MyReducer.class). // Define custom JVM parameters setJVMParameters("-Xms512M -Xmx1024M"). load(); //Run 10 jobs on the same invocation grid for(int i=0; i<10; i++) { Configuration conf = new Configuration(); //The preloaded invocation grid is passed as the parameter to the job Job job = new HServerJob(conf, "Job number "+i, false, grid); //......Configure the job here......... //Run the job job.waitForCompletion(true); } //Unload the invocation grid when we are done grid.unload(); } 42 ScaleOut Software, Inc.
  • 44. Recap • Online systems need operational intelligence on “live” data for immediate feedback. • Operational intelligence can be implemented using an IMDG integrated with data-parallel analysis. • IMDGs track “live” state: • Model real-world entities as a highly available object collection. • Enable updates to track changes. • Use data-parallel computation for immediate feedback with low latency. • Can run standard MapReduce. 43 ScaleOut Software, Inc.
  • 46. Parallel Query Example (C#) • Mark class properties as indexes for query: class Stock { [SossIndex] public string Ticker { get; set; } public decimal TotalShares { get; set; } public decimal Price { get; set; }} • Define a query using these properties: NamedCache cache = CacheFactory.GetCache("Stocks"); var q = from s in cache.QueryObjects<Stock>() where s.Ticker == "GOOG" || s.Ticker == "ORCL" select s; Console.WriteLine("{0} Stocks found", q.Count()); 45 ScaleOut Software, Inc.
  • 47. Example of Analysis Code (C#) • Create method to analyze each queried stock object: static decimal eval(Stock stock, StockCalcParams params) { return stock.Price * stock.TotalShares; } • Create method to pair-wise merge the results: static decimal merge(decimal r1, decimal r2) { return r1 + r2; } 46 ScaleOut Software, Inc.
  • 48. Invoking the Parallel Analysis (C#) • Run a parallel method invocation: NamedCache cache = CacheFactory.GetCache("Stocks"); decimal valueOfSelectedStocks = (from s in cache.QueryObjects<Stock>() where s.Ticker == "GOOG" || s.Ticker == "ORCL" select s) .Invoke(new StockCalcParams(…), new Func<Stock, StockCalcParams, decimal>(eval)) .Merge(new Func<decimal, decimal, decimal>(merge)); Console.WriteLine(“The value of selected stocks is {0}", valueOfSelectedStocks); 47 ScaleOut Software, Inc.
  • 49. 17TH ~ 18th NOV 2014 MADRID (SPAIN)