SlideShare a Scribd company logo
1 of 43
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Powering	
  Real-­‐Time	
  
Applications	
  &	
  Analytics	
  
Enabling	
  Decisions	
  in	
  the	
  Moment
John	
  Leach	
  CTO	
  &	
  Co-­‐Founder
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Life	
  Sciences
Digital	
  Marketing Fraud	
  Detection
DECISIONS	
  IN	
  THE	
  MOMENT
Supply	
  Chain	
  Optimization
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Today’s	
  Reality:	
  Stale	
  Data,	
  Backward-­‐Looking	
  Decisions
3
How	
  old	
  is	
  the	
  data	
  in	
  your	
  reports?
¨ 1	
  day	
  +
¨ 1	
  day
¨ 4	
  hours	
  +
¨ 1	
  hour	
  +
¨ Real-­‐time
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Today’s	
  Reality:	
  Stale	
  Data,	
  Backward-­‐Looking	
  Decisions
4
24%
50%
7%
9%
9%
* Source: Webinars on 11-3-15 and 12-10-15, 237 respondents
How	
  old	
  is	
  the	
  data	
  in	
  your	
  reports?
¨ 1	
  day	
  +
¨ 1	
  day
¨ 4	
  hours	
  +
¨ 1	
  hour	
  +
¨ Real-­‐time
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Data	
  Gridlock:	
  Complex,	
  Outdated	
  ETL	
  Pipelines
Ad Hoc
Analytics
Executive
Business Reports
Operational
Reports
ERP
CRM
Supply
Chain
HR
…
Data
Warehouse
Datamart
Stream or
Batch Updates
Mixed
Workload Apps
ODS
ETL
OLTP
Systems
Extract
Transform
Load
OLAP
Systems§ Pain
§ Separate	
  OLTP	
  &	
  OLAP	
  
systems
§ Messy	
  ETL	
  “glue”
§ Why?
§ Different	
  workloads
§ Different	
  data	
  structures
§ Hard	
  to	
  isolate	
  workloads
§ No	
  longer	
  adequate
§ Can’t	
  afford	
  to	
  wait	
  days	
  or	
  
hours	
  to	
  analyze	
  data
Current	
  architectures	
  unable	
  to	
  keep	
  up
5
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Nirvana:	
  No	
  More	
  ETL
OLAP  
Report
§ Benefits
§ Faster	
  since	
  Big	
  Data	
  does	
  
not	
  need	
  to	
  be	
  moved
§ Eliminate	
  expensive	
  ETL	
  and	
  
data	
  warehouse	
  systems
§ Act	
  on	
  real-­‐time	
  data	
  instead	
  
of	
  yesterday’s
§ Why	
  is	
  it	
  Possible	
  Now?
OLTP  App
OLTP/OLAP
Simultaneous	
  OLTP	
  &	
  OLAP	
  workloads
6
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Disruptive	
  Technology	
  Enablers
Scale-­Out
Technology
In-­Memory
Technology
Scale	
  Up
(Increase	
  server	
  size)
Scale	
  Out
(More	
  small	
  servers)
vs.
$ $ $ $ $ $
7
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
The	
  Splice	
  Machine	
  RDBMS:	
  Replace	
  Oracle	
  &	
  MySQL
The	
  First	
  RDBMS	
  Powered	
  by	
  Hadoop &	
  Spark
8
ANSI	
  SQL
No	
  retraining	
  or	
  rewrites	
  for	
  SQL-­‐based	
  
analysts,	
  reports,	
  and	
  applications	
  
¼	
  the	
  Cost	
  
Scales	
  out	
  on	
  
commodity	
  hardware
SQL Scale	
  Out Speed
Transactions
Ensure	
  reliable	
  updates	
  
across	
  multiple	
  rows
Mixed	
  Workloads
Simultaneously	
  support	
  
OLTP	
  and	
  OLAP	
  workloads
Elastic
Increase	
  scale	
  in	
  
just	
  a	
  few	
  minutes
10-­‐20x	
  Faster
Leverages	
  Spark	
  
in-­‐memory	
  technology
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Omni-­‐Channel	
  Marketing:	
  Harte-­‐Hanks
9
Overview	
  
Digital	
  marketing	
  services	
  provider
Unified	
  Customer	
  Profile
Real-­‐time	
  campaign	
  management
Operational	
  application	
  with	
  BI	
  reports
Challenges
Oracle	
  RAC	
  too	
  expensive	
  to	
  scale
Queries	
  too	
  slow	
  – even	
  up	
  to	
  ½	
  hour
Getting	
  worse	
  – expect	
  30-­‐50%	
  data	
  growth
Looked	
  for	
  9	
  months	
  for	
  a	
  cost-­‐effective	
  solution
Solution	
  Diagram Initial	
  Results
¼cost
with	
  commodity	
  scale	
  out
3-­‐7x	
  faster
through	
  parallelized	
  queries
10-­‐20x	
  price/perf
with	
  no	
  application,	
  BI	
  or	
  ETL	
  rewrites
Cross-Channel
Campaigns
Real-Time
Personalization
Real-Time Actions
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Simultaneous	
  OLTP	
  &	
  OLAP	
  Workloads
10
Very	
  few	
  applications	
  are	
  OLTP	
  only
Traditional RDBMSs Splice Machine
HBASE SPARK
BOTTLENECKS,	
  DELAYS
O	
  L	
  A	
  P
WORKLOAD	
  ISOLATION
O	
  L	
  T	
  P
K E Y
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Simultaneous	
  OLTP	
  &	
  OLAP	
  Workloads
11
Separate	
  OLTP	
  &	
  OLAP	
  processes	
  isolate	
  workloads	
  
Traditional RDBMSs Splice Machine
As	
  OLAP	
  load	
  rises,	
  
OLTP	
  response	
  times	
  increase
OLAP	
  LOAD
OLTP	
  RESPONSE	
  TIME
As	
  OLAP	
  load	
  rises,	
  
OLTP	
  response	
  times	
  remain	
  flat
OLAP	
  LOAD
OLTP	
  RESPONSE	
  TIME
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Proven	
  Building	
  Blocks:	
  Spark,	
  Hadoop and	
  Derby
Apache	
  Derby
§ ANSI	
  SQL-­‐99	
  RDBMS
§ Java-­‐based
§ ODBC/JDBC	
  Compliant
Apache	
  HBase/Hadoop
§ Auto-­‐sharding
§ High	
  availability
§ Scalability	
  to	
  100s	
  of	
  PBs
Apache	
  Spark
§ Analytical	
  engine
§ Fast,	
  in-­‐memory	
  technology
§ Memory	
  resilient	
   to	
  node	
  failure
12
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
HBase:	
  Proven	
  Scale-­‐Out
§ Auto-­‐sharding
§ Scales	
  with	
  commodity	
  hardware
§ Cost-­‐effective	
  from	
  GBs	
  to	
  PBs
§ High	
  availability	
  thru	
  failover	
  and	
  replication
§ LSM-­‐trees
13
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Apache	
  
14
Unmatched	
  Performance
§ Fastest	
  sort	
  of	
  1PB	
  of	
  data
Advanced	
  In-­‐Memory	
  Technology
§ Spill-­‐to-­‐disk	
  for	
  large	
  datasets
§ Resilient	
   against	
  node	
  failures
§ Pipelining	
  for	
  computation	
  parallelism
Most	
  Active	
  Apache	
  Community
§ Almost	
  500	
  committers
Extensive	
  Libraries
§ Over	
  140	
  and	
  growing
§ Libraries	
  for	
  machine	
  learning,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
streaming	
  and	
  graph	
  processing
Splice	
  Machine	
  Proprietary	
  and	
  Confidential 15
Address	
  HBase Challenges
§ Compactions
§ Large	
  Data	
  Movements
RDBMS	
  Features
§ Index	
  Creation
§ Statistics	
  Collection
§ Import
§ Admin	
  UI
Analytic	
  Processing
§ Pipelining	
  for	
  computation	
  parallelism
§ Lineage
Machine	
  Learning
§ Incorporating	
  MLib into	
  the	
  RDBMS
How	
  is	
  Spark	
  aiding	
  Splice	
  Machine?
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Advanced	
  Spark	
  Integration
16
Compaction:	
  LSM	
  Tree	
  (Deal	
  with	
  the	
  Devil)
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Spark/HBase Integration:	
  Compaction
17
Minor	
  Compaction Major	
  Compaction
•••
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Advanced	
  Spark	
  Integration
18
Innovative,	
  High-­‐Performance	
  	
  	
  	
  	
  	
  	
  	
  
RDD	
  Creation
§ Fast	
  access	
  to	
  HFiles in	
  HDFS
§ Merged	
  with	
  deltas	
  from	
  Memstore
§ Avoids	
  slower	
  HBase API
Universal	
  Execution	
  Plan	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
and	
  Byte	
  Code
§ Optimizer,	
  plan	
  and	
  code	
  shared	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
across	
  Spark	
  or	
  HBase execution
•••
HBase Region	
  Server
HDFS
•••
Region	
  1
Memstore
Spark	
  Worker
•••RDD	
  1
HFile HFile•••
P H Y S I C A L 	
   N O D E
RDD	
  N
HFile••• HFile•••
Region	
  N
Memstore
HBase Region	
  Server
HDFS
•••
Region	
  1
Memstore
Spark	
  Worker
•••RDD	
  1
HFile HFile•••
P H Y S I C A L 	
   N O D E
RDD	
  N
HFile••• HFile•••
Region	
  N
Memstore
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine	
  Architecture
1. Standard	
  install	
  of	
  HBase
Cluster	
  (HBase,	
  HDFS,	
  
ZooKeeper)	
  with	
  Spark
HBase
Co-­‐Processor
L 	
  
E 	
  
G 	
  
E 	
  
N 	
  
D
2. Distribute	
  Splice	
  Machine	
  
JAR	
  to	
  each	
  region	
  server
3. Automatically	
  invoke	
  co-­‐
processors	
  on	
  each	
  region
19
Cach
e
•••
Tas
k
Executor
Tas
k
HBase Region	
  Server
•••
HDFS
SPLICE	
   PARSER
SPLICE	
   PLANNER
SPLICE	
   OPTIMIZER
SPLICE	
   EXECUTOR	
  
• Snapshot	
   Isolation
• Indexes
Region Region
SPLICE	
   EXECUTOR	
  
• Snapshot	
   Isolation
• Indexes
Spark	
  Worker RDD
Spark	
  Master
RDD
Cach
e
•••
Tas
k
Executor
Tas
k
•••
•••
•••
Cach
e
•••
Tas
k
Executor
Tas
k
HBase Region	
  Server
HDFS
SPLICE	
   PARSER
SPLICE	
   PLANNER
SPLICE	
   OPTIMIZER
SPLICE	
   EXECUTOR	
  
• Snapshot	
   Isolation
• Indexes
Region Region
SPLICE	
   EXECUTOR	
  
• Snapshot	
   Isolation
• Indexes
Spark	
  Worker RDDRDD
Cach
e
•••
Tas
k
Executor
Tas
k
•••
•••
•••
HMasterZookeeper
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
20
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
21
1. Parse SQL
• Generate Abstract Syntax Tree (AST)
• Bind AST to Transactional Dictionary
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
22
1. Parse SQL
2. Optimize query plan
• Determine join order and storage
structure (e.g., base table, index)
using table statistics (e.g., cardinality
estimates)
• Push predicates
• Unroll nested subqueries
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
23
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
24
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Splice	
  Machine:	
  Query	
  Execution
25
OLAP Execution on Spark
4b. Generate Spark execution plan
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
OLAP Execution on Spark
4b. Generate Spark execution plan
5b. Submit Spark plan with byte code
6b. Fair scheduling of distributed of tasks
7b. Generate RDD from HFiles and Memstore
8b. Execute query and return results
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Isolated	
  Resource	
  Management
26
Isolate	
  Spark	
  &	
  HBase resources	
  through	
  Linux	
  Cgroups
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Isolated	
  Resource	
  Management
27
Isolate	
  Spark	
  &	
  HBase resources	
  through	
  Linux	
  Cgroups
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Configurable	
  Spark	
  Resource	
  Management
28
Prioritize	
  Spark	
  resources	
  between	
  Query,	
  Admin	
  &	
  Import	
  jobs
Custom	
  resource	
  pools	
  
through	
  XML
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Spark	
  Query	
  Management
29
Visualization	
  of	
  active	
  and	
  completed	
  queries
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Spark	
  Query	
  Management	
  (cont’d)
30
Visualization	
  of	
  stages	
  for	
  each	
  query,	
  plus	
  kill	
  function
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Spark	
  Query	
  Management	
  (cont’d)
31
Visualization	
  of	
  stages	
  for	
  query	
  plan,	
  plus	
  kill	
  function
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Spark	
  Query	
  Management	
  (cont’d)
32
Detailed	
  metrics	
  for	
  tasks	
  in	
  each	
  stage
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Spark	
  Query	
  Management	
  (cont’d)
33
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Federated	
  Query	
  Support
34
Virtual	
  Table	
  Interface	
  (VTI)
§ Execute	
  federated	
  queries	
  against	
  	
  	
  	
  	
  	
  	
  	
  	
  
external	
  files,	
  libraries	
  or	
  databases
§ External	
  Databases
§ Use	
  JDBC	
  to	
  access	
  data	
  in	
  DBs	
  such	
  	
  	
  	
  	
  	
  
as	
  Oracle	
  and	
  DB2
§ External	
  Libraries
§ Access	
  over	
  140	
  Spark	
  libraries	
  for	
  	
  	
  	
  	
  	
  	
  	
  
machine	
  learning	
  and	
  streaming
§ External	
  Files
§ Pre-­‐defined	
  or	
  dynamic	
  schema
§ Access	
  local	
  FS,	
  HDFS,	
  AWS	
  S3
§ Sample	
  query:
MapReduceI/O	
  Formats
§ Accept	
  federated	
  queries	
  from	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
MapReduce,	
  Pig,	
  and	
  Hive
§ Register	
  Splice	
  Machine	
  schema	
  in	
  	
  	
  	
  	
  	
  
HCATALOG
§ Merge	
  structured	
  (Splice)	
  and	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
unstructured	
  data	
  in	
  ad-­‐hoc	
  query
§ Seamless	
  integration	
  to	
  Hadoop
ecosystem
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Machine	
  Learning:	
  
Adding	
  Multivariate	
  Statistics	
  via	
  a	
  Stored	
  Procedure
35
public	
  static	
  void	
  getStatementStatistics(String	
  statement,	
  ResultSet[]	
  resultSets)	
  throws	
  SQLException{
try	
  {
//	
  Run	
  sql statement
Connection	
  con	
  =	
  DriverManager.getConnection("jdbc:default:connection");
PreparedStatement ps =	
  con.prepareStatement(statement);
ResultSet rs =	
  ps.executeQuery();
//	
  Convert	
  result	
  set	
  to	
  Java	
  RDD
JavaRDD<LocatedRow>	
  resultSetRDD=	
  ResultSetToRDD(rs);
//	
  Collect	
  column	
  statistics
int[]	
  fieldsToConvert =	
  getFieldsToConvert(ps);
MultivariateStatisticalSummary summary	
  =	
  getColumnStatisticsSummary(resultSetRDD,
fieldsToConvert);
IteratorNoPutResultSet resultsToWrap =	
  wrapResults((EmbedConnection)	
  con,
getColumnStatistics(ps,	
  summary,	
  fieldsToConvert));
resultSets[0]	
  =	
  new	
  EmbedResultSet40((EmbedConnection)con,	
  resultsToWrap,	
  false,	
  null,	
  true);
}	
  catch	
  (StandardException e)	
  {
throw	
  new	
  SQLException(Throwables.getRootCause(e));
}
}
private	
  static	
  MultivariateStatisticalSummary getColumnStatisticsSummary(JavaRDD<LocatedRow>	
  resultSetRDD,
int[]	
  fieldsToConvert)	
  throws	
  StandardException{
JavaRDD<Vector>	
  vectorJavaRDD=	
  SparkMLibUtils.locatedRowRDDToVectorRDD(resultSetRDD,	
  fieldsToConvert);
MultivariateStatisticalSummary summary	
  =	
  Statistics.colStats(vectorJavaRDD.rdd());
return	
  summary;
}
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
ANSI	
  SQL-­‐99+	
  Coverage
36
§ Data	
  types	
  – e.g.,	
  INTEGER,	
  REAL,	
  
CHARACTER,	
  DATE,	
  BOOLEAN,	
  BIGINT
§ DDL	
  – e.g.,	
  CREATE	
  TABLE,	
  CREATE	
  SCHEMA,	
  
ALTER	
  TABLE,	
  DELETE,	
  UPDATE	
  TABLE
§ Predicates	
  – e.g.,	
  IN,	
  BETWEEN,	
  LIKE,	
  EXISTS
§ DML	
  – e.g.,	
  INSERT,	
  DELETE,	
   UPDATE,	
  SELECT
§ Query	
  specification – e.g.,	
  GROUP	
  BY,	
  
HAVING
§ SET	
  functions	
  – e.g.,	
  UNION,	
  ABS,	
  MOD,	
  ALL,	
  
INTERSECT,	
  EXCEPT
§ Aggregation	
  functions – e.g.,	
  AVG,	
  MAX,	
  
COUNT
§ String	
  functions	
  – e.g.,	
  SUBSTRING,	
  
concatenation,	
  UPPER,	
  LOWER,	
  TRIM,	
  
LENGTH
§ Constraints	
  – e.g.,	
  PRIMARY	
  KEY,	
  CHECK,	
  
FOREIGN	
  KEY,	
  UNIQUE,	
  NOT	
  NULL
§ Conditional	
  functions	
  – e.g.,	
  CASE,	
  
searched	
  CASE
§ Privileges	
  – e.g.,	
  privileges	
  for	
  SELECT,	
  
DELETE,	
  INSERT,	
  EXECUTE
§ Joins	
  – e.g.,	
  INNER	
  JOIN,	
  LEFT	
  OUTER	
  JOIN
§ Transactions – e.g.,	
  COMMIT,	
  ROLLBACK,	
  
Snapshot	
  Isolation
§ Sub-­‐queries
§ Triggers
§ User-­‐defined	
  functions	
  (UDFs)
§ Views – including	
  grouped	
  views
§ Window	
  Functions	
  – e.g.,	
  FIRST_VALUE,	
  
LAST_VALUE,	
  LEAD,	
  LAG
Splice	
  Machine	
  Proprietary	
  and	
  Confidential 37
High	
  Concurrency,	
  ACID	
  transactions
Required	
  to	
  support	
  OLTP	
  applications
share_quantity share_price
TIMESTAMP VALUE TIMESTAMP VALUE
T12 4,000
“Virtual”	
  
Snapshot
T7 $15.11
T7 2,000 T5 $15.65
T3 5,000
Transaction	
  
@T6
T2 $15.74
T1 3,000 T0 $15.27
T3 5,000
Transaction	
  
@T6
T2 $15.74
T5 $15.65
value_held=	
  share_quality*	
  share_price
@T6:	
  value_held=	
  5,000	
  *	
  $15.65
@T3:	
  value_held=	
  5,000	
  *	
  $15.74
§ State-­‐of-­‐the-­‐art,	
  distributed	
  
snapshot	
  isolation
§ Form	
  of	
  Multi-­‐Version	
  
Concurrency	
  Control	
  (MVCC)
§ Writers	
  do	
  not	
  block	
  readers
§ Fast,	
  high	
  concurrency	
  
§ Delivers	
  performance	
  for	
  small	
  
reads/writes	
  &	
  batch	
  loads
§ Extends	
  research	
  from	
  Google	
  
Percolator &	
  Yahoo	
  Labs
§ Patent	
  pending	
  technology
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
BI	
  and	
  SQL	
  tool	
  support	
  via	
  ODBC/JDBC
38
No	
  application	
  rewrites	
  needed
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Application	
  Framework	
  Support
39
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Advisory	
  Board
40
Advisory	
  Board	
  includes	
  luminaries	
  in	
  databases	
  and	
  technology	
  
Roger	
  Bamford
Former	
  Principal	
  Architect	
  at	
  Oracle
Father	
  of	
  Oracle	
  RAC
Mike	
  Franklin
Computer	
  Science	
  Chair,	
  UC	
  Berkeley
Director,	
  UC	
  Berkeley	
  AMPLab
Founder	
  of	
  Apache	
  Spark
Marie-­‐Anne	
  Neimat
Co-­‐Founder,	
  Times-­‐Ten	
  Database
Former	
  VP,	
  Database	
  Eng.	
  at	
  Oracle
Ken	
  Rudin
Head	
  of	
  Growth	
  and	
  Analysis	
  	
  for	
  Google	
  Search
Head	
  of	
  Analytics	
  at	
  Facebook
Abhinav Gupta	
  
Co-­‐Founder,	
  VP	
  Engineering	
  at	
  Rocket	
  Fuel
Runs	
  15PB	
  HBase Cluster
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
The	
  Splice	
  Machine	
  RDBMS:	
  Replace	
  Oracle	
  &	
  MySQL
The	
  First	
  RDBMS	
  Powered	
  by	
  Hadoop &	
  Spark
41
ANSI	
  SQL
No	
  retraining	
  or	
  rewrites	
  for	
  SQL-­‐based	
  
analysts,	
  reports,	
  and	
  applications	
  
¼	
  the	
  Cost	
  
Scales	
  out	
  on	
  
commodity	
  hardware
SQL Scale	
  Out Speed
Transactions
Ensure	
  reliable	
  updates	
  
across	
  multiple	
  rows
Mixed	
  Workloads
Simultaneously	
  support	
  
OLTP	
  and	
  OLAP	
  workloads
Elastic
Increase	
  scale	
  in	
  
just	
  a	
  few	
  minutes
10-­‐20x	
  Faster
Leverages	
  Spark	
  
in-­‐memory	
  technology
Splice	
  Machine	
  Proprietary	
  and	
  Confidential 42
Make	
  Decisions	
  in	
  the	
  Moment
Splice	
  Machine	
  Proprietary	
  and	
  Confidential
Next	
  Steps
43
Try	
  Us!
Proof	
  of
Concept

More Related Content

What's hot

Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Data Con LA
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)E. Balauca
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architectureMartinStrycek
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceMichael Stack
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackDataWorks Summit/Hadoop Summit
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 

What's hot (20)

Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 

Viewers also liked

The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkSingleStore
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCapgemini
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessData IQ Argentina
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Cloudera, Inc.
 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsSingleStore
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Cloudera, Inc.
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...confluent
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Spark Summit
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료ABRC_DATA
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera
 

Viewers also liked (20)

The Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with SparkThe Fast Path to Building Operational Applications with Spark
The Fast Path to Building Operational Applications with Spark
 
Ibm watson
Ibm watsonIbm watson
Ibm watson
 
CWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / ClouderaCWIN17 Frankfurt / Cloudera
CWIN17 Frankfurt / Cloudera
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Zoomdata
ZoomdataZoomdata
Zoomdata
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for Business
 
Softnix Security Data Lake
Softnix Security Data Lake Softnix Security Data Lake
Softnix Security Data Lake
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Building the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time AnalyticsBuilding the Ideal Stack for Real-Time Analytics
Building the Ideal Stack for Real-Time Analytics
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Softnix Messaging Server
Softnix Messaging ServerSoftnix Messaging Server
Softnix Messaging Server
 
빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료빅데이터윈윈 컨퍼런스_데이터시각화자료
빅데이터윈윈 컨퍼런스_데이터시각화자료
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 

Similar to Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine

Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectSoftServe
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 

Similar to Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine (20)

Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2DianaGray10
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfwill854175
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationDianaGray10
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023Joshua Flannery
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionDEEPRAJ PATHAK
 

Recently uploaded (20)

Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
Efficiencies in RPA with UiPath and CyberArk Technologies - Session 2
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
BoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another CenturyBoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another Century
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Arti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdfArti Languages Pre Seed Pitchdeck 2024.pdf
Arti Languages Pre Seed Pitchdeck 2024.pdf
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automation
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile Evolution
 

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine

  • 1. Splice  Machine  Proprietary  and  Confidential Powering  Real-­‐Time   Applications  &  Analytics   Enabling  Decisions  in  the  Moment John  Leach  CTO  &  Co-­‐Founder
  • 2. Splice  Machine  Proprietary  and  Confidential Life  Sciences Digital  Marketing Fraud  Detection DECISIONS  IN  THE  MOMENT Supply  Chain  Optimization
  • 3. Splice  Machine  Proprietary  and  Confidential Today’s  Reality:  Stale  Data,  Backward-­‐Looking  Decisions 3 How  old  is  the  data  in  your  reports? ¨ 1  day  + ¨ 1  day ¨ 4  hours  + ¨ 1  hour  + ¨ Real-­‐time
  • 4. Splice  Machine  Proprietary  and  Confidential Today’s  Reality:  Stale  Data,  Backward-­‐Looking  Decisions 4 24% 50% 7% 9% 9% * Source: Webinars on 11-3-15 and 12-10-15, 237 respondents How  old  is  the  data  in  your  reports? ¨ 1  day  + ¨ 1  day ¨ 4  hours  + ¨ 1  hour  + ¨ Real-­‐time
  • 5. Splice  Machine  Proprietary  and  Confidential Data  Gridlock:  Complex,  Outdated  ETL  Pipelines Ad Hoc Analytics Executive Business Reports Operational Reports ERP CRM Supply Chain HR … Data Warehouse Datamart Stream or Batch Updates Mixed Workload Apps ODS ETL OLTP Systems Extract Transform Load OLAP Systems§ Pain § Separate  OLTP  &  OLAP   systems § Messy  ETL  “glue” § Why? § Different  workloads § Different  data  structures § Hard  to  isolate  workloads § No  longer  adequate § Can’t  afford  to  wait  days  or   hours  to  analyze  data Current  architectures  unable  to  keep  up 5
  • 6. Splice  Machine  Proprietary  and  Confidential Nirvana:  No  More  ETL OLAP   Report § Benefits § Faster  since  Big  Data  does   not  need  to  be  moved § Eliminate  expensive  ETL  and   data  warehouse  systems § Act  on  real-­‐time  data  instead   of  yesterday’s § Why  is  it  Possible  Now? OLTP  App OLTP/OLAP Simultaneous  OLTP  &  OLAP  workloads 6
  • 7. Splice  Machine  Proprietary  and  Confidential Disruptive  Technology  Enablers Scale-­Out Technology In-­Memory Technology Scale  Up (Increase  server  size) Scale  Out (More  small  servers) vs. $ $ $ $ $ $ 7
  • 8. Splice  Machine  Proprietary  and  Confidential The  Splice  Machine  RDBMS:  Replace  Oracle  &  MySQL The  First  RDBMS  Powered  by  Hadoop &  Spark 8 ANSI  SQL No  retraining  or  rewrites  for  SQL-­‐based   analysts,  reports,  and  applications   ¼  the  Cost   Scales  out  on   commodity  hardware SQL Scale  Out Speed Transactions Ensure  reliable  updates   across  multiple  rows Mixed  Workloads Simultaneously  support   OLTP  and  OLAP  workloads Elastic Increase  scale  in   just  a  few  minutes 10-­‐20x  Faster Leverages  Spark   in-­‐memory  technology
  • 9. Splice  Machine  Proprietary  and  Confidential Omni-­‐Channel  Marketing:  Harte-­‐Hanks 9 Overview   Digital  marketing  services  provider Unified  Customer  Profile Real-­‐time  campaign  management Operational  application  with  BI  reports Challenges Oracle  RAC  too  expensive  to  scale Queries  too  slow  – even  up  to  ½  hour Getting  worse  – expect  30-­‐50%  data  growth Looked  for  9  months  for  a  cost-­‐effective  solution Solution  Diagram Initial  Results ¼cost with  commodity  scale  out 3-­‐7x  faster through  parallelized  queries 10-­‐20x  price/perf with  no  application,  BI  or  ETL  rewrites Cross-Channel Campaigns Real-Time Personalization Real-Time Actions
  • 10. Splice  Machine  Proprietary  and  Confidential Simultaneous  OLTP  &  OLAP  Workloads 10 Very  few  applications  are  OLTP  only Traditional RDBMSs Splice Machine HBASE SPARK BOTTLENECKS,  DELAYS O  L  A  P WORKLOAD  ISOLATION O  L  T  P K E Y
  • 11. Splice  Machine  Proprietary  and  Confidential Simultaneous  OLTP  &  OLAP  Workloads 11 Separate  OLTP  &  OLAP  processes  isolate  workloads   Traditional RDBMSs Splice Machine As  OLAP  load  rises,   OLTP  response  times  increase OLAP  LOAD OLTP  RESPONSE  TIME As  OLAP  load  rises,   OLTP  response  times  remain  flat OLAP  LOAD OLTP  RESPONSE  TIME
  • 12. Splice  Machine  Proprietary  and  Confidential Proven  Building  Blocks:  Spark,  Hadoop and  Derby Apache  Derby § ANSI  SQL-­‐99  RDBMS § Java-­‐based § ODBC/JDBC  Compliant Apache  HBase/Hadoop § Auto-­‐sharding § High  availability § Scalability  to  100s  of  PBs Apache  Spark § Analytical  engine § Fast,  in-­‐memory  technology § Memory  resilient   to  node  failure 12
  • 13. Splice  Machine  Proprietary  and  Confidential HBase:  Proven  Scale-­‐Out § Auto-­‐sharding § Scales  with  commodity  hardware § Cost-­‐effective  from  GBs  to  PBs § High  availability  thru  failover  and  replication § LSM-­‐trees 13
  • 14. Splice  Machine  Proprietary  and  Confidential Apache   14 Unmatched  Performance § Fastest  sort  of  1PB  of  data Advanced  In-­‐Memory  Technology § Spill-­‐to-­‐disk  for  large  datasets § Resilient   against  node  failures § Pipelining  for  computation  parallelism Most  Active  Apache  Community § Almost  500  committers Extensive  Libraries § Over  140  and  growing § Libraries  for  machine  learning,                                 streaming  and  graph  processing
  • 15. Splice  Machine  Proprietary  and  Confidential 15 Address  HBase Challenges § Compactions § Large  Data  Movements RDBMS  Features § Index  Creation § Statistics  Collection § Import § Admin  UI Analytic  Processing § Pipelining  for  computation  parallelism § Lineage Machine  Learning § Incorporating  MLib into  the  RDBMS How  is  Spark  aiding  Splice  Machine?
  • 16. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Advanced  Spark  Integration 16 Compaction:  LSM  Tree  (Deal  with  the  Devil)
  • 17. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Spark/HBase Integration:  Compaction 17 Minor  Compaction Major  Compaction •••
  • 18. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Advanced  Spark  Integration 18 Innovative,  High-­‐Performance                 RDD  Creation § Fast  access  to  HFiles in  HDFS § Merged  with  deltas  from  Memstore § Avoids  slower  HBase API Universal  Execution  Plan                                     and  Byte  Code § Optimizer,  plan  and  code  shared                         across  Spark  or  HBase execution ••• HBase Region  Server HDFS ••• Region  1 Memstore Spark  Worker •••RDD  1 HFile HFile••• P H Y S I C A L   N O D E RDD  N HFile••• HFile••• Region  N Memstore HBase Region  Server HDFS ••• Region  1 Memstore Spark  Worker •••RDD  1 HFile HFile••• P H Y S I C A L   N O D E RDD  N HFile••• HFile••• Region  N Memstore
  • 19. Splice  Machine  Proprietary  and  Confidential Splice  Machine  Architecture 1. Standard  install  of  HBase Cluster  (HBase,  HDFS,   ZooKeeper)  with  Spark HBase Co-­‐Processor L   E   G   E   N   D 2. Distribute  Splice  Machine   JAR  to  each  region  server 3. Automatically  invoke  co-­‐ processors  on  each  region 19 Cach e ••• Tas k Executor Tas k HBase Region  Server ••• HDFS SPLICE   PARSER SPLICE   PLANNER SPLICE   OPTIMIZER SPLICE   EXECUTOR   • Snapshot   Isolation • Indexes Region Region SPLICE   EXECUTOR   • Snapshot   Isolation • Indexes Spark  Worker RDD Spark  Master RDD Cach e ••• Tas k Executor Tas k ••• ••• ••• Cach e ••• Tas k Executor Tas k HBase Region  Server HDFS SPLICE   PARSER SPLICE   PLANNER SPLICE   OPTIMIZER SPLICE   EXECUTOR   • Snapshot   Isolation • Indexes Region Region SPLICE   EXECUTOR   • Snapshot   Isolation • Indexes Spark  Worker RDDRDD Cach e ••• Tas k Executor Tas k ••• ••• ••• HMasterZookeeper
  • 20. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 20
  • 21. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 21 1. Parse SQL • Generate Abstract Syntax Tree (AST) • Bind AST to Transactional Dictionary
  • 22. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 22 1. Parse SQL 2. Optimize query plan • Determine join order and storage structure (e.g., base table, index) using table statistics (e.g., cardinality estimates) • Push predicates • Unroll nested subqueries
  • 23. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 23 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 24. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 24 OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 25. Splice  Machine  Proprietary  and  Confidential Splice  Machine:  Query  Execution 25 OLAP Execution on Spark 4b. Generate Spark execution plan OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan OLAP Execution on Spark 4b. Generate Spark execution plan 5b. Submit Spark plan with byte code 6b. Fair scheduling of distributed of tasks 7b. Generate RDD from HFiles and Memstore 8b. Execute query and return results
  • 26. Splice  Machine  Proprietary  and  Confidential Isolated  Resource  Management 26 Isolate  Spark  &  HBase resources  through  Linux  Cgroups
  • 27. Splice  Machine  Proprietary  and  Confidential Isolated  Resource  Management 27 Isolate  Spark  &  HBase resources  through  Linux  Cgroups
  • 28. Splice  Machine  Proprietary  and  Confidential Configurable  Spark  Resource  Management 28 Prioritize  Spark  resources  between  Query,  Admin  &  Import  jobs Custom  resource  pools   through  XML
  • 29. Splice  Machine  Proprietary  and  Confidential Spark  Query  Management 29 Visualization  of  active  and  completed  queries
  • 30. Splice  Machine  Proprietary  and  Confidential Spark  Query  Management  (cont’d) 30 Visualization  of  stages  for  each  query,  plus  kill  function
  • 31. Splice  Machine  Proprietary  and  Confidential Spark  Query  Management  (cont’d) 31 Visualization  of  stages  for  query  plan,  plus  kill  function
  • 32. Splice  Machine  Proprietary  and  Confidential Spark  Query  Management  (cont’d) 32 Detailed  metrics  for  tasks  in  each  stage
  • 33. Splice  Machine  Proprietary  and  Confidential Spark  Query  Management  (cont’d) 33
  • 34. Splice  Machine  Proprietary  and  Confidential Federated  Query  Support 34 Virtual  Table  Interface  (VTI) § Execute  federated  queries  against                   external  files,  libraries  or  databases § External  Databases § Use  JDBC  to  access  data  in  DBs  such             as  Oracle  and  DB2 § External  Libraries § Access  over  140  Spark  libraries  for                 machine  learning  and  streaming § External  Files § Pre-­‐defined  or  dynamic  schema § Access  local  FS,  HDFS,  AWS  S3 § Sample  query: MapReduceI/O  Formats § Accept  federated  queries  from                         MapReduce,  Pig,  and  Hive § Register  Splice  Machine  schema  in             HCATALOG § Merge  structured  (Splice)  and                               unstructured  data  in  ad-­‐hoc  query § Seamless  integration  to  Hadoop ecosystem
  • 35. Splice  Machine  Proprietary  and  Confidential Machine  Learning:   Adding  Multivariate  Statistics  via  a  Stored  Procedure 35 public  static  void  getStatementStatistics(String  statement,  ResultSet[]  resultSets)  throws  SQLException{ try  { //  Run  sql statement Connection  con  =  DriverManager.getConnection("jdbc:default:connection"); PreparedStatement ps =  con.prepareStatement(statement); ResultSet rs =  ps.executeQuery(); //  Convert  result  set  to  Java  RDD JavaRDD<LocatedRow>  resultSetRDD=  ResultSetToRDD(rs); //  Collect  column  statistics int[]  fieldsToConvert =  getFieldsToConvert(ps); MultivariateStatisticalSummary summary  =  getColumnStatisticsSummary(resultSetRDD, fieldsToConvert); IteratorNoPutResultSet resultsToWrap =  wrapResults((EmbedConnection)  con, getColumnStatistics(ps,  summary,  fieldsToConvert)); resultSets[0]  =  new  EmbedResultSet40((EmbedConnection)con,  resultsToWrap,  false,  null,  true); }  catch  (StandardException e)  { throw  new  SQLException(Throwables.getRootCause(e)); } } private  static  MultivariateStatisticalSummary getColumnStatisticsSummary(JavaRDD<LocatedRow>  resultSetRDD, int[]  fieldsToConvert)  throws  StandardException{ JavaRDD<Vector>  vectorJavaRDD=  SparkMLibUtils.locatedRowRDDToVectorRDD(resultSetRDD,  fieldsToConvert); MultivariateStatisticalSummary summary  =  Statistics.colStats(vectorJavaRDD.rdd()); return  summary; }
  • 36. Splice  Machine  Proprietary  and  Confidential ANSI  SQL-­‐99+  Coverage 36 § Data  types  – e.g.,  INTEGER,  REAL,   CHARACTER,  DATE,  BOOLEAN,  BIGINT § DDL  – e.g.,  CREATE  TABLE,  CREATE  SCHEMA,   ALTER  TABLE,  DELETE,  UPDATE  TABLE § Predicates  – e.g.,  IN,  BETWEEN,  LIKE,  EXISTS § DML  – e.g.,  INSERT,  DELETE,   UPDATE,  SELECT § Query  specification – e.g.,  GROUP  BY,   HAVING § SET  functions  – e.g.,  UNION,  ABS,  MOD,  ALL,   INTERSECT,  EXCEPT § Aggregation  functions – e.g.,  AVG,  MAX,   COUNT § String  functions  – e.g.,  SUBSTRING,   concatenation,  UPPER,  LOWER,  TRIM,   LENGTH § Constraints  – e.g.,  PRIMARY  KEY,  CHECK,   FOREIGN  KEY,  UNIQUE,  NOT  NULL § Conditional  functions  – e.g.,  CASE,   searched  CASE § Privileges  – e.g.,  privileges  for  SELECT,   DELETE,  INSERT,  EXECUTE § Joins  – e.g.,  INNER  JOIN,  LEFT  OUTER  JOIN § Transactions – e.g.,  COMMIT,  ROLLBACK,   Snapshot  Isolation § Sub-­‐queries § Triggers § User-­‐defined  functions  (UDFs) § Views – including  grouped  views § Window  Functions  – e.g.,  FIRST_VALUE,   LAST_VALUE,  LEAD,  LAG
  • 37. Splice  Machine  Proprietary  and  Confidential 37 High  Concurrency,  ACID  transactions Required  to  support  OLTP  applications share_quantity share_price TIMESTAMP VALUE TIMESTAMP VALUE T12 4,000 “Virtual”   Snapshot T7 $15.11 T7 2,000 T5 $15.65 T3 5,000 Transaction   @T6 T2 $15.74 T1 3,000 T0 $15.27 T3 5,000 Transaction   @T6 T2 $15.74 T5 $15.65 value_held=  share_quality*  share_price @T6:  value_held=  5,000  *  $15.65 @T3:  value_held=  5,000  *  $15.74 § State-­‐of-­‐the-­‐art,  distributed   snapshot  isolation § Form  of  Multi-­‐Version   Concurrency  Control  (MVCC) § Writers  do  not  block  readers § Fast,  high  concurrency   § Delivers  performance  for  small   reads/writes  &  batch  loads § Extends  research  from  Google   Percolator &  Yahoo  Labs § Patent  pending  technology
  • 38. Splice  Machine  Proprietary  and  Confidential BI  and  SQL  tool  support  via  ODBC/JDBC 38 No  application  rewrites  needed
  • 39. Splice  Machine  Proprietary  and  Confidential Application  Framework  Support 39
  • 40. Splice  Machine  Proprietary  and  Confidential Advisory  Board 40 Advisory  Board  includes  luminaries  in  databases  and  technology   Roger  Bamford Former  Principal  Architect  at  Oracle Father  of  Oracle  RAC Mike  Franklin Computer  Science  Chair,  UC  Berkeley Director,  UC  Berkeley  AMPLab Founder  of  Apache  Spark Marie-­‐Anne  Neimat Co-­‐Founder,  Times-­‐Ten  Database Former  VP,  Database  Eng.  at  Oracle Ken  Rudin Head  of  Growth  and  Analysis    for  Google  Search Head  of  Analytics  at  Facebook Abhinav Gupta   Co-­‐Founder,  VP  Engineering  at  Rocket  Fuel Runs  15PB  HBase Cluster
  • 41. Splice  Machine  Proprietary  and  Confidential The  Splice  Machine  RDBMS:  Replace  Oracle  &  MySQL The  First  RDBMS  Powered  by  Hadoop &  Spark 41 ANSI  SQL No  retraining  or  rewrites  for  SQL-­‐based   analysts,  reports,  and  applications   ¼  the  Cost   Scales  out  on   commodity  hardware SQL Scale  Out Speed Transactions Ensure  reliable  updates   across  multiple  rows Mixed  Workloads Simultaneously  support   OLTP  and  OLAP  workloads Elastic Increase  scale  in   just  a  few  minutes 10-­‐20x  Faster Leverages  Spark   in-­‐memory  technology
  • 42. Splice  Machine  Proprietary  and  Confidential 42 Make  Decisions  in  the  Moment
  • 43. Splice  Machine  Proprietary  and  Confidential Next  Steps 43 Try  Us! Proof  of Concept