SlideShare a Scribd company logo
Analytique temps réel sur des données transactionnelles
= Cassandra + Spark 20/02/15
Victor Coustenoble Ingénieur Solutions
victor.coustenoble@datastax.com
@vizanalytics
Comment utilisez vous Cassandra?
En contrôlant votre
consommation d’énergie
En regardant des films
en streaming
En naviguant
sur des sites Internet
En achetant
en ligne
En effectuant un règlement
via Smart Phone
En jouant à des
jeux-vidéo très
connus
• Collections/Playlists
• Recommandation/Pe
rsonnalisation
• Détection de Fraude
• Messagerie
• Objets Connectés
Aperçu
Fondé en avril 2010
~35 500+
Santa Clara, Austin, New York, London, Paris, Sydney
400+
Employés Pourcent Clients
4
Straightening the road
RELATIONAL DATABASES
CQL SQL
OpsCenter / DevCenter Management tools
DSE for search & analytics Integration
Security Security
Support, consulting & training 30 years ecosystem
Apache Cassandra™
• Apache Cassandra™ est une base de données NoSQL, Open Source, Distribuée et créée pour
les applications en ligne, modernes, critiques et avec des montée en charge massive.
• Java , hybride entre Amazon Dynamo et Google BigTable
• Sans Maître-Esclave (peer-to-peer), sans Point Unique de Défaillance (No SPOF)
• Distribuée avec la possibilité de Data Center
• 100% Disponible
• Massivement scalable
• Montée en charge linéaire
• Haute Performance (lecture ET écriture)
• Multi Data Center
• Simple à Exploiter
• Language CQL (comme SQL)
• Outils OpsCenter / DevCenter
6
Dynamo
BigTable
BigTable: http://research.google.com/archive/bigtable-osdi06.pdf
Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
Node 1
Node 2
Node 3Node 4
Node 5
Haute Disponibilité et Cohérence
• La défaillance d’un seul noeud ne doit pas entraîner de défaillance du système
• Cohérence choisie au niveau du client
• Facteur de Réplication (RF) + Niveau de Cohérence (CL) = Succès
• Exemple:
• RF = 3
• CL = QUORUM (= 51% des replicas)
©2014 DataStax Confidential. Do not distribute without consent. 7
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Node 3
3rd copy
Parallel
Write
Write
CL=QUORUM
5 μs ack
12 μs ack
12 μs ack
> 51% de réponses – donc la requête est réussie
CL(Read) + CL(Write) > RF => Cohérence Immédiate/Forte
DataStax Enterprise
Cassandra
Certifié,
Prêt pour
l’Entreprise
8
Security Analytics Search Visual
Monitoring
Management
Services
In-Memory
Dev.IDE&
Drivers
Professional
Services
Support&
Training
Confiance
d’utilisation
Fonctionnalités
d’Entreprise
DataStax Enterprise - Analytique
• Conçu pour faire des analyses sur des données Cassandra
• Il y a 4 façons de faire de l’Analytique sur des données Cassandra:
1. Recherche (Solr)
2. Analytique en mode Batch (Hadoop)
3. Analytique en mode Batch avec des outils Externe (Cloudera, Hortonworks)
4. Analytique Temps Réel
©2014 DataStax Confidential. Do not distribute without consent.
Partenariat
©2014 DataStax Confidential. Do not distribute without consent. 10
Why Spark on Cassandra?
• Analytics on transactional data and operational applications
• Data model independent queries
• Cross-table operations (JOIN, UNION, etc.)
• Complex analytics (e.g. machine learning)
• Data transformation, aggregation, etc.
• Stream processing
• Better performances than Hadoop Map/Reduce
Real-time Big Data
©2014 DataStax Confidential. Do not distribute without consent. 12
Data Enrichment
Batch Processing
Machine Learning
Pre-computed
aggregates
Data
NO ETL
Real-Time Big Data Use Cases
• Recommendation Engine
• Internet of Things
• Fraud Detection
• Risk Analysis
• Buyer Behaviour Analytics
• Telematics, Logistics
• Business Intelligence
• Infrastructure Monitoring
• …
©2014 DataStax Confidential. Do not distribute without consent. 13
Composants Sparks
Shark
or
Spark SQL
Structured
Spark
Streaming
Real-time
MLlib
Machine learning
Spark (General execution engine)
GraphX
Graph
Cassandra
Compatible
Isolation des ressources
Cassandra
Executor
ExecutorSpark
Worker
(JVM)
Cassandra
Executor
ExecutorSpark
Worker
(JVM)
DSE Spark Integration Architecture
Node 1
Node 2
Node 3
Node 4
Cassandra
Executor
ExecutorSpark
Worker
(JVM)
Cassandra
Executor
ExecutorSpark
Worker
(JVM)
Spark
Master
(JVM)
App
Driver
Spark Cassandra Connector
C*
C*
C*C*
Spark Executor
C* Java Driver
Spark-Cassandra Connector
User Application
Cassandra
Cassandra Spark Driver
•Cassandra tables exposed as Spark RDDs
•Load data from Cassandra to Spark
•Write data from Spark to Cassandra
•Object mapper : Mapping of C* tables and rows to Scala objects
•Type conversions : All Cassandra types supported and converted to Scala types
•Server side data selection
•Virtual Nodes support
•Scala and Java APIs
DSE Spark Interactive Shell
$ dse spark
...
Spark context available as sc.
HiveSQLContext available as hc.
CassandraSQLContext available as csc.
scala> sc.cassandraTable("test", "kv")
res5: com.datastax.spark.connector.rdd.CassandraRDD
[com.datastax.spark.connector.CassandraRow] =
CassandraRDD[2] at RDD at CassandraRDD.scala:48
scala> sc.cassandraTable("test", "kv").collect
res6: Array[com.datastax.spark.connector.CassandraRow] =
Array(CassandraRow{k: 1, v: foo})
cqlsh> select * from
test.kv;
k | v
---+-----
1 | foo
(1 rows)
Connecting to Cassandra
// Import Cassandra-specific functions on SparkContext and RDD objects
import com.datastax.driver.spark._
// Spark connection options
val conf = new SparkConf(true)
.setMaster("spark://192.168.123.10:7077")
.setAppName("cassandra-demo")
.set("cassandra.connection.host", "192.168.123.10") // initial
contact
.set("cassandra.username", "cassandra")
.set("cassandra.password", "cassandra")
val sc = new SparkContext(conf)
Reading Data
val table = sc
.cassandraTable[CassandraRow]("db", "tweets")
.select("user_name", "message")
.where("user_name = ?", "ewa")
row
representation keyspace table
server side column
and row selection
Writing Data
CREATE TABLE test.words(word TEXT PRIMARY KEY, count INT);
val collection = sc.parallelize(Seq(("foo", 2), ("bar", 5)))
collection.saveToCassandra("test", "words", SomeColumns("word", "count"))
cqlsh:test> select * from words;
word | count
------+-------
bar | 5
foo | 2
(2 rows)
Mapping Rows to Objects
CREATE TABLE test.cars (
id text PRIMARY KEY,
model text,
fuel_type text,
year int
);
case class Vehicle(
id: String,
model: String,
fuelType: String,
year: Int
)
sc.cassandraTable[Vehicle]("test", "cars").toArray
//Array(Vehicle(KF334L, Ford Mondeo, Petrol, 2009),
// Vehicle(MT8787, Hyundai x35, Diesel, 2011)

* Mapping rows to Scala Case Classes
* CQL underscore case column mapped to Scala camel case property
* Custom mapping functions (see docs)
Type Mapping
CQL Type Scala Type
ascii String
bigint Long
boolean Boolean
counter Long
decimal BigDecimal, java.math.BigDecimal
double Double
float Float
inet java.net.InetAddress
int Int
list Vector, List, Iterable, Seq, IndexedSeq, java.util.List
map Map, TreeMap, java.util.HashMap
set Set, TreeSet, java.util.HashSet
text, varchar String
timestamp Long, java.util.Date, java.sql.Date, org.joda.time.DateTime
timeuuid java.util.UUID
uuid java.util.UUID
varint BigInt, java.math.BigInteger
*nullable values Option
Shark
• SQL query engine on top of Spark
• Not part of Apache Spark
• Hive compatible (JDBC, UDFs, types, metadata, etc.)
• Supports in-memory tables
• Available as a part of DataStax Enterprise
Spark SQL
• Spark SQL supports a subset of SQL-92 language
• Spark SQL optimized for Spark internals (e.g. RDDs) , better performances than Shark
• Support for in-memory computation
•From Spark command line
•Mapping of Cassandra keyspaces and tables
•Read and write on Cassandra tables
Usage of Spark SQL & HiveQL query
import com.datastax.spark.connector._
// Connect to the Spark cluster
val conf = new SparkConf(true)...
val sc = new SparkContext(conf)
// Create Cassandra SQL context
val cc = new CassandraSQLContext(sc)
// Execute SQL query
val rdd = cc.sql("INSERT INTO ks.t1 SELECT c1,c2 FROM ks.t2")
// Execute HQL query
val rdd = cc.hql("SELECT * FROM keyspace.table JOIN ... WHERE ...")
Spark Streaming
• For real time analytics
• Push or pull model
• Stream TO and FROM Cassandra
• Micro batching (each batch represented as RDD)
• Fault tolerant
• Data processed in small batches
• Exactly-once processing
• Unified stream and batch processing framework
• Supports Kafka, Flume, ZeroMQ, Kinesis, MQTT
producers
Usage of Spark Streaming
• Due to the unifying Spark architecture,
portions of batch and streaming
development can be reused
• Given that Spark Streaming is backed by
Cassandra, no need to depend upon
solutions like Apache Zookeeper ™ in
production
import com.datastax.spark.connector.streaming._
// Spark connection options
val conf = new SparkConf(true)...
// streaming with 1 second batch window
val ssc = new StreamingContext(conf, Seconds(1))
// stream input
val lines = ssc.socketTextStream(serverIP, serverPort)
// count words
val wordCounts = lines.flatMap(_.split(" ")).map(word =>
(word, 1)).reduceByKey(_ + _)
// stream output
wordCounts.saveToCassandra("test", "words")
// start processing
ssc.start()
ssc.awaitTermination()
Python API
$ dse pyspark
Python 2.7.8 (default, Oct 20 2014, 15:05:19)
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/__ / .__/_,_/_/ /_/_ version 1.1.0
/_/
Using Python version 2.7.8 (default, Oct 20 2014 15:05:19)
SparkContext available as sc.
>>> sc.cassandraTable("test", "kv").collect()
[Row(k=1, v=u'foo')]
DataStax Enterprise + Spark Special Features
•Easy setup and config
• no need to setup a separate Spark cluster
• no need to tweak classpaths or config files
•High availability of Spark Master
•Enterprise security
• Password / Kerberos / LDAP authentication
• SSL for all Spark to Cassandra connections
•CFS integration (no SPOF distributed file system)
•Cassandra access through Spark Python API
•Certified and Supported on Cassandra
•Shark availability
DataStax Enterprise - High Availability
• All nodes are Spark Workers
• By default resilient to Worker failures
• First Spark node promoted as Spark Master (state saved
in CFS, no SPOF)
• Standby Master promoted on failure (New Spark Master
reconnects to Workers and the driver app and continues the job)
Without DataStax Enterprise
33
C*
SparkM
SparkW
C* SparkW
C* SparkWC* SparkW
C* SparkW
With DataStax Enterprise
34
C*
SparkM
SparkW
C*
SparkW*
C* SparkWC* SparkW
C* SparkW
Master state in C*
Spare master for H/A
Spark Use Cases
35
Load data from various
sources
Analytics (join, aggregate, transform, …)
Sanitize, validate, normalize data
Schema migration,
Data conversion
DataStax Enterprise
© 2014 DataStax, All Rights Reserved. Company Confidential
External Hadoop Distribution
Cloudera, Hortonworks
OpsCenter
Services
Hadoop
Monitoring
Operations
Operational
Application
Real Time
Search
Real Time
Analytics
Batch
Analytics
SGBDR
Analytics
Transformation
s
36
Cassandra Cluster – Nodes Ring – Column Family Storage
High Performance – Alway Available – Massive Scalability
Advanced
Security
In-Memory
How to Spark on Cassandra?
DataStax Cassandra Spark driver
https://github.com/datastax/cassandra-driver-spark
Compatible with
•Spark 1.2
•Cassandra 2.0.x and 2.1.x
•DataStax Enterprise 4.5 et 4.6
DataStax Enterprise 4.6 = Cassandra 2.0 + Driver + Spark 1.1
Spark 1.2 in next DSE 4.7 version (March)
Merci Questions ?
We power the big data apps
that transform business.
©2013 DataStax Confidential. Do not distribute without consent.
victor.coustenoble@datastax.com
@vizanalytics

More Related Content

What's hot

Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
DataWorks Summit
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Databricks
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
Vinay Kumar Chella
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Harri Kauhanen
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 

What's hot (20)

Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
Query and audit logging in cassandra
Query and audit logging in cassandraQuery and audit logging in cassandra
Query and audit logging in cassandra
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 

Similar to Spark + Cassandra = Real Time Analytics on Operational Data

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
Victor Coustenoble
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
Christopher Batey
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
Matthias Niehoff
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
Johnny Miller
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
Tim Vincent
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
Lightbend
 

Similar to Spark + Cassandra = Real Time Analytics on Operational Data (20)

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 

More from Victor Coustenoble

Préparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de FraudePréparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de Fraude
Victor Coustenoble
 
Préparation de Données dans le Cloud
Préparation de Données dans le CloudPréparation de Données dans le Cloud
Préparation de Données dans le Cloud
Victor Coustenoble
 
Préparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec TrifactaPréparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec Trifacta
Victor Coustenoble
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
Victor Coustenoble
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
Victor Coustenoble
 
DataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTDataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoT
Victor Coustenoble
 
DataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft TechdaysDataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft Techdays
Victor Coustenoble
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
Victor Coustenoble
 
Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?
Victor Coustenoble
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
Victor Coustenoble
 
DataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le CloudDataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le Cloud
Victor Coustenoble
 
Datastax Cassandra + Spark Streaming
Datastax Cassandra + Spark StreamingDatastax Cassandra + Spark Streaming
Datastax Cassandra + Spark Streaming
Victor Coustenoble
 
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache CassandraDataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
Victor Coustenoble
 

More from Victor Coustenoble (13)

Préparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de FraudePréparation de Données pour la Détection de Fraude
Préparation de Données pour la Détection de Fraude
 
Préparation de Données dans le Cloud
Préparation de Données dans le CloudPréparation de Données dans le Cloud
Préparation de Données dans le Cloud
 
Préparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec TrifactaPréparation de Données Hadoop avec Trifacta
Préparation de Données Hadoop avec Trifacta
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
 
DataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoTDataStax et Apache Cassandra pour la gestion des flux IoT
DataStax et Apache Cassandra pour la gestion des flux IoT
 
DataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft TechdaysDataStax et Cassandra dans Azure au Microsoft Techdays
DataStax et Cassandra dans Azure au Microsoft Techdays
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
 
Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?Quelles stratégies de Recherche avec Cassandra ?
Quelles stratégies de Recherche avec Cassandra ?
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 
DataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le CloudDataStax Enterprise - La plateforme de base de données pour le Cloud
DataStax Enterprise - La plateforme de base de données pour le Cloud
 
Datastax Cassandra + Spark Streaming
Datastax Cassandra + Spark StreamingDatastax Cassandra + Spark Streaming
Datastax Cassandra + Spark Streaming
 
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache CassandraDataStax Enterprise et Cas d'utilisation de Apache Cassandra
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Spark + Cassandra = Real Time Analytics on Operational Data

  • 1. Analytique temps réel sur des données transactionnelles = Cassandra + Spark 20/02/15 Victor Coustenoble Ingénieur Solutions victor.coustenoble@datastax.com @vizanalytics
  • 2.
  • 3. Comment utilisez vous Cassandra? En contrôlant votre consommation d’énergie En regardant des films en streaming En naviguant sur des sites Internet En achetant en ligne En effectuant un règlement via Smart Phone En jouant à des jeux-vidéo très connus • Collections/Playlists • Recommandation/Pe rsonnalisation • Détection de Fraude • Messagerie • Objets Connectés
  • 4. Aperçu Fondé en avril 2010 ~35 500+ Santa Clara, Austin, New York, London, Paris, Sydney 400+ Employés Pourcent Clients 4
  • 5. Straightening the road RELATIONAL DATABASES CQL SQL OpsCenter / DevCenter Management tools DSE for search & analytics Integration Security Security Support, consulting & training 30 years ecosystem
  • 6. Apache Cassandra™ • Apache Cassandra™ est une base de données NoSQL, Open Source, Distribuée et créée pour les applications en ligne, modernes, critiques et avec des montée en charge massive. • Java , hybride entre Amazon Dynamo et Google BigTable • Sans Maître-Esclave (peer-to-peer), sans Point Unique de Défaillance (No SPOF) • Distribuée avec la possibilité de Data Center • 100% Disponible • Massivement scalable • Montée en charge linéaire • Haute Performance (lecture ET écriture) • Multi Data Center • Simple à Exploiter • Language CQL (comme SQL) • Outils OpsCenter / DevCenter 6 Dynamo BigTable BigTable: http://research.google.com/archive/bigtable-osdi06.pdf Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf Node 1 Node 2 Node 3Node 4 Node 5
  • 7. Haute Disponibilité et Cohérence • La défaillance d’un seul noeud ne doit pas entraîner de défaillance du système • Cohérence choisie au niveau du client • Facteur de Réplication (RF) + Niveau de Cohérence (CL) = Succès • Exemple: • RF = 3 • CL = QUORUM (= 51% des replicas) ©2014 DataStax Confidential. Do not distribute without consent. 7 Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 3rd copy Parallel Write Write CL=QUORUM 5 μs ack 12 μs ack 12 μs ack > 51% de réponses – donc la requête est réussie CL(Read) + CL(Write) > RF => Cohérence Immédiate/Forte
  • 8. DataStax Enterprise Cassandra Certifié, Prêt pour l’Entreprise 8 Security Analytics Search Visual Monitoring Management Services In-Memory Dev.IDE& Drivers Professional Services Support& Training Confiance d’utilisation Fonctionnalités d’Entreprise
  • 9. DataStax Enterprise - Analytique • Conçu pour faire des analyses sur des données Cassandra • Il y a 4 façons de faire de l’Analytique sur des données Cassandra: 1. Recherche (Solr) 2. Analytique en mode Batch (Hadoop) 3. Analytique en mode Batch avec des outils Externe (Cloudera, Hortonworks) 4. Analytique Temps Réel ©2014 DataStax Confidential. Do not distribute without consent.
  • 10. Partenariat ©2014 DataStax Confidential. Do not distribute without consent. 10
  • 11. Why Spark on Cassandra? • Analytics on transactional data and operational applications • Data model independent queries • Cross-table operations (JOIN, UNION, etc.) • Complex analytics (e.g. machine learning) • Data transformation, aggregation, etc. • Stream processing • Better performances than Hadoop Map/Reduce
  • 12. Real-time Big Data ©2014 DataStax Confidential. Do not distribute without consent. 12 Data Enrichment Batch Processing Machine Learning Pre-computed aggregates Data NO ETL
  • 13. Real-Time Big Data Use Cases • Recommendation Engine • Internet of Things • Fraud Detection • Risk Analysis • Buyer Behaviour Analytics • Telematics, Logistics • Business Intelligence • Infrastructure Monitoring • … ©2014 DataStax Confidential. Do not distribute without consent. 13
  • 14. Composants Sparks Shark or Spark SQL Structured Spark Streaming Real-time MLlib Machine learning Spark (General execution engine) GraphX Graph Cassandra Compatible
  • 16. Cassandra Executor ExecutorSpark Worker (JVM) Cassandra Executor ExecutorSpark Worker (JVM) DSE Spark Integration Architecture Node 1 Node 2 Node 3 Node 4 Cassandra Executor ExecutorSpark Worker (JVM) Cassandra Executor ExecutorSpark Worker (JVM) Spark Master (JVM) App Driver
  • 17. Spark Cassandra Connector C* C* C*C* Spark Executor C* Java Driver Spark-Cassandra Connector User Application Cassandra
  • 18. Cassandra Spark Driver •Cassandra tables exposed as Spark RDDs •Load data from Cassandra to Spark •Write data from Spark to Cassandra •Object mapper : Mapping of C* tables and rows to Scala objects •Type conversions : All Cassandra types supported and converted to Scala types •Server side data selection •Virtual Nodes support •Scala and Java APIs
  • 19. DSE Spark Interactive Shell $ dse spark ... Spark context available as sc. HiveSQLContext available as hc. CassandraSQLContext available as csc. scala> sc.cassandraTable("test", "kv") res5: com.datastax.spark.connector.rdd.CassandraRDD [com.datastax.spark.connector.CassandraRow] = CassandraRDD[2] at RDD at CassandraRDD.scala:48 scala> sc.cassandraTable("test", "kv").collect res6: Array[com.datastax.spark.connector.CassandraRow] = Array(CassandraRow{k: 1, v: foo}) cqlsh> select * from test.kv; k | v ---+----- 1 | foo (1 rows)
  • 20. Connecting to Cassandra // Import Cassandra-specific functions on SparkContext and RDD objects import com.datastax.driver.spark._ // Spark connection options val conf = new SparkConf(true) .setMaster("spark://192.168.123.10:7077") .setAppName("cassandra-demo") .set("cassandra.connection.host", "192.168.123.10") // initial contact .set("cassandra.username", "cassandra") .set("cassandra.password", "cassandra") val sc = new SparkContext(conf)
  • 21. Reading Data val table = sc .cassandraTable[CassandraRow]("db", "tweets") .select("user_name", "message") .where("user_name = ?", "ewa") row representation keyspace table server side column and row selection
  • 22. Writing Data CREATE TABLE test.words(word TEXT PRIMARY KEY, count INT); val collection = sc.parallelize(Seq(("foo", 2), ("bar", 5))) collection.saveToCassandra("test", "words", SomeColumns("word", "count")) cqlsh:test> select * from words; word | count ------+------- bar | 5 foo | 2 (2 rows)
  • 23. Mapping Rows to Objects CREATE TABLE test.cars ( id text PRIMARY KEY, model text, fuel_type text, year int ); case class Vehicle( id: String, model: String, fuelType: String, year: Int ) sc.cassandraTable[Vehicle]("test", "cars").toArray //Array(Vehicle(KF334L, Ford Mondeo, Petrol, 2009), // Vehicle(MT8787, Hyundai x35, Diesel, 2011)  * Mapping rows to Scala Case Classes * CQL underscore case column mapped to Scala camel case property * Custom mapping functions (see docs)
  • 24. Type Mapping CQL Type Scala Type ascii String bigint Long boolean Boolean counter Long decimal BigDecimal, java.math.BigDecimal double Double float Float inet java.net.InetAddress int Int list Vector, List, Iterable, Seq, IndexedSeq, java.util.List map Map, TreeMap, java.util.HashMap set Set, TreeSet, java.util.HashSet text, varchar String timestamp Long, java.util.Date, java.sql.Date, org.joda.time.DateTime timeuuid java.util.UUID uuid java.util.UUID varint BigInt, java.math.BigInteger *nullable values Option
  • 25. Shark • SQL query engine on top of Spark • Not part of Apache Spark • Hive compatible (JDBC, UDFs, types, metadata, etc.) • Supports in-memory tables • Available as a part of DataStax Enterprise
  • 26. Spark SQL • Spark SQL supports a subset of SQL-92 language • Spark SQL optimized for Spark internals (e.g. RDDs) , better performances than Shark • Support for in-memory computation
  • 27. •From Spark command line •Mapping of Cassandra keyspaces and tables •Read and write on Cassandra tables Usage of Spark SQL & HiveQL query import com.datastax.spark.connector._ // Connect to the Spark cluster val conf = new SparkConf(true)... val sc = new SparkContext(conf) // Create Cassandra SQL context val cc = new CassandraSQLContext(sc) // Execute SQL query val rdd = cc.sql("INSERT INTO ks.t1 SELECT c1,c2 FROM ks.t2") // Execute HQL query val rdd = cc.hql("SELECT * FROM keyspace.table JOIN ... WHERE ...")
  • 28. Spark Streaming • For real time analytics • Push or pull model • Stream TO and FROM Cassandra • Micro batching (each batch represented as RDD) • Fault tolerant • Data processed in small batches • Exactly-once processing • Unified stream and batch processing framework • Supports Kafka, Flume, ZeroMQ, Kinesis, MQTT producers
  • 29. Usage of Spark Streaming • Due to the unifying Spark architecture, portions of batch and streaming development can be reused • Given that Spark Streaming is backed by Cassandra, no need to depend upon solutions like Apache Zookeeper ™ in production import com.datastax.spark.connector.streaming._ // Spark connection options val conf = new SparkConf(true)... // streaming with 1 second batch window val ssc = new StreamingContext(conf, Seconds(1)) // stream input val lines = ssc.socketTextStream(serverIP, serverPort) // count words val wordCounts = lines.flatMap(_.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) // stream output wordCounts.saveToCassandra("test", "words") // start processing ssc.start() ssc.awaitTermination()
  • 30. Python API $ dse pyspark Python 2.7.8 (default, Oct 20 2014, 15:05:19) [GCC 4.9.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /__ / .__/_,_/_/ /_/_ version 1.1.0 /_/ Using Python version 2.7.8 (default, Oct 20 2014 15:05:19) SparkContext available as sc. >>> sc.cassandraTable("test", "kv").collect() [Row(k=1, v=u'foo')]
  • 31. DataStax Enterprise + Spark Special Features •Easy setup and config • no need to setup a separate Spark cluster • no need to tweak classpaths or config files •High availability of Spark Master •Enterprise security • Password / Kerberos / LDAP authentication • SSL for all Spark to Cassandra connections •CFS integration (no SPOF distributed file system) •Cassandra access through Spark Python API •Certified and Supported on Cassandra •Shark availability
  • 32. DataStax Enterprise - High Availability • All nodes are Spark Workers • By default resilient to Worker failures • First Spark node promoted as Spark Master (state saved in CFS, no SPOF) • Standby Master promoted on failure (New Spark Master reconnects to Workers and the driver app and continues the job)
  • 33. Without DataStax Enterprise 33 C* SparkM SparkW C* SparkW C* SparkWC* SparkW C* SparkW
  • 34. With DataStax Enterprise 34 C* SparkM SparkW C* SparkW* C* SparkWC* SparkW C* SparkW Master state in C* Spare master for H/A
  • 35. Spark Use Cases 35 Load data from various sources Analytics (join, aggregate, transform, …) Sanitize, validate, normalize data Schema migration, Data conversion
  • 36. DataStax Enterprise © 2014 DataStax, All Rights Reserved. Company Confidential External Hadoop Distribution Cloudera, Hortonworks OpsCenter Services Hadoop Monitoring Operations Operational Application Real Time Search Real Time Analytics Batch Analytics SGBDR Analytics Transformation s 36 Cassandra Cluster – Nodes Ring – Column Family Storage High Performance – Alway Available – Massive Scalability Advanced Security In-Memory
  • 37. How to Spark on Cassandra? DataStax Cassandra Spark driver https://github.com/datastax/cassandra-driver-spark Compatible with •Spark 1.2 •Cassandra 2.0.x and 2.1.x •DataStax Enterprise 4.5 et 4.6 DataStax Enterprise 4.6 = Cassandra 2.0 + Driver + Spark 1.1 Spark 1.2 in next DSE 4.7 version (March)
  • 38. Merci Questions ? We power the big data apps that transform business. ©2013 DataStax Confidential. Do not distribute without consent. victor.coustenoble@datastax.com @vizanalytics

Editor's Notes

  1. Qui nous connait parmi vous. En fait dans votre vie quotienne, vous utilisez la technologie DataStax sans le savoir : ebay pour les recommandations produit, bientot NetFlix pour visonner des films en streaming, un achat par SmartSphone grace à nouveau un service offert par un grande banque mutualiste, un échange de de message instantanée avec un service du plus gros opérateur de téléphonie en France etc… Finallement vous utilisez dans votre vie de tous les jours les différents types d’applications proposées par nos 500 clients et qui s’appuie sur notre technologie de base de données We are growing so fast, and in so many ways, I'm willing to bet you’ve used our technology several times in just the past few days and don’t even realize it.  Whether you did some online banking, browsed news sites, did a bit of retail shopping, filled a few prescriptions, or watched movies online -- basically, if you lived your life -- you used the kinds of applications that we power for over 400 customers, including over 20 of the Fortune 100.
  2. Key Takeaway- Introduce the company, our incredible growth and global presence, that we are in about 25% of the FORTUNE 100, and the fact that many of the online and mobile applications you already use every day are actually built on DataStax. Talk Track- DataStax, the leading distributed database technology, delivers Apache Cassandra to the world’s most innovative companies such as Netflix, Rackspace, Pearson Education and Constant Contact. DataStax is built to be agile, always-on, and predictably scalable to any size. We were founded in April 2010, so we are a little over 4 years old. We are headquartered in Santa Clara, California and have offices in Austin TX, New York, London, England and Sydney Australia. We now have over 330 employees; this number will reach well over 400 by the end of our fiscal year (Jan 31 2015) and double by the end of FY16. Currently 25% of the Fortune 100 use us, and our success has been built on our customers success and today and we have over 500 customers worldwide, in over 40 countries. The logos you see here are ones that you are already using every day. These applications are all built on DataStax and Apache Cassandra. So how have we come so far in such a short time…..?
  3. En fait la mission de DataStax est de vos libérer de ces incertitudes et vous faciliter la route sur cette nouvelle voie. A cette fin, nous vous offrons un DML DDL appelé CQL très proche du SQL maitrisé par vos équipes, des outils complets d’administration et de monitoring, So, What DataStax is doing is trying to straightened that bend in the road. We are providing things like CQL, and management tools called DevCenter and OpsCenter. DataStax Enterprise provides integration into analytics and search capabilities and we do it all within a secure environment. We also provide consultants and training courses, including free virtual training to help get you up to speed.
  4. Cassandra is designed to handle big data workloads across multiple data centers with no single point of failure, providing enterprises with continuous availability without compromising performance. It uses aspects of Dynamos partitioning and replication and a log-structured data model similar to Bigtable’s. It takes its distribution algorithm from Dynamo and its data model from Bigtable. Cassandra is a reinvented database which is lightening fast and always on ideal for todays online applications where relational databases like Oracle can’t keep up. This means that in todays world, cassandra stores and processes real time information at fast, predictive performance and built in fault tolerance
  5. Key Takeaway- DataStax Enterprise delivers the commercial confidence and the additional enterprise functionality that you need to support your online business applications. Talk Track- DataStax takes the latest version of open source Apache Cassandra, certifies and prepares it for bullet-proof enterprise deployment. We deliver commercial confidence in the form of training and support, development tools and drivers and professional implementation services to ensure that you have everything you need to successfully deploy Cassandra on support of your mainstream business applications. We also offer additional functionality such as Management Services, that allow you to automatically manage administration and performance Security and encryption to ensure that your data remains perfectly safe and free from corruption In-Memory option that allows you to deliver online applications with lightening fast response times Analytics that allow you to gain valuable insights into data center performance Search which easily allows you search you database, and Visual Monitoring, our Ops Center product that allows you to easily manage and monitor data center performance from anywhere, and on any device
  6. Databricks is the company behind Apache Spark.
  7. Predictive analytics Does this simple architecture look familiar to you? Lambda Nathan Marz
  8. Shark is hive compatible – you can run the same application on Shark Shark integration is only on DSE, otherwise you have to wait for Spark SQL Separate projects – Spark is totally different project Spark SQL has borrowed from Spark Both promising to be Hive compatible
  9. Cassandra spark driver will NOT connect to remote DC Different nodes, profile etc..
  10. Master HA out of the box with DSE A Spark Master controls the workflow, and a Spark Worker launches executors responsible for executing part of the job submitted to the Spark master.
  11. DUYHAI