Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Streaming Big Data with Apache 
Spark, Kafka and Cassandra 
Helena Edelson 
@helenaedelson 
©2014 DataStax Confidential. D...
1 Delivering Meaning In Near-Real Time At High Velocity 
2 Overview Of Spark Streaming, Cassandra, Kafka and Akka 
3 The S...
Who Is This Person? 
• Using Scala in production since 2010! 
• Spark Cassandra Connector Committer! 
• Akka (Cluster) Con...
Analytic 
Analytic 
Search 
Spark WordCount
Delivering Meaning In Near-Real 
Time At High Velocity At Massive 
Scale 
©2014 DataStax Confidential. Do not distribute w...
Strategies 
• Partition For Scale! 
• Replicate For Resiliency! 
• Share Nothing! 
• Asynchronous Message Passing! 
• Para...
What We Need 
• Fault Tolerant ! 
• Failure Detection! 
• Fast - low latency, distributed, data locality! 
• Masterless, D...
Why Akka Rocks 
• location transparency! 
• fault tolerant! 
• async message passing! 
• non-deterministic! 
• share nothi...
Apache Kafka! 
From Joe Stein
Lambda Architecture 
Spark SQL 
structured 
Spark Core 
Cassandra 
Spark 
Streaming 
real-time 
MLlib 
machine learning 
G...
Apache Spark and Spark Streaming 
©2014 DataStax Confidential. Do not distribute without consent. 
12 
The Drive-By Versio...
Analytic 
Analytic 
Search 
What Is Apache Spark 
• Fast, general cluster compute system! 
• Originally developed in 2009 ...
Apache Spark - Easy to Use & Fast 
• 10x faster on disk,100x faster in memory than Hadoop MR! 
• Works out of the box on E...
Spark Components 
Spark SQL 
structured 
Spark Core 
Spark 
Streaming 
real-time 
MLlib 
machine learning 
GraphX 
graph
• Functional 
• On the JVM 
• Capture functions and ship them across the network 
• Static typing - easier to control perf...
Spark Versus Spark Streaming 
zillions of bytes gigabytes per second
Input & Output Sources
Analytic 
Analytic 
Search 
Spark Streaming 
Kinesis,'S3'
Common Use Cases 
applications sensors web mobile phones 
intrusion detection 
malfunction 
detection 
site analytics 
net...
DStream - Micro Batches 
• Continuous sequence of micro batches! 
• More complex processing models are possible with less ...
DStreams representing the stream of raw data received from streaming 
sources. ! 
!•!Basic sources: Sources directly avail...
Spark Streaming Modules 
GroupId ArtifactId Latest Version 
org.apache.spark spark-streaming-kinesis-asl_2.10 1.1.0 
org.a...
Streaming Window Operations 
// where pairs are (word,count) 
pairsStream 
.flatMap { case (k,v) => (k,v.value) } 
.reduce...
Streaming Window Operations 
• window(Duration, Duration) 
• countByWindow(Duration, Duration) 
• reduceByWindow(Duration,...
Ten Things About Cassandra 
You always wanted to know but were afraid to ask 
©2014 DataStax Confidential. Do not distribu...
Quick intro to Cassandra 
• Shared nothing 
• Masterless peer-to-peer 
• Based on Dynamo 
• Highly Scalable 
• Spans DataC...
Apache Cassandra 
• Elasticity - scale to as many nodes as you need, when you need! 
•! Always On - No single point of fai...
Easy to use 
• CQL - familiar syntax! 
• Friendly to programmers! 
• Paxos for locking 
CREATE TABLE users (! 
username va...
Spark Cassandra Connector 
©2014 DataStax Confidential. Do not distribute without consent. 
30
Spark On Cassandra 
• Server-Side filters (where clauses)! 
• Cross-table operations (JOIN, UNION, etc.)! 
• Data locality...
Spark Cassandra Connector 
• Loads data from Cassandra to Spark! 
• Writes data from Spark to Cassandra! 
• Implicit Type ...
Spark Cassandra Connector 
!https://github.com/datastax/spark-cassandra-connector 
C* 
User Application 
Spark-Cassandra C...
Use Cases: KillrWeather 
• Get data by weather station! 
• Get data for a single date and time! 
• Get data for a range of...
Cassandra Data Model 
user_name tweet_id publication_date message 
pkolaczk 4b514a41-­‐9cf... 2014-­‐08-­‐21 
10:00:04 ......
Spark Data Model 
A1 A2 A3 A4 A5 A6 A7 A8 
map 
B1 B2 B3 B4 B5 B6 B7 B8 
filter 
B1 B2 B5 B7 B8 
C 
reduce 
Resilient Dist...
Spark Cassandra Example 
val conf = new SparkConf(loadDefaults = true) 
.set("spark.cassandra.connection.host", "127.0.0.1...
Spark Cassandra Example 
val sc = new SparkContext(..) 
val ssc = new StreamingContext(sc, Seconds(5)) 
! 
val stream = Tw...
Reading: From C* To Spark 
row 
representation keyspace table 
val table = sc 
.cassandraTable[CassandraRow]("keyspace", "...
CassandraRDD 
class CassandraRDD[R](..., keyspace: String, table: String, ...) 
extends RDD[R](...) { 
// Splits the table...
CassandraStreamingRDD 
/** RDD representing a Cassandra table for Spark Streaming. 
* @see [[com.datastax.spark.connector....
Paging Reads with .cassandraTable 
• Page size is configurable! 
• Controls how many CQL rows to fetch at a time, when fet...
ResultSet Paging and Pre-Fetching 
Node 1 
Client Cassandra 
Node 1 
request a page 
data 
process data 
request a page 
d...
Co-locate Spark and C* for Best 
Performance 
44 
C* 
C* C* 
Spark 
Worker 
C* 
Spark 
Worker 
Spark 
Master 
Spark 
Worke...
The Key To Speed - Data Locality 
• LocalNodeFirstLoadBalancingPolicy! 
• Decides what node will become the coordinator fo...
Column Type Conversions 
trait 
TypeConverter[T] 
extends 
Serializable 
{ 
def 
targetTypeTag: 
TypeTag[T] 
def 
convertP...
Rows to Case Class Instances 
val 
row: 
CassandraRow 
= 
table.first 
case 
class 
Tweet(userName: 
String, 
tweetId: 
UU...
Convert Rows To Tuples 
! 
val 
tweets 
= 
sc 
.cassandraTable[(String, 
String)]("db", 
"tweets") 
.select("userName", 
"...
Handling Unsupported Types 
case 
class 
Tweet( 
userName: 
String, 
tweetId: 
my.private.implementation.of.UUID, 
publica...
Connector Code and Docs 
https://github.com/datastax/spark-cassandra-connector! 
! 
! 
! 
! 
! 
! 
Add It To Your Project:...
What’s New In Spark? 
•Petabyte sort record! 
• Myth busting: ! 
• Spark is in-memory. It doesn’t work with big data! 
• I...
Recent / Pending Spark Updates 
•Usability, Stability & Performance Improvements ! 
• Improved Lightning Fast Shuffle!! 
•...
Recent Spark Streaming Updates 
•Apache Flume: a new pull-based mode (simplifying deployment and 
providing high availabil...
Recent MLib Updates 
• New library of statistical packages which provides exploratory analytic 
functions *stratified samp...
Recent/Pending Connector Updates 
•Spark SQL (came in 1.1)! 
•Python API! 
•Official Scala Driver for Cassandra! 
•Removin...
Resources 
•Spark Cassandra Connector ! 
https://github.com/datastax/spark-cassandra-connector ! 
•Apache Cassandra http:/...
www.spark-summit.org
Thanks 
for 
listening! 
SEPTEMBER 
10 
-­‐ 
11, 
2014 
| 
SAN 
FRANCISCO, 
CALIF. 
| 
THE 
WESTIN 
ST. 
FRANCIS 
HOTEL 
C...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with Apache: Spark, Kafka, Cassandra and Akka
Upcoming SlideShare
Loading in …5
×

Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with Apache: Spark, Kafka, Cassandra and Akka

11,249 views

Published on

Streaming Big Data: Delivering Meaning In Near-Real Time At High Velocity At Massive Scale with Apache Spark, Apache Kafka, Apache Cassandra, Akka and the Spark Cassandra Connector. Why this pairing of technologies and How easy it is to implement. Example application: https://github.com/killrweather/killrweather

Published in: Technology

Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with Apache: Spark, Kafka, Cassandra and Akka

  1. 1. Streaming Big Data with Apache Spark, Kafka and Cassandra Helena Edelson @helenaedelson ©2014 DataStax Confidential. Do not distribute without consent. 1
  2. 2. 1 Delivering Meaning In Near-Real Time At High Velocity 2 Overview Of Spark Streaming, Cassandra, Kafka and Akka 3 The Spark Cassandra Connector 4 Integration In Big Data Applications © 2014 DataStax, All Rights Reserved. 2
  3. 3. Who Is This Person? • Using Scala in production since 2010! • Spark Cassandra Connector Committer! • Akka (Cluster) Contributor ! • Scala Driver for Cassandra Committer ! • @helenaedelson! • https://github.com/helena ! • Senior Software Engineer, Analytics Team, DataStax! • Spent the last few years as a Senior Cloud Engineer Analytic Analytic
  4. 4. Analytic Analytic Search Spark WordCount
  5. 5. Delivering Meaning In Near-Real Time At High Velocity At Massive Scale ©2014 DataStax Confidential. Do not distribute without consent. 5
  6. 6. Strategies • Partition For Scale! • Replicate For Resiliency! • Share Nothing! • Asynchronous Message Passing! • Parallelism! • Isolation! • Location Transparency
  7. 7. What We Need • Fault Tolerant ! • Failure Detection! • Fast - low latency, distributed, data locality! • Masterless, Decentralized Cluster Membership! • Span Racks and DataCenters! • Hashes The Node Ring ! • Partition-Aware! • Elasticity ! • Asynchronous - message-passing system! • Parallelism! • Network Topology Aware!
  8. 8. Why Akka Rocks • location transparency! • fault tolerant! • async message passing! • non-deterministic! • share nothing! • actor atomicity (w/in actor)! !
  9. 9. Apache Kafka! From Joe Stein
  10. 10. Lambda Architecture Spark SQL structured Spark Core Cassandra Spark Streaming real-time MLlib machine learning GraphX graph Apache ! Kafka
  11. 11. Apache Spark and Spark Streaming ©2014 DataStax Confidential. Do not distribute without consent. 12 The Drive-By Version
  12. 12. Analytic Analytic Search What Is Apache Spark • Fast, general cluster compute system! • Originally developed in 2009 in UC Berkeley’s AMPLab! • Fully open sourced in 2010 – now at Apache Software Foundation! • Distributed, Scalable, Fault Tolerant
  13. 13. Apache Spark - Easy to Use & Fast • 10x faster on disk,100x faster in memory than Hadoop MR! • Works out of the box on EMR! • Fault Tolerant Distributed Datasets! • Batch, iterative and streaming analysis! • In Memory Storage and Disk ! • Integrates with Most File and Storage Options Up to 100× faster (2-10× on disk) Analytic 2-5× less code Analytic Search
  14. 14. Spark Components Spark SQL structured Spark Core Spark Streaming real-time MLlib machine learning GraphX graph
  15. 15. • Functional • On the JVM • Capture functions and ship them across the network • Static typing - easier to control performance • Leverage Scala REPL for the Spark REPL http://apache-spark-user-list.1001560.n3.nabble.com/Why-Scala-tp6536p6538. html Analytic Analytic Search Why Scala?
  16. 16. Spark Versus Spark Streaming zillions of bytes gigabytes per second
  17. 17. Input & Output Sources
  18. 18. Analytic Analytic Search Spark Streaming Kinesis,'S3'
  19. 19. Common Use Cases applications sensors web mobile phones intrusion detection malfunction detection site analytics network metrics analysis fraud detection dynamic process optimisation recommendations location based ads log processing supply chain planning sentiment analysis …
  20. 20. DStream - Micro Batches • Continuous sequence of micro batches! • More complex processing models are possible with less effort! • Streaming computations as a series of deterministic batch computations on small time intervals DStream μBatch (ordinary RDD) μBatch (ordinary RDD) μBatch (ordinary RDD) Processing of DStream = Processing of μBatches, RDDs
  21. 21. DStreams representing the stream of raw data received from streaming sources. ! !•!Basic sources: Sources directly available in the StreamingContext API! !•!Advanced sources: Sources available through external modules available in Spark as artifacts 22 InputDStreams
  22. 22. Spark Streaming Modules GroupId ArtifactId Latest Version org.apache.spark spark-streaming-kinesis-asl_2.10 1.1.0 org.apache.spark spark-streaming-mqtt_2.10 1.1.0 all (7) org.apache.spark spark-streaming-zeromq_2.10 1.1.0 all (7) org.apache.spark spark-streaming-flume_2.10 1.1.0 all (7) org.apache.spark spark-streaming-flume-sink_2.10 1.1.0 org.apache.spark spark-streaming-kafka_2.10 1.1.0 all (7) org.apache.spark spark-streaming-twitter_2.10 1.1.0 all (7)
  23. 23. Streaming Window Operations // where pairs are (word,count) pairsStream .flatMap { case (k,v) => (k,v.value) } .reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(30), Seconds(10)) .saveToCassandra(keyspace,table) 24
  24. 24. Streaming Window Operations • window(Duration, Duration) • countByWindow(Duration, Duration) • reduceByWindow(Duration, Duration) • countByValueAndWindow(Duration, Duration) • groupByKeyAndWindow(Duration, Duration) • reduceByKeyAndWindow((V, V) => V, Duration, Duration)
  25. 25. Ten Things About Cassandra You always wanted to know but were afraid to ask ©2014 DataStax Confidential. Do not distribute without consent. 26
  26. 26. Quick intro to Cassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo • Highly Scalable • Spans DataCenters
  27. 27. Apache Cassandra • Elasticity - scale to as many nodes as you need, when you need! •! Always On - No single point of failure, Continuous availability!! !•! Masterless peer to peer architecture! •! Designed for Replication! !•! Flexible Data Storage! !•! Read and write to any node syncs across the cluster! !•! Operational simplicity - with all nodes in a cluster being the same, there is no complex configuration to manage
  28. 28. Easy to use • CQL - familiar syntax! • Friendly to programmers! • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('hedelson','Helena','Edelson',! [‘helena.edelson@datastax.com'],'ba27e03fd95e507daf2937c937d499ab','2014-11-15 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  29. 29. Spark Cassandra Connector ©2014 DataStax Confidential. Do not distribute without consent. 30
  30. 30. Spark On Cassandra • Server-Side filters (where clauses)! • Cross-table operations (JOIN, UNION, etc.)! • Data locality-aware (speed)! • Data transformation, aggregation, etc. ! • Natural Time Series Integration
  31. 31. Spark Cassandra Connector • Loads data from Cassandra to Spark! • Writes data from Spark to Cassandra! • Implicit Type Conversions and Object Mapping! • Implemented in Scala (offers a Java API)! • Open Source ! • Exposes Cassandra Tables as Spark RDDs + Spark DStreams
  32. 32. Spark Cassandra Connector !https://github.com/datastax/spark-cassandra-connector C* User Application Spark-Cassandra Connector Cassandra C* C* C* Spark Executor C* Java (Soon Scala) Driver
  33. 33. Use Cases: KillrWeather • Get data by weather station! • Get data for a single date and time! • Get data for a range of dates and times! • Compute, store and quickly retrieve daily, monthly and annual aggregations of data! Data Model to support queries • Store raw data per weather station • Store time series in order: most recent to oldest • Compute and store aggregate data in the stream • Set TTLs on historic data
  34. 34. Cassandra Data Model user_name tweet_id publication_date message pkolaczk 4b514a41-­‐9cf... 2014-­‐08-­‐21 10:00:04 ... pkolaczk 411bbe34-­‐11d... 2014-­‐08-­‐24 16:03:47 ... ewa 73847f13-­‐771... 2014-­‐08-­‐21 10:00:04 ... ewa 41106e9a-­‐5ff... 2014-­‐08-­‐21 10:14:51 ... row partitioning key partition primary key data columns
  35. 35. Spark Data Model A1 A2 A3 A4 A5 A6 A7 A8 map B1 B2 B3 B4 B5 B6 B7 B8 filter B1 B2 B5 B7 B8 C reduce Resilient Distributed Dataset! ! A Scala collection:! • immutable! • iterable! • serializable! • distributed! • parallel! • lazy evaluation
  36. 36. Spark Cassandra Example val conf = new SparkConf(loadDefaults = true) .set("spark.cassandra.connection.host", "127.0.0.1") .setMaster("spark://127.0.0.1:7077") ! val sc = new SparkContext(conf) ! ! val table: CassandraRDD[CassandraRow] = sc.cassandraTable("keyspace", "tweets") ! val ssc = new StreamingContext(sc, Seconds(30)) val stream = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder]( ssc, kafka.kafkaParams, Map(topic -> 1), StorageLevel.MEMORY_ONLY) stream.map(_._2).countByValue().saveToCassandra("demo", "wordcount") ssc.start() ssc.awaitTermination() ! Initialization CassandraRDD Stream Initialization Transformations and Action
  37. 37. Spark Cassandra Example val sc = new SparkContext(..) val ssc = new StreamingContext(sc, Seconds(5)) ! val stream = TwitterUtils.createStream(ssc, auth, filters, StorageLevel.MEMORY_ONLY_SER_2) val transform = (cruft: String) => Pattern.findAllIn(cruft).flatMap(_.stripPrefix("#")) /** Note that Cassandra is doing the sorting for you here. */ stream.flatMap(_.getText.toLowerCase.split("""s+""")) .map(transform) .countByValueAndWindow(Seconds(5), Seconds(5)) .transform((rdd, time) => rdd.map { case (term, count) => (term, count, now(time))}) .saveToCassandra(keyspace, suspicious, SomeColumns(“suspicious", "count", “timestamp"))
  38. 38. Reading: From C* To Spark row representation keyspace table val table = sc .cassandraTable[CassandraRow]("keyspace", "tweets") .select("user_name", "message") .where("user_name = ?", "ewa") server side column and row selection
  39. 39. CassandraRDD class CassandraRDD[R](..., keyspace: String, table: String, ...) extends RDD[R](...) { // Splits the table into multiple Spark partitions, // each processed by single Spark Task override def getPartitions: Array[Partition] // Returns names of hosts storing given partition (for data locality!) override def getPreferredLocations(split: Partition): Seq[String] // Returns iterator over Cassandra rows in the given partition override def compute(split: Partition, context: TaskContext): Iterator[R] }
  40. 40. CassandraStreamingRDD /** RDD representing a Cassandra table for Spark Streaming. * @see [[com.datastax.spark.connector.rdd.CassandraRDD]] */ class CassandraStreamingRDD[R] private[connector] ( sctx: StreamingContext, connector: CassandraConnector, keyspace: String, table: String, columns: ColumnSelector = AllColumns, where: CqlWhereClause = CqlWhereClause.empty, readConf: ReadConf = ReadConf())( implicit ct : ClassTag[R], @transient rrf: RowReaderFactory[R]) extends CassandraRDD[R](sctx.sparkContext, connector, keyspace, table, columns, where, readConf)
  41. 41. Paging Reads with .cassandraTable • Page size is configurable! • Controls how many CQL rows to fetch at a time, when fetching a single partition! • Connector returns an iterator for rows to Spark! • Spark iterates over this, lazily ! • Handled by the java driver as well as spark
  42. 42. ResultSet Paging and Pre-Fetching Node 1 Client Cassandra Node 1 request a page data process data request a page data request a page Node 2 Client Cassandra Node 2 request a page data process data request a page data request a page
  43. 43. Co-locate Spark and C* for Best Performance 44 C* C* C* Spark Worker C* Spark Worker Spark Master Spark Worker Running Spark Workers on the same nodes as your C* Cluster will save network hops when reading and writing
  44. 44. The Key To Speed - Data Locality • LocalNodeFirstLoadBalancingPolicy! • Decides what node will become the coordinator for the given mutation/read! • Selects local node first and then nodes in the local DC in random order! • Once that node receives the request it will be distributed ! • Proximal Node Sort Defined by the C* snitch! Analytic •https://github.com/apache/cassandra/blob/trunk/src/java/org/ apache/cassandra/locator/DynamicEndpointSnitch.java#L155- Analytic L190 Search
  45. 45. Column Type Conversions trait TypeConverter[T] extends Serializable { def targetTypeTag: TypeTag[T] def convertPF: PartialFunction[Any, T] } ! class CassandraRow(...) { def get[T](index: Int)(implicit c: TypeConverter[T]): T = … def get[T](name: String)(implicit c: TypeConverter[T]): T = … … } ! implicit object IntConverter extends TypeConverter[Int] { def targetTypeTag = implicitly[TypeTag[Int]] convertPF = { case def x: Number => x.intValue case x: String => x.toInt } }
  46. 46. Rows to Case Class Instances val row: CassandraRow = table.first case class Tweet(userName: String, tweetId: UUID, publicationDate: Date, message: String) extends Serializable ! val tweets = sc.cassandraTable[Tweet]("db", "tweets") Scala Cassandra message message column1 column_1 userName user_name Scala Cassandra Message Message column1 column1 userName userName Recommended convention:
  47. 47. Convert Rows To Tuples ! val tweets = sc .cassandraTable[(String, String)]("db", "tweets") .select("userName", "message") When returning tuples, always use select to ! specify the column order!
  48. 48. Handling Unsupported Types case class Tweet( userName: String, tweetId: my.private.implementation.of.UUID, publicationDate: Date, message: String) ! val tweets: CassandraRDD[Tweet] = sc.cassandraTable[Tweet]("db", "tweets") Current behavior: runtime error Future: compile-time error (macros!) Macros could even parse Cassandra schema!
  49. 49. Connector Code and Docs https://github.com/datastax/spark-cassandra-connector! ! ! ! ! ! ! Add It To Your Project:! ! val connector = "com.datastax.spark" %% "spark-­‐cassandra-­‐connector" % "1.1.0-­‐beta2"
  50. 50. What’s New In Spark? •Petabyte sort record! • Myth busting: ! • Spark is in-memory. It doesn’t work with big data! • It’s too expensive to buy a cluster with enough memory to fit our data! •Application Integration! • Tableau, Trifacta, Talend, ElasticSearch, Cassandra! •Ongoing development for Spark 1.2! • Python streaming, new MLib API, Yarn scaling…
  51. 51. Recent / Pending Spark Updates •Usability, Stability & Performance Improvements ! • Improved Lightning Fast Shuffle!! • Monitoring the performance long running or complex jobs! •Public types API to allow users to create SchemaRDD’s! •Support for registering Python, Scala, and Java lambda functions as UDF ! •Dynamic bytecode generation significantly speeding up execution for queries that perform complex expression evaluation
  52. 52. Recent Spark Streaming Updates •Apache Flume: a new pull-based mode (simplifying deployment and providing high availability)! •The first of a set of streaming machine learning algorithms is introduced with streaming linear regression.! •Rate limiting has been added for streaming inputs
  53. 53. Recent MLib Updates • New library of statistical packages which provides exploratory analytic functions *stratified sampling, correlations, chi-squared tests, creating random datasets…)! • Utilities for feature extraction (Word2Vec and TF-IDF) and feature transformation (normalization and standard scaling). ! • Support for nonnegative matrix factorization and SVD via Lanczos. ! • Decision tree algorithm has been added in Python and Java. ! • Tree aggregation primitive! • Performance improves across the board, with improvements of around
  54. 54. Recent/Pending Connector Updates •Spark SQL (came in 1.1)! •Python API! •Official Scala Driver for Cassandra! •Removing last traces of Thrift!!! •Performance Improvements! • Token-aware data repartitioning! • Token-aware saving! • Wide-row support - no costly groupBy call
  55. 55. Resources •Spark Cassandra Connector ! https://github.com/datastax/spark-cassandra-connector ! •Apache Cassandra http://cassandra.apache.org! •Apache Spark http://spark.apache.org ! •Apache Kafka http://kafka.apache.org ! •Akka http://akka.io Analytic Analytic
  56. 56. www.spark-summit.org
  57. 57. Thanks for listening! SEPTEMBER 10 -­‐ 11, 2014 | SAN FRANCISCO, CALIF. | THE WESTIN ST. FRANCIS HOTEL Cassandra Summit

×