Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Cassandra & Apache Spark 
for Time Series Data 
Patrick McFadin 
Chief Evangelist for Apache Cassandra, DataStax 
@...
Cassandra for Applications 
APACHE 
CASSANDRA
Cassandra is… 
• Shared nothing 
• Masterless peer-to-peer 
• Based on Dynamo
Scaling 
• Add nodes to scale 
• Millions Ops/s 
Cassandra HBase Redis MySQL 
THROUGHPUT OPS/SEC)
Uptime 
• Built to replicate 
• Resilient to failure 
• Always on 
NONE
Replication 
DC1 
10.0.0.1 
00-25 
10.0.0.4 
76-100 
10.0.0.2 
26-50 
10.0.0.3 
51-75 
DC1: RF=3 
DC2 
10.10.0.1 
00-25 
A...
Data Model 
• Familiar syntax 
• Collections 
• PRIMARY KEY for uniqueness 
CREATE TABLE videos ( 
videoid uuid, 
userid u...
Data Model - User Defined Types 
• Complex data in one place 
• No multi-gets (multi-partitions) 
• Nesting! CREATE TYPE a...
Data Model - Updated 
• Now video_metadata is 
embedded in videos 
CREATE TYPE video_metadata ( 
height int, 
width int, 
...
Data Model - Storing JSON 
{ 
"productId": 2, 
"name": "Kitchen Table", 
"price": 249.99, 
"description" : "Rectangular ta...
Why… 
Cassandra for Time Series? 
Spark as a great addition to Cassandra?
Example 1: Weather Station 
• Weather station collects data 
• Cassandra stores in sequence 
• Application reads in sequen...
Use case 
• Get all data for one weather station 
• Get data for a single date and time 
• Get data for a range of dates a...
Data Model 
• Weather Station Id and Time 
are unique 
• Store as many as needed 
CREATE TABLE temperature ( 
weather_stat...
Storage Model - Logical View 
weather_station hour temperature 
2005:12:1:7 
-5.6 
2005:12:1:8 
-5.1 
2005:12:1:9 
-4.9 
S...
2005:12:1:12 
-5.4 
2005:12:1:11 
Storage Model - Disk Layout 
SELECT weather_station,hour,temperature 
FROM temperature 
...
Primary key relationship 
PRIMARY KEY (weatherstation_id,year,month,day,hour)
Primary key relationship 
PRIMARY KEY (weatherstation_id,year,month,day,hour) 
Partition Key
Primary key relationship 
PRIMARY KEY (weatherstation_id,year,month,day,hour) 
Partition Key Clustering Columns
Primary key relationship 
PRIMARY KEY (weatherstation_id,year,month,day,hour) 
Partition Key Clustering Columns 
10010:999...
Primary key relationship 
PRIMARY KEY (weatherstation_id,year,month,day,hour) 
Partition Key Clustering Columns 
2005:12:1...
Data Locality 
weatherstation_id=‘10010:99999’ ? 
1000 Node Cluster 
You are here!
Query patterns 
• Range queries 
• “Slice” operation on disk 
SELECT weatherstation,hour,temperature 
FROM temperature 
WH...
Query patterns 
• Range queries 
• “Slice” operation on disk 
Sorted by event_time 
Programmers like this 
SELECT weathers...
A pache Spark
Apache Spark 
• 10x faster on disk,100x faster in memory than Hadoop MR 
• Works out of the box on EMR 
• Fault Tolerant D...
Spark Components 
Spark SQL 
structured 
Spark Core 
Spark 
Streaming 
real-time 
MLlib 
machine learning 
GraphX 
graph
org.apache.spark.rdd.RDD 
Resilient Distributed Dataset (RDD) 
•Created through transformations on data (map,filter..) or ...
RDD Operations 
•Transformations - Similar to scala collections API 
•Produce new RDDs 
•filter, flatmap, map, distinct, g...
Analytic 
Analytic 
Search 
RDD Operations 
Transformation 
Action
Collections and Files To RDD 
scala> val distData = sc.parallelize(Seq(1,2,3,4,5) 
distData: spark.RDD[Int] = spark.Parall...
Spark and Cassandra
Spark on Cassandra 
• Server-Side filters (where clauses) 
• Cross-table operations (JOIN, UNION, etc.) 
• Data locality-a...
Spark Cassandra Connector 
• Loads data from Cassandra to Spark 
• Writes data from Spark to Cassandra 
• Implicit Type Co...
Spark Cassandra Connector 
C* 
User Application 
Spark-Cassandra Connector 
Cassandra C* C* 
C* 
Spark Executor 
C* Java (...
Spark Cassandra Example 
val conf = new SparkConf(loadDefaults = true) 
.set("spark.cassandra.connection.host", "127.0.0.1...
Weather Station Analysis 
• Weather station collects data 
• Cassandra stores in sequence 
• Spark rolls up data into new ...
Roll-up table 
CREATE TABLE daily_aggregate_temperature ( 
wsid text, 
year int, 
month int, 
day int, 
high double, 
low ...
Setup connection 
def main(args: Array[String]): Unit = { 
// the setMaster("local") lets us run & test the job right in o...
Get data and aggregate 
// Case class to store row data 
case class daily_aggregate_temperature (wsid: String, year: Int, ...
Store back into Cassandra 
connector.withSessionDo(session => { 
// Create a single prepared statement 
val prepared = ses...
Result 
wsid | year | month | day | high | low 
--------------+------+-------+-----+------+------ 
725300:94846 | 2012 | 9...
What just happened? 
• Data is read from raw_weather_data table 
• Transformed 
• Inserted into the daily_aggregate_temper...
Weather Station Stream Analysis 
• Weather station collects data 
• Data processed in stream 
• Data stored in Cassandra 
...
Spark Versus Spark Streaming 
zillions of bytes gigabytes per second
Analytic 
Analytic 
Search 
Spark Streaming 
Kinesis,'S3'
DStream - Micro Batches 
• Continuous sequence of micro batches 
• More complex processing models are possible with less e...
Spark Streaming Reduce Example 
val sc = new SparkContext(..) 
val ssc = new StreamingContext(sc, Seconds(5)) 
val stream ...
Temperature High/Low Stream 
Weather 
Stations 
Receive API 
Apache Kafka 
Producer 
TemperatureActor 
TemperatureActor 
T...
You can do this at home! 
https://github.com/killrweather/killrweather
Databricks & Datastax 
Apache Spark is packaged as part of Datastax 
Enterprise Analytics 4.5 
Databricks & Datastax Have ...
Resources 
•Spark Cassandra Connector 
https://github.com/datastax/spark-cassandra-connector 
•Apache Cassandra http://cas...
FREE tickets to our Annual Cassandra Summit Europe taking place in London in early December (3rd 
and 4th). The 4th is a f...
Munich Cassandra Users 
Join your local Cassandra meetup group: 
http://www.meetup.com/Munchen-Cassandra- 
Users/ 
© 2014 ...
Thank you 
Follow me on twitter for more updates 
@PatrickMcFadin
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Upcoming SlideShare
Loading in …5
×

Apache cassandra & apache spark for time series data

47,223 views

Published on

Published in: Technology
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE Format, ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache cassandra & apache spark for time series data

  1. 1. Apache Cassandra & Apache Spark for Time Series Data Patrick McFadin Chief Evangelist for Apache Cassandra, DataStax @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent. 1
  2. 2. Cassandra for Applications APACHE CASSANDRA
  3. 3. Cassandra is… • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  4. 4. Scaling • Add nodes to scale • Millions Ops/s Cassandra HBase Redis MySQL THROUGHPUT OPS/SEC)
  5. 5. Uptime • Built to replicate • Resilient to failure • Always on NONE
  6. 6. Replication DC1 10.0.0.1 00-25 10.0.0.4 76-100 10.0.0.2 26-50 10.0.0.3 51-75 DC1: RF=3 DC2 10.10.0.1 00-25 Asynchronous WAN Replication 10.10.0.4 76-100 10.10.0.2 26-50 10.10.0.3 51-75 DC2: RF=3 Client Insert Data Asynchronous Local Replication
  7. 7. Data Model • Familiar syntax • Collections • PRIMARY KEY for uniqueness CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
  8. 8. Data Model - User Defined Types • Complex data in one place • No multi-gets (multi-partitions) • Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
  9. 9. Data Model - Updated • Now video_metadata is embedded in videos CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text ); CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
  10. 10. Data Model - Storing JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float ); CREATE TYPE category ( catalogPage int, url text ); CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
  11. 11. Why… Cassandra for Time Series? Spark as a great addition to Cassandra?
  12. 12. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  13. 13. Use case • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times • Store data per weather station • Store time series in order: first to last Needed Queries Data Model to support queries
  14. 14. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weather_station text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((weather_station),year,month,day,hour) ); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-5.1); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-4.9); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.3);
  15. 15. Storage Model - Logical View weather_station hour temperature 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1; 10010:99999 10010:99999 10010:99999 2005:12:1:10 -5.3 10010:99999
  16. 16. 2005:12:1:12 -5.4 2005:12:1:11 Storage Model - Disk Layout SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1; -5.1 -4.9 -5.3 -4.9 2005:12:1:7 -5.6 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Merged, Sorted and Stored Sequentially
  17. 17. Primary key relationship PRIMARY KEY (weatherstation_id,year,month,day,hour)
  18. 18. Primary key relationship PRIMARY KEY (weatherstation_id,year,month,day,hour) Partition Key
  19. 19. Primary key relationship PRIMARY KEY (weatherstation_id,year,month,day,hour) Partition Key Clustering Columns
  20. 20. Primary key relationship PRIMARY KEY (weatherstation_id,year,month,day,hour) Partition Key Clustering Columns 10010:99999
  21. 21. Primary key relationship PRIMARY KEY (weatherstation_id,year,month,day,hour) Partition Key Clustering Columns 2005:12:1:7 -5.6 10010:99999 2005:12:1:8 2005:12:1:9 2005:12:1:10 -5.1 -4.9 -5.3
  22. 22. Data Locality weatherstation_id=‘10010:99999’ ? 1000 Node Cluster You are here!
  23. 23. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; Single seek on disk 2005:12:1:12 -5.4 2005:12:1:11 -5.1 -4.9 -5.3 -4.9 2005:12:1:7 -5.6 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Partition key for locality
  24. 24. Query patterns • Range queries • “Slice” operation on disk Sorted by event_time Programmers like this SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; weather_station hour temperature 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 10010:99999 10010:99999 10010:99999 2005:12:1:10 -5.3 10010:99999
  25. 25. A pache Spark
  26. 26. Apache Spark • 10x faster on disk,100x faster in memory than Hadoop MR • Works out of the box on EMR • Fault Tolerant Distributed Datasets • Batch, iterative and streaming analysis • In Memory Storage and Disk • Integrates with Most File and Storage Options Up to 100× faster (2-10× on disk) 2-5× less code
  27. 27. Spark Components Spark SQL structured Spark Core Spark Streaming real-time MLlib machine learning GraphX graph
  28. 28. org.apache.spark.rdd.RDD Resilient Distributed Dataset (RDD) •Created through transformations on data (map,filter..) or other RDDs •Immutable •Partitioned •Reusable
  29. 29. RDD Operations •Transformations - Similar to scala collections API •Produce new RDDs •filter, flatmap, map, distinct, groupBy, union, zip, reduceByKey, subtract •Actions •Require materialization of the records to generate a value •collect: Array[T], count, fold, reduce..
  30. 30. Analytic Analytic Search RDD Operations Transformation Action
  31. 31. Collections and Files To RDD scala> val distData = sc.parallelize(Seq(1,2,3,4,5) distData: spark.RDD[Int] = spark.ParallelCollection@10d13e3e val distFile: RDD[String] = sc.textFile(“directory/*.txt”) val distFile = sc.textFile(“hdfs://namenode:9000/path/file”) val distFile = sc.sequenceFile(“hdfs://namenode:9000/path/file”)
  32. 32. Spark and Cassandra
  33. 33. Spark on Cassandra • Server-Side filters (where clauses) • Cross-table operations (JOIN, UNION, etc.) • Data locality-aware (speed) • Data transformation, aggregation, etc. • Natural Time Series Integration
  34. 34. Spark Cassandra Connector • Loads data from Cassandra to Spark • Writes data from Spark to Cassandra • Implicit Type Conversions and Object Mapping • Implemented in Scala (offers a Java API) • Open Source • Exposes Cassandra Tables as Spark RDDs + Spark DStreams
  35. 35. Spark Cassandra Connector C* User Application Spark-Cassandra Connector Cassandra C* C* C* Spark Executor C* Java (Soon Scala) Driver https://github.com/datastax/spark-cassandra-connector
  36. 36. Spark Cassandra Example val conf = new SparkConf(loadDefaults = true) .set("spark.cassandra.connection.host", "127.0.0.1") .setMaster("spark://127.0.0.1:7077") val sc = new SparkContext(conf) val table: CassandraRDD[CassandraRow] = sc.cassandraTable("keyspace", "tweets") val ssc = new StreamingContext(sc, Seconds(30)) val stream = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder]( ssc, kafka.kafkaParams, Map(topic -> 1), StorageLevel.MEMORY_ONLY) stream.map(_._2).countByValue().saveToCassandra("demo", "wordcount") ssc.start() ssc.awaitTermination() Initialization CassandraRDD Stream Initialization Transformations and Action
  37. 37. Weather Station Analysis • Weather station collects data • Cassandra stores in sequence • Spark rolls up data into new tables Windsor California July 1, 2014 High: 73.4F Low : 51.4F
  38. 38. Roll-up table CREATE TABLE daily_aggregate_temperature ( wsid text, year int, month int, day int, high double, low double, PRIMARY KEY ((wsid), year, month, day) ); • Weather Station Id(wsid) is unique • High and low temp for each day
  39. 39. Setup connection def main(args: Array[String]): Unit = { // the setMaster("local") lets us run & test the job right in our IDE val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1").setMaster("local") // "local" here is the master, meaning we don't explicitly have a spark master set up val sc = new SparkContext("local", "weather", conf) val connector = CassandraConnector(conf) val cc = new CassandraSQLContext(sc) cc.setKeyspace("isd_weather_data")
  40. 40. Get data and aggregate // Case class to store row data case class daily_aggregate_temperature (wsid: String, year: Int, month: Int, day: Int, high:Double, low:Double) // Create SparkSQL statement val aggregationSql = "SELECT wsid, year, month, day, max(temperature) high, min(temperature) low " + "FROM raw_weather_data " + "WHERE month = 6 " + "GROUP BY wsid, year, month, day;" val srdd: SchemaRDD = cc.sql(aggregationSql); val resultSet = srdd.map(row => ( new daily_aggregate_temperature( row.getString(0), row.getInt(1), row.getInt(2), row.getInt(3), row.getDouble(4), row.getDouble(5)))) .collect()
  41. 41. Store back into Cassandra connector.withSessionDo(session => { // Create a single prepared statement val prepared = session.prepare(insertStatement) val bound = prepared.bind // Iterate over result set and bind variables for (row <- resultSet) { bound.setString("wsid", row.wsid) bound.setInt("year", row.year) bound.setInt("month", row.month) bound.setInt("day", row.day) bound.setDouble("high", row.high) bound.setDouble("low", row.low) // Insert new row in database session.execute(bound) } })
  42. 42. Result wsid | year | month | day | high | low --------------+------+-------+-----+------+------ 725300:94846 | 2012 | 9 | 30 | 18.9 | 10.6 725300:94846 | 2012 | 9 | 29 | 25.6 | 9.4 725300:94846 | 2012 | 9 | 28 | 19.4 | 11.7 725300:94846 | 2012 | 9 | 27 | 17.8 | 7.8 725300:94846 | 2012 | 9 | 26 | 22.2 | 13.3 725300:94846 | 2012 | 9 | 25 | 25 | 11.1 725300:94846 | 2012 | 9 | 24 | 21.1 | 4.4 725300:94846 | 2012 | 9 | 23 | 15.6 | 5 725300:94846 | 2012 | 9 | 22 | 15 | 7.2 725300:94846 | 2012 | 9 | 21 | 18.3 | 9.4 725300:94846 | 2012 | 9 | 20 | 21.7 | 11.7 725300:94846 | 2012 | 9 | 19 | 22.8 | 5.6 725300:94846 | 2012 | 9 | 18 | 17.2 | 9.4 725300:94846 | 2012 | 9 | 17 | 25 | 12.8 725300:94846 | 2012 | 9 | 16 | 25 | 10.6 725300:94846 | 2012 | 9 | 15 | 26.1 | 11.1 725300:94846 | 2012 | 9 | 14 | 23.9 | 11.1 725300:94846 | 2012 | 9 | 13 | 26.7 | 13.3 725300:94846 | 2012 | 9 | 12 | 29.4 | 17.2 725300:94846 | 2012 | 9 | 11 | 28.3 | 11.7 725300:94846 | 2012 | 9 | 10 | 23.9 | 12.2 725300:94846 | 2012 | 9 | 9 | 21.7 | 12.8 725300:94846 | 2012 | 9 | 8 | 22.2 | 12.8 725300:94846 | 2012 | 9 | 7 | 25.6 | 18.9 725300:94846 | 2012 | 9 | 6 | 30 | 20.6 725300:94846 | 2012 | 9 | 5 | 30 | 17.8 725300:94846 | 2012 | 9 | 4 | 32.2 | 21.7 725300:94846 | 2012 | 9 | 3 | 30.6 | 21.7 725300:94846 | 2012 | 9 | 2 | 27.2 | 21.7 725300:94846 | 2012 | 9 | 1 | 27.2 | 21.7 SELECT wsid, year, month, day, high, low FROM daily_aggregate_temperature WHERE wsid = '725300:94846' AND year=2012 AND month=9 ;
  43. 43. What just happened? • Data is read from raw_weather_data table • Transformed • Inserted into the daily_aggregate_temperature table Table: raw_weather_data Table: daily_aggregate_tem perature Read data from table Transform Insert data into table
  44. 44. Weather Station Stream Analysis • Weather station collects data • Data processed in stream • Data stored in Cassandra Windsor California Today Rainfall total: 1.2cm High: 73.4F Low : 51.4F
  45. 45. Spark Versus Spark Streaming zillions of bytes gigabytes per second
  46. 46. Analytic Analytic Search Spark Streaming Kinesis,'S3'
  47. 47. DStream - Micro Batches • Continuous sequence of micro batches • More complex processing models are possible with less effort • Streaming computations as a series of deterministic batch computations on small time intervals DStream μBatch (ordinary RDD) μBatch (ordinary RDD) μBatch (ordinary RDD) Processing of DStream = Processing of μBatches, RDDs
  48. 48. Spark Streaming Reduce Example val sc = new SparkContext(..) val ssc = new StreamingContext(sc, Seconds(5)) val stream = TwitterUtils.createStream(ssc, auth, filters, StorageLevel.MEMORY_ONLY_SER_2) val transform = (cruft: String) => Pattern.findAllIn(cruft).flatMap(_.stripPrefix("#")) /** Note that Cassandra is doing the sorting for you here. */ stream.flatMap(_.getText.toLowerCase.split("""s+""")) .map(transform) .countByValueAndWindow(Seconds(5), Seconds(5)) .transform((rdd, time) => rdd.map { case (term, count) => (term, count, now(time))}) .saveToCassandra(keyspace, suspicious, SomeColumns(“suspicious", "count", “timestamp")) Even Machine Learning!
  49. 49. Temperature High/Low Stream Weather Stations Receive API Apache Kafka Producer TemperatureActor TemperatureActor TemperatureActor Consumer
  50. 50. You can do this at home! https://github.com/killrweather/killrweather
  51. 51. Databricks & Datastax Apache Spark is packaged as part of Datastax Enterprise Analytics 4.5 Databricks & Datastax Have Partnered for Apache Spark Engineering and Support http://www.datastax.com/
  52. 52. Resources •Spark Cassandra Connector https://github.com/datastax/spark-cassandra-connector •Apache Cassandra http://cassandra.apache.org •Apache Spark http://spark.apache.org •Apache Kafka http://kafka.apache.org •Akka http://akka.io Analytic Analytic
  53. 53. FREE tickets to our Annual Cassandra Summit Europe taking place in London in early December (3rd and 4th). The 4th is a full conference day with free admission to all attendees and will feature presentations by companies like ING, Credit Suisse, Target, UBS, The Noble Group as well as other top Cassandra experts in the world. There will be content for those entirely new to Cassandra all the way to the most seasoned Cassandra veteran, spanning development, architecture, and operations as well as how to integrate Cassandra with analytics and search technologies like Apache Spark and Apache Solr. December 3rd is a paid training day. If you are interested in getting a discount on paid training, please speak with Diego - dferreira@datastax.com
  54. 54. Munich Cassandra Users Join your local Cassandra meetup group: http://www.meetup.com/Munchen-Cassandra- Users/ © 2014 DataStax, All Rights Reserved. 56
  55. 55. Thank you Follow me on twitter for more updates @PatrickMcFadin

×