Your SlideShare is downloading. ×
0
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Escape from Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Escape from Hadoop

2,090

Published on

Apache Cassandra and Spark when combined can give powerful OLTP and OLAP functionality for your data. We’ll walk through the basics of both of these platforms before diving into applications combining …

Apache Cassandra and Spark when combined can give powerful OLTP and OLAP functionality for your data. We’ll walk through the basics of both of these platforms before diving into applications combining the two. Usually joins, changing a partition key, or importing data can be difficult in Cassandra, but we’ll see how do these and other operations in a set of simple Spark Shell one-liners!

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,090
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
48
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Escape From Hadoop: Spark One Liners for C* Ops Kurt Russell Spitzer DataStax
  2. Who am I? • Bioinformatics Ph.D from UCSF • Works on the integration of Cassandra (C*) with Hadoop, Solr, and SPARK!! • Spends a lot of time spinning up clusters on EC2, GCE, Azure, … http://www.datastax.com/dev/ blog/testing-cassandra-1000- nodes-at-a-time • Developing new ways to make sure that C* Scales
  3. Why escape from Hadoop? HADOOP Many Moving Pieces Map Reduce Single Points of Failure Lots of Overhead And there is a way out!
  4. Spark Provides a Simple and Efficient framework for Distributed Computations Node Roles 2 In Memory Caching Yes! Generic DAG Execution Yes! Great Abstraction For Datasets? RDD! Spark Worker Spark Worker Spark Master Spark Worker Resilient Distributed Dataset Spark Executor
  5. Spark is Compatible with HDFS, Parquet, CSVs, ….
  6. Spark is Compatible with HDFS, Parquet, CSVs, …. AND APACHE CASSANDRA Apache Cassandra
  7. Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down
  8. Apache Cassandra Architecture is Very Simple Node Roles 1 Replication Tunable Replication Consistency Tunable C* C* C* C* Client
  9. DataStax OSS Connector Spark to Cassandra https://github.com/datastax/spark4cassandra4connector Cassandra Spark Keyspace Table RDD[CassandraRow] RDD[Tuples] Bundled9and9Supported9with9DSE94.5!
  10. Spark Cassandra Connector uses the DataStax Java Driver to Read from and Write to C* Spark C* Full Token Range Each Executor Maintains a connection to the C* Cluster Spark Executor DataStax Java Driver Tokens 1001 -2000 Tokens 1-1000 Tokens … RDD’s read into different splits based on sets of tokens
  11. Co-locate Spark and C* for Best Performance C* C* C* Spark Worker C* Spark Worker Spark Master Spark Running Spark Workers Worker on the same nodes as your C* Cluster will save network hops when reading and writing
  12. Setting up C* and Spark DSE > 4.5.0 Just start your nodes with dse cassandra -k Apache Cassandra Follow the excellent guide by Al Tobey http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html
  13. We need a Distributed System For Analytics and Batch Jobs But it doesn’t have to be complicated!
  14. Even count needs to be distributed Ask me to write a Map Reduce for word count, I dare you. You could make this easier by adding yet another technology to your Hadoop Stack (hive, pig, impala) or we could just do one liners on the spark shell.
  15. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&);
  16. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); scala>&sc.cassandraTable(“newyork","presidentlocations")& & & cassandraTable
  17. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); scala>&sc.cassandraTable(“newyork","presidentlocations")& & .count& res3:&Long&=&10 cassandraTable count 10
  18. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations")& cassandraTable
  19. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC
  20. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC scala>&sc.cassandraTable(“newyork","presidentlocations") cassandraTable
  21. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC scala>&sc.cassandraTable(“newyork","presidentlocations").toArray& ! res3:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&3,&location:&White&House},&& & …, & CassandraRow{time:&6,&location:&Air&Force&1}) cassandraTable toArray Array of CassandraRows 9 NYC 99 NNYYCC 99 NNYYCC
  22. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  23. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable take(1) Array of CassandraRows 9 NYC http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  24. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable take(1) Array of CassandraRows 9 NYC 9 get[Int] get[Int] get[String] … get[Any] Got Null ? get[Option[Int]] http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  25. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& );
  26. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house
  27. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house
  28. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable get[Int] get[String] 1 white house 1,president,white house
  29. get[Int] get[String] C* Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house 1,president,white house saveToCassandra
  30. get[Int] get[String] C* Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cqlsh:newyork>&SELECT&*&FROM&characterlocations&;& ! &time&|&character&|&location& kkkkkk+kkkkkkkkkkk+kkkkkkkkkkkkk& &&&&5&|&president&|&Air&Force&1& &&&10&|&president&|&&&&&&&&&NYC& …& …& cassandraTable 1 white house 1,president,white house saveToCassandra
  31. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable
  32. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter
  33. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter _ (Anonymous Param) 1 white house
  34. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter 1 white house get[Int] 1 _ (Anonymous Param)
  35. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable _ (Anonymous Param) >7 1 white house get[Int] 1 Filter
  36. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable _ (Anonymous Param) >7 1 white house get[Int] 1 Filter
  37. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure.
  38. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") 1 white house cassandraTable president
  39. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") 1 white house cassandraTable saveToCassandra president C*
  40. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") cqlsh:newyork>&select&*&from&timelines;& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkk& &president&|&&&&1&|&White&House& &president&|&&&&2&|&White&House& &president&|&&&&3&|&White&House& &president&|&&&&4&|&White&House& &president&|&&&&5&|&Air&Force&1& &president&|&&&&6&|&Air&Force&1& &president&|&&&&7&|&Air&Force&1& &president&|&&&&8&|&&&&&&&&&NYC& &president&|&&&&9&|&&&&&&&&&NYC& &president&|&&&10&|&&&&&&&&&NYC 1 white house cassandraTable saveToCassandra president C*
  41. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile
  42. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve
  43. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve plissken,1,Federal Reserve
  44. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve plissken,1,Federal Reserve saveToCassandra C*
  45. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,white house split plissken 1 white house plissken,1,white house saveToCassandra C* cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& &&plissken&|&&&&1&|&Federal&Reserve& &&plissken&|&&&&2&|&Federal&Reserve& &&plissken&|&&&&3&|&Federal&Reserve& &&plissken&|&&&&4&|&&&&&&&&&&&Court& &&plissken&|&&&&5&|&&&&&&&&&&&Court& &&plissken&|&&&&6&|&&&&&&&&&&&Court& &&plissken&|&&&&7&|&&&&&&&&&&&Court& &&plissken&|&&&&8&|&&Stealth&Glider& &&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& &&plissken&|&&&10&|&&&&&&&&&&&&&NYC
  46. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,white house split plissken 1 white house plissken,1,white house saveToCassandra C* cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& &&plissken&|&&&&1&|&Federal&Reserve& &&plissken&|&&&&2&|&Federal&Reserve& &&plissken&|&&&&3&|&Federal&Reserve& &&plissken&|&&&&4&|&&&&&&&&&&&Court& &&plissken&|&&&&5&|&&&&&&&&&&&Court& &&plissken&|&&&&6&|&&&&&&&&&&&Court& &&plissken&|&&&&7&|&&&&&&&&&&&Court& &&plissken&|&&&&8&|&&Stealth&Glider& &&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& &&plissken&|&&&10&|&&&&&&&&&&&&&NYC
  47. Perform a Join with MySQL Maybe a little more than one line … MySQL Table “quotes” in “escape_from_ny” import&java.sql._& import&org.apache.spark.rdd.JdbcRDD& Class.forName(“com.mysql.jdbc.Driver”).newInstance();//Connector/J&added&toSpark&Shell&Classpath& val&quotes&=&new&JdbcRDD(& & sc,&& & ()&=>&{& & & DriverManager.getConnection("jdbc:mysql://Localhost/escape_from_ny?user=root")},&& & "SELECT&*&FROM&quotes&WHERE&?&<=&ID&and&ID&<=&?”,& & 0,& & 100,& & 5,&& & (r:&ResultSet)&=>&{& & & (r.getInt(2),r.getString(3))& & }& )& ! quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23&
  48. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD Needs to be in the form of RDD[K,V] 5, ‘Bob Hauk: …'
  49. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5, ‘Bob Hauk: …'
  50. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5,(‘Bob Hauk: …’,court) 5, ‘Bob Hauk: …'
  51. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5,(‘Bob Hauk: …’,court) 5, ‘Bob Hauk: …'
  52. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) timelineRow character,time,location
  53. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location
  54. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken
  55. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8
  56. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8 character:plissken,time:8,location: Stealth Glider
  57. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) The Future cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8 character:plissken,time:8,location: Stealth Glider
  58. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) cassandraTable
  59. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house cassandraTable get[String]
  60. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house cassandraTable get[String] _.split()
  61. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 cassandraTable get[String] _.split() (_,1)
  62. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 house, 1 house, 1 house, 2 cassandraTable get[String] _.split() (_,1) _ + _
  63. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 house, 1 house, 1 house, 2 cassandraTable get[String] _.split() (_,1) _ + _
  64. Stand Alone App Example https://github.com/RussellSpitzer/spark4cassandra4csv Car,:Model,:Color Dodge,:Caravan,:Red: Ford,:F150,:Black: Toyota,:Prius,:Green Spark SCC RDD: [CassandraRow] !!! FavoriteCars Table Cassandra Column:Mapping CSV
  65. Thanks for listening! There is plenty more we can do with Spark but … Questions?
  66. Getting started with Cassandra?! DataStax Academy offers free online Cassandra training! Planet Cassandra has resources for learning the basics from ‘Try Cassandra’ tutorials to in depth language and migration pages! Find a way to contribute back to the community: talk at a meetup, or share your story on PlanetCassandra.org! Need help? Get questions answered with Planet Cassandra’s free virtual office hours running weekly! Email us: Community@DataStax.com! Thanks for coming to the meetup!! In production?! Tweet us: @PlanetCassandra!
  67. Thanks:for:your:Time:and:Come:to:C*:Summit!: SEPTEMBER91094911,9201499|99SAN9FRANCISCO,9CALIF.99|99THE9WESTIN9ST.9FRANCIS9HOTEL Cassandra:Summit:Link

×