Kiji cassandra la   june 2014 - v02 clint-kelly
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Kiji cassandra la june 2014 - v02 clint-kelly

on

  • 349 views

Big Data Camp LA 2014, Don't re-invent the Big-Data Wheel, Building real-time, Big Data applications on Cassandra with the open-source Kiji project by Clint Kelly of Wibidata

Big Data Camp LA 2014, Don't re-invent the Big-Data Wheel, Building real-time, Big Data applications on Cassandra with the open-source Kiji project by Clint Kelly of Wibidata

Statistics

Views

Total Views
349
Views on SlideShare
325
Embed Views
24

Actions

Likes
0
Downloads
13
Comments
0

2 Embeds 24

https://twitter.com 22
http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Kiji cassandra la june 2014 - v02 clint-kelly Presentation Transcript

  • 1. Don’t Reinvent the Big-Data Wheel! Clint Kelly - @clintwkelly WibiData Building real-time, Big Data applications on Cassandra with the open-source Kiji project Big Data Camp LA 14 June 2014
  • 2. Agenda
  • 3. Agenda The problem
  • 4. Agenda The problem How Kiji works
  • 5. Agenda The problem How Kiji works Kiji in production
  • 6. Agenda The problem How Kiji works Kiji in production Kiji on Cassandra
  • 7. The problem.
  • 8. !
  • 9. !
  • 10. ! Open source software
  • 11. !
  • 12. !
  • 13. !
  • 14. !
  • 15. !
  • 16. ! ?
  • 17. Data in
  • 18. Data in
  • 19. Data in REST
  • 20. Inspect
  • 21. Inspect
  • 22. Inspect
  • 23. Inspect
  • 24. Inspect
  • 25. Train
  • 26. Train
  • 27. Train “Trained model”
  • 28. Train “Trained model”
  • 29. Train “Trained model”
  • 30. Train “Trained model”
  • 31. Train “Trained model”
  • 32. Model
  • 33. Model AaBb
  • 34. Model AaBb
  • 35. Score
  • 36. Score
  • 37. Score AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  • 38. Score AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  • 39. Score Batch AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  • 40. Data out
  • 41. Data out
  • 42. Data out REST
  • 43. Data out REST
  • 44. REST
  • 45. REST
  • 46. REST
  • 47. AaBb
  • 48. AaBb
  • 49. AaBb
  • 50. AaBb
  • 51. Experiments / Deployment
  • 52. Experiments / Deployment
  • 53. Experiments / Deployment c d c d
  • 54. Experiments / Deployment c d c d
  • 55. 3
  • 56. Data in / out
  • 57. Data in / out (REST)
  • 58. Inspect and train
  • 59. Score
  • 60. Score (real-time)
  • 61. ! ?
  • 62. !! Kiji
  • 63. How Kiji works
  • 64. Kiji History
  • 65. Kiji History
  • 66. Kiji History
  • 67. How does it work? Kiji
  • 68. How does it work? Kiji Engineering Data Science
  • 69. How does it work? Kiji Data Science Write Engineering
  • 70. How does it work? Kiji Data Science Write Channels Engineering
  • 71. How does it work? Kiji Data Science Write Logs DBs EngineeringChannels
  • 72. How does it work? Kiji Data Science Write Logs DBs KijiMR EngineeringChannels
  • 73. How does it work? Kiji Data Science Write KijiREST Stream EngineeringChannels
  • 74. How does it work? Kiji Data Science Write Read KijiREST Stream EngineeringChannels
  • 75. How does it work? KijiSchema (Cassandra) Data Science Write Read KijiREST Stream EngineeringChannels
  • 76. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 EngineeringChannels
  • 77. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  • 78. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  • 79. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  • 80. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive C C C EngineeringChannels
  • 81. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  • 82. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  • 83. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  • 84. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  • 85. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiMR C C C EngineeringChannels Data
  • 86. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR C C C EngineeringChannels Data
  • 87. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  • 88. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  • 89. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R EngineeringChannels Data
  • 90. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  • 91. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  • 92. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R R R EngineeringChannels Data
  • 93. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 94. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 95. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 96. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 97. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 98. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 99. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 100. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  • 101. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R
  • 102. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R
  • 103. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R R
  • 104. 3
  • 105. Data in / out KijiREST KijiMR
  • 106. Inspect and train KijiHive KijiMR KijiExpress
  • 107. Score (real-time) KijiModelRepository KijiScoring
  • 108. Modular
  • 109. Kiji in production
  • 110. In production now Fortune 500 retailer: Personalized recommendations Opower: Energy usage and analytics reporting
  • 111. Fortune 500 retailer Serving personalized recommendations
  • 112. Kiji Write Logs DBs KijiMR EngineeringChannels Bulk load
  • 113. KijiSchema (Cassandra) Data Science User 1 User 2 User 3 KijiExpress KijiMR C C C Data Train
  • 114. KijiSchema (Cassandra) Data Science Write Read KijiREST Stream User 1 User 2 User 3 KijiScoring C C C R Kiji Model Repository EngineeringChannels Scorer Score
  • 115. Kiji on Cassandra
  • 116. KijiSchema
  • 117. KijiSchema
  • 118. KijiSchema Cassandra
  • 119. KijiSchema Cassandra
  • 120. KijiSchema HBase
  • 121. Kiji ~ BigTable
  • 122. table
  • 123. table row row row row row row row row row row row row
  • 124. row
  • 125. Row key = entity ID entity ID data
  • 126. Composite entity IDs data0xfa “bob”
  • 127. Column families payment0xfa “bob” interactions recommendations
  • 128. inter: clicks inter: search0xfa “bob” payment: cardnum payment: address rec: scorer1 rec: scorer2 Columns
  • 129. Timestamped versions songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 130. Complex data types record Search { string search_term; long session_id; device_type device; } songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 131. Locality group
  • 132. Locality group Column families
  • 133. Locality group
  • 134. Locality group Batch Batch Batch
  • 135. Locality group Batch Batch Batch Real- time Real- time Real- time
  • 136. Locality group Batch Batch Real- time Real- time Real- time Batch
  • 137. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  • 138. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  • 139. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  • 140. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed.
  • 141. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed. In memory.
  • 142. Row ➔ transactional consistency
  • 143. Locality group ➔ Column family CREATE TABLE loc_grp songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 144. Entity ID ➔ Primary key CREATE TABLE loc_grp (city text, user text, PRIMARY KEY (city, user) ) WITH CLUSTERING ORDER BY (user ASC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 145. Family, Qualifier,Version ➔ Clustering Columns CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 146. Column values ➔ Blobs CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, value blob, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  • 147. Implementation notes
  • 148. Implementation notes DataStax Java driver
  • 149. Implementation notes DataStax Java driver Cassandra 2.0.6
  • 150. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API
  • 151. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API New MapReduce InputFormat
  • 152. Issues
  • 153. Operations across locality groups
  • 154. Operations across locality groups Kiji locality group ➔ C* column family
  • 155. Operations across locality groups Kiji locality group ➔ C* column family
  • 156. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups
  • 157. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  • 158. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  • 159. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups
  • 160. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  • 161. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  • 162. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji Lose transactional consistency
  • 163. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses
  • 164. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses Client-side filtering
  • 165. Project status
  • 166. Components working with Cassandra KijiSchema KijiMR KijiREST KijiExpress
  • 167. KijiSchema available for download / tutorial https://github.com/kijiproject/kiji- schema/blob/cassandra/ cassandra_tutorial.md (tinyurl.com/mmubg5o)
  • 168. All code available with tutorial within 1-2 months
  • 169. Summary
  • 170. 3
  • 171. Data in / out KijiREST KijiMR
  • 172. Inspect and train KijiHive KijiMR KijiExpress
  • 173. Score (real-time) KijiModelRepository KijiScoring
  • 174. Thanks to Cassandra community Mailing lists Meetups, webinars, conferences
  • 175. Try it now! www.kiji.org tinyurl.com/mmubg5o @clintwkelly