Don’t Reinvent the
Big-Data Wheel!
Clint Kelly - @clintwkelly
WibiData
Building real-time, Big Data applications on
Cassan...
Agenda
Agenda
The problem
Agenda
The problem
How Kiji works
Agenda
The problem
How Kiji works
Kiji in production
Agenda
The problem
How Kiji works
Kiji in production
Kiji on Cassandra
The problem.
!
!
!
Open source
software
!
!
!
!
!
!
?
Data in
Data in
Data in
REST
Inspect
Inspect
Inspect
Inspect
Inspect
Train
Train
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Model
Model
AaBb
Model
AaBb
Score
Score
Score
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Score
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Score
Batch
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Data out
Data out
Data out
REST
Data out
REST
REST
REST
REST
AaBb
AaBb
AaBb
AaBb
Experiments / Deployment
Experiments / Deployment
Experiments / Deployment
c
d
c
d
Experiments / Deployment
c
d
c
d
3
Data in / out
Data in / out
(REST)
Inspect and train
Score
Score
(real-time)
!
?
!!
Kiji
How Kiji works
Kiji History
Kiji History
Kiji History
How does it work?
Kiji
How does it work?
Kiji
Engineering
Data
Science
How does it work?
Kiji
Data
Science
Write
Engineering
How does it work?
Kiji
Data
Science
Write
Channels Engineering
How does it work?
Kiji
Data
Science
Write
Logs
DBs
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
Logs
DBs
KijiMR
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
KijiREST
Stream
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
Read
KijiREST
Stream
EngineeringChannels
How does it work?
KijiSchema
(Cassandra)
Data
Science
Write
Read
KijiREST
Stream
EngineeringChannels
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
EngineeringChannels
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
C
C
C...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiM...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
3
Data in / out
KijiREST
KijiMR
Inspect and train
KijiHive
KijiMR
KijiExpress
Score
(real-time)
KijiModelRepository
KijiScoring
Modular
Kiji in production
In production now
Fortune 500 retailer: Personalized recommendations
Opower: Energy usage and analytics reporting
Fortune 500 retailer
Serving personalized recommendations
Kiji
Write
Logs
DBs
KijiMR
EngineeringChannels
Bulk load
KijiSchema
(Cassandra)
Data
Science
User 1
User 2
User 3
KijiExpress
KijiMR
C
C
C
Data
Train
KijiSchema
(Cassandra)
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
KijiScoring
C
C
C
R
Kiji Model
Reposit...
Kiji on Cassandra
KijiSchema
KijiSchema
KijiSchema
Cassandra
KijiSchema
Cassandra
KijiSchema
HBase
Kiji ~ BigTable
table
table
row
row
row
row
row
row
row
row
row
row
row
row
row
Row key = entity ID
entity ID data
Composite entity IDs
data0xfa “bob”
Column families
payment0xfa “bob” interactions recommendations
inter:
clicks
inter:
search0xfa “bob”
payment:
cardnum
payment:
address
rec:
scorer1
rec:
scorer2
Columns
Timestamped versions
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clic...
Complex data types
record Search {
string search_term;
long session_id;
device_type device;
}
songs:
let it be
inter:
sear...
Locality group
Locality group
Column families
Locality group
Locality group
Batch Batch Batch
Locality group
Batch Batch Batch
Real-
time
Real-
time
Real-
time
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
On disk.
Co...
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
On disk.
Co...
Row ➔ transactional consistency
Locality group ➔ Column family
CREATE TABLE loc_grp
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it...
Entity ID ➔ Primary key
CREATE TABLE loc_grp (city text, user text,
PRIMARY KEY (city, user) )
WITH CLUSTERING ORDER BY (u...
Family, Qualifier,Version ➔ Clustering Columns
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, ver...
Column values ➔ Blobs
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, version bigint, value blob,...
Implementation notes
Implementation notes
DataStax Java driver
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Async API
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Async API
New MapReduce InputFormat
Issues
Operations across locality groups
Operations across locality groups
Kiji locality group ➔ C* column family
Operations across locality groups
Kiji locality group ➔ C* column family
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Filters
HBase ➔ Rich server-side filters
Cassandra ➔ WHERE clauses
Filters
HBase ➔ Rich server-side filters
Cassandra ➔ WHERE clauses
Client-side filtering
Project status
Components working with
Cassandra
KijiSchema
KijiMR
KijiREST
KijiExpress
KijiSchema available for
download / tutorial
https://github.com/kijiproject/kiji-
schema/blob/cassandra/
cassandra_tutoria...
All code available with tutorial
within 1-2 months
Summary
3
Data in / out
KijiREST
KijiMR
Inspect and train
KijiHive
KijiMR
KijiExpress
Score
(real-time)
KijiModelRepository
KijiScoring
Thanks to Cassandra community
Mailing lists
Meetups, webinars, conferences
Try it now!
www.kiji.org
tinyurl.com/mmubg5o
@clintwkelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kelly
Upcoming SlideShare
Loading in...5
×

Kiji cassandra la june 2014 - v02 clint-kelly

529

Published on

Big Data Camp LA 2014, Don't re-invent the Big-Data Wheel, Building real-time, Big Data applications on Cassandra with the open-source Kiji project by Clint Kelly of Wibidata

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
529
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Kiji cassandra la june 2014 - v02 clint-kelly

  1. 1. Don’t Reinvent the Big-Data Wheel! Clint Kelly - @clintwkelly WibiData Building real-time, Big Data applications on Cassandra with the open-source Kiji project Big Data Camp LA 14 June 2014
  2. 2. Agenda
  3. 3. Agenda The problem
  4. 4. Agenda The problem How Kiji works
  5. 5. Agenda The problem How Kiji works Kiji in production
  6. 6. Agenda The problem How Kiji works Kiji in production Kiji on Cassandra
  7. 7. The problem.
  8. 8. !
  9. 9. !
  10. 10. ! Open source software
  11. 11. !
  12. 12. !
  13. 13. !
  14. 14. !
  15. 15. !
  16. 16. ! ?
  17. 17. Data in
  18. 18. Data in
  19. 19. Data in REST
  20. 20. Inspect
  21. 21. Inspect
  22. 22. Inspect
  23. 23. Inspect
  24. 24. Inspect
  25. 25. Train
  26. 26. Train
  27. 27. Train “Trained model”
  28. 28. Train “Trained model”
  29. 29. Train “Trained model”
  30. 30. Train “Trained model”
  31. 31. Train “Trained model”
  32. 32. Model
  33. 33. Model AaBb
  34. 34. Model AaBb
  35. 35. Score
  36. 36. Score
  37. 37. Score AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  38. 38. Score AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  39. 39. Score Batch AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  40. 40. Data out
  41. 41. Data out
  42. 42. Data out REST
  43. 43. Data out REST
  44. 44. REST
  45. 45. REST
  46. 46. REST
  47. 47. AaBb
  48. 48. AaBb
  49. 49. AaBb
  50. 50. AaBb
  51. 51. Experiments / Deployment
  52. 52. Experiments / Deployment
  53. 53. Experiments / Deployment c d c d
  54. 54. Experiments / Deployment c d c d
  55. 55. 3
  56. 56. Data in / out
  57. 57. Data in / out (REST)
  58. 58. Inspect and train
  59. 59. Score
  60. 60. Score (real-time)
  61. 61. ! ?
  62. 62. !! Kiji
  63. 63. How Kiji works
  64. 64. Kiji History
  65. 65. Kiji History
  66. 66. Kiji History
  67. 67. How does it work? Kiji
  68. 68. How does it work? Kiji Engineering Data Science
  69. 69. How does it work? Kiji Data Science Write Engineering
  70. 70. How does it work? Kiji Data Science Write Channels Engineering
  71. 71. How does it work? Kiji Data Science Write Logs DBs EngineeringChannels
  72. 72. How does it work? Kiji Data Science Write Logs DBs KijiMR EngineeringChannels
  73. 73. How does it work? Kiji Data Science Write KijiREST Stream EngineeringChannels
  74. 74. How does it work? Kiji Data Science Write Read KijiREST Stream EngineeringChannels
  75. 75. How does it work? KijiSchema (Cassandra) Data Science Write Read KijiREST Stream EngineeringChannels
  76. 76. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 EngineeringChannels
  77. 77. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  78. 78. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  79. 79. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  80. 80. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive C C C EngineeringChannels
  81. 81. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  82. 82. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  83. 83. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  84. 84. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  85. 85. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiMR C C C EngineeringChannels Data
  86. 86. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR C C C EngineeringChannels Data
  87. 87. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  88. 88. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  89. 89. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R EngineeringChannels Data
  90. 90. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  91. 91. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  92. 92. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R R R EngineeringChannels Data
  93. 93. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  94. 94. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  95. 95. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  96. 96. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  97. 97. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  98. 98. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  99. 99. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  100. 100. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  101. 101. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R
  102. 102. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R
  103. 103. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R R
  104. 104. 3
  105. 105. Data in / out KijiREST KijiMR
  106. 106. Inspect and train KijiHive KijiMR KijiExpress
  107. 107. Score (real-time) KijiModelRepository KijiScoring
  108. 108. Modular
  109. 109. Kiji in production
  110. 110. In production now Fortune 500 retailer: Personalized recommendations Opower: Energy usage and analytics reporting
  111. 111. Fortune 500 retailer Serving personalized recommendations
  112. 112. Kiji Write Logs DBs KijiMR EngineeringChannels Bulk load
  113. 113. KijiSchema (Cassandra) Data Science User 1 User 2 User 3 KijiExpress KijiMR C C C Data Train
  114. 114. KijiSchema (Cassandra) Data Science Write Read KijiREST Stream User 1 User 2 User 3 KijiScoring C C C R Kiji Model Repository EngineeringChannels Scorer Score
  115. 115. Kiji on Cassandra
  116. 116. KijiSchema
  117. 117. KijiSchema
  118. 118. KijiSchema Cassandra
  119. 119. KijiSchema Cassandra
  120. 120. KijiSchema HBase
  121. 121. Kiji ~ BigTable
  122. 122. table
  123. 123. table row row row row row row row row row row row row
  124. 124. row
  125. 125. Row key = entity ID entity ID data
  126. 126. Composite entity IDs data0xfa “bob”
  127. 127. Column families payment0xfa “bob” interactions recommendations
  128. 128. inter: clicks inter: search0xfa “bob” payment: cardnum payment: address rec: scorer1 rec: scorer2 Columns
  129. 129. Timestamped versions songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  130. 130. Complex data types record Search { string search_term; long session_id; device_type device; } songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  131. 131. Locality group
  132. 132. Locality group Column families
  133. 133. Locality group
  134. 134. Locality group Batch Batch Batch
  135. 135. Locality group Batch Batch Batch Real- time Real- time Real- time
  136. 136. Locality group Batch Batch Real- time Real- time Real- time Batch
  137. 137. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  138. 138. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  139. 139. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  140. 140. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed.
  141. 141. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed. In memory.
  142. 142. Row ➔ transactional consistency
  143. 143. Locality group ➔ Column family CREATE TABLE loc_grp songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  144. 144. Entity ID ➔ Primary key CREATE TABLE loc_grp (city text, user text, PRIMARY KEY (city, user) ) WITH CLUSTERING ORDER BY (user ASC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  145. 145. Family, Qualifier,Version ➔ Clustering Columns CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  146. 146. Column values ➔ Blobs CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, value blob, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  147. 147. Implementation notes
  148. 148. Implementation notes DataStax Java driver
  149. 149. Implementation notes DataStax Java driver Cassandra 2.0.6
  150. 150. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API
  151. 151. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API New MapReduce InputFormat
  152. 152. Issues
  153. 153. Operations across locality groups
  154. 154. Operations across locality groups Kiji locality group ➔ C* column family
  155. 155. Operations across locality groups Kiji locality group ➔ C* column family
  156. 156. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups
  157. 157. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  158. 158. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  159. 159. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups
  160. 160. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  161. 161. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  162. 162. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji Lose transactional consistency
  163. 163. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses
  164. 164. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses Client-side filtering
  165. 165. Project status
  166. 166. Components working with Cassandra KijiSchema KijiMR KijiREST KijiExpress
  167. 167. KijiSchema available for download / tutorial https://github.com/kijiproject/kiji- schema/blob/cassandra/ cassandra_tutorial.md (tinyurl.com/mmubg5o)
  168. 168. All code available with tutorial within 1-2 months
  169. 169. Summary
  170. 170. 3
  171. 171. Data in / out KijiREST KijiMR
  172. 172. Inspect and train KijiHive KijiMR KijiExpress
  173. 173. Score (real-time) KijiModelRepository KijiScoring
  174. 174. Thanks to Cassandra community Mailing lists Meetups, webinars, conferences
  175. 175. Try it now! www.kiji.org tinyurl.com/mmubg5o @clintwkelly
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×