Building a Flexible, Real-time
Big Data Applications Platform
on Cassandra with Kiji
Clint Kelly
Member of Technical Staff...
Agenda
Agenda
The problem
Agenda
The problem
How Kiji works
Agenda
The problem
How Kiji works
Kiji on Cassandra
!
!
!
Open source
software
!
!
!
!
!
!
?
Data in
Data in
Data in
REST
Inspect
Inspect
Inspect
Inspect
Inspect
Train
Train
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Train
“Trained
model”
Model
Model
AaBb
Model
AaBb
Model
Model
Model
Apply
Apply
Apply
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Apply
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Apply
Batch
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
AaBb
Data out
Data out
Data out
REST
Data out
REST
REST
REST
REST
AaBb
AaBb
AaBb
Experiments / Deployment
Experiments / Deployment
Experiments / Deployment
c
d
c
d
Experiments / Deployment
c
d
c
d
3
Data in / out
Data in / out
(REST)
Inspect and train
Apply
Apply
(real-time)
!
?
!!
Kiji
How Kiji works
Kiji History
Kiji History
Kiji History
Kiji History
Kiji History
Kiji History
Kiji History
Kiji History
In production now
Fortune 500 retailer: Personalized recommendations
Opower: Energy usage and analytics reporting
How does it work?
Kiji
How does it work?
Kiji
Engineering
Data
Science
How does it work?
Kiji
Data
Science
Write
Engineering
How does it work?
Kiji
Data
Science
Write
Channels Engineering
How does it work?
Kiji
Data
Science
Write
Logs
DBs
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
Logs
DBs
KijiMR
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
KijiREST
Stream
EngineeringChannels
How does it work?
Kiji
Data
Science
Write
Read
KijiREST
Stream
EngineeringChannels
How does it work?
KijiSchema
(Cassandra)
Data
Science
Write
Read
KijiREST
Stream
EngineeringChannels
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
EngineeringChannels
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
C
C
C
EngineeringCha...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
C
C
C...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
Data
...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiM...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
KijiSchema
(Cassandra)
How does it work?
Data
Science
Write
Read
KijiREST
Stream
User 1
User 2
User 3
Query
KijiHive
KijiE...
3
Data in / out
KijiREST
KijiMR
Inspect and train
KijiHive
KijiMR
KijiExpress
Apply
(real-time)
KijiModelRepository
KijiScoring
Modular
Kiji on Cassandra
Kiji ~ BigTable
table
table
row
row
row
row
row
row
row
row
row
row
row
row
row
Row key = entity ID
entity ID data
Composite entity IDs
data0xfa “bob”
Column families
payment0xfa “bob” interactions recommendations
inter:
clicks
inter:
search0xfa “bob”
payment:
cardnum
payment:
address
rec:
scorer1
rec:
scorer2
Columns
Timestamped versions
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it besongs:
let it be
inter:
clic...
Complex data types
record Search {
string search_term;
long session_id;
device_type device;
}
songs:
let it be
inter:
sear...
Locality group
Locality group
Column families
Locality group
Locality group
Batch Batch Batch
Locality group
Batch Batch Batch
Real-
time
Real-
time
Real-
time
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
On disk.
Co...
locality_group_real_timelocality_group_batch
Locality group
Batch Batch
Real-
time
Real-
time
Real-
time
Batch
On disk.
Co...
Row ➔ transactional consistency
Locality group ➔ Column family
CREATE TABLE loc_grp
songs:
let it be
inter:
search0xfa “bob” songs:
let it besongs:
let it...
Entity ID ➔ Primary key
CREATE TABLE loc_grp (city text, user text,
PRIMARY KEY (city, user) )
WITH CLUSTERING ORDER BY (u...
Family, Qualifier,Version ➔ Clustering Columns
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, ver...
Column values ➔ Blobs
CREATE TABLE loc_grp (city text, user text,
family text, qualifier text, version bigint, value blob,...
bob:pay:cardnum:t
AMEX1234...
bob:pay:addr:t5
1234 Main St, SF
bob:inter:clicks:t9
...
bob:inter:clicks:t7
...
bob:inter:c...
Implementation notes
Implementation notes
DataStax Java driver
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Async API
Implementation notes
DataStax Java driver
Cassandra 2.0.6
Async API
New MapReduce InputFormat
Issues
Operations across locality groups
Operations across locality groups
Kiji locality group ➔ C* column family
Operations across locality groups
Kiji locality group ➔ C* column family
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Operations across locality groups
Kiji locality group ➔ C* column family
Read across locality groups
➔ multiple C* reads (...
Filters
HBase ➔ Rich server-side filters
Cassandra ➔ WHERE clauses
Filters
HBase ➔ Rich server-side filters
Cassandra ➔ WHERE clauses
Client-side filtering
Entity IDs with unhashed
components
EntityId(state, city, username)
EntityId(state, city, username)
hashed
EntityId(state, city, username)
hashed unhashed
EntityId(state, city, username)
hashed unhashed
0x235af-alice
0x235af-bob
0x235af-cathy
0x235af-dave
0x38e0a-andy
0x38e0a-...
EntityId(state, city, username)
hashed unhashed
0x235af-alice
0x235af-bob
0x235af-cathy
0x235af-dave
0x38e0a-andy
0x38e0a-...
EntityId(state, city, username)
hashed unhashed
0x235af-alice
0x235af-bob
0x235af-cathy
0x235af-dave
0x38e0a-andy
0x38e0a-...
Project status
KijiSchema (alpha) ready now.
https://github.com/kijiproject/kiji-
schema/blob/cassandra/
cassandra_tutorial.md
(tinyurl.c...
Next quarter
Cassandra in all Kiji components
Run MapReduce jobs with KijiExpress
Expose Cassandra-specific features
3
Data in / out
KijiREST
KijiMR
Inspect and train
KijiHive
KijiMR
KijiExpress
Apply
(real-time)
KijiModelRepository
KijiScoring
Thanks to Cassandra community
Mailing lists
Meetups, webinars, conferences
Try it now!
www.kiji.org/getstarted
tinyurl.com/mmubg5o
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra
Upcoming SlideShare
Loading in …5
×

South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra

552 views

Published on

The Kiji Project is a modular, open-source framework that enables developers to efficiently build real-time Big Data applications. Kiji is built upon popular open-source technologies such as Cassandra, HBase, Hadoop, and Scalding, and contains components that implement functionality critical for Big Data applications, including the following:
• Support for evolvable schemas of complex data types

• Batch training of machine learning models with Hadoop

• Real-time scoring with trained modelsIntegration with Hive and R

• A REST endpoint

Recently, we have updated Kiji to use Cassandra as a backing data store (previously, Kiji worked only with HBase). In this talk, we describe the process of integrating Cassandra and Kiji. Topics we cover include the following:

• The Kiji architecture and data model

• Implementing the Kiji data model in Cassandra using the Java driver and CQL3

• Integrating Cassandra with Hadoop 2.x

• Building a flexible middleware platform that supports Cassandra and HBase (including projects that use both simultaneously)

• Exposing unique features of Cassandra (e.g., variable consistency) to Kiji users

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
552
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

South Bay Cassandra Meetup 4/23: Building a flexible, real-time Big Data Applications platform on Cassandra

  1. 1. Building a Flexible, Real-time Big Data Applications Platform on Cassandra with Kiji Clint Kelly Member of Technical Staff WibiData Cassandra Meetup 23 April 2014
  2. 2. Agenda
  3. 3. Agenda The problem
  4. 4. Agenda The problem How Kiji works
  5. 5. Agenda The problem How Kiji works Kiji on Cassandra
  6. 6. !
  7. 7. !
  8. 8. ! Open source software
  9. 9. !
  10. 10. !
  11. 11. !
  12. 12. !
  13. 13. !
  14. 14. ! ?
  15. 15. Data in
  16. 16. Data in
  17. 17. Data in REST
  18. 18. Inspect
  19. 19. Inspect
  20. 20. Inspect
  21. 21. Inspect
  22. 22. Inspect
  23. 23. Train
  24. 24. Train
  25. 25. Train “Trained model”
  26. 26. Train “Trained model”
  27. 27. Train “Trained model”
  28. 28. Train “Trained model”
  29. 29. Train “Trained model”
  30. 30. Model
  31. 31. Model AaBb
  32. 32. Model AaBb
  33. 33. Model
  34. 34. Model
  35. 35. Model
  36. 36. Apply
  37. 37. Apply
  38. 38. Apply AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  39. 39. Apply AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  40. 40. Apply Batch AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb AaBb
  41. 41. Data out
  42. 42. Data out
  43. 43. Data out REST
  44. 44. Data out REST
  45. 45. REST
  46. 46. REST
  47. 47. REST
  48. 48. AaBb
  49. 49. AaBb
  50. 50. AaBb
  51. 51. Experiments / Deployment
  52. 52. Experiments / Deployment
  53. 53. Experiments / Deployment c d c d
  54. 54. Experiments / Deployment c d c d
  55. 55. 3
  56. 56. Data in / out
  57. 57. Data in / out (REST)
  58. 58. Inspect and train
  59. 59. Apply
  60. 60. Apply (real-time)
  61. 61. ! ?
  62. 62. !! Kiji
  63. 63. How Kiji works
  64. 64. Kiji History
  65. 65. Kiji History
  66. 66. Kiji History
  67. 67. Kiji History
  68. 68. Kiji History
  69. 69. Kiji History
  70. 70. Kiji History
  71. 71. Kiji History
  72. 72. In production now Fortune 500 retailer: Personalized recommendations Opower: Energy usage and analytics reporting
  73. 73. How does it work? Kiji
  74. 74. How does it work? Kiji Engineering Data Science
  75. 75. How does it work? Kiji Data Science Write Engineering
  76. 76. How does it work? Kiji Data Science Write Channels Engineering
  77. 77. How does it work? Kiji Data Science Write Logs DBs EngineeringChannels
  78. 78. How does it work? Kiji Data Science Write Logs DBs KijiMR EngineeringChannels
  79. 79. How does it work? Kiji Data Science Write KijiREST Stream EngineeringChannels
  80. 80. How does it work? Kiji Data Science Write Read KijiREST Stream EngineeringChannels
  81. 81. How does it work? KijiSchema (Cassandra) Data Science Write Read KijiREST Stream EngineeringChannels
  82. 82. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 EngineeringChannels
  83. 83. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  84. 84. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  85. 85. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 C C C EngineeringChannels
  86. 86. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive C C C EngineeringChannels
  87. 87. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  88. 88. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  89. 89. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  90. 90. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive Data C C C EngineeringChannels
  91. 91. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiMR C C C EngineeringChannels Data
  92. 92. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR C C C EngineeringChannels Data
  93. 93. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  94. 94. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  95. 95. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R EngineeringChannels Data
  96. 96. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  97. 97. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C EngineeringChannels Data
  98. 98. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Scorer C C C R R R EngineeringChannels Data
  99. 99. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  100. 100. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  101. 101. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  102. 102. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  103. 103. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  104. 104. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  105. 105. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer
  106. 106. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R
  107. 107. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R
  108. 108. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R R
  109. 109. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR KijiScoring C C C R Kiji Model Repository EngineeringChannels Data Scorer R R R c d c d
  110. 110. KijiSchema (Cassandra) How does it work? Data Science Write Read KijiREST Stream User 1 User 2 User 3 Query KijiHive KijiExpress KijiMR Kiji Model Repository KijiScoring Freshness Policy C C C R EngineeringChannels Data
  111. 111. 3
  112. 112. Data in / out KijiREST KijiMR
  113. 113. Inspect and train KijiHive KijiMR KijiExpress
  114. 114. Apply (real-time) KijiModelRepository KijiScoring
  115. 115. Modular
  116. 116. Kiji on Cassandra
  117. 117. Kiji ~ BigTable
  118. 118. table
  119. 119. table row row row row row row row row row row row row
  120. 120. row
  121. 121. Row key = entity ID entity ID data
  122. 122. Composite entity IDs data0xfa “bob”
  123. 123. Column families payment0xfa “bob” interactions recommendations
  124. 124. inter: clicks inter: search0xfa “bob” payment: cardnum payment: address rec: scorer1 rec: scorer2 Columns
  125. 125. Timestamped versions songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  126. 126. Complex data types record Search { string search_term; long session_id; device_type device; } songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  127. 127. Locality group
  128. 128. Locality group Column families
  129. 129. Locality group
  130. 130. Locality group Batch Batch Batch
  131. 131. Locality group Batch Batch Batch Real- time Real- time Real- time
  132. 132. Locality group Batch Batch Real- time Real- time Real- time Batch
  133. 133. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  134. 134. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  135. 135. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch
  136. 136. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed.
  137. 137. locality_group_real_timelocality_group_batch Locality group Batch Batch Real- time Real- time Real- time Batch On disk. Compressed. In memory.
  138. 138. Row ➔ transactional consistency
  139. 139. Locality group ➔ Column family CREATE TABLE loc_grp songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  140. 140. Entity ID ➔ Primary key CREATE TABLE loc_grp (city text, user text, PRIMARY KEY (city, user) ) WITH CLUSTERING ORDER BY (user ASC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  141. 141. Family, Qualifier,Version ➔ Clustering Columns CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  142. 142. Column values ➔ Blobs CREATE TABLE loc_grp (city text, user text, family text, qualifier text, version bigint, value blob, PRIMARY KEY (city, user, family, qualifier, version) ) WITH CLUSTERING ORDER BY (user ASC, family ASC, qualifier ASC, version DESC); songs: let it be inter: search0xfa “bob” songs: let it besongs: let it besongs: let it be inter: clicks 1396560123 payment: cardnum payment: address rec: scorer2 rec: scorer3rec: scorer3rec: scorer3 rec: scorer1 1395650231
  143. 143. bob:pay:cardnum:t AMEX1234... bob:pay:addr:t5 1234 Main St, SF bob:inter:clicks:t9 ... bob:inter:clicks:t7 ... bob:inter:clicks:t6 ... 0xfa
  144. 144. Implementation notes
  145. 145. Implementation notes DataStax Java driver
  146. 146. Implementation notes DataStax Java driver Cassandra 2.0.6
  147. 147. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API
  148. 148. Implementation notes DataStax Java driver Cassandra 2.0.6 Async API New MapReduce InputFormat
  149. 149. Issues
  150. 150. Operations across locality groups
  151. 151. Operations across locality groups Kiji locality group ➔ C* column family
  152. 152. Operations across locality groups Kiji locality group ➔ C* column family
  153. 153. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups
  154. 154. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  155. 155. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!)
  156. 156. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups
  157. 157. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  158. 158. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji
  159. 159. Operations across locality groups Kiji locality group ➔ C* column family Read across locality groups ➔ multiple C* reads (async API!) Compare-and-set across locality groups ➔ not allowed in C* Kiji Lose transactional consistency
  160. 160. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses
  161. 161. Filters HBase ➔ Rich server-side filters Cassandra ➔ WHERE clauses Client-side filtering
  162. 162. Entity IDs with unhashed components
  163. 163. EntityId(state, city, username)
  164. 164. EntityId(state, city, username) hashed
  165. 165. EntityId(state, city, username) hashed unhashed
  166. 166. EntityId(state, city, username) hashed unhashed 0x235af-alice 0x235af-bob 0x235af-cathy 0x235af-dave 0x38e0a-andy 0x38e0a-jane 0x38e0a-lucy 0x38e0a-nancy HBase
  167. 167. EntityId(state, city, username) hashed unhashed 0x235af-alice 0x235af-bob 0x235af-cathy 0x235af-dave 0x38e0a-andy 0x38e0a-jane 0x38e0a-lucy 0x38e0a-nancy HBase 0x235af | alice | bob | cathy | dave 0x38e0a | andy | jane | lucy | nancy Cassandra
  168. 168. EntityId(state, city, username) hashed unhashed 0x235af-alice 0x235af-bob 0x235af-cathy 0x235af-dave 0x38e0a-andy 0x38e0a-jane 0x38e0a-lucy 0x38e0a-nancy HBase 0x235af | alice | bob | cathy | dave 0x38e0a | andy | jane | lucy | nancy Cassandra Limited to width of C* wide row!
  169. 169. Project status
  170. 170. KijiSchema (alpha) ready now. https://github.com/kijiproject/kiji- schema/blob/cassandra/ cassandra_tutorial.md (tinyurl.com/mmubg5o)
  171. 171. Next quarter Cassandra in all Kiji components Run MapReduce jobs with KijiExpress Expose Cassandra-specific features
  172. 172. 3
  173. 173. Data in / out KijiREST KijiMR
  174. 174. Inspect and train KijiHive KijiMR KijiExpress
  175. 175. Apply (real-time) KijiModelRepository KijiScoring
  176. 176. Thanks to Cassandra community Mailing lists Meetups, webinars, conferences
  177. 177. Try it now! www.kiji.org/getstarted tinyurl.com/mmubg5o

×