Cassandra python driver
Benchmarking concurrency for nyt⨍aбrik
Michael.Laing@nytimes.com
A Global Mesh with a Memory
Message-based: WebSocket, AMQP, SockJS
If in doubt:
• Resend
• Reconnect
• Reread
Idempotent:
...
Message: an event with data
CREATE TABLE source_data (
hash_key int, -- real ones are more complex
message_id timeuuid,
bo...
1-10kb
1-10kb
Ack
Ack
Push
1kb
1kb
10-150kb
10-150kb
Pull
Synchronous:
C* Thrift or
CQL Native
Concurrent
Degree = 3
(using the
Libev event
Loop)
Asynchronous:
CQL Native only
More Concurrency
Can also try:
• DC Aware
• Token Aware
• Subprocessing
Build one
def build_message(self):
message = {
"message_id": str(uuid.uuid1()),
"hash_key": randint(0, self._hash_key_rang...
Kick-off
def push_message(self):
if self._submitted_count.next() < self._message_count:
message = self.build_message()
sel...
Put it in the pipeline
def submit_query(self, message):
body = message.pop('body')
substitution_args = (
json.dumps(messag...
Maintain concurrency or finish
def push_or_finish(self, _):
try:
if (
self._unfinished and
self._confirmed_count.next() < ...
1-10kb
1-10kb
Ack
Ack
Push
Push some messages
usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC]
[--remote-dc-hosts REMOTE_DC_HOSTS]...
Push messages many times
usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS]
[-d LOCAL_DC] [-w [worker_...
1kb
1kb
10-150kb
10-150kb
Pull
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Upcoming SlideShare
Loading in …5
×

Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

2,237 views
2,107 views

Published on

In this session, you’ll learn about how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. Michael will start his talk off by diving into an overview of the NYT⨍aбrik global message bus platform and its “memory” features and then discuss their use of the open source Apache Cassandra Python driver by DataStax. Progressive benchmark to test features/performance will be presented: from naive and synchronous to asynchronous with multiple IO loops; these benchmarks tailored to usage at the NY Times. Code snippets, followed by beer, for those who survive. All code available on Github!

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,237
On SlideShare
0
From Embeds
0
Number of Embeds
1,563
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

  1. 1. Cassandra python driver Benchmarking concurrency for nyt⨍aбrik Michael.Laing@nytimes.com
  2. 2. A Global Mesh with a Memory Message-based: WebSocket, AMQP, SockJS If in doubt: • Resend • Reconnect • Reread Idempotent: • Replicating • Racy • Resolving Classes of service: • Gold: replicate/race • Silver: prioritize • Bronze: queueable Millions of users
  3. 3. Message: an event with data CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id) );
  4. 4. 1-10kb 1-10kb Ack Ack Push
  5. 5. 1kb 1kb 10-150kb 10-150kb Pull Synchronous: C* Thrift or CQL Native
  6. 6. Concurrent Degree = 3 (using the Libev event Loop) Asynchronous: CQL Native only
  7. 7. More Concurrency Can also try: • DC Aware • Token Aware • Subprocessing
  8. 8. Build one def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }
  9. 9. Kick-off def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message) def push_initial_data(self): self._start_time = time() try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()
  10. 10. Put it in the pipeline def submit_query(self, message): body = message.pop('body') substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) ) future = self._cql_session.execute_async( self._query, substitution_args ) future.add_callback(self.push_or_finish) future.add_errback(self.note_error)
  11. 11. Maintain concurrency or finish def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()
  12. 12. 1-10kb 1-10kb Ack Ack Push
  13. 13. Push some messages usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Push messages from a RabbitMQ queue into a Cassandra table.
  14. 14. Push messages many times usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Run multiple test cases based upon the product of worker_counts, prefetch_counts, and consistency_levels. Each test case may be run with up to 4 variations reflecting the use or not of the dc_aware and token_aware policies. The results are output to stdout as a JSON object.
  15. 15. 1kb 1kb 10-150kb 10-150kb Pull

×