Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cassandra python driver
Benchmarking concurrency for nyt⨍aбrik
Michael.Laing@nytimes.com
A Global Mesh with a Memory
Message-based: WebSocket, AMQP, SockJS
If in doubt:
• Resend
• Reconnect
• Reread
Idempotent:
...
Message: an event with data
CREATE TABLE source_data (
hash_key int, -- real ones are more complex
message_id timeuuid,
bo...
1-10kb
1-10kb
Ack
Ack
Push
1kb
1kb
10-150kb
10-150kb
Pull
Synchronous:
C* Thrift or
CQL Native
Concurrent
Degree = 3
(using the
Libev event
Loop)
Asynchronous:
CQL Native only
More Concurrency
Can also try:
• DC Aware
• Token Aware
• Subprocessing
Build one
def build_message(self):
message = {
"message_id": str(uuid.uuid1()),
"hash_key": randint(0, self._hash_key_rang...
Kick-off
def push_message(self):
if self._submitted_count.next() < self._message_count:
message = self.build_message()
sel...
Put it in the pipeline
def submit_query(self, message):
body = message.pop('body')
substitution_args = (
json.dumps(messag...
Maintain concurrency or finish
def push_or_finish(self, _):
try:
if (
self._unfinished and
self._confirmed_count.next() < ...
1-10kb
1-10kb
Ack
Ack
Push
Push some messages
usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC]
[--remote-dc-hosts REMOTE_DC_HOSTS]...
Push messages many times
usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS]
[-d LOCAL_DC] [-w [worker_...
1kb
1kb
10-150kb
10-150kb
Pull
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Upcoming SlideShare
Loading in …5
×

Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

2,393 views

Published on

In this session, you’ll learn about how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. Michael will start his talk off by diving into an overview of the NYT⨍aбrik global message bus platform and its “memory” features and then discuss their use of the open source Apache Cassandra Python driver by DataStax. Progressive benchmark to test features/performance will be presented: from naive and synchronous to asynchronous with multiple IO loops; these benchmarks tailored to usage at the NY Times. Code snippets, followed by beer, for those who survive. All code available on Github!

Published in: Technology
  • I think you need a perfect and 100% unique academic essays papers have a look once this site i hope you will get valuable papers, ⇒ www.WritePaper.info ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! I just wanted to share a list of sites that helped me a lot during my studies: .................................................................................................................................... www.EssayWrite.best - Write an essay .................................................................................................................................... www.LitReview.xyz - Summary of books .................................................................................................................................... www.Coursework.best - Online coursework .................................................................................................................................... www.Dissertations.me - proquest dissertations .................................................................................................................................... www.ReMovie.club - Movies reviews .................................................................................................................................... www.WebSlides.vip - Best powerpoint presentations .................................................................................................................................... www.WritePaper.info - Write a research paper .................................................................................................................................... www.EddyHelp.com - Homework help online .................................................................................................................................... www.MyResumeHelp.net - Professional resume writing service .................................................................................................................................. www.HelpWriting.net - Help with writing any papers ......................................................................................................................................... Save so as not to lose
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

  1. 1. Cassandra python driver Benchmarking concurrency for nyt⨍aбrik Michael.Laing@nytimes.com
  2. 2. A Global Mesh with a Memory Message-based: WebSocket, AMQP, SockJS If in doubt: • Resend • Reconnect • Reread Idempotent: • Replicating • Racy • Resolving Classes of service: • Gold: replicate/race • Silver: prioritize • Bronze: queueable Millions of users
  3. 3. Message: an event with data CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id) );
  4. 4. 1-10kb 1-10kb Ack Ack Push
  5. 5. 1kb 1kb 10-150kb 10-150kb Pull Synchronous: C* Thrift or CQL Native
  6. 6. Concurrent Degree = 3 (using the Libev event Loop) Asynchronous: CQL Native only
  7. 7. More Concurrency Can also try: • DC Aware • Token Aware • Subprocessing
  8. 8. Build one def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }
  9. 9. Kick-off def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message) def push_initial_data(self): self._start_time = time() try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()
  10. 10. Put it in the pipeline def submit_query(self, message): body = message.pop('body') substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) ) future = self._cql_session.execute_async( self._query, substitution_args ) future.add_callback(self.push_or_finish) future.add_errback(self.note_error)
  11. 11. Maintain concurrency or finish def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()
  12. 12. 1-10kb 1-10kb Ack Ack Push
  13. 13. Push some messages usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Push messages from a RabbitMQ queue into a Cassandra table.
  14. 14. Push messages many times usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Run multiple test cases based upon the product of worker_counts, prefetch_counts, and consistency_levels. Each test case may be run with up to 4 variations reflecting the use or not of the dc_aware and token_aware policies. The results are output to stdout as a JSON object.
  15. 15. 1kb 1kb 10-150kb 10-150kb Pull

×