This upload requires better support for ODP format


Published on

I uplopaded this version in Open Office .ODP format, which is presumably the reason slideshare messed up the formatting. Slideshare, can we get some better support for open formats, stat?

If you'd like to view these slides, I've re-uploaded this talk in .ppt format.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • soft arch at mnn the first and largest community broadcast facility, producing more original content than any network in the world incl. any major network, abc, hbo. With thousands of producers, city-approved to expand to 9 on-air channels, and every episode stored in database, currently metadata only, with a roadmap to archive all broadcast content in a system that integrates with both the playout system and an on-demand web delivery system. And we're doing it with Drupal with which we're starting to use MongoDB. In fact, despite the surprising lack of information out there, MongoDB is getting a lot of traction in Drupal, in fact MongoDB is being more and more seen as a natural match for Drupal for a number of reasons. Document-oriented non-relational DB + Drupal = the Shiznit
  • Using MongoDB with Drupal. the social graph is not a term you hear but it's uber apt It's surprising not more is being written about MongoDB for Drupal given how exciting it is. (I know exciting here is a relative term dependent on yr level of geekinees.) In fact if you're reading this, that's pretty much it, ATM. & if you're reading this it's assumed you know what Drupal is, while you may or may not know that MongoDB is a stand-alone database server, barely a year old, that gains enormous improvements in speed by virtue of its schema-free design, trading off resource hungry features of RDBMS (joins, transactions) to achieve much better performance, It's optimized for web development and makes a great match with Drupal for a number of reasons. One of the first things you may ask about Mongo Drupal, apart from “why do all these softwarez have such funny names,” is why you would want to use them together. The answer is a word is speed, or rather its availability. More esoterically, MongoDB can give Drupal sites a rather high CAP coefficient* (a metric measuring performance in CAP space) while providing functional isomorphism between application and database layers that bears more of a resemblance to Internet programming patterns hey can we add some social media to out website
  • Essentially the web applications architecture issue such as Sun encountered into when they tried to build web-apps in Java. They had to *adjust* the meanings of some terms to gloss over architectural shortcomings since internet programming patterns turned out to differ in important ways from traditional, even distributed applications. Today we see the same thing with RDBMS (NoSQL is catchier than NoRDBMS) oob mysql is not- lack of relational features 1 foreign constraints 2 ??? 3 row level locking but as it was the only tool in the tool box (object db's ran out of steam 10 years ago when they hit the i/o wall) that's what we've been using, and we've managed to make it work, problems notwithstanding. In particular, schema-based RDBMS don't play well with web-apps (>>>slide>>>) as your requirements for availability and scalability grow.
  • However Schema-based RDBMS tend to be the worst offenders; SQL in particular has a number of issues: > MySQL / InnoDB has a very unfriendly method of doing schema changes: it rebuilds the table and blocks all writes until the new table is ready. > Big price – transactions. Worked so hard to get transactions in, wait, why are we working hard? > Rich feature set string functions you've never used > More abstractly, reality, or at least the reality of the Interwebs doesn't play nice with SQL as the former doesn't have a rigid schema that the later wants to enforce. A thousand blog posts bear witness to the list of reasons that essentially boil down to the big price you have to pay to support features of a system that reality doesn't map well to reality. And this is of course is one of the most important things these systems/machines/programs do: map reality. And while it may seem to be a fairly high level of abstraction to compare database models to reality ('itself') there is in fact a crucial dialogue in database programming between abstraction and actuality that is exemplified by what my signature quote <blockquote>In Theory, ...
  • behaviour you experience comes from fundamental decisions made long ago in t programming Relations in theory and objects in practice If someone is getting too religious about the lack of schemas they're going to be more on the theory side even if mongodb itself has more affinitiy with the practical side Mongo encourages you to: have no data model and admit it
  • Sidebar: CAP theorem – availability, consistency, partition tolerance. “ available and scalable” Dont want to spend too much time, but devs still don't seem to have the full sense of why mongodb is making big waves despite only being a year old.... atomic - everything you need is in 1 doc simple-one query, one thing fast - no cross table queries, one collection mysql trades availability for speed In asking why mongo you may as well be asking why not sql or simply, why nosql? don't hit database? then how do you keep it consistant Without going into to much detail, we can list out the main benefits realised in MongoDB as follows:
  • 1. It is just blindingly, blazingy, insaningly fast. It also gives you faster development times (no data mapping to speak of, no schemas to manage…) 50 hour SQL import in 2 hours? Believe it. 2. It's tableless. Based on collections/documents 3. Goodbye queries, hello save/find arrays (querying inside arrarys_ There's no need to learn/use a querying language (such as there is with CouchDB) you can arbitrarily query inside collections, it doesn't matter what's in the document. Mongo's indexing is one reason it's so fast. (querying part still needed some structure to be added to the unstructured collection, in the form of indexes, because without them the result sets had to be filtered client-side) 4. It's Schemaless – no place to define types, which is good bc contrary to what relation db devs may believe, reality is schemaless, it has order of course, not not a rigid schema. Yet supports full indexing you can index over it, in fact you can index across documents that don't share the field you're indexing on. 5. Devel time faster also - no schemas to manage, very little (if any) data mapping. 6. Learning curves less steep - because there is no new query language to learn. 7. Code is trimmer – Don't need other ORM, thin wrappers 8. Future-proof - trivially easy to add more fields -even complex fields- to objects. As requirements change, you can adapt code quickly. Sharding built-in when volumes start increasing, indexes are necessary, and these in some sense are based on an underlying schema assumption: that of the fields to index. However when volumes start increasing, you can be sure that MongoDB is going to serve this data up in an highly available and highly scalable way.
  • Whitehose direct citizen engagement – 100K questions, 1.7M votes just on one topic Experimenting with Drupal6 + MongoDB, comments experiment, 7X improvement over SQL Not completely magical- to scale out you will need to learn how to write performanct queries.
  • Not completely magical- to scale out you will need to learn how to write performanct queries.
  • Installing Mongo Obviously if you are serious you will build from source and not install from the package at . Actually even if you weren't serious you wouldn't have to anyway bc it's in the latest Debian not to mention the last few Ubuntus. aptitude install mongodb-stable After you have mongod running you'll need to install the drivers for your language, which in Drupal's case is php of course. If you have dh-make-php tools installed you can easily invoke pecl install mongodb to do that. Right right away you'll notice you don't even need half as many mongodb functions as you do mysql (18 vs. 54) and that's if you only had one SQL driver loaded. One thing you will need however is to load the extension in php or so if you haven't added <span class=”code”>load extension =</span>to php.ini</span> (reqs. reload) Drupal will complain (PECL doesn't do this when it builds) when the module's install file checks for this. In fact, that's pretty much the only thing the installer does, much unlike the relational based modules that have to pre-create their entire schemas. Installing mongodb (shoutout to AOP*) into your Drupal site (using drush of course) will give you mongodb.module, essentially a collections manager that implements hook_help (sort of) and gives you mongo(), collection_name() & get sequential ID (the later since MongoDB uses pointers/k-v's).
  • rejected approaches: de-normalise data (really want to duplicate yr data anytime you need to index a type-mismatched query?) MongoDB solution: node is a document, stores all comments directly on the node document
  • Installing the dev version also gives you a host of submodules implementing various MongoDB related handlers (such as blocks, fields, sessions & cache.) Some of these are already available in the Drupal 6 version, and in fact MongoDB is used in production Drupal sites, however if you are planning a new deployment there's no good reason not to use Seven. [sic]
  • Mongo Watchdog and blocks are drop-in replacements for their core counterparts. Enable watchdog on a server running mongoDB and you've just created a watchdog collection. Disable dblogging module to stop writing the same information to sql, and my favourite part, drop the watchdog table. (Mongo watchdog is also rumoured to work in 6 although it doesn't seem to be included in the drupal6 release.) What you get: the ability to send off structured content to a server that is - completely separated from the MySQL cluster, hence saving on MySQL load - faster from Drupal to log than going through ksyslogd then a custom logger - still querable from the standard Drupal UI, which a custom logger wouldn't have been so easily - convenient auto-trimming with fixed-size partitions without a need for client code, which maps very nicely to logging applications
  • Great for statistics Capping a collection Despamming a capped collection
  • That's all there is to it. As chx wryly noted.... There's a reason people use Drupal- ease of admin leveraging forgive the expression Humongous backend of contrib community & testing framework Because your databases and collections -- collections are like the equivalent of tables in MongoDB -- are created on demand as you write to them, there's no schema to load in. There's no setup to do. You simply start writing your data. You say "I want to write to this database, this collection, write out this data structure." You can instantly start writing php objects to it, reading them and finding them. It's so amazingly easy.
  • Conf variable in settings.php ported to 6 -
  • problme: more users = slower lookups in Drupal one workaround has been "noanonymous sessios" of course this kills your reporting… (also anon functions & targeted advertising) better: caching (apc/memcached) So enable the sessions sub-module override in settings.php: $conf['session_inc'] = 'sites/all/modules/mongodb/mongodb_session/'; (if all goes well you will need to log in again) And yes delete the sessions table...
  • This + simpletest + user roles (sql)
  • Assigned by session set handler and are called * automatically by PHP. These functions should not be called directly. Session * data should instead be accessed via the $_SESSION superglobal. perlscript:
  • install block & block ui disable block (core) & drop table > can store arrays such as paths the block is visible to and i think it's patched to give you paths that block is IN visible to > Again, block.install only a warning (since writes=creates in mongo) > How it differs- for display, so Block has a tipplefip Has API instead of include > there was a time in the past where there was no dependency in the ui module (on block) > of course implements (a hook for) block_view_alter > Note that instead of hook_block_view_alter(), which is called for all blocks, you can also use hook_block_view_MODULE_DELTA_alter() to alter a specific bloc
  • block rebuild quirk
  • Render main content block via mongodb_block Patch 725444 Actually a few months old
  • Storing fielded entities in Mongodb - storing whole fielded entities Everything is an entity incl. Taxonomy (it's just not fieldable) d7 big big big thing = field storage d6 field storage engine not pluggable JSON objects are not something we tend to manipulate within drupal or swap unchanged between browser and DB. IIRC, this has led to M devs implementing something to make it easier for our field storage to map more cleanly to the API
  • Storing fielded entities in Mongodb - storing whole fielded entities Everything is an entity incl. Taxonomy (it's just not fieldable) d7 big big big thing = field storage d6 field storage engine not pluggable JSON objects are not something we tend to manipulate within drupal or swap unchanged between browser and DB. IIRC, this has led to M devs implementing something to make it easier for our field storage to map more cleanly to the API
  • Why this works: * Drupal's node_load() function can load a node as an object * Drush can let me write a quick import/export script to run on the command line * And MongoDB can store just about any old array of primitives
  • Why this works: * Drupal's node_load() function can load a node as an object * Drush can let me write a quick import/export script to run on the command line * And MongoDB can store just about any old array of primitives
  • Do it in PHP! The data stored in MongoDB is basically exactly the data that one would get from doing a (array)node_load(). (There will be exceptions to this where non-primitives are stored in a node.) Note this is not a solution for manipulating nodes IN Drupal
  • new Drupal MongoDB module will manage persistence of data across two servers
  • write a class that extends the MongoCollection class and pass an instance of that back from mongodb_collection . You need to implement MOST methods collecting arguments and passing to parent.
  • DAL api: “The intent of this layer is to preserve the syntax and power of SQL as much as possible”
  • But the new twist with some of the new NoSQL stores is storage in JSON. more scalable than an RDBMS, but to me the real attraction seems to be the innate hierarchical storage structure which JSON (or even XML) allows. Much of our data come in a hierarchical format, so simply converting that to JSON may be easier than the gymnastics required by conversion to a relational format. JSON objects are not something we tend to manipulate within drupal or swap unchanged between browser and DB. IIRC, this has led to M devs implementing something to make it easier for our field storage to map more cleanly to the API page callback => 'drupal_json'. Cross with the object loading notation of the new menu and you can get your objects in JSON with remarkable ease: $items['node/%node/json'] = array('page callback' => 'drupal_json', 'page arguments' => array(1), 'type' => MENU_CALLBACK); will get you the node in JSON format, access checked and all. Noone stops you from writing foo_object_load and using %foo_object to return anything in JSON...
  • f you want list nodes belonging to users whose usernames starts with "Ab" then you are back in denormalization land -- you need to store the username into the node collection. Only one example--- but it typifies something we do a lot "it's not relational data...." except when it is Solution: do joins in app layer or fail Overall the type of Drupal queries that can be easily done in Mongo are going to be far more common than the exceptions that don't index well in Mongo.... (Informed) Banking:
  • I have seen the future and it is Mongo(DB)
  • This upload requires better support for ODP format

    1. 1. Drupal + Mongo: Bigger is Better? A lightning talk by Forest Mars
    2. 2. Drupal (FTW!) Drupal: Aspect-oriented (modular) “social-publishing” framework, written in php (pdo) that allows easy creation and integration of multiple data-rich social networking sites and robust web applications & services. Used on large high-performance websites. Roadmap anticipates future web (rdf, ggg) & emerging technologies (MongoDB!)
    3. 3. The Problem with SQL* Webapps > Once you have related data, you need joins > Indexing on joins does not work that well ...So you: > Introduce denormalisation > Build extra tables *(RDBMS)
    4. 4. The Problem with Schema > Changes broadly lock data, requiring downtime or putting excessive, short-term load on the system. > The data integrity constraints don’t quite support application integrity constraints. For example, there’s no standard way in SQL to require a column to contain only valid URLs. > Coordinating schema changes with application code changes is difficult, largely because schema changes lack convenient coupling with code.
    5. 5. “ In theory, theory in practice are exactly the same. In practice, they're completely different.” Practice WiP (works in practice) Scales well Future-proof Theory Clean abstractions Strong semantics Smart-proof
    6. 6. A C I D / B A S E Base Basically Available Scales well Eventually consistant Acid Atomic Consistent Isolated Durable
    7. 7. ( Why Mongo ?) Highly Available Easily Scalable Partition Tolerant
    8. 8. ( Why Mongo ?) Tableless Queriless Schemaless Blazingly fast Faster Development times Nicer learning curves Code is trimmer Future-proof
    9. 9. Performance / Scaling / direct engagement 15K/day contact requests 2M records in db 4GB db: replication risks MongoDB: 180M+ documents in 1 collection
    10. 10. Writing Performant Queries Sort column must be the last column used in the index. Range query must also be the last column in an index, Only use a range query or sort on one column. Conserve indexes by re-ordering columns used in straight = queries Never use Mongo's $ne or $nin operator's Never use Mongo's $exists operator
    11. 11. Install Mongo public array authenticate (string $username, string $password ) public array command ( array $data ) __construct ( Mongo $conn , string $name ) public MongoCollection createCollection ( string $name [, bool $capped = FALSE [, int $size = 0 [, int $max = 0 ]]] ) public array createDBRef ( string $collection , mixed $a ) public array drop ( void ) public array dropCollection ( mixed $coll ) public array execute ( mixed $code [, array $args = array() ] ) public bool forceError ( void ) public MongoCollection __get ( string $name ) public array getDBRef ( array $ref ) public MongoGridFS getGridFS ([ string $prefix = "fs" ] ) public int getProfilingLevel ( void ) public array lastError ( void ) public array listCollections ( void ) public array prevError ( void ) public array repair ([ bool $preserve_cloned_files = FALSE [, bool $backup_original_files = FALSE ]] ) public array resetError ( void ) public MongoCollection selectCollection ( string $name ) public int setProfilingLevel ( int $level ) public string __toString ( void )
    12. 12. Real World Example list the nodes of a user ordered by comment count uid is stored in the node table and the comment count is in node_comment_statistics > thus query cannot be indexed (Comparison of dissimilar columns may prevent use of indexes if values cannot be compared directly without conversion.)
    13. 13. What's already in Drupal * mongodb: support library for the other modules (D7/D6) * mongodb_block: Store block information in mongodb. Very close to the core block API. * mongodb_cache: Store cache items in mongodb. * mongodb_session: Store sessions in mongodb. * mongodb_watchdog: Store watchdog messages in mongodb * mongodb_queue: DrupalQueueInterface implementation using mongodb. * mongodb_field_storage: Store the fields in mongodb.
    14. 14. Mongo Watchdog
    15. 15. mongodb_watchdog mongo> db.watchdog.drop(); mongo> db.createCollection("watchdog", {capped:true, size:1000000, max:10000} );
    16. 16. "It's really incredible how much you don't have to do." -chx
    17. 17. mongodb_cache $conf['page_cache_without_database'] = TRUE;
    18. 18. mongodb_sessions $conf['session_inc'] = 'sites/all/modules/mongodb/ mongodb_session/';
    19. 19. mongodb_sessions function mongodb_session_user_update($edit, $account) { if (!module_exists('mongodb_field_storage')) { $roles = _mongodb_session_get_roles($account); $save = (array) $account + array( '_id' => (int) $account->uid, '@bundle' => 'user', '@fields' => array(), 'roles' => $roles, ); foreach (array('uid', 'created', 'access', 'login', 'status', 'picture') as $key) { $save[$key] = (int) $save[$key]; } mongodb_collection('fields_current', 'user')->save($save); } return $roles; }
    20. 20. mongodb_sessions * The user-level session storage handlers: * - _drupal_session_open() * - _drupal_session_close() * - _drupal_session_read() * - _drupal_session_write() * - _drupal_session_destroy() * - _drupal_session_garbage_collection() are assigned by session_set_save_handler() in
    21. 21. mongodb_block function hook_block_view_alter(&$data, $block) { // Remove the contextual links on all blocks that provide them. if (is_array($data['content']) && isset($data['content']['#contextual_links'])) { unset($data['content']['#contextual_links']); } // Add a theme wrapper function defined by the current module to all blocks // provided by the "somemodule" module. if (is_array($data['content']) && $block->module == 'somemodule') { $data['content']['#theme_wrappers'][] = 'mymodule_special_block'; } }
    22. 22. Block rebuild Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ). Notice : Undefined variable: block_html_id in include() (line 4 of /var/www/Drupal/drupal-7.0-alpha4/themes/garland/block.tpl.php ).
    23. 23. Render main content block function mongodb_block_theme() { 'block' => array( 'render element' => 'elements', 'template' => 'block', 'path' => drupal_get_path('module', 'block'), ), } function mongodb_block_mongodb_block_info_alter(&$blocks) { // Enable the main content block. $blocks['system_main']['region'] = 'content'; $blocks['system_main']['weight'] = 0; $blocks['system_main']['status'] = 1; } function mongodb_block_rehash($redirect = FALSE) { $collection = mongodb_collection('block'); $theme = variable_get('theme_default', 'garland');
    24. 24. mongodb_field_storage don't : variable_set('field_storage_default', 'mongodb_field_storage'); instead : $conf['field_storage_default'] = 'mongodb_field_storage'; in settings.php ESP. for session/caching backends
    25. 25. Drupal 7 Everything In MongoDB
    26. 26. Import all Nodes > MongoDB* (* in 14 l.o.c.) // Connect $mongo = new Mongo(); // Get the database (it is created automatically) $db = $mongo->testDatabase; // Get the collection for nodes (it is created automatically) $collection = $db->nodes; // Get a listing of all of the node IDs $r = db_query('SELECT nid FROM {node}'); // Loop through all of the nodes... while($row = db_fetch_object($r)) { print "Writing node $row->nid "; // Load each node and convert it to an array. $node = (array)node_load($row->nid); // Store the node in MongoDB $collection->save($node); }
    27. 27. Import all Nodes > MongoDB* (* in 14 l.o.c.) # drush script mongoimport.php # use testDatabase; # db.nodes.find( {title: /about/i} , {title: true}).limit(4);
    28. 28. Import all Nodes > MongoDB* (* in 14 l.o.c.) <?php // Connect $mongo = new Mongo(); // Write our search filter (same as shell example above) $filter = array( 'title' => new MongoRegex('/about/i'), ); // Run the query, getting only 5 results. $res = $mongo->quiddity->nodes->find($filter)->limit(5); // Loop through and print the title of each article. foreach ($res as $row) { print $row['title'] . PHP_EOL; } ?>
    29. 29. What's Next? Multiple DB servers – Data Persistance Query logging - Devel support Query builder – Views integration DBTNG – Full DB Abstraction MongoDB API
    30. 30. Query Logging Extend Mongo collection class Pass instance back from mongodb_collection Implement all collection methods
    31. 31. Drupal Mongo API $collection = mongodb_collection('myname'); $collection->find(array('key' => $value)); $collection->insert($object); $collection->remove(array('_id' => $item->id));
    32. 32. Full DBTNG Impementation DO NOT USE !!!
    33. 33. awesomesauce page callback => 'drupal_json' $items['node/%node/json'] = array('page callback' => 'drupal_json', 'page arguments' => array(1), 'type' => MENU_CALLBACK);
    34. 34. Where Mongo Won't Work Ex. list nodes belonging to users whose usernames starts with 'Ab'
    35. 35. Thanks! Comments & questions to: ForestMars ForestMars Facebook, LinkedIn, etc. twitter: @elvetica (identica@forest)