You can get far by caching Drupal's content feeds. There are a lot of caching layers available. But when you need a bit of intelligence to your caching layer, drowning deep into the world of Varnish VCL configurations isn't the only option.
We went from trying to optimize Drupal's ability to deliver JSON-feeds out with MongoDB field storage and SOLR backed Views with a Varnish caching layer to a performance-optimized standalone Node.JS/MongoDB stack.
In this presentation we'll show a real-world case, where Drupal's content is optimized and indexed to MongoDB and then delivered out in JSON with astonishing speeds with a very simple Node.JS layer.
The setup serves most of the video content to Finland's biggest media corporation, Sanoma. It's the sole source of video content to their online TV service, Ruutu.fi.
The same setup could be used for serving as a backend for high-volume Javascript applications, replicating a lot of content around the world or optimizing the UX of a Drupal site by adding super-fast asynchronous APIs.
In the presentation we'll look at the architecture, the development phases, performance optimizations and lessons learnt in storing complicated data structures to Drupal and MongoDB. We'll also look at the current development efforts in getting the system in shape for Drupal 8 upgrade in the near future.
The session video (slides with audio) can be viewed in YouTube: https://www.youtube.com/watch?v=VmTd6hITVVA
2. Pre-requisites
Moderate understanding of PHP and Javascript
Moderate understanding of Drupal’s internals
Some idea of how MySQL, NoSQL, SOLR and Varnish
work
3. What you’ll learn
How to speed up Drupal content delivery with Node.JS
in real life
Why to speed up Drupal content delivery with Node.JS
How does Node.JS scale compared to Drupal/PHP
4. About me
Kalle Varisvirta
Technology Director at Exove
Exove is a technology company based in Finland,
Estonia and UK with over 70 professionals
5. The project
The customer is Nelonen, a Finnish TV broadcaster
owned by Sanoma
The platform is used by their subscription video and
catchup video on-demand service
7. The architecture
A video content management system
It gets fed by linear TV programming as well as
uploaded content
It’s supposed to deliver all content to downstream
clients, multiple websites, mobile and TV apps and
other CMSs
8. The architecture
Of videos, the system only handles metadata
Any uploaded videos are moved away from the CMS,
to a separate stream controlled by the CMS
Videos are streamed from multiple streaming locations
as directed by the CMS
9. Video content management
system
(Drupal 7)
Linear television
data
(ERP)
Drupal 7 site Wordpress site
10. Drupal
Focusing solely in content management, not delivering
any web pages outside the admin
There are custom modules for
Integrating to linear TV ERP system
Controlling the video binary management system
Marking videos ready to display
11. Video content management
system
(Drupal 7)
Linear television
data
(ERP)
Drupal 7 site Wordpress site
12. Video content management
system
(Drupal 7)
Linear television
data
(ERP)
Drupal 7 site
Drupal 7 site
iOS app
Wordpress site
Samsung
SmartTV app
Android app
13. Drupal optimizations
It’s not like we didn’t think of the performance early on
The Drupal 7 was built on MongoDB field storage and
thus standing on a fast database
The Views feeds (JSON) were coming from SOLR
backend
14. Drupal optimizations
MongoDB field storage
Is faster
Isn’t compatible with Views, unless used with Entity
Field Queries
EFQ’s didn’t originally work well with MongoDB,
e.g. booleans just didn’t work
Eventually, isn’t worth the trouble
15. Syndicating content
Big feeds are always too slow if coming from Drupal
Storage doesn’t make a difference (MySQL,
MongoDB, SOLR)
Field API is extensible and flexible and thus - slow
16. Fields API
40 fields per item, 1000 items in feed page
40 000 calls to every hook per page load
18. Caching feeds
Downstream clients want integration feeds limited by
time
Time attribute by seconds of the last fetch
To deal with existing content
To deal with changed content
Caching just doesn’t cut it well
20. We need a new approach
Indexing outside of Drupal felt like the only way out
We decided we’ll go with SOLR via ApacheSOLR
integration
Distribution by a simple REST API
21. We need a new approach
Due to very frequent updates in popularity data and the
need the order by popularity, SOLR indexing was found
too slow
All data was moved to MongoDB except full-text
search backend
22. Video content management
system
(Drupal 7)
Linear television
data
(ERP)
Drupal 7 site
Drupal 7 site
MongoDB, SOLR and
Node.JS REST API
iOS app
Wordpress site
Samsung
SmartTV app
Android app
23. Video content management
system
(Drupal 7)
Linear television
data
(ERP)
Drupal 7 site
Drupal 7 site
Indexing is done using a
Drupal module, MongoDB
indexer
MongoDB, SOLR and
Node.JS REST API
iOS app
Wordpress site
Samsung
SmartTV app
Android app
24. MongoDB indexer
MongoDB indexer uses a straight connection to the
MongoDB
We have an indexing API on the roadmap, but
currently don’t have a need for it
MongoDB indexer also de-normalizes the data for
optimized distribution by the Node.JS REST API
It’s a contributed module, currently waiting approval
28. Node.JS
Node.JS is Javascript running on the server
The Javascript is ran by Chrome’s Javascript engine,
the V8
It’s non-blocking, event-based and when used
correctly, blazing fast
30. Node.JS
No Node.JS framework used
No fronting Nginx used, requests are passed from the
main program to separate sub-programs manually
Running on three nodes, sharing a MongoDB replica
set and fronted by a F5 load balancer
31. Node.JS
The code is very simple as it’s mostly just passing
information out from the MongoDB
There are filters in the API that are validated and used
for filtering, other than that, very little processing is
done
The backend has also some special services
34. Node.JS
For a PHP programmer, it’s quite a change
Due to the asynchronous nature, parallel programming
understanding is needed
There are npm modules there to help
During the lifetime of the project (18 months) both
Node.JS and MongoDB have evolved quite a bit
36. Node.JS optimizations
Separating different Node.JS services
Moving MongoDB’s out from the Node.JS servers
MongoDB’s are very, very IO intensive
37. Drupal optimizations
Moving integrations outside Drupal for clarity
Using a fake Drupal 8 REST API to connect the
integrations to the Drupal
To be able to upgrade some time soon
We released out ‘fake Drupal 8 REST API’ module,
it’s in sandbox pending approval