How we maintain 200+ 
Drupal sites in 
Georgetown University 
Vadym Myrgorod 
vm386@georgetown.edu 
@dealancer
About me
About Georgetown University
258 sites that looks the 
same
Session plan
Infrastructure
Drupal CMS 
Feature 
A 
Feature 
B 
Feature 
C 
Feature 
D 
Feature 
E
Feature 
C 
Drupal CMS 
Feature 
A 
Feature 
B 
Feature 
D 
Feature 
E 
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7
Multisiting
sites.php 
<?php 
#biology 
$sites['biology.georgetown.edu'] = 'biology'; 
$sites['biology.gudrupalstg.georgetown.edu'] = ...
Multisiting
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7
Well known Drupal cloud 
Dev Test Prod
Local development
Setup any subsite locally 
within a few minutes
Sitesync script
Script highlights 
# Getting DB from the remote server 
ssh $HOST "drush @gudrupal.$ENV -l $SUBSITE sql-dump --skip-tables...
local.settings.inc 
if (file_exists('sites/local.settings.inc')) { 
require_once('sites/local.settings.inc'); 
}
local.settings.inc 
$remote_url = 'http://' . $sitedir . '.georgetown.edu'; 
$conf[‘stage_file_proxy_origin'] = $remote_ur...
Development workflow
Deploy new sites, 
don’t deploy untested code
A successful Git branching 
model 
http://nvie.com/posts/a-successful-git-branching-model/
Environments 
Dev Test Prod 
develop 
hotfixes 
personal branches 
features branches 
master branch 
tag 
tag
Change Control
Be careful with Features 
do not overuse it
Features are not a 
panacea
DevOps
Run certain amount of tasks 
almost each day across 258 
sites
How I did it 6 years ago 
Advice from my PM: do as described above if it is urgent
What I’ve discovered 
$ drush updb -y 
$ drush @gudrupal.prod -l biology fra -y 
$ drush @sites dis devel
What I am doing now 
$ ./gudrupal-bulk-drush.sh subsites.txt prod "updb -y" 
"fra -y" "cc all" > ~/deployment-2014-07-21.t...
What script can do that Drush 
can’t
What else can be done
Creating a new site
We needed to create 80 
new websites for less then 
a week
Before we launched at 
least 5 sites a week
It took me more then a day 
to create my first new site
4 scripts to automate site 
creation
Site creation script
Site installation script
Script that adds new SAML 
record
Sync script
Finally, bulk scripts that do 
all previous tasks in a 
batch
Then and now
Why do producers of the 
hardware make it slower than it 
actually could be?
Add a delay into your bulk scripts 
to prevent server from the high 
load
Introducing Druml 
github.com/georgetown-university/druml
Questions? 
Email: vm386@georgetown.edu 
Twitter: @dealancer
Thanks! 
Email: vm386@georgetown.edu 
Twitter: @dealancer
How we maintain 200+ Drupal sites in Georgetown University
How we maintain 200+ Drupal sites in Georgetown University
How we maintain 200+ Drupal sites in Georgetown University
How we maintain 200+ Drupal sites in Georgetown University
Upcoming SlideShare
Loading in...5
×

How we maintain 200+ Drupal sites in Georgetown University

771

Published on

How we maintain 200+ Drupal sites in Georgetown University

Published in: Software
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
771
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide
  • Hi, my name is Vadym Myrgorod and welcome to my session. I will tell about our best practices of how we maintain 200 and more Drupal sites in Georgetown University.
  • I have been doing Drupal for more then 6 years.

    I am originally from Ukraine. I’ve moved to the US 4 month ago.

    And right now I work in Georgetown University.
  • There are many interesting facts about Georgetown University.
  • Georgetown University was founded in 1789, one year earlier then the Washington `DC was founded. It was named in honour of King George II.

    Georgetown University was attended by the Bill Clinton, Jose Manuel Barrosso president of the European Commission

    There are aprox. 100 employees working in IT department.

    There is a huge migration into Drupal going on right now.
  • We actually we have 258 sites. All sites looks the same and changes need to be pushed across all sites.
  • This session won’t answer the question why we have so many sites, but it will tell:

    What infrastructure we have
    How we develop websites
    How we perform daily tasks
    And How we create new sites
  • I would like to start with our infrastructure as it is an important part for understanding.
  • It’s obvious we are using Drupal content management system :)

    We have configuration, views, content types and even custom code stored inside features. Each feature represents specific part of the functionality such as Homepage, Image rotator, List pages, etc.

    When new site is installed features are enabled by install profiles. An install profile also provides some basic configuration that could not be stored in features and creates Lorem Ipsum content. We have Unit and School install profiles which provide slightly different functionality.
  • There are several ways to organise multiple Drupal sites. We use Drupal multisiting, which comes out of the box. It allows us to store physically all of our sites within a same Drupal directory. This also saves a space on the server. However each site uses own files directory and a database.
  • A few words about multi siting for those who don’t know. It is a standard Drupal feature. To add new site you have to create subsite directory in the sites directory. You also need to specify domains associated with this sitename in sites.php file.
  • This is a sample content of sites.php file. Here we assign production, test and dev domains for subsites. This makes Drupal know which site you are accessing when hitting a URL with a specific domain.

    In reality in sites.php we have 200 and more blocks like this ;)
  • We moved out common site settings from the settings.php into common.settings.inc, which is included into every settings file. That trick saves a lot of time if you need to update settings for all of the subsites and makes possible to override setting for a specific website.

    Also all modules that are common to all the sites could be stored in the all directory.
  • And now it is an interesting topic. We have Single Sign On feature for all university services. User accounts are created via LDAP.

    We use simpleSAMLphp library and Drupal module called same name to implement Single Sign On functionality. Anybody who has a university account can log in to any Drupal site.

    When GU user logs in into the website via SAML a corresponding Drupal user is created automatically in this case.
  • And of course we use Drupal cloud. It is very well known Drupal cloud and it is really hard not to guess a name of it :) Of course it is Boston based Drupal company :) I am not doing any advertisement ;)
  • Let’s go further and talk about how we perform local development. Actually some automation starts here.
  • As there are 200 and more sites, curtain bugs could only be reproduced on the few of them, so we needed an ability to quickly setup any subsite locally in just a few minutes.
  • Do you know that many developers are very lazy and me to :) Each time I enter a new job I create sitesync script :) This scripts copy site from server to local environment.
  • Here is a demo of how it works. Here we are. We have ready site for a local development just in few seconds.

    I shortened this video. In reality it takes 2 minutes, though it is significantly less then if you do it manually, which can take up to 30 minutes or even 1 hour.
  • In this slide I highlighted only important parts of the code of the script.

    First snippet shows how we get DB from the remote server using Drush. There is a reason why I run drush through the SSH. When I specify subsite using -l key it is not correctly working from my machine. I think it depends on the operating system. So may be in the future we will use vagrant.

    Also when you sync a site from the remote machine to the local machine, for some themes like adaptive themes, CSS stylesheets are missing, because they are generated and stored in files directory. So you need to regenerate them by hitting Save button on the theme settings page. Second snippet shows how the PHP code that resaves theme settings is called via the drush command.
  • We use local.settings.inc file where we store local credentials and local settings.

    This file is include into common.settings.php. Keep this file on local machine only If this file exist, it is included and local settings are applied.
  • We use Stage File Proxy module, that automatically synchronises files such as images in the files directory from the remote site to your local site. This is extremely useful. You don’t need to copy files via ssh or ftp.

    We also use reroute Email module that reroutes any outgoing email to your email address. If you are a Mac user, you should know that Mac is configured to send emails by default. If yo don’t want site users to see some weird emails, you can use this module.
  • Our development workflow does not look like one on this image.

    But this is how it could like this if we wouldn’t do some big improvements.
  • Deploy new sites, don’t deploy untested code. This is the major problem we were facing.

    Because the creation of the new site assumes deployment of code to the production, we need to be careful of what we are deploying. We need to deploy separately new features, bug fixes and new sites.
  • Eventually we came to this solution.

    We use a successful Git branching model also known as Git Flow. There is also Git extension that implements this model. Git Flow simplifies the way 2 or more developers work together on the new features and maintain existing code.

    There are 3 main branches: master, hot fixes and develop. Master branch contains tested code ready for release, to make a release you simply create a tag on this branch and apply this tag to the production.

    If you have something urgent to fix you perform the fix in the hotfixes branch, test it and then merge it into master.

    All development goes in the develop branch. If you work on the big feature you can use a separate branch for this purpose. I have had many examples in my life, when a feature branch was committed into the develop branch after several releases were done. So it is important to have feature branches. Doing that way you will prevent unstable code going into production.
  • We use 3 environments: dev, test and the prod. They are completely separated environments. Each site has own database, files directory and local git repo per each environment.

    Dev environment usually contains develop branch, but it can also be set to wrok with hotfixes, personal or feature branches.

    Test branch usually contains a master branch or a tag out of the master branch - in general it contains any code that should be tested before it goes to production.

    Prod branch always points to the tag of the master branch, that was previously tested.
  • We also use some Change Control procedures that require us to have:

    Deployment script and the backup plan readt. This scripts must be approved by the several people, before they can be used.

    We need to have a deployment scheduled in the outage window. Outage window is the time during which university is not working, e.g. night or weekends.

    Then we perform deployment using this script and if something goes wrong we need to perform backup script to roll back changes. There could be an exceptions to make changes in the deployment plan, during the deployment, but they should be discussed with deployment manager.

    We also need to test changes. This CC is very important because we have lot’s of sites and we should avoid any serious problems going to production.

    What about minor deployments such as CSS changes? Well, it is up to you to decide what changes should go through the change control.
  • And be careful with features, try not to overuse it.
  • Really, Features are not a panacea, they could not solve all deployment problems in the world.

    Now I know that we should strongly avoid “featurifying” any objects or entities that contain numeric IDs. It could be anything that references content, vocabularies or roles. If the content is changed or even deleted, it will mess up a feature and we will see a feature in the overridden state. There is no guarantee that same object will have same IDs across multiple sites, because content could be created in the different order

    Moreover AUTO_INCREMENT delta could be changed and this is something that we can not affect on. We depend on our hosting which can change MySQL configuration in any moment, so the delta could change from 1 to 5, so if the role IDs go like 1, 2, 3 before change, they could go like 1, 5 and 15 after a change. This is why new sites will have overridden features.

    We had this problem: we lost IMCE profile settings for news sites, cause of the auto increment delta change. It happened because IMCE module were storing settings in Drupal variables, so after a change of the delta, IMCE module could not find a configuration for affected profiles for a new sites ;) That was fun to debug :) We had many “positive” emotions ;)

    However the solution was quite easy. We moved out IMCE settings from a feature into hook_install and hook_update_N. Physically it stored in the same feature but in the install file.
  • Now let’s move to the next part. I am going to introduce you to how we perform doves and daily tasks such as clearing cache, executing db updates or reverting features.

    (Cause we do deployment every day, of course not, it is a joke but we do often :)
  • Other problem we had is that the are curtain amount of tasks that should be run each day across 258 sites. Doing it through the Drupal UI won’t be very effective.
  • If I were hired in Georgetown University 6 years ago, when I started to do Drupal development, I would do it that way.

    To perform DB updates I would be going through update.php url for every site and clicking next, next and next ;)

    If I needed to revert features I would use Features UI to revert features one by one for 200 websites :)

    And to enable a module I would go to the modules page and enable module I needed. I would repeat it 200 times :) cause I did not know the better way to do it.

    And now I will tell you better and faster way to do it!

    Though my product manager says you can do it ”wrong” but only if you have something urgent to fix ;)
  • During years of development I’ve discovered many useful approaches how to save my own time. So I will share them with you.

    Use Drush - which allows you to perform Drupal operations through the command line. This exact command performs DB update.

    Using Drush aliases and the -l key you can access specific subsite on the specific environment. This command reverts features.

    We could even run a command for all subsites using @sites alias. Though, it does not run fast and may load your server, but it works for all sites.
  • And this is what I am doing right now.

    This is a bash script that performs specific drush commands for the subsites on the specific environment. subistes.txt file contains list of sites that are affected.

    A regular deployment procedure looks like this. We can pass multiple drush commands into this script. In this case we are performing db updates, reverting features and clearing cache.

    Also we are saving output into the file.

    One small trick I can teach you when I perform overnight deployment. This output file could be placed in the Dropbox directory so either your or your team members know what’s happening.

    And one small thing: we are not using Dropbox ;) We are just using some other tools ;)
  • And here is a demo of how this script works.
  • Here are some benefits of such approach:

    We can perform multiple operations in a batch
    We can perform operations on multiple sites (not necessary all subsites) - you can have different groups sites to perform operations.
    We can control order of the execution, so the important sites goes first, so you can test them first

    We can set timeout between iterations, so it will reduce load on the servers
  • 1. There are also additional scripts that I have created. There is a script that allows to run a PHP code for every site and use return a result to build a report. For example we get total amount of nodes for each sites in the CSV format.

    2. There is also a script that allows to perform multiple bash commands for every server running your site. This could be useful if you need to analyse logs that are stored on the multiple servers.
  • And now it is the most interesting part. It was a big challenge to automate site creation, but finally it is done.
  • We’ve met new challenge recently: we needed to create 80 new website for less then a week.
  • When I started to work in Georgetown University we were adding up to the five sites each week.
  • It took me more then a day to create my first site. Then I improved this number to hours. And then I decided to automate this process and eventually created several scripts that helps me to do this.
  • Here is a demo of how it works.
  • We have a sync script synchronises files and database from one environment to another. This could be useful in various scenarios like deployment or some kind of backup. When create new site need to make sure it is the same on all environments.
  • Finally, we have bulk scripts that do all previous tasks in a batch.
  • Let’s compare of what we had before and what we have now.

    It took ~3 hours of work to create a website manually.

    Now we can create as many websites as we need with running just a 3 scripts.

    It saved me more 1.5 month of work to create 80 new websites just running this scripts over a night. I’ve just launched them and went sleep.
  • However there is an important thing to know.

    Let me ask you a question. Why do producers of the hardware like processors or video cards make it slower than it actually could be?

    ?? Guess why?




    I remember the video card I had in the 2000 year. After a playing a computer games on the high graphic settings, video card could hang for a second, several times.

    !!

    So producers want hardware to be in use more time and be more reliable. They do not want it to be overheated and finally broken.
  • This is exactly a reason why we should a add delay into our bulk scripts to prevent server from the high load.
  • I would like to introduce you Druml. Druml is a Drupal multisite tool, that allows your to perform various operations that you have learnt today during my presentation.

    The reason we created this project, is that dealing with multiple sites is a common problem to many organisations and universities.

    Also there haven’t been a single good solution that perfectly fits into Drupal environment. And we developed this solution.

    Also keep in mind that it is very fresh Open Source which is in an active development phase and it may not cover all use cases, but our aim is to create a tool that will be a best match to various development workflows and infrastructures.

    So you are welcome to test and send me your feedback.
  • How we maintain 200+ Drupal sites in Georgetown University

    1. 1. How we maintain 200+ Drupal sites in Georgetown University Vadym Myrgorod vm386@georgetown.edu @dealancer
    2. 2. About me
    3. 3. About Georgetown University
    4. 4. 258 sites that looks the same
    5. 5. Session plan
    6. 6. Infrastructure
    7. 7. Drupal CMS Feature A Feature B Feature C Feature D Feature E
    8. 8. Feature C Drupal CMS Feature A Feature B Feature D Feature E Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7
    9. 9. Multisiting
    10. 10. sites.php <?php #biology $sites['biology.georgetown.edu'] = 'biology'; $sites['biology.gudrupalstg.georgetown.edu'] = 'biology'; $sites['biology.gudrupaldev.georgetown.edu'] = ‘biology';
    11. 11. Multisiting
    12. 12. Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7
    13. 13. Well known Drupal cloud Dev Test Prod
    14. 14. Local development
    15. 15. Setup any subsite locally within a few minutes
    16. 16. Sitesync script
    17. 17. Script highlights # Getting DB from the remote server ssh $HOST "drush @gudrupal.$ENV -l $SUBSITE sql-dump --skip-tables-key=common --gzip" > $# Resave theme settings drush @gudrupal.local -l $SUBSITE php-eval "# module_load_include('inc', 'system', 'system.admin'); foreach (array('at_georgetown') as $theme_name) { $form_state = form_state_defaults(); $form_state['build_info']['args'][0] = $theme_name; $form_state['values'] = array(); drupal_form_submit('system_theme_settings', $form_state); } "
    18. 18. local.settings.inc if (file_exists('sites/local.settings.inc')) { require_once('sites/local.settings.inc'); }
    19. 19. local.settings.inc $remote_url = 'http://' . $sitedir . '.georgetown.edu'; $conf[‘stage_file_proxy_origin'] = $remote_url; $files_dir = ‘sites/' . $sdir . '/files'; $conf[‘stage_file_proxy_origin_dir'] = $files_dir; $conf['reroute_email_enable'] = 1; $conf['reroute_email_address'] = 'email@example.com'; $conf['reroute_email_enable_message'] = 1;
    20. 20. Development workflow
    21. 21. Deploy new sites, don’t deploy untested code
    22. 22. A successful Git branching model http://nvie.com/posts/a-successful-git-branching-model/
    23. 23. Environments Dev Test Prod develop hotfixes personal branches features branches master branch tag tag
    24. 24. Change Control
    25. 25. Be careful with Features do not overuse it
    26. 26. Features are not a panacea
    27. 27. DevOps
    28. 28. Run certain amount of tasks almost each day across 258 sites
    29. 29. How I did it 6 years ago Advice from my PM: do as described above if it is urgent
    30. 30. What I’ve discovered $ drush updb -y $ drush @gudrupal.prod -l biology fra -y $ drush @sites dis devel
    31. 31. What I am doing now $ ./gudrupal-bulk-drush.sh subsites.txt prod "updb -y" "fra -y" "cc all" > ~/deployment-2014-07-21.txt
    32. 32. What script can do that Drush can’t
    33. 33. What else can be done
    34. 34. Creating a new site
    35. 35. We needed to create 80 new websites for less then a week
    36. 36. Before we launched at least 5 sites a week
    37. 37. It took me more then a day to create my first new site
    38. 38. 4 scripts to automate site creation
    39. 39. Site creation script
    40. 40. Site installation script
    41. 41. Script that adds new SAML record
    42. 42. Sync script
    43. 43. Finally, bulk scripts that do all previous tasks in a batch
    44. 44. Then and now
    45. 45. Why do producers of the hardware make it slower than it actually could be?
    46. 46. Add a delay into your bulk scripts to prevent server from the high load
    47. 47. Introducing Druml github.com/georgetown-university/druml
    48. 48. Questions? Email: vm386@georgetown.edu Twitter: @dealancer
    49. 49. Thanks! Email: vm386@georgetown.edu Twitter: @dealancer
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×