Advertisement

ACLU Partners with Tag1 to Raise Most-Ever $120M in Donations at Mission-Critical Moments

Tag1 Consulting, Inc.
Apr. 2, 2020
Advertisement

More Related Content

Similar to ACLU Partners with Tag1 to Raise Most-Ever $120M in Donations at Mission-Critical Moments(20)

Advertisement

ACLU Partners with Tag1 to Raise Most-Ever $120M in Donations at Mission-Critical Moments

  1. ACLU.org in 2017 Patrick Jensen (ACLU), Narayan Newton (Tag1 Consulting), & Matthew Cheney (Pantheon) Handling a Big Year
  2. ACLU ● Nonprofit founded 1920 with over 3 million supporters ● Defend individual rights and liberties ● Famous cases ○ Led fight against Japanese-American internment camps ○ 1996 Communications Decency Act ○ Marriage equality image
  3. ACLU Action Website ● Act ○ Sign petitions ○ Send messages ○ Request legal aid ● Support ○ Donate ○ Sign up to volunteer ● Accomplished via form submissions ● Drupal 6 (now Drupal 7) image
  4. Before Pantheon Instability and Uncertainty 2013 ● Database Strain ○ Using core Drupal search ● Hardware upgrades took weeks ● Maintenance was onerous ○ test and development environments ○ infrastructure (e.g. varnish)
  5. Hosting Websites is Hard Work image ● Need to Know Lots of Technology ○ Linux, LXC, NGINX, MariaDB, PHP, Redis, Solr, Git, Varnish, New Relic ● Need to Do Lots of Things ○ Workflow, Branches, Backups, Scalability, Performance, Security ● 24 hours a day, 7 days a week
  6. What Does Git Have to do with Civil Rights?
  7. Putting Organizational Mission at Top of Stack There is already so much to do! ● The World is Already Full of Challenges ● Don’t be “ambitious” about a backup system or your load balancers ● Leverage the Experience of Others ● Be the Pyramidion you want to be in the world!
  8. That Is Why Folks Like the ACLU Use Drupal Stand on the Shoulders of Giants ● Leverage the Expertise of Others ○ Drupal Core ○ Contrib Modules ○ External Libraries ● Benefit from Community of Practice ○ Best Practices, Security Process, Performance, Documentation
  9. And Why Folks Use Managed Cloud Services Free up Time & Resources to Focus ● Drupal is Getting More Complicated & The Web is Getting More Ambitious ● Leverage Pre-Built Feature Sets ○ Redis (Object Caching), Solr (Search Indexing, Dev->Test->Live (Workflow) ● Use Best In Class Security Processes + Performance/Scalability Tooling
  10. And Be Prepared. Now and in the Future. Behold the Power of Containerization!
  11. And Be Prepared. Now and in the Future. Behold the Power of Containerization!
  12. And Be Prepared. Now and in the Future. Behold the Power of Containerization!
  13. Be Prepared. You Never Know What Is Going to Happen Andrew Lowery “ “
  14. Donald Trump Elected ● Donations in the 5 days after election ■ 2012: $25,000 ■ 2016: $7,200,000 ● Page views Nov. 9 - 13 ■ 2015: 400,000 ■ 2016: 4,250,000
  15. Nov 16, 2016: The wake-up call Site outage Formsubmissionsperminute
  16. 300 form submissions per minute Nov 16, 2016: The wake-up call
  17. Post-Maddow Emergency Improvements
  18. Outage Review Tag1 Consulting brought in to review outage after Rachel Maddow interview Specifically -- ● Fabian Franz (d.o.: fabianx) ● Narayan Newton (d.o.: nnewton) ● Jeremy Andrews (d.o.: Jeremy) Overall issue was clear and was somewhat on-going. Immediately transitioned into developing and deploying fixes. image
  19. Example Query Fix +------+-------------+-------+--------+----------------------+---------+---------+---------------------+--------++ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+--------+----------------------+---------+---------+---------------------+--------++ | 1 | SIMPLE | fo | ALL | NULL | NULL | NULL | NULL | 282880 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | o | eq_ref | PRIMARY,order_status | PRIMARY | 4 | aclu.fo.oid | 1 | Using where | | 1 | SIMPLE | os | eq_ref | PRIMARY | PRIMARY | 98 | aclu.o.order_status| 1 | | +------+-------------+-------+--------+----------------------+---------+---------+---------------------+--------++ SELECT o.order_id, o.uid, o.billing_first_name, o.billing_last_name, o.order_total, o.order_status, o.created, os.title FROM uc_orders o INNER JOIN fundraiser_og fo ON fo.oid = o.order_id AND fo.gid IN (8888,9999) LEFT JOIN uc_order_statuses os ON o.order_status = os.order_status_id WHERE o.order_status IN ('refunded', 'pending', 'processing', 'payment_received', 'completed') ORDER BY o.order_id DESC LIMIT 0, 30;
  20. Index Solution | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+--------+----------------------+--------------+---------+---------------------+------++ | 1 | SIMPLE | o | range | PRIMARY,order_status | order_status | 98 | NULL | 76 | Using index condition; Using filesort | | 1 | SIMPLE | os | eq_ref | PRIMARY | PRIMARY | 98 | aclu.o.order_status | 1 | | | 1 | SIMPLE | fo | ref | test | test | 4 | aclu.o.order_id | 1 | Using where; Using index | +------+-------------+-------+--------+----------------------+--------------+---------+---------------------+------++ + db_add_primary_key($ret, 'fundraiser_og', array('oid', 'gid', 'nid')); + db_add_index($ret, 'fundraiser_og', 'idx_gid', array('gid')); + db_add_index($ret, 'fundraiser_og', 'idx_nid', array('nid')); ALTER TABLE fundraiser_og ADD INDEX test (oid,gid,nid);
  21. Result
  22. Patchset Results Before Patchset: ~ 1400ms response time After Patchset: ~ 650ms response time
  23. It Works on My Local (cluster) Performance Testing For Complex Sites ● Performance Testing is Complicated ○ Varnish/CDN ○ Redis/APC ○ PHP, MariaDB ● Production Parity Testing! ● But Replicating a Cluster is Hard Work ○ Nobody has time for that!
  24. Let the Robots Do the Work! They already do so much. What’s a little more SysAdmin?
  25. On Demand Environments are Solution
  26. Surviving and Learning from Even Bigger Traffic Spikes
  27. Source: http://fortune.com/2017/01/31/uber-boycott-trump/
  28. Traffic spiked to 85x normal levels
  29. How did our site handle the traffic? Site outage Formsubmissionsperminute
  30. Mitigating a Site Outage
  31. Load Testing
  32. Results Before Code Changes After Code Changes
  33. Payment Gateway Toolkit ● curl_log ○ Adding verbose logging to the curl requests ○ Logging to a table in the DB ○ In-flight sanitization of user information ● curl_loadbalance ○ Decaying ticket-based curl endpoints load balancer ○ Removes failing endpoints for a window of time after X failures ○ Specifically designed to always have at least one endpoint
  34. Performance Next Steps ● query_cache ○ Caching “shim” to adding db_query caching to contrib modules without patching them ○ Ability to map queries to a single base query ○ Moves read-only traffic from the DB to the object cache ● rate_limit ○ An in-drupal solution to rate limiting specific types of requests ○ Webform protection ○ Search protection
  35. The Payoff
  36. Source: http://fortune.com/2018/01/06/google-microsoft-amazon-internet-association-net-neutrality/
  37. Site outage Formsubmissionsperminute Previous Failures
  38. Site outage Formsubmissionsperminute Dec 2017: No Failure at 1,900 submissions/min
  39. The ACLU is ready. We have to be. We’re in for the fight of our lives. Anthony Romero, ACLU Exec. Dir. “ “
  40. Questions
  41. Join us for contribution sprints Friday, April 13, 2018 9:00-12:00 Room: Stolz 2 Mentored Core sprint First time sprinter workshop General sprint #drupalsprint 9:00-12:00 Room: Stolz 2 9:00-12:00 Room: Stolz 2
  42. What did you think? Locate this session at the DrupalCon Nashville website: http://nashville2018.drupal.org/schedule Take the Survey! https://www.surveymonkey.com/r/DrupalConNashville

Editor's Notes

  1. Narayan Newton, Lead systems engineer at Tag1 Consulting Matthew Cheney, Chaos Wizard at Pantheon
  2. A non-profit founded almost 100 years ago and we have over 3 million supporters Our mission is to defend and preserve the individual rights and liberties guaranteed by the Constitution and laws of the United States. To put it succinctly, we consider ourselves to be the first responders for the Constitution We take on issues like: Voting rights Reproductive Freedom the intersection of privacy and technology For example, Led fight against Japanese-American internment camps during WWII took on and defeated 1996 Communications Decency Act, which censored the Internet by banning "indecent" speech Marriage equality - We brought the first lawsuit in the country seeking the freedom to marry for same-sex couples in 1970. We appear before the Supreme Court more than any other organization except the Department of Justice.
  3. We maintain about 40 Drupal websites at the ACLU but today we’re going to talk about just one really important website: action.aclu.org Take action online Sign an online petition Send a letter to an elected official Request legal aid from the ACLU Where our members can go to support us Fundraising Sign up to volunteer action.aclu.org is currently on Drupal 7 but for the time period we’re talking about Drupal 6 Critical for our organization that our websites are available and performant.
  4. Before Pantheon in like 2013 on dedicated hosting There was an initiative at the ACLU to build our online presence But we found our infrastructure wasn’t quite up to the task of handling the increased traffic site slowness some site outages Problems with our infra Database strain (using core drupal search bc Solr wasn’t set up) hardware upgrades took weeks and weeks maintaining test and development environments and varnish involved a lot of developer time ACLU CTO, Marco Carbone who is an old-school drupal dev heard about Pantheon by attending DrupalCon event We did our research and decided they’d be a great host for us. Matt’s going to tell you why.
  5. -- This may not be surprising, but hosting websites is hard work. -- Not as hard as hard as resisting executive overreach through constitutional law of course. -- Need to Use Lots of Technologies and Do Lots of Things 365 days a year -- Plus you need to keep it all up to date and adopt NEW stuff when it come
  6. -- What does knowing Git have to do with civil rights? -- Its about as necessary as this guinea pig wearing sunglasses. -- I mean its great to know how Git works, but its not necessary
  7. -- The world is full of challenges, why add to things you need to do! -- Things move quickly. Organizations need to be able to respond. -- Time/Resources need to be focused on organiational goals. -- Even more true with “Ambitious Digital Experiences”. -- Be the Pyramidion you want to be in the world
  8. -- Leverage the expertise of others through reusable modules/libraries -- Benefit from a community of practice around web development
  9. -- Leveraging the expertise of others is why people use CLOUD -- Drupal is getting more complciarfed. Web is getting more ambitious. -- Features You Need Require Spercialized Knowledge to Make. Even More to Maintain. -- Security is Ongoing Challenge Requiring Lots of Knowledable People -- Performance/Scalability Takesa a Village
  10. -- Horizontally Scaling PHP is Hard Work -- Hosting Platforms That Have This Tech Work Really Hard To Make it Awesome -- It Wont Solve All Your Performance Problems -- But It will Provide you a SOLID Starting Point
  11. -- Be preapred, you never know what is going to happen -- Its Not About Having all The Answers, It’s About Having the Right Tools Pat: After switching to Pantheon, our site was quite stable… until Nov. 8th 2016
  12. After switching to Pantheon, our site was quite stable… until Nov. 8th 2016 We received $7.2 million in the 5 days after the 2016 election. Compare that to the $25k in donations we received in the 5 days after the 2012 election In the 5 days after the election our websites saw over 4 million page view Compare that to 400,000 page views the year before Our web traffic increased to more than 10x what we were used to seeing in the days after the election, essentially overnight This was a great outpouring of support for our organizaion but we started seeing small performance issues
  13. Those small performance issues turned into a really big performance issue on Nov 16, 2016 The ACLU’s executive director appeared on the Rachel Maddow show. Rachel Maddow Appearance Nov. 16 2016 500 peak form submissions per minute ~15 minutes site outage Only able to sustain ~300 submissions per minute
  14. This graph shows the spike in HTTP 500 errors our site was returning during the Maddow appearance Huge missed opportunity for us. Supporters were trying to donate to us, send letters to their elected officials via our site and sign up for our email lists, but they were being met by errors Luckily ACLU mgt realized this wouldn’t be a one-time spike They realized the Trump era meant that we’d be seeing spikes like this on the the regular for the next 4 - 8 years But we didn’t have 4 - 8 years to fix these performance issues The next spike could come at ANY time so we called in Tag1 to do emergency weekend
  15. Tag1 Brought in to look at outage period Issue was clearly that we were DB bound, brought in 3 engineers including myself to review new relic traces Developed indexes, fixed queries, worked in concert with the ACLU team to deploy fixes.
  16. An example of what type of thing we were doing. This is a fairly typical ubercart-esqe query, with the addition of an og table. An interesting quirk of this additional table is that it lacks all indexes. This is more common than you might think.
  17. Looked at the table to find the datasets natural key and pushed a primary key and some additional keys for filtering and joining. Note, we have a key on oid, gid, nid but then I have indexes on specifically gid and nid. Why? Because of the order of gid and nid in the primary index As you can see, we went from 200k rows to 76.
  18. And here is the result of just that change. You can see the green query being marked fundraiser_og, that is this query and you can see it basically dropping out of the graph.
  19. Put together our fixes as a patchset, tested against multidev at this point wanted to ensure that the ACLU site would survive larger traffic spikes and find other issues Turned to pantheon to setup a production-alike environment to enable testing at that capacity
  20. -- performasnce testing is complicated. just ask narayan. -- important to test in as “close a production parity” as possible -- but setting all this stuff up is hard!
  21. -- robots will drive our cars. raise our children on ipads. tell us what to believe politically -- is it really too much to ask that they can create production parity developkment environments on demand?
  22. -- on demand environments are the answer -- at pantheon we call this “Multidev”, but its basically ONE ENVIRONMENT PER GIT BRANCH ---- integrated with new relic, production parity -- made possible by Containers and Robots -- Allowed Tag1 and ACLU to quickly iterate and test features
  23. The emergency improvements Tag1 put in over that weekend in late Nov 2016 were very effective. Made it through: Giving Tuesday 2016 end of year fundraising pushes received 15x more donations in our end of year fundraising than previous year (20,548 gifts) But we weren’t out of the woods yet
  24. Jan 27 2017, issue Executive Order 13769 (AKA Muslim travel ban) Barred people from 7 Muslim-majority countries from entering the US Thousands protested the executive order at airports across the country The ACLU fulfilled our reputation as first responders for the Constitution Within hours, the ACLU—and partnering organizations nationwide—obtained the first injunction to block the order
  25. When news broke of what the ACLU had accomplished People rushed to our websites That top line on the graph there shows page views in the page views before during and after the executive order The line at the bottom of the screen shows the same dates from the previous year The big spike is at almost 4 million hits, on the same day the previous year is at 44,000 85x traffic spike… almost 2 orders of magnitude Donations over the weekend after the executive order were six times the organization’s yearly average
  26. So how did our websites hold up during this crazy post-executive order weekend? Rachel Maddow Appearance Able to sustain 300 submissions per minute ~15 minutes site outage Executive Order 900 peak form submissions per minute Sustained 500 submissions per minute for ~8 hours We did have a 10 minute ‘site outage’
  27. We did 2 smart things to mitigate this outage New Relic alerts when traffic got high or response times increased Static CDN-hosted donation page After the dust settled, we took some time and confirmed what we previously suspected slow responses from one of our payment gateways was the root cause of the site outage we still had some issues with database performance to address Once again, we handed the reins over to Tag1
  28. So at this point we know things are better, but that we are still having issues at very high load. We are past the easy fixes you can detect at low load situations and need actual traffic I build a botnet
  29. initial results of the patchset starting seeing issues with DB and external request/curl requests to the payment gateway We took a two pronged approach to fix these issues
  30. First, we starting look at the external requests. It was very unclear what was actually happening with our payment gateways Developed curl_log to log the actual responses from the gateway, but also to sanitize Finally found that there was an issue with CDN curl_load balance was developed
  31. Turned towards DB issues, which were more over-all load and less bad queries specifically Legacy deployment, don’t want to patch every module. Fabian developed query_cache We also developed rate_limit, which is sort of a performance tool and sort of a security tool. It allows us to rate_limit specific actions in Drupal itself.
  32. How well did this second and final round of changes serve us?
  33. We got a chance to find out when the Trump administration’s FCC repealed 'Net Neutrality' rules for internet providers in mid-December 2017 The internet reacted with outrage and once again the action.aclu.org website was a conduit for that outrage This time, we nailed it.
  34. Nov 2016 Before changes Rachel Maddow Appearance Maxed out at 300 form submissions per minute ~15 minute outage January 2017 After first round of changes Executive Order 900 peak form submissions per minute ~10 minute mitigated outage
  35. After the second round of changes we were able to hit a peak of 1,900 form submissions per minute and easily sustained 500 submissions per minute for 10 hours (probably indefinitely) This was a big victory for us… 100s of thousands
  36. In his Nov 2016 appearance on the Maddow show our exec dir said about Pres. Trump’s election While the rest of the organization was ready, the website wasn’t quite prepared. But after a year of: work by the developers and management at the ACLU leveraging Tag1’s expertise and having Pantheon’s infrastructure having our backs We’re now confident that our websites are really ready to be used in their full capacity to defend civil liberties in the US
Advertisement