What You Missed in Computer Science

5,416 views

Published on

This presentation explains what Computer Science actually entails. It covers ways to describe code performance using Big-Oh notation comparing different post meta and taxonomy queries and it discusses concurrency as it applies to WordPress, specifically data races and how they can occur while counting post views.

Published in: Software, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,416
On SlideShare
0
From Embeds
0
Number of Embeds
4,845
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

What You Missed in Computer Science

  1. 1. Computer Science in WordPress Taylor Lovett
  2. 2. My name is Taylor Lovett - Senior Strategic Web Engineer at 10up - Core Contributor - Plugin Author (Safe Redirect Manager) - Plugin Contributor - BS in Computer Science from the University of Maryland, College Park
  3. 3. What is Computer Science? - It can mean a lot of things. It is really the study of computational theory, computer software, and hardware.
  4. 4. Theory of Computation - General Mathematics (Calculus, linear algebra, general computational theory, statistics) - Algorithms (a method to solve a problem) - Data structures (which data structure will allow us to access our data the quickest?) - Graph theory
  5. 5. Computer Software - Programming techniques and design patterns (i.e a singleton class) - Concurrent design patterns (data races) - Mobile software development - Operating system software - Web development - Databases - Networking - Benchmarking
  6. 6. Computer Hardware - Motherboards - Memory types (solid state, RAM, etc.) - Benchmarking (processor execution time) - Pipelining - Processors
  7. 7. Big-Oh Notation - "Big O notation is used to classify algorithms by how they respond (e.g., in their processing time or working space requirements) to changes in input size." -- Wikipedia - Very useful to describe how performant your code may or may not be - Big-Oh usually describes the upper bound of a function (worst-case)
  8. 8. Big-Oh Notation (cont.) - Big-Oh notation is concerned with measuring the rate of growth of the amount of processing that your code might do on an unknown input size - In Big-Oh we are only concerned about how a our code performs as the input size approaches infinity. Mathematically speaking, this means we only care about the highest order term: i.e. O(3n2 + 5n) = O(n2) since as n approaches infinity the only thing that matters is the n2
  9. 9. Let's look at some examples!
  10. 10. // $fruits contains a non-empty array of strings function contains_orange( $fruits = array() ) { for ( $i = 0; $i < count( $fruits ); $i++ ) { if ( 'orange' == $fruits[$i] ) return true; } return false; } Best Case Scenario: Loop executes once, orange is found, and it returns. Worst Case Scenario: Loop executes n times (where n is the number of elements in $fruits) Performance: contains_orange() is in O(n)
  11. 11. Remember! - With Big-Oh we are only concerned with what happens in the worst case. Sometimes knowing what happens in the best case is useful, but we are mostly worried about the performance hit our code could take in the worst possible situation.
  12. 12. // $fruits contains a non-empty array of strings. For educational // purposes, $fruits is guaranteed to have at least one duplicate. function contains_duplicate_fruit( $fruits = array() ) { for ( $i = 0; $i < count( $fruits ); $i++ ) { for ( $z = 0; $z < count( $fruits ); $z++ ) { if ( $i != $z && $fruits[$z] == $fruits[$i] ) return true; } } return false; } What does everyone think?
  13. 13. Best Case Scenario: Outer loop executes once, inner loop executes twice, duplicate is found, function returns Worst Case Scenario: Outer loop executes n - 1 times (where n is the size of $fruits), inner loop executes n times for each outer loop execution... n * (n -1) = n2 - n Performance: contains_duplicate_fruit is in O(n2 - n) = O(n2)
  14. 14. An important reminder - We dropped the (-n) from our final Big-Oh evaluation because, as n approaches infinity, n2 dominates and (-n) becomes insignificant.
  15. 15. But seriously... How is this useful?
  16. 16. Big-Oh Notation and Databases - Big-Oh notation is used a lot in conjunction with SQL operations. - We've all heard that indexing a column in MySQL makes search on that column faster. - But why? What does that actually mean?
  17. 17. MySQL Indexes - An index is a data structure that speeds up search time for information. - Without an index, searching for a specific column value is O(n) because in the worst case scenario every single row in the table must be examined.
  18. 18. MySQL Indexes - When a column is indexed, MySQL takes the data across all of the rows in that column and stores references to that data in a B-tree (this structure is used for the majority of index types). - A B-tree is just what it sounds like: A tree of data that speeds up search time. The worst case scenario for the amount of items to be processed in a B-tree is log n. A log is a mathematical function such that: n2 > n > log n http://en.wikipedia.org/wiki/B-tree
  19. 19. Post Meta Queries - The full Big-Oh analysis of a post meta query is pretty complex because of the join operation and therefore is outside the scope of this talk. - For our purposes, searching for posts based on a meta key is O(n) where n is the number of posts that have that key. - Let's frame this in terms of featured posts. Featured posts refers to the situation where a website needs to mark select posts as featured and query for them.
  20. 20. Featured Posts Solution #1 On post update: if ( isset( $_POST['meta_box_feature'] ) ) update_post_meta( $post_id, 'featured', 1 ); else update_post_meta( $post_id, 'featured', 0 ); Query: $args = array( 'meta_key' => 'featured', 'meta_value' => 1, ); $featured_posts = new WP_Query( $args );
  21. 21. Solution #1 Analysis - Using this code, every time a post is saved, it will have post meta attached to it such that 'featured' = 1 or 0. This will create a ton of unnecessary post meta rows. - Remember searching for posts based on a meta key is O(n) where n is the number of posts that have that key. Therefore saving meta when a post is not featured is not only unnecessary but will really slow us down. This would result in O(m) performance where m is the number of posts!
  22. 22. Featured Posts Solution #2 On post update: if ( isset( $_POST['meta_box_feature'] ) ) update_post_meta( $post_id, 'featured', 1 ); else delete_post_meta( $post_id, 'featured' ); Query: $args = array( 'meta_key' => 'featured', 'meta_value' => 1, ); $featured_posts = new WP_Query( $args );
  23. 23. Solution #2 Analysis - This solution is a major improvement over our first one. This will result in O(n) search time where n is the number of featured posts. - However, we can still do better.
  24. 24. Featured Posts Solution #3 Let's create a tag called 'featured' and attach it to all our featured posts: On init: $args = array( ... ); register_taxonomy( 'featured', 'post', $args ); Query: $args = array( 'post_tag' => 'featured' ); $featured_posts = new WP_Query( $args );
  25. 25. Solution #3 Analysis - For our purposes, searching for posts based on a tag is O(log n) since there is an index on the tag id column. The full Big-Oh analysis of our tag solution is pretty complex due to SQL join operations and therefore is beyond the scope of this talk.
  26. 26. Concurrency - In Computer Science concurrency is a property describing the event where multiple computations are executed simultaneously, sometimes interacting with each other.
  27. 27. Concurrency - With concurrent programming we can, among other things, force each core in a computer to process a piece of a larger problem or handle separate tasks. This is extremely powerful. - When not properly account for, Concurrency can sometimes result in unexpected bugs that are difficult to reproduce.
  28. 28. Concurrency in WordPress - Concurrency takes a slightly different form in WordPress. We don't solve problems by starting new threads/processes. However, since behind the scenes servers can run multiple processes at the same time and thus multiple users can execute the same code simultaneously, issues surrounding concurrency can arise.
  29. 29. Tracking Postviews in WordPress - A common request in WordPress is to display the number of views for each post on the frontend. - There are many different ways to approach this problem; the most common is to increment an integer stored in post meta each time a post is viewed, then to display this number for each post. - This implementation can lead to data races.
  30. 30. Here is the code that executes on each post request $views = get_post_meta( $id, 'views', true ); $views++; update_post_meta( $id, 'views', $views );
  31. 31. Data Races - A data race is the situation where two or more threads access a shared memory location, at least one of those accesses is a write, and the order of the accesses is unknown (meaning there are no explicit locking mechanisms used). - Think of each page request as a thread on the server. If two users request a post at the same time, a data race for pageviews occurs since both accesses are writing to the postmeta table.
  32. 32. A Possible Ordering of Events Code executed for User A is in red and User B in blue $views = get_post_meta( $id, 'views', true ); // $views = 0 $views++; // $views = 1 update_post_meta( $id, 'views', $views ); // _views = 1 $views = get_post_meta( $id, 'views', true ); // $views = 1 $views++; // $views = 2 update_post_meta( $id, 'views', $views ); // _views = 2 In this ordering of events, $views ends up with a value of 2 which is what we want. However, these events could occur in any order...
  33. 33. Another Ordering of Events $views = get_post_meta( $id, 'views', true ); // $views = 0 $views = get_post_meta( $id, 'views', true ); // $views = 0 $views++; // $views = 1 $views++; // $views = 1 update_post_meta( $id, 'views', $views ); // _views = 1 update_post_meta( $id, 'views', $views ); // _views = 1 In this ordering of events, $views ends up with a value of 1 which is NOT what we want.
  34. 34. Conclusion: This algorithm won't work!
  35. 35. Solution to Pageview Problem? Solution 1: Jetpack plugin. We can install Jetpack and leverage it's stats API to query information on specific posts. Solution 2: Google Analytics. Using a websites Google Analytics account, we can set custom variables on a post-to-post basis and query the API based on those variables.
  36. 36. Questions?

×