2. The Problem
How do you quickly filter data
represented by multiple ActiveRecord
associations and calculations?
3. Data Model
City%
&%Philly%
&%Boston%
Technology%
&%Ruby%
&%Python%
Club%
&%Philly.rb%
Talk%
&%PostgreSQL%
Materialized%Views%
Feedback%
&%“Super%rad%talk!!”%
&%“The%whole%world%is%
now%dumber”%
Author%
&%David%Roberts%
4. View all Comments
class Feedback < ActiveRecord::Base
belongs_to :talk
INVALID_COMMENTS = ['', 'NA', 'N/A', 'not
applicable']
scope :filled_out,
-> { where.not(comment: INVALID_COMMENTS) }
end
Feedback.filled_out
Feedback Load (2409.6ms) SELECT "feedbacks".*
FROM "feedbacks" WHERE ("feedbacks"."comment"
NOT IN ('', 'NA', 'N/A', 'not applicable'))
5. Highest Scoring Talk with
Valid Comments
Feedback.filled_out
.select('talk_id, avg(score) as overall_score')
.group('talk_id').order('overall_score desc')
.limit(10)
# 665ms
SELECT talk_id, avg(score) as overall_score FROM
"feedbacks"
WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/
A', 'not applicable'))
GROUP BY talk_id ORDER BY overall_score desc LIMIT
10;
7. Highest Scoring Talks in PA
by Authors named Parker
Feedback.filled_out.joins(talk: [:author,
{ club: :city }] )
.select('feedbacks.talk_id,
avg(feedbacks.score) as overall_score')
.where("cities.state_abbr = ?", 'PA')
.where("authors.name LIKE '%?%'", 'Parker')
.group('feedbacks.talk_id')
.order('overall_score desc')
.limit(10)
# 665ms
SELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks"
INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id"
INNER JOIN "authors" ON "authors"."id" = "talks"."author_id"
INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id"
INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id"
WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))
AND (cities.state_abbr = 'PA')
AND (authors.name LIKE '%Parker%')
GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;
8. What’s Wrong with these
Examples?
• Long ugly queries
• Slow queries are bad for Web Applications
• Fighting ActiveRecord framework
• SQL aggregates are difficult to access
• Returned object no longer corresponds to model
• Repetitive code to setup joins and Filter invalid
comments
10. Views
• Uses a stored / pre-defined Query
• Uses live data from corresponding tables when
queried
• Can reference data from many tables
• Great for hiding complex SQL statements
• Allows you to push functionality to database
12. class CreateTalkView < ActiveRecord::Migration
def up
connection.execute <<-SQL
CREATE VIEW v_talks_report AS
SELECT cities.id as city_id,
cities.name as city_name,
cities.state_abbr as state_abbr,
technologies.id as technology_id,
clubs.id as club_id,
clubs.name as club_name,
talks.id as talk_id,
talks.name as talk_name,
authors.id as author_id,
authors.name as author_name,
feedback_agg.overall_score as overall_score
FROM (
SELECT talk_id, avg(score) as overall_score
FROM feedbacks
WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable')
GROUP BY talk_id
) as feedback_agg
INNER JOIN talks ON feedback_agg.talk_id = talks.id
INNER JOIN authors ON talks.author_id = authors.id
INNER JOIN clubs ON talks.club_id = clubs.id
INNER JOIN cities ON clubs.city_id = cities.id
INNER JOIN technologies ON clubs.technology_id = technologies.id
SQL
end
def down
connection.execute 'DROP VIEW IF EXISTS v_talks_report'
end
end
13. Encapsulate in ActiveRecord Model
class TalkReport < ActiveRecord::Base
# Use associations just like any other ActiveRecord
object
belongs_to :author
belongs_to :talk
belongs_to :club
belongs_to :city
belongs_to :technology
# take advantage of talks has_many relationship
delegate :feedbacks, to: :talk
self.table_name = 'v_talks_report'
# views cannot be changed since they are virtual
def readonly
true
end
end
15. Highest Scoring Talks in PA by Authors
named Parker
TalkReport.where(state_abbr: 'PA')
.where("author_name LIKE '%Parker%'")
.order(overall_score: :desc).limit(10)
17. Materialized Views
• Acts similar to a Database View, but results persist for
future queries
• Creates a table on disk with the Result set
• Can be indexed
• Ideal for capturing frequently used joins and aggregations
• Allows optimization of tables for updating and Materialized
Views for reporting
• Must be refreshed to be updated with most recent data
18. class CreateTalkReportMv < ActiveRecord::Migration
def up
connection.execute <<-SQL
CREATE MATERIALIZED VIEW mv_talks_report AS
SELECT cities.id as city_id,
cities.name as city_name,
cities.state_abbr as state_abbr,
technologies.id as technology_id,
clubs.id as club_id,
clubs.name as club_name,
talks.id as talk_id,
talks.name as talk_name,
authors.id as author_id,
authors.name as author_name,
feedback_agg.overall_score as overall_score
FROM (
SELECT talk_id, avg(score) as overall_score
FROM feedbacks
WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable')
GROUP BY talk_id
) as feedback_agg
INNER JOIN talks ON feedback_agg.talk_id = talks.id
INNER JOIN authors ON talks.author_id = authors.id
INNER JOIN clubs ON talks.club_id = clubs.id
INNER JOIN cities ON clubs.city_id = cities.id
INNER JOIN technologies ON clubs.technology_id = technologies.id;
CREATE INDEX ON mv_talks_report (overall_score);
SQL
end
def down
connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report'
end
end
19. ActiveRecord Model Change
class TalkReport < ActiveRecord::Base
# No changes to associations
belongs_to …
self.table_name = 'mv_talks_report'
def self.repopulate
connection.execute("REFRESH MATERIALIZED VIEW #{table_name}")
end
# Materialized Views cannot be modified
def readonly
true
end
end
20. Highest Scoring Talks
99% reduction in runtime
577ms
Feedback.filled_out
.select('talk_id, avg(score) as score')
.group('talk_id').order('score desc').limit(10)
1ms
TalkReport.order(overall_score: :desc).limit(10)
21. Highest Scoring Talks in PA
by Authors named “Parker”
95% reduction in runtime
400ms
Feedback.filled_out.joins(talk: [:author, { club: :city }] )
.select('feedbacks.talk_id, avg(feedbacks.score) as
overall_score')
.where("cities.state_abbr = ?", 'PA')
.where("authors.name LIKE '%?%'", 'Parker')
.group('feedbacks.talk_id')
.order('overall_score desc')
.limit(10)
19ms
TalkReport.where(state_abbr: 'PA')
.where("author_name LIKE '%?%'", 'Parker')
.order(overall_score: :desc).limit(10)
22. Why Use Materialized Views
in your Rails application?
• ActiveRecord models allow for easy representation in Rails
• Capture commonly used joins / filters
• Allows for fast, live filtering and sorting of complex associations
or calculated fields
• Push data intensive processing out of Ruby to Database
• Make use of advanced Database functions
• Can Optimize Indexes for reporting only
• When Performance is more important than Storage
23. Downsides
• Requires PostgreSQL 9.3
• Entire Materialized View must be refreshed to
update
• Bad when Live Data is required
• For this use case, roll your own Materialized
View using standard tables
24. Downsides
• Migrations are painful!
• Recommend writing in SQL, so no using
scopes
• Entire Materialized View must be dropped and
redefined for any changes to the View or
referring tables
• Hard to read and track what changed
25. Resources
• Source Code used in talk
• https://github.com/droberts84/materialized-view-
demo
• PostgreSQL Documentation
• https://wiki.postgresql.org/wiki/
Materialized_Views