To help improve the performance of database-centric cloud-based web applications, developers usually use caching frameworks to speed up database accesses. Such caching frameworks require extensive knowledge of the application to operate effectively. However, all too often developers have limited knowledge about the intricate details of their own application. Hence, most developers find configuring caching frameworks a challenging and time-consuming task that requires extensive and scattered code changes. Furthermore, developers may also need to frequently change such configurations to accommodate the ever changing workload.
In this paper, we propose CacheOptimizer, a lightweight approach that helps developers optimize the configuration of caching frameworks for web applications that are implemented using Hibernate. CacheOptimizer leverages readily-available web logs to create mappings between a workload and database accesses. Given the mappings, CacheOptimizer discovers the optimal cache configuration using coloured Petri nets, and automatically adds the appropriate cache configurations to the application. We evaluate CacheOptimizer on three open-source web applications. We find that i) CacheOptimizer improves the throughput by 27-138%; and ii) after considering both the memory cost and throughput improvement, CacheOptimizer still brings statistically significant gains (with mostly large effect sizes) in comparison to the application's default cache configuration and to blindly enabling all possible caches.
FSE2016 - CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based Database-centric Web Applications
1. CacheOptimizer: Helping Developers Configure
Caching Frameworks for Hibernate-based
Database-Centric Web Applications
1
Mohamed Nasser, Parminder Flora
Tse-Hsun(Peter) Chen Ahmed E. HassanWeiyi Shang
2. – Over 1 billion page views per day
– 44 billion SQL executions per day
– 8 billion minutes online everyday
– Over 1.2 million photos a sec at peak
Modern Database-Centric Web Applications:
Millions of Users, Billions of Transactions
3. Gmail’s 25 to 55 minutes
outage affected 42 million
users.
Azure service was interrupted
for 11hrs, affecting Azure users
world-wide.
Down time of large-scale applications
is very costly
Jan 24th Nov 19thOct 28th
Facebook went down for 35
minutes, losing $854,700.
2014
4. Gmail’s 25 to 55 minutes
outage affected 42 million
users.
Azure service was interrupted
for 11hrs, affecting Azure users
world-wide.
Down time of large systems is very
costly
Jan 24th Nov 19thOct 28th
Facebook went down for 35
minutes, losing $854,700.
2014
Often caused by
performance problems
8. Over 67% of Java developers use
Hibernate to access databases
8
22%67%
We focus on Hibernate due to its popularity,
but our approach should be applicable to
other database technologies
9. An example class with Hibernate code
9
@Entity
@Table(name = “group”)
@Cacheable
public class Group{
@Column(name=“id”)
private int id;
@Column(name=“name”)
String groupName;
String User findGroupById(id){
query = “select g from Group
where g.id = id”;
query.execute().cache();
}
Group.java
User class is
mapped to “group”
table in DB
id is mapped to the
column “id” in the
user table
Query-level cache
(cache query
result)
Object-level cache
(cache retrieval by id)
There can be thousands of possible
cache configurations
11. Caching helps improve performance
11
Group g = findGroupByID(1);
Hibernate
database
App-level
cache
Application server
…
Group g = findGroupByID(1);
Group1
12. Hibernate
App-level
cache
Sub-optimal cache configurations are
harmful to performance
12
Group u = findGroupByID(1);
database
Application server
g.setName(“FSE”)
Group g = findGroupByID(1);
Group1
…
It is important to understand user behaviors
in order to find the optimal cache
configuration
13. Problem: Understanding user
behavior in production is very difficult
13
User
Hibernate
Application server
Optimal cache configuration evolves in
production, which requires regular update
Instrumentation
adds too much
overhead!
14. Our solution: Recover user behaviors by
analyzing readily-available logs
14
User
Source
Code
Application
server Database
CacheOptimizer
Apply optimal
cache config
Update
executable
16. Apply static analysis to extract
database access information
16
@Get
@Path(“/group/{id}”)
Group getGroup(id){
getGroupById(id);
…
}
Group getGroupById(id){
select from Group g
where g.id = id …
}
Finding HTTP request handler
methods by analyzing annotations
Apply inter-procedural data flow
analysis to see if inputs from the
HTTP request are used as querying
criteria
18. Example: Recovered database access
18
10.10.10.1 - - [11/Apr/2015:12:19:30]
200 “GET /app/group/1 ”
10.10.10.1 - - [11/Apr/2015:12:19:31]
200 “GET /app/group/2 ”
10.10.10.1 - - [11/Apr/2015:12:19:32]
200 “GET /app/group/1 ”
@Get
@Path(“/group/{id}”)
Group getGroup(id){
…
select from Group g
where g.id = id …
}
Read operation on Group
table, record with id 1, time
is 11/Apr/2015:12:19:30
Read operation on Group
table, record with id 2, time
is 11/Apr/2015:12:19:31
Read operation on Group
table, record with id 1, time
is 11/Apr/2015:12:19:32
19. Overview of CacheOptimizer
19
Source
Code
Static analysis
System running in
production
Build
System
10.10.10.1 - -
[11/Apr/2015:12:19:
30] 200 “GET
/app/group/1 ” …
User database
accesses
Cache
configuration
Database
access
information
@Get
@Path(‘/group/{id}’)
select from Group u
where g.id = id …
20. Calculating optimal cache
configuration via workload simulation
20
Incoming
request
Cache hit
Invalidated
cache
Read group with id 1
Update group with id 1
Cache
consideration
No longer
considered for
caching
Time
Miss ratio is ½
(one cache hit)
We keep track of the cache miss ratio for
each potential cache location
21. Studied applications
Performance
benchmarking
e-commence application
> 35K LOC
Medical record
application
> 3.8M LOC
Simple open-source
application for a pet clinic
3.3K LOC
21
• We use JMeter tests to simulate user
behaviours
• Database is pre-populated with
hundreds of MB of data
22. Comparing throughput improvements
under different cache configs
22
• CacheAll: Enable all caches
• Default: Cache configurations that are
already added in the application (what
developers think should be cached)
• CacheOptimizer: The optimal cache
config discovered using CacheOptimizer
We compare three different cache configurations
against having no cache (baseline)