A flexible plugin like data layer - decouple your -_application logic from your data

Decouple your application logic from your data

Juan Soprano - juan@pixable.com
Director of Software Development
Pixable Inc.

Pixable in numbers
 ~5 million users
 9 billion photos (~5 Terabytes)
 35 new million photos a day
 80 million categories
 16 million writes/hour (~40GB/hour)
 30 million reads/hour (~120GB/hour)

Logging and profiling
- 15k inserts/sec

Where can you find us?
Pixable.com

iPhone/iPod/iPad

Android

Presentation
 In Pixable we have migrated from/to different data storage
solutions. To accomplish this, we've built a plugin-like data layer
to allow complete separation between application code and
data storage. In fact, our whole migration from MySQL to
MongoDB was performed over this layer, helping us to move
chunks of data little by little while learning how the system
behaved under the new configuration. During the process, we
managed to maintain duplicate copies in MySQL and Mongo for
a while until the transition was complete. All of this happened in
a way almost transparent to the application code, requiring very
little changes in the code.

 During this talk, we are going to show how we built this
architecture and how easy is to integrate other data storages
(memcached, S3, etc) on it. We will also share some tips that
we've learned down the road and pros/cons of working under
this schema.

Initial infrastructure
 LAMMP (Lynux-Apache-Memcache-MySQL-PHP)
Backend
Frontend MySQL
API User DB

class user {
$id;
$first_name;
$last_name;
public function getUser($id) {
$sql = ‘SELECT * FROM users WHERE id =’.$id;
$userRS = db->fetchArray($sql);
$user = $this->buildUser($userRS);
return $user;
}
}

Issues encountered
 Limit on the DB connections to Master.
 Not able to hit the DB hard without generating
lag on the slave servers.
 Adding a field to an existing table with billions
of records would mean downtime of the App.
 Adding new DB servers was slow (in some
cases required downtime of the app) and high
in server costs.

…so we needed a DB engine easy to
grow, schema-less and low in server cost.

Solution found
 MongoDB
 Has built-in sharding
 ReplicaSet features automatic data
clone, synchronization and PRIMARY failover.
 Our data fits perfectly in the MongoDB document
paradigm.
 Schema-less.
 Easier to have many small machines, failures or
maintenances are less traumatic.
 Background index creation.

…now we needed a way to start migration
without having downtime and data loss.

Implementing solution
 Migrating code from classes/functions with
SQL queries all around the project code to
the new Flexible Plugin-like data layer.
API Backend
Frontend MySQL
User DB

MySQL MySQL
User DS
DB
API Backend User
Frontend Data
User Source MongoDB Mongo
User DS
DB

Implementing solution – Step 1
 User Data Source (Plugin manager)

• Gets the call from the backend for the user
data source.
• Evaluates the conditions defined by us to see
what Data Source to return and looks for the
User
user in the correct data source.
Data
• If the conditions for migrating are activated will
Source
migrate the user if its not migrated already to
the new DB Engine.
• Will return the Data Source defined by the
conditions above.

 Building each DB engine plugin
User
Data
Source

MySQL MongoDB Memcached …
Plugin Plugin Plugin Plugin

Requirements:
• All plugins have to implement the same set of public
methods/functions.
• All have to reply in the exact same data structure and format.
• All plugins constructors may accept as a parameter another plugin
so we can chain them together if needed.

 Moving all SQL queries from different classes
methods/functions to the new Data Source infrastructure:

Old Class code: New Class code:
class user { class user {
$id; $id;
$first_name; $first_name;
$last_name; $last_name;
public function getUser($id) { public function getUser($id) {
$sql = ‘SELECT * FROM users WHERE id =’.$id;
$userRS = db->fetchArray($sql);
$uDS = UserDataSource::getUserDS($id);
$user = $this->buildUser($userRS); $userRS = $uDS->getUser();
return $user; $user = $this->buildUser($userRS);
} return $user;
} }
}

Example 1
Condition:
• Read operation and found in Memcached
• Write operation, writing in MySQL and MongoDB.

MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin

Read operation
Write operation

Example 2
Condition:
• Read and write to MySQL but use MongoDB as backup.

MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin

Read operation
Write operation

Example 3
Condition:
• Only new users should be migrated but use MongoDB as
backup for all read operations from existing users.

MySQL
Plugin
User MongoDB
Backen Memcached
Data MySQL
d Plugin
Source DS
MongoDB
Plugin

Read operation
Write operation

Conclusion
 Pros:
 Separates your app’s code from the Data Storage engines
languages.
 Adding new Data Engines easily.
 Lets you balance the load generated to each Data Engine.
 As the company grows, a team can be dedicated to the Data
Plugins development and optimization, while other team can
actually develop the application itself.
 Cons:
 Your App will generate more queries to the Data Engines.
 You will have to write more lines of code when implementing
this plugins that when only using one Data Engine.

Final Recap

App

User
Data
Source

MySQL MongoDB Memcached …
Plugin Plugin Plugin Plugin

Thanks for listening!

Questions?

Want to work in NY?
We’re hiring: pixable.com/jobs

A flexible plugin like data layer - decouple your -_application logic from your data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A flexible plugin like data layer - decouple your -_application logic from your data

Similar to A flexible plugin like data layer - decouple your -_application logic from your data (20)

More from MongoDB

More from MongoDB (20)

A flexible plugin like data layer - decouple your -_application logic from your data

Editor's Notes