Application architecture for the rest of us - php xperts devcon 2012
APPLICATIONARCHITECTURE FORTHE REST OF USPresented byM N Islam Shihan
Introduction Target Audience What is Architecture? Architecture is the foundation of your application Applications are not like Sky Scrappers Enterprise Vs Personal Architecture Why look ahead in Architecture? Adaptabilitywith Growth Maintainability Requirements never ends
Security (cont…)Think about Security first of all Network Security: Implement Firewall &Reverse Proxy for your network SQL Injection: Never forget to escapefield values in your queries XSS (Cross Site Scripting): Never trust user provided (or grabbed from third party data sources) data and display without sanitizing/escaping CSRF (Cross Site Request Forgery): Never let your forms to be submitted from third party sites
Security (cont…) DDOS (Distributed Daniel of Services): Enable real time monitoring of access to detect and prevent DDOS attacks Session fixation: Implement session key regeneration for every request Always hash your security tokens/cookies with new random salts per request/session basis (or in an interval) Stay tuned and up-to-date with security news and releases of all of your used tools and technologies
Extendibility Implement and use robust data access interface, so that they can be exposed easily via web services (like REST, SOAP, JSONP) Use architectural patterns & best practices SOA(Service Oriented Architecture) MVC (Model View Controller) Modular architecture with plug-ability Allow hooks and overrides through Events
Availability (cont…) Implement well planned Disaster Recovery policy Use version control for your sources Use RAID for your storage devices Keep hot standby fallback for each of your primary data/content servers Perform periodical backup of your source repository, files & data Implement periodical archiving of your old data Provide mechanism to the users to switch between current and archived data when possible
Caching To Cache Or Not to Cache? Analyze the nature of content and response generated by your application very well What to cache? Analyze and set proper expiry time Invalidate cache whenever content changes Partial caching will also bring you speed When caching is bad? Understand various types of web caches Browser cache Proxy cache Gateway cache
Caching (cont…) Implement server side caching Runtime in-memory cache Per request: Global variables Shared: Memcached Persistent Cache Per Server: File based, APC Shared: Db based, Redis Optimizers and accelerators: eAccelerator, XCache Reverse proxy/gateway cache Varnish cache
Scalability Scaling up (vertical) vs. Scaling out (horizontal)
Scalability Database Scalability Vertical: Add resource to server as needed In most cases produce single point of failure Horizontal: Distribute/replicate data among multiple servers Cloud Services: Store your data to third party data centers and pay with respect to your usage
Scalability (cont…)Scaling DatabaseScaling options Master/Slave Master for Write, Slaves for Read Cluster Computing Single storage with multiple server node Table Partitioning Large tables are split among partitions Federated Tables Tables are shared among multiple servers Distributed Key Value Stores Distributed Object DB Database Sharding
Scalability (cont…)Database Sharding Smaller databases are easier to manage Smaller databases are faster Database sharding can reduce costs Need one or multiple well define shard functions "Dont do it, if you dont need to!" (37signals.com) "Shard early and often!" (startuplessonslearned. blogspot.com)
Scalability (cont…)Database ShardingWhen appropriate? What to analyze? High-transaction database Identify all transaction-intensive applications tables in your schema. Mixed workload database usage Determine the transaction volume Frequent reads, including complex your database is currently handling queries and joins (or is expected to handle). Write-intensive transactions (CRUD Identify all common SQL statements statements, including INSERT, (SELECT, INSERT, UPDATE, UPDATE, DELETE) DELETE), and the volumes Contention for common tables and/or associated with each. rows Develop an understanding of your General Business Reporting "table hierarchy" contained in your Typical "repeating segment" report schema; in other words the main generation parent-child relationships. Some data analysis (mixed with other Determine the "key distribution" for workloads) transactions on high-volume tables, to determine if they are evenly spread or are concentrated in narrow ranges.
Scalability (cont…)Database Sharding Challenges (cont…) Avoidance of cross-shard joins Auto-increment key management Support for multiple Shard Schemes Session-based sharding Transaction-based sharding Statement-based sharding Determine the optimum method for sharding the data Shard by a primary key on a table Shard by the modulus of a key value Maintain a master shard index table
Scalability (cont…)Database ShardingExample Bookstore schema showing how data is sharded
Think Ahead (cont…) Understand business model Analyze requirement in greatest detail Plan for extendibility Be agile, do incremental architecture Create/use frameworks SQL or NoSQL? Sharding or clustering or both? Cloud services?
Guidelines Enrich your knowledge: Read, read & read. Read anything available : jokes to religions. Follow patterns & best practices Mix technologies Don’t let your tools/technologies limit your vision Invent/customize technology if required Use FOSS Don’t expect ready solutions Find the closest match Customize as needed
Guidelines (cont…)Database Optimization Use established & proven solutions MySQL PostgreSQL MongoDB Redis Memchached CouchDB Understand and utilize indexing & full-text search Use optimized DB structure & algorithms Modified Preorder Tree Traversal (MPTT) Map Reduce ORM or not?
Guidelines (cont…)Database Optimization Optimize your queries One big query is faster than repetitive smaller queries Never be lazy to write optimized queries One Ring to Rule `em All Use Runtime In Memory Cache Filtering in-memory cached dataset is much faster than executing a query in DB
Guidelines (cont…) One Ring to Rule `em All Perform Selection, then Projection, then Join a_i d A B C1,000 records 1000,000 records 1000,000,000 records A simple example Write a standard SQL query to find all records with fields A.a1, B.b1 and C.c1 from tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ), and C(id, b_id, c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will match ‘X’, ‘Y’ and ‘Z’ values respectively. Assume all tables A, B, C has primary keys defined by id column and a_id and b_id are the foreign keys in B from A and in C from B respectively.
GuidelinesOne Ring to Rule `em All (cont…)Solution 1SELECT A.a1, B.b1, C.c1FROM A, B, CWHERE A.id = B.a_id AND B.id = C.b_idAND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’Why it Sucks?•Remembered the size of A, B and C tables?•Cross product of tables are always memory extensive, why? •A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records with (P +1) + (Q +2) + (R +2) fields •Can you imagine the size of in-memory result set of joined tables? •It will be HUGE
GuidelinesOne Ring to Rule `em All (cont…)Solution 2SELECT A.a1, B.b1, C.c1FROM A INNER JOIN B ON A.id = B.a_id INNER JOIN C ON B.id = C.b_idWHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’Why it still Sucks?•A B C will produce (1,000 x 1,000,000) records to perform A B andthen produce another (1,000 x 1,000,000,000) records to compute (A B) Cand then it will filters the records defined by WHERE clause.•The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will alsocontribute in memory consumption.•It is optimized but still be HUGE with respect to memory consumption andcomputation
GuidelinesOne Ring to Rule `em All (cont…)Optimal SolutionSELECT A.a1, B.b1, C.c1FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as AINNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id = B.a_idINNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id =Why this solution out performs? C.b_id•Let’s keep the explanation as an exercise