The document provides an overview of application architecture concepts for non-enterprise applications. It discusses topics like security, responsiveness, extensibility, availability, load management, caching, distributed computing, and scalability. Specific techniques are recommended, such as implementing firewalls and reverse proxies for security, optimizing queries and caching for performance, and database sharding or clustering for scalability. The document emphasizes planning for growth, adaptability, and maintainability from the beginning.
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
http://tyfs.rocks
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesJames Serra
Learn how SQL Server can scale to HUNDREDS of terabytes for BI solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned on a PDW implementation and how a successful BI solution was built on top of it using SSAS.
The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
http://tyfs.rocks
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesJames Serra
Learn how SQL Server can scale to HUNDREDS of terabytes for BI solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned on a PDW implementation and how a successful BI solution was built on top of it using SSAS.
CodeFutures - Scaling Your Database in the CloudRightScale
RightScale Conference Santa Clara 2011: Scaling an application in the cloud often hits the most common bottleneck – the database tier. Not only is database performance the number one cause of poor application performance, but also the issue is magnified in cloud environments where I/O and bandwidth is generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier, operating on top of proven RDBMS products such as MySQL and Postgres – as well as the new NoSQL database platforms. One critical aspect often given too little consideration is monitoring and continuous operation of your databases, including the full lifecycle, to ensure that they stay up.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
Devise and implement a test strategy in order to perform a comparative analysis of the capabilities of two database management systems (Cassandra and HBase) in terms of performance.
Approach: Installation and implementation of instances of the two data storage and management systems. The Yahoo Cloud Serving Benchmark is used to compare the performances of HBase and Cassandra. Average latency and throughput were considered for analyzing the comparison of the two databases. The results obtained from YCSB are then analyzed and visualized with the help of Tableau.
Findings: HBase performs insertion, reading, and updating of records faster than Cassandra but only when the operations count is less. At heavier loads, Cassandra performs better than Hbase.
Tools: Hbase, Cassandra, Hadoop, Tableau, YCSB
This is a preliminary study and the objective of this study is to make simple distributed database system with some basic tutorials. Cassandra is a distributed database from Apache that is highly scalable and designed to accomplish very large amounts of organized data. Without having a single point of failure, it offers high accessibility. This report highlights with a basic outline of Cassandra trailed by its architecture, installation, and significant classes and interfaces. Subsequently, it proceeds to cover how to perform operations such as CREATE, ALTER, UPDATE, and DELETE on KEYSPACES, TABLES, and INDEXES using CQLSH using C#/.NET Client with a sample program done by ASP.NET(C#).
Servers are critical to your IT infrastructure. Attend this session to learn how best to make sure they are running smoothly with the K1000: http://dell.to/1GDYpr8
CodeFutures - Scaling Your Database in the CloudRightScale
RightScale Conference Santa Clara 2011: Scaling an application in the cloud often hits the most common bottleneck – the database tier. Not only is database performance the number one cause of poor application performance, but also the issue is magnified in cloud environments where I/O and bandwidth is generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier, operating on top of proven RDBMS products such as MySQL and Postgres – as well as the new NoSQL database platforms. One critical aspect often given too little consideration is monitoring and continuous operation of your databases, including the full lifecycle, to ensure that they stay up.
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
Apache Cassandra is a distributed storage system for managing very large amounts of structured data.
Cassandra provides highly available service with no single point of failure. Cassandra aims to run on top
of an infrastructure of hundreds of nodes possibly spread across different data centers with small and large
components fail continuously. Cassandra manages the persistent state in the face of the failures which
drives the reliability and scalability of the software systems. Cassandra does not support a full relational
data model because it resembles a database and shares many design and implementation strategies. In this
paper, discuss an implementation of Cassandra as Hotel Management System application. Cassandra
system was designed to run on cheap commodity hardware. Cassandra provides high write throughput and
read efficiency.
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
Devise and implement a test strategy in order to perform a comparative analysis of the capabilities of two database management systems (Cassandra and HBase) in terms of performance.
Approach: Installation and implementation of instances of the two data storage and management systems. The Yahoo Cloud Serving Benchmark is used to compare the performances of HBase and Cassandra. Average latency and throughput were considered for analyzing the comparison of the two databases. The results obtained from YCSB are then analyzed and visualized with the help of Tableau.
Findings: HBase performs insertion, reading, and updating of records faster than Cassandra but only when the operations count is less. At heavier loads, Cassandra performs better than Hbase.
Tools: Hbase, Cassandra, Hadoop, Tableau, YCSB
This is a preliminary study and the objective of this study is to make simple distributed database system with some basic tutorials. Cassandra is a distributed database from Apache that is highly scalable and designed to accomplish very large amounts of organized data. Without having a single point of failure, it offers high accessibility. This report highlights with a basic outline of Cassandra trailed by its architecture, installation, and significant classes and interfaces. Subsequently, it proceeds to cover how to perform operations such as CREATE, ALTER, UPDATE, and DELETE on KEYSPACES, TABLES, and INDEXES using CQLSH using C#/.NET Client with a sample program done by ASP.NET(C#).
Servers are critical to your IT infrastructure. Attend this session to learn how best to make sure they are running smoothly with the K1000: http://dell.to/1GDYpr8
In this first lecture we look at the state of the industry and specifically the post-pc era that follows the digital decade. The PC is not the center anymore and we as architects need to create solution that are long lasting and usable on any device.
Creating software systems is hard. Fortunately, as system architects we have many methods to build upon and in this lecture we will explore those building blocks. We also look at the evolution of software architectures and the importance of service oriented architecture
Hardware IBM Servers Information in one PPT.
Definition of Server
Different Types of Server their Application and Benefits .
Client – Server Model
Hardware Components of Server
Types of RAID and their Concept
Server Processor Diagnostics
Created By:. Mitesh Vartak mvmiteshvartak133@gmail.com
An exposition on the security of the web. Is the web safe enough? History has taught us that we should never underestimate the amount of money, time, and effort someone will expend to thwart a security system.
Introduction to Enterprise ArchitectureMohammed Omar
what is Enterprise Architecture
Enterprise Architecture Life-cycle
Enterprise Architecture benefits
Enterprise Architecture challenges
EA driven approach for IT strategy
Enterprise Architecture frameworks
Why do we Need Enterprise Architecture
Basic overview, testing, mitigation plan for popular web application vulnerabilities such as: XSS, CSRF, SQLi etc.
Updated "Web Security - Introduction" presentation.
Hi fellas,
Here is a ppt which helps you to have some basic idea on Web servers, Application servers, Shared and Dedicated Hosting, Back up server and SSL concepts...
Technology pool is amazingly very vast.
This is a drop of it.
F. Questier, Computer security, workshop for Lib@web international training program 'Management of Electronic Information and Digital Libraries', university of Antwerp, October 2015
Scaling SQL and NoSQL Databases in the Cloud RightScale
Database performance is the number-one cause of poor performance for scalable web applications, and the problem is magnified in cloud environments where I/O and bandwidth are generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier by operating on top of proven RDBMS products such as MySQL and Postgres as well as the new NoSQL database platforms. In this session, you'll learn what it really takes to implement sharding, the role it plays in the effective end-to-end lifecycle management of your entire database environment, why it is crucial for ensuring reliability, and how to choose the best technology for a specific application. We'll also share a case study on a high-volume social networking application that demonstrates the effectiveness of database sharding for scaling cloud-based applications.
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. In this session we covered ,modeling of data using NOSQL cosmos database and how it's helpful for distributed application to maintain high availability ,scaling in multiple region and throughput.
These slides were presented at Cloud Expo West 2010, covering what it takes to scale your databases in the cloud -- keeping them fully reliable as well.
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Как устроить анализ данных 40 млн. человек за 5 лет так, чтобы это выглядело почти в реальном времени.
ارائه در زمینه کلان داده،
کارگاه آموزشی "عصر کلان داده، چرا و چگونه؟" در بیست و دومین کنفرانس انجمن کامپیوتر ایران csicc2017.ir
وحید امیری
vahidamiry.ir
datastack.ir
A data lake can be used as a source for both structured and unstructured data - but how? We'll look at using open standards including Spark and Presto with Amazon EMR, Amazon Redshift Spectrum and Amazon Athena to process and understand data.
Speakers:
Neel Mitra - Solutions Architect, AWS
Roger Dahlstrom - Solutions Architect, AWS
Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data-centres,with asynchronous master-less replication allowing low latency operations for all clients.
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Introduction
Target Audience
What is Architecture?
Architecture is the foundation of your application
Applications are not like Sky Scrappers
Enterprise Vs Personal Architecture
Why look ahead in Architecture?
Adaptabilitywith Growth
Maintainability
Requirements never ends
5. Security (cont…)
Think about Security first of all
Network Security: Implement Firewall &
Reverse Proxy for your network
SQL Injection: Never forget to escape
field values in your queries
XSS (Cross Site Scripting): Never trust user provided
(or grabbed from third party data sources) data and
display without sanitizing/escaping
CSRF (Cross Site Request Forgery): Never let your
forms to be submitted from third party sites
6. Security (cont…)
DDOS (Distributed Daniel of Services): Enable real
time monitoring of access to detect and prevent DDOS
attacks
Session fixation: Implement session key
regeneration for every request
Always hash your security tokens/cookies with new
random salts per request/session basis (or in an
interval)
Stay tuned and up-to-date with security news and
releases of all of your used tools and technologies
8. Responsiveness (cont…)
Web applications should be as responsive as Desktop
Applications
Plan well and apply good use of JavaScript to achieve
Responsiveness
Detect browsers and provide separate
response/interface depending on detected browser
type
Implement unobtrusive use of JavaScript
Implement optimal use of Ajax
Use Comet Programming instead of Polling
Implement deferred/asynchronous processing of
large computations using Job Queue
9. Extendibility
Implement and use robust data access
interface, so that they can be exposed easily
via web services (like REST, SOAP, JSONP)
Use architectural patterns & best practices
SOA(Service Oriented Architecture)
MVC (Model View Controller)
Modular architecture with plug-ability
Allow hooks and overrides through Events
11. Availability (cont…)
Implement well planned Disaster Recovery policy
Use version control for your sources
Use RAID for your storage devices
Keep hot standby fallback for each of your primary
data/content servers
Perform periodical backup of your source repository,
files & data
Implement periodical archiving of your old data
Provide mechanism to the users to switch between
current and archived data when possible
13. Load Management (cont…)
Monitor and Benchmark your servers periodically and
find pick usage time
Optimize to support at least 150% of pick time load
Use web servers with high I/O performance
Introduce load balancer to distribute loads among
multiple application Servers
Start with software (aka. reverse proxy) then grow to
use hardware load balancer only if necessary
Use CDNs to serve your static contents
Use public CDNs to serve the open source JavaScript
or CSS files when possible
14. Caching
To Cache Or Not to Cache?
Analyze the nature of content and response generated by your
application very well
What to cache?
Analyze and set proper expiry time
Invalidate cache whenever content changes
Partial caching will also bring you speed
When caching is bad?
Understand various types of web caches
Browser cache
Proxy cache
Gateway cache
15. Caching (cont…)
Implement server side caching
Runtime in-memory cache
Per request: Global variables
Shared: Memcached
Persistent Cache
Per Server: File based, APC
Shared: Db based, Redis
Optimizers and accelerators: eAccelerator, XCache
Reverse proxy/gateway cache
Varnish cache
21. Scalability
Scaling up (vertical) vs. Scaling out
(horizontal)
22. Scalability
Database Scalability
Vertical: Add resource to server as needed
In most cases produce single point of failure
Horizontal: Distribute/replicate data among
multiple servers
Cloud Services: Store your data to third party
data centers and pay with respect to your usage
23. Scalability (cont…)
Scaling Database
Scaling options
Master/Slave
Master for Write, Slaves for Read
Cluster Computing
Single storage with multiple server node
Table Partitioning
Large tables are split among partitions
Federated Tables
Tables are shared among multiple servers
Distributed Key Value Stores
Distributed Object DB
Database Sharding
24. Scalability (cont…)
Database Sharding
Smaller databases are
easier to manage
Smaller databases are
faster
Database sharding can
reduce costs
Need one or multiple
well define shard
functions
"Don't do it, if you don't
need to!"
(37signals.com)
"Shard early and often!"
(startuplessonslearned.
blogspot.com)
25. Scalability (cont…)
Database Sharding
When appropriate? What to analyze?
High-transaction database Identify all transaction-intensive
applications tables in your schema.
Mixed workload database usage Determine the transaction volume
Frequent reads, including complex your database is currently handling
queries and joins (or is expected to handle).
Write-intensive transactions (CRUD Identify all common SQL statements
statements, including INSERT, (SELECT, INSERT, UPDATE,
UPDATE, DELETE) DELETE), and the volumes
Contention for common tables and/or associated with each.
rows
Develop an understanding of your
General Business Reporting "table hierarchy" contained in your
Typical "repeating segment" report schema; in other words the main
generation parent-child relationships.
Some data analysis (mixed with other Determine the "key distribution" for
workloads) transactions on high-volume tables,
to determine if they are evenly
spread or are concentrated in narrow
ranges.
27. Scalability (cont…)
Database Sharding
Challenges (cont…)
Avoidance of cross-shard joins
Auto-increment key management
Support for multiple Shard Schemes
Session-based sharding
Transaction-based sharding
Statement-based sharding
Determine the optimum method for sharding the
data
Shard by a primary key on a table
Shard by the modulus of a key value
Maintain a master shard index table
31. Think Ahead (cont…)
Understand business model
Analyze requirement in greatest detail
Plan for extendibility
Be agile, do incremental architecture
Create/use frameworks
SQL or NoSQL?
Sharding or clustering or both?
Cloud services?
32. Guidelines
Enrich your knowledge: Read, read & read. Read
anything available : jokes to religions.
Follow patterns & best practices
Mix technologies
Don’t let your tools/technologies limit your vision
Invent/customize technology if required
Use FOSS
Don’t expect ready solutions
Find the closest match
Customize as needed
33. Guidelines (cont…)
Database Optimization
Use established & proven solutions
MySQL
PostgreSQL
MongoDB
Redis
Memchached
CouchDB
Understand and utilize indexing & full-text search
Use optimized DB structure & algorithms
Modified Preorder Tree Traversal (MPTT)
Map Reduce
ORM or not?
34. Guidelines (cont…)
Database Optimization
Optimize your queries
One big query is faster than repetitive smaller
queries
Never be lazy to write optimized queries
One Ring to Rule `em All
Use Runtime In Memory Cache
Filtering in-memory cached dataset is much
faster than executing a query in DB
35. Guidelines (cont…)
One Ring to Rule `em All
Perform Selection, then Projection, then Join
a_i
d
A B C
1,000 records 1000,000 records 1000,000,000
records
A simple example
Write a standard SQL query to find all records with fields A.a1, B.b1 and
C.c1 from tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ),
and C(id, b_id, c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will
match ‘X’, ‘Y’ and ‘Z’ values respectively.
Assume all tables A, B, C has primary keys defined by id column and a_id
and b_id are the foreign keys in B from A and in C from B respectively.
36. Guidelines
One Ring to Rule `em All (cont…)
Solution 1
SELECT A.a1, B.b1, C.c1
FROM A, B, C
WHERE A.id = B.a_id AND B.id = C.b_id
AND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it Sucks?
•Remembered the size of A, B and C tables?
•Cross product of tables are always memory extensive, why?
•A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records with (P
+1) + (Q +2) + (R +2) fields
•Can you imagine the size of in-memory result set of joined tables?
•It will be HUGE
37. Guidelines
One Ring to Rule `em All (cont…)
Solution 2
SELECT A.a1, B.b1, C.c1
FROM A
INNER JOIN B ON A.id = B.a_id
INNER JOIN C ON B.id = C.b_id
WHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it still Sucks?
•A B C will produce (1,000 x 1,000,000) records to perform A B and
then produce another (1,000 x 1,000,000,000) records to compute (A B) C
and then it will filters the records defined by WHERE clause.
•The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will also
contribute in memory consumption.
•It is optimized but still be HUGE with respect to memory consumption and
computation
38. Guidelines
One Ring to Rule `em All (cont…)
Optimal Solution
SELECT A.a1, B.b1, C.c1
FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as A
INNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id =
B.a_id
INNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id =
Why this solution out performs?
C.b_id
•Let’s keep the explanation as an exercise
42. Thank You
Join phpXperts [http://bit.ly/phpxperts]
Follow me on twitter [http://twitter.com/mnishihan]
Subscribe in facebook [http://fb.me/mnishihan]