3. SQL Azure model
Each account has zero or more servers
Account Azure wide, provisioned in a common portal
Billing instrument
Each server has one or more databases
Zone for authentication: userId+password
Zone for administration and billing
Server Metadata about the databases and usage
Network access control based on client IP
Has unique DNS name and unit of geo-location
Each database has standard SQL objects
Unit of consistency and high availability
(autonomous replication)
Database Contains Users, Tables, Views, Indices, etc…
Most granular unit of usage reports
Three SKUs available (1GB, 10GB and 50GB)
3 12/25/2011
4. SQL Azure Network Topology use standard SQL
Applications
Application client libraries: ODBC,
ADO.Net, PHP, …
Internet
Azure
Cloud
TDS (tcp)
Security Boundary Load balancer forwards „sticky‟
LB sessions to TDS protocol tier
TDS (tcp)
Gateway Gateway Gateway Gateway Gateway Gateway
Gateway: TDS protocol gateway, enforces AUTHN/AUTHZ policy; proxy to CloudDB
TDS (tcp)
L SQL SQL SQL SQL
4 12/25/2011
Scalability and Availability: Fabric, Failover, Replication, and Load balancing
5. Replication
All reads are
completed at
the primary Ack
Read
Write
Writes Value
Ack
replicated to Ack
write quorum of
replicas Ack P Ack
Commit on
secondaries S Write Write S
first then
primary
S Write Write S
Each
transaction has
a commit
5
sequence 12/25/2011
number (epoch,
6. Reconfiguration
Types of reconfiguration
Primary failover
Removing a failed
secondary Failed
Adding recovered replica
Building a new secondary
S
B P
Assumes S
Failed
Failure detector
Leader election Safe in the presence
Both services provided by of cascading failures
Fabric layer
6 12/25/2011
7. Partition Management
Partition Manager (PM) is a highly available service
running in the Master cluster
Ensures all partitions are operational
Places replicas across failure domains
(rack/switch/server)
Ensures all partitions have target replica count
Balances the load across all the nodes
Each node manages multiple partitions
Global state maintained by the PM can be recreated
from the local node state in event of disaster (GPM
rebuild)
7 12/25/2011
8. System in Operation
Primary master node
SQL Server Partition Manager
Partition
Global Load Balancer
Managemen
Partition t
Partition Placement
map
Advisor
Leader
Fabric
Elector
Data Node Data Node Data Node Data Node Data Node Data Node
100 101 102 103 104 105
P S P P S P
S S S S S S
S P S S S S
S S S S
S
Fabric
Secondary Primary Secondary
8 12/25/2011
9. SQL node Architecture
Single physical DB for entire Machine SQL
node Instance
DB files and log shared master
across every logical
database/partition tempdb
Allows better logging msdb
throughput with sequential
IO/group commits CloudNode
No auto-growth on demand
stalls DB1 DB2 master
Uniform manageability and DB5 DB1 DB7
backup
Each partition is a “silo” with DB3 DB4 DB7
its own independent schema
Local SQL backup guards
against software bugs
9 12/25/2011
10. Recap
Two kinds of nodes:
Data nodes store application data
Master nodes store cluster metadata
Node failures are reliably detected
On every node, SQL and Fabric processes monitor each
other
Fabric processes monitor each other across nodes
Local failures cause nodes to fail-fast
Failures cause reconfiguration and placement
changes
10 12/25/2011
11. Hardware Architecture
Each rack hosts 2 pods of
L2 Switch 20 machines each
Each pod has a TOR mini-
switch
10GB uplink to L2 switch
Each SQL Azure machine
runs on commodity box
Example:
8 cores
32 GB RAM
1TB+ SATA drives
Programmable power
1Gb NIC
Machine spec changes as
hardware (pricing) evolves
11 12/25/2011
12. Hardware Challenges
SATA drives
On-disk cache and lack of true "write through" results
in Write Ahead Logging violations
DB requires in-order writes to be honored
Can force flush cache, but causes performance
degradation
Disk failures happen daily (at scale), fail-fast on
those
Bit-flips (enabled page checksums)
Drives just disappear
IOs are misdirected
Faulty NIC
Encountered message corruption
12 Enabled message signing and checksums 12/25/2011
13. Software Deployment
OS is automatically imaged via deployment
All the services are setup using file copy
Guarantees on which version is running
Provides fast switch to new version
Minimal global state allows running side by side
Yes, that includes the SQL Server DB engine
Rollout is monitored to ensure high availability
Knowledge of replica state health ensure SLA is met
Two phase rollouts for data or protocol changes
Leverages internal Autopilot technologies with SQL
Azure extensions
13 12/25/2011
14. Software Challenges
Lack of real-time OS features
CPU priority
High priority for Fabric lease traffic
Page Faults/GC
Locked pages for SQL and Fabric (in managed
code)
Fail fast or not?
Yes, for corruption/AV
No, for other issues unless centrally controlled
What is really considered failed?
Some failures are non-deterministic or hangs
Multiple protocols / channels means partial failures too
14 12/25/2011
15. Monitoring
Health model w/repair actions
Reboot -> Re-deploy -> Re-image (OS) -> RMA cycle
Additional monitoring for SQL tier
Connect / network probes
Memory leaks / hung worker processes
Database corruption detection
Trace and performance stats capture
Sourced from regular SQL trace and support mechanisms
Stored locally and pushed to a global cluster wide store
Global cluster used for service insight and problem tracking
15 12/25/2011
16. SQL Azure Login Process
7
1
TDS Gateway
Front-end Node
TDS Session Protocol Parser
6 2
Gateway Logic
Global Partition Map
Master Node
8 3
Master Node
Components
4 5
Backend Node 1 Backend Node 2 Backend Node 3
SQL Instance SQL Instance SQL Instance
SQL DB SQL DB SQL DB
16 Scalability and and Availability: Fabric,Failover,Replication, and Load balancing
Scalability Availability: Fabric, Failover, Replication, and Load balancing
12/25/2011
17. Security/Attack Considerations
Service
Secure channel required (SSL)
Denial Of Service trend tracking
Packet Inspection
Server
IP allow list (Firewall)
Idle connection culling
Generated server names
Database
Disallow the most commonly attacked user id‟s (SA,
Admin, root, guest, etc)
Standard SQL Authn/Authz mode
17 12/25/2011
22. Summary
Cloud is different
Not a different place to host code
SQL Azure IS SQL Server…a TDS endpoint
Create DB‟s and manage using what we already
know
Considerations and futures paint exciting picture of
what to expect looking forward
Opportunities are great
Customers want a utility approach to storage
New businesses and abilities in scale, availability, etc
But the price must be paid
Which is a good thing, otherwise everyone would be
22 12/25/2011
doing it!