2. Abstract
● Introduction to Cloud Computing
● Cloud Characteristics
● Data Analysis in the Cloud
● Replication
● Master-slave election
● References
● Q&A
3. Introduction to Cloud Computing
● Encompass works of computer processing, storage and software delivery
● Get rid of large IT investments and its management
○ no need for configuration and extra employers to do that
● Enable professionals to get in powerful computing resources
○ Powerful computers are hard to buy
○ Maintenance is expensive
● pay-as-you-go model is preferable for startups
○ pay how much you use
4. Cloud Characteristics
● Elasticity helps to widen database due to demands
○ Quickly insert new resources
● Security risk for data
○ Governments may have in law rights to reach servers
● Replication across large geographic distance
○ Latency in data transfer
● Heterogeneous infrastructure
○ Different resource usage for VMs in same cloud
5. Data Analysis in the Cloud
● Wish List
○ Efficiency
○ Fault tolerance
■ hard to guarantee ACID properties in transactional data
management over large geographical distances
■ complex queries can take time on weak processors
○ ability to run in a heterogeneous environment
■ different performance of nodes
○ ability of data encryption
■ decrypt data before sending to avoid high bandwidth
○ ability to interface with business products
■ ODBC or JDBC
6. Replication (1)
● Master-slave
○ master: controller node.
○ slave: read-only nodes
● Write operation is done on master nodes. Slaves replicate the changes.
● Multi-master replication
○ one fails, others continue
○ at different physical locations can shorten distance to slaves
○ loosely consistent
○ violates ACID
○ complex and increases latency
○ conflict resolution
7. Replication (2)
● Multimaster replication (cont.)
○ e.g. Couchdb, cloudant, oracle, mysql etc
○ Multiversion Concurrency Control (MVCC)
● Replication types
○ Storage level replication
■ guarantees ‘zero data loss’
■ copies disk blocks
○ File level replication
■ less bandwidth
■ know what to replicate
■ uses CPU
8. Replication (3)
● Replication types(cont.)
○ Journaling
■ Operation logs
■ See which operations are done and apply them in secondaries
■ May be preferable for sensitive data
● Database size may differ
○ Different pre-allocation
○ Different disk fragmentation
10. ● Need to be immediate and fast
○ Absence of a primary should be detected fast
○ Election must start immediately
○ Without a primary node, replica set is read-only
● Odd number of nodes is recommended
○ The master will be one who connects
to majority.
○ Accept-reject votes will not be equal.
Master-slave election
11. Master-slave election (2)
● Give priority for quick election
○ Node with highest priority will be voted.
○ A node with high priority can drop
candidacy of a node with low priority.
● Network partitions
○ Put the majority in same cloud