Spanner: Google’s Globally-Distributed Database
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,
Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,
David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
Google, Inc.
Published in the Proceedings of OSDI 2012 1
Published in the Proceedings of OSDI 2012 1
Published in the Proceedings of OSDI 2012 16
Abstract
Spanner is Google’s scalable, multi-version, globallydistributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.Introduction
Spanner is a scalable, globally-distributed database designed, built, and deployed at Google. At the highest level of abstraction, it is a database that shards data across many sets of Paxos [21] state machines in datacenters spread all over the world. Replication is used for global availability and geographic locality; clients automatically failover between replicas. Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures. Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows.
Applications can use Spanner for high availability, even in the face of wide-area natural disasters, by replicating their data within or even across continents. Our initial customer was F1 [35], a rewrite of Google’s advertising backend. F1 uses five replicas spread across the United States. Most other applications will probably replicate their data across 3 to 5 datacenters in one geographic region, but with relatively independent failure modes. That is, most applications will choose lower latency over higher availability, as long as they can survive 1 or 2 datacenter failures.
Spanner’s main focus is managing cross-datacenter replicated data, but we have also spent a great deal of time in designing and implementing important database features on top of our distributed-systems infrastructure. Even though many projects happily use Bigtable [9], we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those t.
Spanner is Google's globally distributed database that provides externally consistent distributed transactions across data centers worldwide. It uses a novel TrueTime API to assign globally meaningful timestamps to transactions despite distribution, enabling features like consistent backups and schema updates at global scale. The database replicates data across multiple zones using the Paxos consensus protocol and provides a SQL-like query interface along with a schematized semi-relational data model.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
The document discusses the Industrial Internet of Things (IIoT) and key challenges in developing a dataflow programming model and middleware for IIoT systems. It notes that IIoT systems involve large-scale distributed data publishing and processing streams in a parallel manner. Existing pub-sub middleware like DDS can handle data distribution but lack support for composable local data processing. The document proposes combining DDS with reactive programming using Rx.NET to provide a unified dataflow model for both local processing and distribution.
This document discusses using big data tools like Lucene to simplify debugging of failing tests by extracting and analyzing data from large simulation log files. It describes parsing UVM log files and storing message elements in a Lucene database for fast querying. Graphical representations of the log file data are presented to aid analysis, showing messages within a time range or containing specific strings. Using big data tools in this way can shorten debug time and verification schedules.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
This document introduces a framework for monitoring and visualizing usage of Grid services. The framework collects usage data from sensors integrated with Grid services, stores the data in a database, and provides a portal with JSR168-compliant portlets to visualize the data. The framework has been implemented and integrated with the Globus Toolkit. Usage data is collected from services like GridFTP and stored in a standardized format. The portlets in the portal allow authorized users to query the database and view summaries of Grid usage over specified time periods in graphical or tabular formats.
This paper describes an integrated performance monitoring environment for parallel systems. It consists of:
1) A distributed monitoring system that collects performance data from instrumented applications and sends it to analysis tools.
2) Graphical and command-line profiling and visualization tools that analyze the performance data to identify bottlenecks.
3) A common graphical interface that provides a consistent way to instrument applications, start tool runs, and view performance results across different tools.
The environment aims to handle large amounts of performance data from massively parallel applications and provide insights at both the application and system level. It is initially targeted for the Intel Paragon but is designed to support different programming models.
The document provides an overview of Hadoop and its core components. It discusses:
- Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers.
- The two core components of Hadoop are HDFS for distributed storage, and MapReduce for distributed processing. HDFS stores data reliably across machines, while MapReduce processes large amounts of data in parallel.
- Hadoop can operate in three modes - standalone, pseudo-distributed and fully distributed. The document focuses on setting up Hadoop in standalone mode for development and testing purposes on a single machine.
Spanner is Google's globally distributed database that provides externally consistent distributed transactions across data centers worldwide. It uses a novel TrueTime API to assign globally meaningful timestamps to transactions despite distribution, enabling features like consistent backups and schema updates at global scale. The database replicates data across multiple zones using the Paxos consensus protocol and provides a SQL-like query interface along with a schematized semi-relational data model.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
The document discusses the Industrial Internet of Things (IIoT) and key challenges in developing a dataflow programming model and middleware for IIoT systems. It notes that IIoT systems involve large-scale distributed data publishing and processing streams in a parallel manner. Existing pub-sub middleware like DDS can handle data distribution but lack support for composable local data processing. The document proposes combining DDS with reactive programming using Rx.NET to provide a unified dataflow model for both local processing and distribution.
This document discusses using big data tools like Lucene to simplify debugging of failing tests by extracting and analyzing data from large simulation log files. It describes parsing UVM log files and storing message elements in a Lucene database for fast querying. Graphical representations of the log file data are presented to aid analysis, showing messages within a time range or containing specific strings. Using big data tools in this way can shorten debug time and verification schedules.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
This document introduces a framework for monitoring and visualizing usage of Grid services. The framework collects usage data from sensors integrated with Grid services, stores the data in a database, and provides a portal with JSR168-compliant portlets to visualize the data. The framework has been implemented and integrated with the Globus Toolkit. Usage data is collected from services like GridFTP and stored in a standardized format. The portlets in the portal allow authorized users to query the database and view summaries of Grid usage over specified time periods in graphical or tabular formats.
This paper describes an integrated performance monitoring environment for parallel systems. It consists of:
1) A distributed monitoring system that collects performance data from instrumented applications and sends it to analysis tools.
2) Graphical and command-line profiling and visualization tools that analyze the performance data to identify bottlenecks.
3) A common graphical interface that provides a consistent way to instrument applications, start tool runs, and view performance results across different tools.
The environment aims to handle large amounts of performance data from massively parallel applications and provide insights at both the application and system level. It is initially targeted for the Intel Paragon but is designed to support different programming models.
The document provides an overview of Hadoop and its core components. It discusses:
- Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of computers.
- The two core components of Hadoop are HDFS for distributed storage, and MapReduce for distributed processing. HDFS stores data reliably across machines, while MapReduce processes large amounts of data in parallel.
- Hadoop can operate in three modes - standalone, pseudo-distributed and fully distributed. The document focuses on setting up Hadoop in standalone mode for development and testing purposes on a single machine.
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
The document discusses the history and goals of distributed systems. It begins by describing how computers evolved from large centralized mainframes in the 1940s-1980s, to networked systems in the mid-1980s enabled by microprocessors and computer networks. The key goals of distributed systems are to make resources accessible across a network, hide the distributed nature of resources to provide transparency, remain open to new services, and scale effectively with increased users and resources. Examples of distributed systems include the internet, intranets, and worldwide web.
In the era of big data, even though we have large infrastructure, storage data varies in size,
formats, variety, volume and several platforms such as hadoop, cloud since we have problem associated
with an application how to process the data which is varying in size and format. Data varying in
application and resources available during run time is called dynamic workflow. Using large
infrastructure and huge amount of resources for the analysis of data is time consuming and waste of
resources, it’s better to use scheduling algorithm to analyse the given data set, for efficient execution of
data set without time consuming and evaluate which scheduling algorithm is best and suitable for the
given data set. We evaluate with different data set understand which is the most suitable algorithm for
analysis of data being efficient execution of data set and store the data after analysis
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
This technical report describes a reconfigurable component-based problem solving environment called DISCWorld. The key features discussed are:
1) DISCWorld uses a data flow model represented as directed acyclic graphs (DAGs) of operators to integrate distributed computing components across networks.
2) It supports both long running simulations and parameter search applications by allowing complex processing requests to be composed graphically or through scripting and executed on heterogeneous platforms.
3) Operators can be simple "pure Java" implementations or wrappers to fast platform-specific implementations, and some operators may represent sub-graphs that can be reconfigured to run across multiple servers for faster execution.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Truly dependable software systems should be built with structuring techniques able to decompose the software complexity without
hiding important hypotheses and assumptions such as those regarding
their target execution environment and the expected fault- and system
models. A judicious assessment of what can be made transparent and
what should be translucent is necessary. This paper discusses a practical
example of a structuring technique built with these principles in mind:
Reflective and refractive variables. We show that our technique offers
an acceptable degree of separation of the design concerns, with limited
code intrusion; at the same time, by construction, it separates but does
not hide the complexity required for managing fault-tolerance. In particular, our technique offers access to collected system-wide information
and the knowledge extracted from that information. This can be used
to devise architectures that minimize the hazard of a mismatch between
dependable software and the target execution environments.
Kumar Ramaswamy provides services related to developing highly scalable and secure distributed systems using technologies like PostgreSQL, HDFS, Spark and Kafka. He has extensive experience architecting fault tolerant systems and has developed distributed systems for tasks like product tracking and advertisement data processing. His background includes work with technologies such as Unix, Java, distributed databases and big data platforms.
This document describes a web application for analyzing building energy management data using predictive modeling and machine learning techniques. The application contains years of sensor data from a CUNY building and allows users to visualize the data, perform statistical analysis, and generate forecasts using Python modules. Key features include interactive data visualization, filtering and selecting subsets of data, defining expressions of sensor variables, and applying machine learning models for prediction. The application provides a customizable platform for exploring time series data while allowing different users to share their work.
This document summarizes a research paper that proposes using a Google File System (GFS) configuration with MapReduce to improve CPU and storage utilization in a cloud computing system. It discusses how GFS with MapReduce can split large files into chunks and distribute the processing of those chunks across idle cloud nodes to make better use of resources. The document also addresses using encryption to improve security of data in the cloud.
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
This document summarizes a research paper that proposes using Google File System (GFS) and MapReduce in a cloud computing environment to improve resource utilization and processing of large datasets. The paper discusses GFS architecture with a master node and chunk servers, and how MapReduce can split large files into chunks and process them in parallel across idle cloud nodes. It also proposes encrypting data for security and using a third party to audit client files. The goal is to provide fault tolerance, optimize workload processing time, and maximize utilization of cloud resources for data-intensive applications.
The document discusses the key goals and challenges of distributed systems. The four main goals are:
1. Connecting users and resources to share resources easily.
2. Transparency by hiding locations of processes and resources.
3. Openness through standard services that are flexible and extensible.
4. Scalability to add more users, resources, and administrative organizations.
The main challenges are that solutions for single systems do not always work for distributed systems, and distributed systems introduce new problems like various failure modes and complex distributed state and management across independent systems.
IRJET - Health Medicare Data using Tweets in TwitterIRJET Journal
This document describes a proposed system to analyze health-related tweets from Twitter. The system would extract tweets using Twitter APIs, preprocess the tweets by removing stop words and replacing emojis and slang with standard words. The preprocessed tweets would then be classified using a support vector machine model to categorize them based on discussed health topics and diseases. The system would generate reports showing the number of tweets in different countries discussing specific diseases, to help predict where disease outbreaks may occur. The proposed system aims to provide real-time health insights from social media data on Twitter.
Using replication, you can distribute data to different locations an.pdfmcmillenfawbushrom13
Using replication, you can distribute data to different locations and to remote or mobile users
over local and wide area networks, dial-up connections, wireless connections, and the Internet.
Why would you want to do this and how do you do this? What is an alternative to replication?
Solution
Replication is a set of technologies for distributing data and database objects from one database
to another and then synchronizing between databases to maintain consistency and also copying
databases. Using replication, you can distribute data to different locations and to remote or
mobile users over local and wide area networks, dial-up connections, wireless connections, and
the Internet.
Transactional replication is basically used in server-to-server concepts that basically requires
high throughput, including: improving availability and scalability; data warehousing and
reporting; integrating heterogeneous data; and offloading batch processing integrating data from
multiple sites;. Merge replication is basically designed for mobile applications or distributed
server applications that have possible data conflicts.
Common uses include
consumer point of sale applications that uses to exchange exchanging data with mobile users;
and integration of data from multiple sites. Snapshot replication is used to provide the initial data
set for transactional and merge replication
; it can also be used when complete refreshes of data are appropriate. With these three types of
replication, SQL Server provides a powerful system and flexible system for synchronizing data
across your enterprise. Replication to SQLCE 3.5 and SQLCE 4.0 is supported on both Windows
Server 2012 and Windows 8.
As an alternative to replication, you can use Microsoft Sync Framework. Sync Framework
includes components and an intuitive and flexible API that make it easy to synchronize among
SQL Server, SQL Server Express, SQL Server Compact, and SQL .
You can basically use any of the two approaches as an alternatives-
Log Shipping-
It automatically sends transaction log backups from primary database to Secondary database on
another server. An optional third server( monitor server), records the history and status of backup
and restore operations. The monitor server can alarmed alerts if these operations fail to occur as
scheduled.
Mirroring-
Database mirroring is a software solution for increasing database availability.
It maintains two copies of a single database that must reside on different server instances of
SQL Server Database Engine..
The document provides information about installing and configuring OpenStack including:
1) It describes the hardware, software and networking requirements for the control and compute nodes when installing OpenStack.
2) It explains the different deployment options for OpenStack including all-in-one, multiple control and compute nodes, and different options for separating services.
3) It provides steps for installing OpenStack using Packstack, including generating SSH keys, editing the answer file and starting the deployment.
4) It gives an overview of the message broker services used by OpenStack and describes how to configure RabbitMQ or Qpid as the message broker.
1) Load balancing is an important issue in cloud computing to improve performance and resource utilization. It aims to distribute tasks evenly among nodes to prevent overloading some nodes while leaving others idle.
2) There are two main categories of load balancing algorithms: static and dynamic. Static algorithms do not consider current system state while dynamic algorithms react to changing system states.
3) Prior research on load balancing in cloud computing has proposed approaches such as using a genetic algorithm to optimize load balancing and addressing delays in dynamic load balancing.
Cloud computing is that ensuing generation of computation. In all probability folks can have everything they need on the cloud. Cloud computing provides resources to shopper on demand. The resources also are code package resources or hardware resources. Cloud computing architectures unit distributed, parallel and serves the requirements of multiple purchasers in various things. This distributed style deploys resources distributive to deliver services with efficiency to users in various geographical channels. Purchasers in a very distributed setting generate request haphazardly in any processor. So the most important disadvantage of this randomness is expounded to task assignment. The unequal task assignment to the processor creates imbalance i.e., variety of the processors sq. measure over laden and many of them unit of measurement to a lower place loaded. The target of load equalisation is to transfer the load from over laden technique to a lower place loaded technique transparently. Load equalisation is one altogether the central issues in cloud computing. To comprehend high performance, minimum interval and high resource utilization relation we want to transfer the tasks between nodes in cloud network. Load equalisation technique is utilized to distribute tasks from over loaded nodes to a lower place loaded or idle nodes. In following sections we have a tendency to tend to stand live discuss concerning cloud computing, load equalisation techniques and additionally the planned work of our load equalisation system. Proposed load equalisation rule is simulated on Cloud Analyst toolkit. Performance is analyzed on the parameters of overall interval, knowledge transfer, average knowledge center mating time and total value of usage. Results area unit compared with 3 existing load equalisation algorithms specifically spherical Robin, Equally unfold Current Execution Load, and Throttled. Results on the premise of case studies performed shows additional knowledge transfer with minimum interval.
This document provides an overview of IBM Spectrum Scale, a high-performance file system for managing large amounts of unstructured data. Some key points:
- Spectrum Scale allows concurrent access to file data across multiple servers and storage devices for high performance and scalability.
- It provides integrated tools for data availability, replication, snapshots, quotas and more to simplify management of petabytes of data and billions of files.
- New features in version 4.1 include file encryption, flash caching, network monitoring and improvements to backup/restore and data migration.
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
This document discusses data management in cloud computing and provides an overview of existing NoSQL database systems and their advantages over traditional SQL databases. It begins by defining cloud computing and the need for scalable data storage. It then discusses key goals for cloud data management systems including availability, scalability, elasticity and performance. Several popular NoSQL databases are described, including BigTable, MongoDB and Dynamo. The advantages of NoSQL systems like elastic scaling and easier administration are contrasted with some limitations like limited transaction support. The document concludes by discussing opportunities for future research to improve scalability and queries in cloud data management systems.
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
This document provides an introduction to distributed systems including definitions, characteristics, motivation, and models. It discusses key topics such as message passing vs shared memory, synchronous vs asynchronous execution, and challenges in distributed system design. Models of distributed computation and logical time frameworks are also introduced.
Summarize the key ideas of each of these texts and explain how they .docxrafbolet0
Summarize the key ideas of each of these texts and explain how they shed light on our study of American religious diversity. Point out some key citations and explain the most important thing you learned from these readings and how these readings helped you achieve the educational goals of our course
US Bill of Rights, UN DECLARATION OF HUMAN RIGHTS, and UNESCO on Diversity and Tolerance; Dignitatis Humanae, and Nostra Aetate
Clash of civilizations, Civil Religion (Reader, pp288-289), and Dominus Iesus
Dynamics of Prejudice (Reader, pp.32-39; 111-114; 295-309)
“Die Judenfrage” (Reader, pp.178-209)
The Irish case (Reader, pp.169-177)
Idolatry (Cantwell Smith, Reader, pp.259-266) and Tolerant Gods ( by Wole Soyinka, text on moodle)
Sacred Texts, Christian and Islamic vision of Religious Tolerance (Reader, p.44, and moodle)
The Real Kant, Multiculturalism, Eurocentrism and the Columbus paradigm (Reader, pp.93-103; 352-358; and pp.282-289)
“Calore-Colore” Paradigm (Reader, pp. 323-346) and scholarship on ATR, and scientific theories or mythologies of otherness (pp, 111-128; 295-346)
AAR article on Egyptology and “Egypt and Israel”
Choose 3 questions from the list above :
the papers should be clear and professonal, answer questions and explain the points that you wants to explain with examples from SACRED TEXTS (BIBLE AND KORAN). I want the writer to do the papers professionally, and to be neutral and non-racist, I want him explain that the examples of the Koran show the positive side, which is commensurate with the topic you will write, And, if possible, that there is a positive similarity between the Koran and the Bible. I already provide additional file can help the writer and you can looking for Koran and Bible to use it
.
Submit, individually, different kinds of data breaches, the threats .docxrafbolet0
This document provides instructions for an assignment to submit a paper analyzing different types of data breaches, the threats that enable them, and their severity. The paper should be APA formatted with 1-inch margins, consistent font, and double spacing, include a 1-page title page, 2-3 pages of body text, and a 1-page references section.
More Related Content
Similar to Spanner Google’s Globally-Distributed DatabaseJames C. Corbett,.docx
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
The document discusses the history and goals of distributed systems. It begins by describing how computers evolved from large centralized mainframes in the 1940s-1980s, to networked systems in the mid-1980s enabled by microprocessors and computer networks. The key goals of distributed systems are to make resources accessible across a network, hide the distributed nature of resources to provide transparency, remain open to new services, and scale effectively with increased users and resources. Examples of distributed systems include the internet, intranets, and worldwide web.
In the era of big data, even though we have large infrastructure, storage data varies in size,
formats, variety, volume and several platforms such as hadoop, cloud since we have problem associated
with an application how to process the data which is varying in size and format. Data varying in
application and resources available during run time is called dynamic workflow. Using large
infrastructure and huge amount of resources for the analysis of data is time consuming and waste of
resources, it’s better to use scheduling algorithm to analyse the given data set, for efficient execution of
data set without time consuming and evaluate which scheduling algorithm is best and suitable for the
given data set. We evaluate with different data set understand which is the most suitable algorithm for
analysis of data being efficient execution of data set and store the data after analysis
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
This technical report describes a reconfigurable component-based problem solving environment called DISCWorld. The key features discussed are:
1) DISCWorld uses a data flow model represented as directed acyclic graphs (DAGs) of operators to integrate distributed computing components across networks.
2) It supports both long running simulations and parameter search applications by allowing complex processing requests to be composed graphically or through scripting and executed on heterogeneous platforms.
3) Operators can be simple "pure Java" implementations or wrappers to fast platform-specific implementations, and some operators may represent sub-graphs that can be reconfigured to run across multiple servers for faster execution.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Truly dependable software systems should be built with structuring techniques able to decompose the software complexity without
hiding important hypotheses and assumptions such as those regarding
their target execution environment and the expected fault- and system
models. A judicious assessment of what can be made transparent and
what should be translucent is necessary. This paper discusses a practical
example of a structuring technique built with these principles in mind:
Reflective and refractive variables. We show that our technique offers
an acceptable degree of separation of the design concerns, with limited
code intrusion; at the same time, by construction, it separates but does
not hide the complexity required for managing fault-tolerance. In particular, our technique offers access to collected system-wide information
and the knowledge extracted from that information. This can be used
to devise architectures that minimize the hazard of a mismatch between
dependable software and the target execution environments.
Kumar Ramaswamy provides services related to developing highly scalable and secure distributed systems using technologies like PostgreSQL, HDFS, Spark and Kafka. He has extensive experience architecting fault tolerant systems and has developed distributed systems for tasks like product tracking and advertisement data processing. His background includes work with technologies such as Unix, Java, distributed databases and big data platforms.
This document describes a web application for analyzing building energy management data using predictive modeling and machine learning techniques. The application contains years of sensor data from a CUNY building and allows users to visualize the data, perform statistical analysis, and generate forecasts using Python modules. Key features include interactive data visualization, filtering and selecting subsets of data, defining expressions of sensor variables, and applying machine learning models for prediction. The application provides a customizable platform for exploring time series data while allowing different users to share their work.
This document summarizes a research paper that proposes using a Google File System (GFS) configuration with MapReduce to improve CPU and storage utilization in a cloud computing system. It discusses how GFS with MapReduce can split large files into chunks and distribute the processing of those chunks across idle cloud nodes to make better use of resources. The document also addresses using encryption to improve security of data in the cloud.
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
This document summarizes a research paper that proposes using Google File System (GFS) and MapReduce in a cloud computing environment to improve resource utilization and processing of large datasets. The paper discusses GFS architecture with a master node and chunk servers, and how MapReduce can split large files into chunks and process them in parallel across idle cloud nodes. It also proposes encrypting data for security and using a third party to audit client files. The goal is to provide fault tolerance, optimize workload processing time, and maximize utilization of cloud resources for data-intensive applications.
The document discusses the key goals and challenges of distributed systems. The four main goals are:
1. Connecting users and resources to share resources easily.
2. Transparency by hiding locations of processes and resources.
3. Openness through standard services that are flexible and extensible.
4. Scalability to add more users, resources, and administrative organizations.
The main challenges are that solutions for single systems do not always work for distributed systems, and distributed systems introduce new problems like various failure modes and complex distributed state and management across independent systems.
IRJET - Health Medicare Data using Tweets in TwitterIRJET Journal
This document describes a proposed system to analyze health-related tweets from Twitter. The system would extract tweets using Twitter APIs, preprocess the tweets by removing stop words and replacing emojis and slang with standard words. The preprocessed tweets would then be classified using a support vector machine model to categorize them based on discussed health topics and diseases. The system would generate reports showing the number of tweets in different countries discussing specific diseases, to help predict where disease outbreaks may occur. The proposed system aims to provide real-time health insights from social media data on Twitter.
Using replication, you can distribute data to different locations an.pdfmcmillenfawbushrom13
Using replication, you can distribute data to different locations and to remote or mobile users
over local and wide area networks, dial-up connections, wireless connections, and the Internet.
Why would you want to do this and how do you do this? What is an alternative to replication?
Solution
Replication is a set of technologies for distributing data and database objects from one database
to another and then synchronizing between databases to maintain consistency and also copying
databases. Using replication, you can distribute data to different locations and to remote or
mobile users over local and wide area networks, dial-up connections, wireless connections, and
the Internet.
Transactional replication is basically used in server-to-server concepts that basically requires
high throughput, including: improving availability and scalability; data warehousing and
reporting; integrating heterogeneous data; and offloading batch processing integrating data from
multiple sites;. Merge replication is basically designed for mobile applications or distributed
server applications that have possible data conflicts.
Common uses include
consumer point of sale applications that uses to exchange exchanging data with mobile users;
and integration of data from multiple sites. Snapshot replication is used to provide the initial data
set for transactional and merge replication
; it can also be used when complete refreshes of data are appropriate. With these three types of
replication, SQL Server provides a powerful system and flexible system for synchronizing data
across your enterprise. Replication to SQLCE 3.5 and SQLCE 4.0 is supported on both Windows
Server 2012 and Windows 8.
As an alternative to replication, you can use Microsoft Sync Framework. Sync Framework
includes components and an intuitive and flexible API that make it easy to synchronize among
SQL Server, SQL Server Express, SQL Server Compact, and SQL .
You can basically use any of the two approaches as an alternatives-
Log Shipping-
It automatically sends transaction log backups from primary database to Secondary database on
another server. An optional third server( monitor server), records the history and status of backup
and restore operations. The monitor server can alarmed alerts if these operations fail to occur as
scheduled.
Mirroring-
Database mirroring is a software solution for increasing database availability.
It maintains two copies of a single database that must reside on different server instances of
SQL Server Database Engine..
The document provides information about installing and configuring OpenStack including:
1) It describes the hardware, software and networking requirements for the control and compute nodes when installing OpenStack.
2) It explains the different deployment options for OpenStack including all-in-one, multiple control and compute nodes, and different options for separating services.
3) It provides steps for installing OpenStack using Packstack, including generating SSH keys, editing the answer file and starting the deployment.
4) It gives an overview of the message broker services used by OpenStack and describes how to configure RabbitMQ or Qpid as the message broker.
1) Load balancing is an important issue in cloud computing to improve performance and resource utilization. It aims to distribute tasks evenly among nodes to prevent overloading some nodes while leaving others idle.
2) There are two main categories of load balancing algorithms: static and dynamic. Static algorithms do not consider current system state while dynamic algorithms react to changing system states.
3) Prior research on load balancing in cloud computing has proposed approaches such as using a genetic algorithm to optimize load balancing and addressing delays in dynamic load balancing.
Cloud computing is that ensuing generation of computation. In all probability folks can have everything they need on the cloud. Cloud computing provides resources to shopper on demand. The resources also are code package resources or hardware resources. Cloud computing architectures unit distributed, parallel and serves the requirements of multiple purchasers in various things. This distributed style deploys resources distributive to deliver services with efficiency to users in various geographical channels. Purchasers in a very distributed setting generate request haphazardly in any processor. So the most important disadvantage of this randomness is expounded to task assignment. The unequal task assignment to the processor creates imbalance i.e., variety of the processors sq. measure over laden and many of them unit of measurement to a lower place loaded. The target of load equalisation is to transfer the load from over laden technique to a lower place loaded technique transparently. Load equalisation is one altogether the central issues in cloud computing. To comprehend high performance, minimum interval and high resource utilization relation we want to transfer the tasks between nodes in cloud network. Load equalisation technique is utilized to distribute tasks from over loaded nodes to a lower place loaded or idle nodes. In following sections we have a tendency to tend to stand live discuss concerning cloud computing, load equalisation techniques and additionally the planned work of our load equalisation system. Proposed load equalisation rule is simulated on Cloud Analyst toolkit. Performance is analyzed on the parameters of overall interval, knowledge transfer, average knowledge center mating time and total value of usage. Results area unit compared with 3 existing load equalisation algorithms specifically spherical Robin, Equally unfold Current Execution Load, and Throttled. Results on the premise of case studies performed shows additional knowledge transfer with minimum interval.
This document provides an overview of IBM Spectrum Scale, a high-performance file system for managing large amounts of unstructured data. Some key points:
- Spectrum Scale allows concurrent access to file data across multiple servers and storage devices for high performance and scalability.
- It provides integrated tools for data availability, replication, snapshots, quotas and more to simplify management of petabytes of data and billions of files.
- New features in version 4.1 include file encryption, flash caching, network monitoring and improvements to backup/restore and data migration.
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
This document discusses data management in cloud computing and provides an overview of existing NoSQL database systems and their advantages over traditional SQL databases. It begins by defining cloud computing and the need for scalable data storage. It then discusses key goals for cloud data management systems including availability, scalability, elasticity and performance. Several popular NoSQL databases are described, including BigTable, MongoDB and Dynamo. The advantages of NoSQL systems like elastic scaling and easier administration are contrasted with some limitations like limited transaction support. The document concludes by discussing opportunities for future research to improve scalability and queries in cloud data management systems.
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
This document provides an introduction to distributed systems including definitions, characteristics, motivation, and models. It discusses key topics such as message passing vs shared memory, synchronous vs asynchronous execution, and challenges in distributed system design. Models of distributed computation and logical time frameworks are also introduced.
Similar to Spanner Google’s Globally-Distributed DatabaseJames C. Corbett,.docx (20)
Summarize the key ideas of each of these texts and explain how they .docxrafbolet0
Summarize the key ideas of each of these texts and explain how they shed light on our study of American religious diversity. Point out some key citations and explain the most important thing you learned from these readings and how these readings helped you achieve the educational goals of our course
US Bill of Rights, UN DECLARATION OF HUMAN RIGHTS, and UNESCO on Diversity and Tolerance; Dignitatis Humanae, and Nostra Aetate
Clash of civilizations, Civil Religion (Reader, pp288-289), and Dominus Iesus
Dynamics of Prejudice (Reader, pp.32-39; 111-114; 295-309)
“Die Judenfrage” (Reader, pp.178-209)
The Irish case (Reader, pp.169-177)
Idolatry (Cantwell Smith, Reader, pp.259-266) and Tolerant Gods ( by Wole Soyinka, text on moodle)
Sacred Texts, Christian and Islamic vision of Religious Tolerance (Reader, p.44, and moodle)
The Real Kant, Multiculturalism, Eurocentrism and the Columbus paradigm (Reader, pp.93-103; 352-358; and pp.282-289)
“Calore-Colore” Paradigm (Reader, pp. 323-346) and scholarship on ATR, and scientific theories or mythologies of otherness (pp, 111-128; 295-346)
AAR article on Egyptology and “Egypt and Israel”
Choose 3 questions from the list above :
the papers should be clear and professonal, answer questions and explain the points that you wants to explain with examples from SACRED TEXTS (BIBLE AND KORAN). I want the writer to do the papers professionally, and to be neutral and non-racist, I want him explain that the examples of the Koran show the positive side, which is commensurate with the topic you will write, And, if possible, that there is a positive similarity between the Koran and the Bible. I already provide additional file can help the writer and you can looking for Koran and Bible to use it
.
Submit, individually, different kinds of data breaches, the threats .docxrafbolet0
This document provides instructions for an assignment to submit a paper analyzing different types of data breaches, the threats that enable them, and their severity. The paper should be APA formatted with 1-inch margins, consistent font, and double spacing, include a 1-page title page, 2-3 pages of body text, and a 1-page references section.
Submit your personal crimes analysis using Microsoft® PowerPoi.docxrafbolet0
Submit
your personal crimes analysis using Microsoft
®
PowerPoint
®
or another pre-approved presentation tool.
Create
a 10- to 15-slide presentation that includes a reference slide with at least four references cited throughout the presentation. Include the following:
·
Differentiate between assault, battery, and mayhem.
·
Identify and explain kidnapping and false imprisonment.
·
Compare and contrast between rape and statutory rape.
·
Choose two states and compare the definitions and punishment for these crimes.
Include
appropriate photos, short videos, or headlines, as needed, to represent your analysis.
Format
your presentation consistent with APA guidelines.
.
Submit two pages (double spaced, 12 point font) describing a musical.docxrafbolet0
Submit two pages (double spaced, 12 point font) describing a musical concert of your choosing, suggested in the syllabus or approved by instructor. Describe as many factors as possible: who/what/where/ when, how many musicians performed, what instruments did they play, name several of the musical pieces, how did they sound (use some of the terms we learned in the course), what did the musicians wear, describe the audience, describe the music (how did it make you feel, etc.), what did you enjoy most about the event? Share your reflections.
.
Submit the rough draft of your geology project. Included in your rou.docxrafbolet0
Submit the rough draft of your geology project. Included in your rough draft should be the text as close as possible to the way you intend on submitting it as well as data tables and rough sketches of figures.
Proofread everything and check your work according to the
Evaluation
guidelines in the original assignment in Week 02.
Geology Project Requirements
**Please review your paper for all of the below before submitting your Week 8 Rough Draft or Week 10 Final Paper.**
-
Length
·
Paper is to be 7 pages, at a minimum, in length:
o
One Cover Page
o
One Reference page
o
5 pages of written text (which does not include space taken up by photos, illustrations or charts).
-
Formatting
·
All paragraphs need to be indented.
·
Font should be Times New Roman and size 12.
·
The line spacing should be double spaced.
·
Make sure there is an introduction paragraph, thebody paragraphs are well organized and a conclusion paragraph.
·
Stay away from many short sentences in a paragraph, as the paragraph needs to flow. (These can be fragment sentences and can make the paper confusing when reading.)
·
Also stay away from many short paragraphs in the body of the paper, if organized well, then there will be medium length paragraphs.
·
Paper should be aligned to the left margin – not center or wide across.
-
Writing
·
This is a science research paper about a geology topic and must be in third person, therefore words such as we, me, I you, our, or us are not allowed to be used. Make sure these are not in your paper.
o
This also pertains to let’s. (Let’s short for let us.)
·
Make sure that all of your sentences are strong and independent.
·
Paper needs to be written using proper mechanics (clear, concise, complete sentences and paragraphs), proper spelling, grammar and punctuation.
·
Do not start your introduction or paper off with ‘This paper will look at…’ or ‘This paper will cover…’ Your thesis should not contain these words and should be a stand alone sentence with a passive lead in.
·
Spell Check Spell Check Spell Check.
·
Any introduction of a new word or scientific word that your reader may not know the definition of, be sure to include the definition for better understanding.
·
Acronyms. The first time an acronym is used, be sure to define what it stands for – such as USGS (United States Geological Survey). Then each subsequent time this acronym is used in the paper, you can just write USGS since it has already been defined to the reader.
·
Make sure to capitalize proper nouns such as Earth.
·
Make sure paragraphs transition and flow well between each other. Read the paper out loud to yourself before final submission to make sure these transitions are in place.
·
Please do not be a casual writer in this paper. What I mean by that is do not write how you would talk in a casual conversation, text on your phone or email a friend. This is a research paper and therefore the presentation and writing style needs to be.
Submit your paper of Sections III and IV of the final project. Spe.docxrafbolet0
Submit your paper of Sections III and IV of the final project. Specifically, the following critical elements must be addressed:
III. Billing and Reimbursement
A. Analyze the collection of data by patient access personnel and its importance to the billing and collection process. Be sure to address the importance of exceptional customer service.
B. Analyze how third-party policies would be used when developing billing guidelines for patient financial services (PFS) personnel and administration when determining the payer mix for maximum reimbursement.
C. Organize the key areas of review in order of importance for timeliness and maximization of reimbursement from third-party payers. Explain your rationale on the order.
D. Describe a way to structure your follow-up staff in terms of effectiveness. How can you ensure that this structure will be effective?
E. Develop a plan for periodic review of procedures to ensure compliance. Include explicit steps for this plan and the feasibility of enacting this plan within this organization.
IV. Marketing and Reimbursement
A. Analyze the strategies used to negotiate new managed care contracts. Support your analysis with research.
B. Communicate the important role that each individual within this healthcare organization plays with regard to managed care contracts. Be sure to include the different individuals within the healthcare organization.
C. Explain how new managed care contracts impact reimbursement for the healthcare organization. Support your explanation with concrete evidence or research.
D. Discuss the resources needed to ensure billing and coding compliance with regulations and ethical standards. What would happen if these resources were not obtained? Describe the consequences of noncompliance with regulations and ethical standards.
.
Submit the finished product for your Geology Project. Please include.docxrafbolet0
Submit the finished product for your Geology Project. Please include all figures, data tables, and text in the same document.
Before you submit, please proofread once more as you check the
Evaluation
guidelines from the original assignment in Week 02.
I need the sources in-text citations please and sources throughout the paper with quotation marks!!! THIS IS NECESSARY. I have the rough draft I can send it.
Geology Project Requirements
**Please review your paper for all of the below before submitting your Week 10 Final Paper.**
-
Length
·
Paper is to be 7 pages, at a minimum, in length:
o
One Cover Page
o
One Reference page
o
5 pages of written text (which does not include space taken up by photos, illustrations or charts).
-
Formatting
·
All paragraphs need to be indented.
·
Font should be Times New Roman and size 12.
·
The line spacing should be double spaced.
·
Make sure there is an introduction paragraph, the body paragraphs are well organized and a conclusion paragraph.
·
Stay away from many short sentences in a paragraph, as the paragraph needs to flow. (These can be fragment sentences and can make the paper confusing when reading.)
·
Also stay away from many short paragraphs in the body of the paper, if organized well, then there will be medium length paragraphs.
·
Paper should be aligned to the left margin – not center or wide across.
-
Writing
·
This is a science research paper about a geology topic and must be in third person, therefore words such as we, me, I you, our, or us are not allowed to be used. Make sure these are not in your paper.
o
This also pertains to let’s. (Let’s short for let us.)
·
Make sure that all of your sentences are strong and independent.
·
Paper needs to be written using proper mechanics (clear, concise, complete sentences and paragraphs), proper spelling, grammar and punctuation.
·
Do not start your introduction or paper off with ‘This paper will look at…’ or ‘This paper will cover…’ Your thesis should not contain these words and should be a stand alone sentence with a passive lead in.
·
Spell Check Spell Check Spell Check.
·
Any introduction of a new word or scientific word that your reader may not know the definition of, be sure to include the definition for better understanding.
·
Acronyms. The first time an acronym is used, be sure to define what it stands for – such as USGS (United States Geological Survey). Then each subsequent time this acronym is used in the paper, you can just write USGS since it has already been defined to the reader.
·
Make sure to capitalize proper nouns such as Earth.
·
Make sure paragraphs transition and flow well between each other. Read the paper out loud to yourself before final submission to make sure these transitions are in place.
·
Please do not be a casual writer in this paper. What I mean by that is do not write how you would talk in a casual conversation, text on your phone or email a friend. This is a research paper.
Submit the Background Information portion of the final project, desc.docxrafbolet0
Submit the Background Information portion of the final project, describing the company and business product, service, or other idea from the business pla. In the description, make sure that you include the target stakeholders and their relationship to the mission, vision, and values of the company. Concisely describe the company and business product or service. Be sure to include the company’s publicly traded name and stock symbol if these exist.
2-3 pages. APA
.
Submit Files - Assignment 1 Role of Manager and Impact of Organizati.docxrafbolet0
Submit Files - Assignment 1 Role of Manager and Impact of Organizational Theories on Managers
Assignment 1 Role of Manager and Impact of Organizational Theories on Managers (Week 3)
Purpose:
In the first assignment, students are given a scenario in which the shipping manager who has worked for Galaxy Toys, Inc. since 1969. The scenario serves to set the stage for students to demonstrate how management theories have changed over time. For example, managing 30 years ago is different than managing in the 21
st
century.
Outcome Met by Completing This Assignment:
integrate management theories and principles into management practices
Instructions:
In Part One of this case study analysis, students are to use the facts from the case study to determine two different organization theories that are demonstrated. For Part Two, students will compare the 21
st
century manager to that of the main character in the case study and the implications of change in being a 21
st
century manager.
In selecting a school of thought and an organizational theory that best describes the current shipping manager, students will use the timeline to select a school of thought and a theory or theories of that time frame. Students will to use the course material to respond to most of the assignment requirements but will also need to research the theorist(s) and theories to complete the assignment. Students are expected to be thorough in responding.
In Part Two, students are going to take what they have learned and compare the management skills of the 21st century shipping manager to the skills of the current shipping manager.
Step 1:
Review “How to Analyze a Case Study” under Week 3 Content.
Step 2:
Create a Word or Rich Text Format (RTF) document that is double-spaced, 12-point font. The final product will be between 4-6 pages in length excluding the title page and reference page.
Step 3:
Review the grading rubric for the assignment.
Step 4:
In addition to providing an introduction, students will use headings following this format:
Title page with title, your name, the course, the instructor’s name;
Background;
Part One;
Part Two.
Step 5
: In writing a case study, the writing is in the third person. What this means is that there are no words such as “I, me, my, we, or us” (first person writing), nor is there use of “you or your” (second person writing). If uncertain how to write in the third person, view this link:
http://www.quickanddirtytips.com/education/grammar/first-second-and-third-person
. Also note that students are not to provide personal commentary.
Step 6:
In writing this assignment, students are expected to support the reasoning using in-text citations and a reference list. If any material is used from a source document, it must be cited and referenced. A reference within a reference list cannot exist without an associated in-text citation and vice versa. View the sample APA paper under Week 1 content.
Step 7:
In writing thi.
SS
C
ha
Simple RegressionSimple Regression
pter
Chapter ContentsChapter Contents
12
12.1 Visual Displays and Correlation Analysis12.1 Visual Displays and Correlation Analysisp y yp y y
12.2 Simple Regression12.2 Simple Regression
12 3 Regression Terminology12 3 Regression Terminology12.3 Regression Terminology12.3 Regression Terminology
12.4 Ordinary Least Squares Formulas12.4 Ordinary Least Squares Formulas
12 T f Si ifi12 T f Si ifi12.5 Tests for Significance12.5 Tests for Significance
12.6 Analysis of Variance: Overall Fit12.6 Analysis of Variance: Overall Fit
12.7 Confidence and Prediction Intervals for 12.7 Confidence and Prediction Intervals for YY
12-1
SS
C
ha
Simple RegressionSimple Regression
pter
Chapter ContentsChapter Contents
12
12 8 Residual Tests12 8 Residual Tests12.8 Residual Tests12.8 Residual Tests
12.9 Unusual Observations12.9 Unusual Observations
12 10 Oth R i P bl12 10 Oth R i P bl12.10 Other Regression Problems12.10 Other Regression Problems
12-2
C
ha
SS
pter 1
Simple RegressionSimple Regression
Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s)
12
Chapter Learning Objectives (LO s)Chapter Learning Objectives (LO s)
LO12LO12--1: 1: Calculate and test a correlation Calculate and test a correlation coefficient coefficient for for significancesignificance..
LO12LO12--2: 2: Interpret Interpret the slope and intercept of a regression equation.the slope and intercept of a regression equation.
LO12LO12--3: 3: Make Make a prediction for a given a prediction for a given x value using a x value using a regressionregression
equationequation..qq
LO12LO12--4: 4: Fit a simple regression on an Excel scatter plot.Fit a simple regression on an Excel scatter plot.
LO12LO12--5:5: Calculate and interpretCalculate and interpret confidenceconfidence intervals forintervals for regressionregressionLO12LO12 5: 5: Calculate and interpret Calculate and interpret confidence confidence intervals for intervals for regressionregression
coefficientscoefficients..
LO12LO12 6:6: Test hypotheses about the slope and intercept by usingTest hypotheses about the slope and intercept by using t testst tests
12-3
LO12LO12--6: 6: Test hypotheses about the slope and intercept by using Test hypotheses about the slope and intercept by using t tests.t tests.
C
ha
ff
pter
Analysis of VarianceAnalysis of Variance
Ch t L i Obj ti (LO’ )Ch t L i Obj ti (LO’ )
12
Chapter Learning Objectives (LO’s)Chapter Learning Objectives (LO’s)
LO12LO12--7:7: Perform regression with Excel or other software.Perform regression with Excel or other software.
LO12LO12--8:8: Interpret the standard errorInterpret the standard error RR22 ANOVA table and F testANOVA table and F testLO12LO12 8: 8: Interpret the standard error, Interpret the standard error, RR , ANOVA table, and F test., ANOVA table, and F test.
LO12LO12--9:9: Distinguish between confidence and prediction intervals.Distinguish between conf.
SRF Journal EntriesreferenceAccount TitlesDebitsCredits3-CType journal entries in the space provided. Link these to the T-accounts and link the T-account balancesto the financial statements provided on the tabs at the bottom of the page.4-C
&L&"Arial,Bold"&14City of Monroe- Street and Highway Fund Journal Entries
SRF T-accountsDUE FROMCASHINVESTMENTSSTATE GOV'Tbb6,500bb55,000bb200,0006,50055,000200,000BUDGETARY FUND BALANCEFUND BALANCEACCOUNTS PAYABLERESERVE FOR ENCUMBRANCESRESERVE FOR ENCUMBRANCES(beginning of year)6,300bb-bb255,200bb6,300-255,200REVENUESREVENUESEXPENDITURES - STREETINTERGOVERNMENTALINVESTMENT INTEREST& HIGHWAY MAINTENANCEENCUMBRANCES----BUDGETARY ACCOUNTSBUDGETARYESTIMATED REVENUESAPPROPRIATIONSFUND BALANCE---
&L&16City of Monroe&C&16
Street and Highway Fund - General Ledger
Closing EntriesBUDGETARYAccount TitleDebitsCreditsFUND BALANCE-Preclosingclosing entry-FUND BALANCE255,200Preclosingclosing entry255,200ending balanceComplete the following tableNon-spendableRestrictedCommittedAssignedUnassignedTotalFund Balance-Budgetary Fund Balance - Reserve for Encumbrances-Totals------
&L&14City of Monroe&C&14
STREET & HIGHWAY MAINTENANCE FUND - Closing Entries
Stmt of revenues & expendituresRevenuesIntergovernmental RevenuesInterest on InvestmentsTotal Revenues$ -ExpendituresCurrent:Street & Highway MaintenanceTotal Expenditures-Excess (Deficiency) of Revenues Over Expenditures-Fund Balance, January 1Fund Balance, December 31$ -
&L&"Times New Roman,Regular"&14City of Monroe
Statement of Revenues, Expenditures and Changes in Fund Balance
Street and Highway Maintenance Fund
For the year ended December 31, 2014
Balance SheetAssetsCashInvestmentsDue from State GovernmentTotal Assets$ -Liabilities and Fund EquityLiabilitiesAccounts PayableFund EquityFund Balance - Restricted forStreet and Highway MaintenanceTotal Liabilities and Fund Equity$ -
&L&"Times New Roman,Regular"&14City of Monroe
Street & Highway Maintenance Fund
Balance Sheet
As of December 31, 2014
Problem 1Problem 1Required: Identify the financial statement on which each of the following items appears by making an X in the appropriate column. The first one is done for you!(15 points total, 1 point each)IncomeBalanceStatement ofItemStatementSheetCash FlowsAccounts PayableXAccounts ReceivableAdvertising ExpenseCommon StockDividendsEquipmentFinancing ActivitiesInvesting ActivitiesLandOperating ActivitiesRent ExpenseRetained EarningsRevenueSalaries PayableUtility Expense
Problem 2Problem 2Required: Show the effects on the financial statements using a horizontal statement model as outlined below. The first one is done for you!(35 points total, 5 points each)1Sold $30,000 in merchandise for cash2Paid $5,000 for rent with cash3Paid $10,000 in salaries to employees with cash4Sold $25,000 in merchandise and customer paid on credit5Collected $10,000 cash for transaction #46Purchased a building for $100,000 and took out a loan for the money7Paid $1,200 for insuranceBala.
src/CommissionCalculation.javasrc/CommissionCalculation.javaimport java.util.Scanner;
import java.text.NumberFormat;
publicclassCommissionCalculation
{
publicstaticvoid main(String args[])
{
finaldouble salesTarget=600000;
//create an object of Scanner class to get the keyboard input
Scanner keyInput =newScanner(System.in);
//for currency format
NumberFormat numberFormat =NumberFormat.getCurrencyInstance();
//creating an object of SalesPerson class
SalesPerson salesPerson =newSalesPerson();
//prompt the user to enter the annual sales
System.out.print("Enter the annual sales : ");
double sale = keyInput.nextDouble();
//Calculate normal commission until sales target is reached
if(sale<=salesTarget)
{
//set the value of annual sale of sales person object
salesPerson.setAnnualSales(sale);
//displaying the report
System.out.println("The total annual compensation : "+numberFormat.format(salesPerson.getAnnualCompensation()));
}
//show compensation table with Accelerated factor when sales target exceeds
else
{
//method to show a compensation table if sales exceed 600000
salesPerson.getCompensationTable(sale);
}
}
}
src/SalesPerson.javasrc/SalesPerson.java
publicclassSalesPerson{
privatefinaldouble fixedSalary =120000.00;
privatefinaldouble commissionRate =1.2;
privatefinaldouble salesTarget=600000;
privatefinaldouble accelerationfactor=1.20;
privatedouble annualSales;
//default constructor
publicSalesPerson(){
annualSales =0.0;
}
//parameterized constructor
publicSalesPerson(double aSale){
annualSales = aSale;
}
//getter method for the annual sales
publicdouble getAnnualSales(){
return annualSales;
}
//method to set the value of annual sale
publicvoid setAnnualSales(double aSale){
annualSales = aSale;
}
//method to calculate and get commission
publicdouble getCommission()
{
if(annualSales<(0.80*salesTarget))
{
return0;
}
else
{
return annualSales *(commissionRate/100.0);
}
}
//method to calculate and calculate Compensation with Accelerated commission and display table
void getCompensationTable(double annualSales)
{
int count=0;
System.out.println("Annual Sales\t Total Compensation");
for(annualSales=salesTarget;annualSales<=((salesTarget)+(0.5*salesTarget));annualSales+=5000)
{
count=count+1;
double comm= annualSales *(commissionRate*Math.pow(1.2,count)/100.0);
System.out.println(annualSales+"\t"+(fixedSalary+comm));
}
}
//method to calculate and get annual compensation
publicdouble getAnnualCompensation(){
return fixedSalary + getCommission();
}
}
The development of any marketing mix depends on positioning, a process that influences potential customers' overall perception of a brand, product line, or organization in general. Position is the place a product, brand, or group of products occupies in consumers' minds relative to competing offering. Review positioning in your text. There are many examples to illustrate this concept. Then:
1. Describe the position .
SQLServerFiles/Cars.mdf
__MACOSX/SQLServerFiles/._Cars.mdf
SQLServerFiles/Contacts.mdf
__MACOSX/SQLServerFiles/._Contacts.mdf
SQLServerFiles/Cottages.mdf
__MACOSX/SQLServerFiles/._Cottages.mdf
SQLServerFiles/KataliClub.mdf
__MACOSX/SQLServerFiles/._KataliClub.mdf
SQLServerFiles/Northwind.mdf
__MACOSX/SQLServerFiles/._Northwind.mdf
SQLServerFiles/Northwind.sdf
__MACOSX/SQLServerFiles/._Northwind.sdf
SQLServerFiles/Pubs.mdf
__MACOSX/SQLServerFiles/._Pubs.mdf
SQLServerFiles/ReadMe.doc
SQL Server Files
Make sure to copy the SQL Server files to a read/write medium before attempting to use them in a program. The act of selecting a file for a connection creates an .ldf file, which fails on a read-only CD.
__MACOSX/SQLServerFiles/._ReadMe.doc
SQLServerFiles/RnrBooks.mdf
__MACOSX/SQLServerFiles/._RnrBooks.mdf
__MACOSX/._SQLServerFiles
“Subsea pipelines connectors”
Subsea pipeline are very popular around the world. Almost every water body has a pipeline, whether it is to transport distilled or spring water, or for gas, or for crude oil. Pipeline with great lengths are broken into segments, and has a connector between each segment; such a methodology are used to control damage and makes it easier for manufacturing and maintenance. However, theses devices are not perfect, and have different aspects that need to be considered when choosing one. Aspects are such as: pressure drop, installment, repair, and material used.
Different types of subsea pipeline connectors are being developed and used everyday in different parts of the world. Manufacturers are racing to be ahead of the technological advancement and rule the market. Starting with a fundamental article about the advancement and the market availability of subsea pipeline connectors back in 1976 to the current technology, this paper will review the literature materials of the present solutions of subsea pipeline connectors. Connectors technology in 1976
This fundamental article written by H. Mohr discusses the available subsea pipe connectors back in 1976[1]. The article offers solution that is applicable for a specific period of time, but when the technology of its time period is expired and new solutions are offered the article would hardly be discussed anymore, which actually made it impossible to find online or in nearby library. However, in general, the solutions offered and the way there were discussed are actually very relatable to this paper.
The paper lays on the three major methods of connections, then goes on to examine the current commercial product at that time. Three methods mentioned are the basic welding, elastomeric connectors, and advanced engineered horizontal systems. H. Mohr then moves to the market demand of the three methods, and two methods only were discussed, welding and mechanical connectors.
“Much emphasis had been placed on welded subsea connections in recent years, but properly designed and installed mechanical connections will always have an ap.
Square, Inc. is a financial services, merchant services aggregat.docxrafbolet0
Square, Inc. is a financial services, merchant services aggregator and mobile payment company based in San Francisco, California. The company markets several software and hardware payments products, including Square Register and Square Reader, and has expanded into small business services such as Square Capital, a financing program, and Square Payroll. The company was founded in 2009 by Jack Dorsey and Jim McKelvey and launched its first app and service in 2010.
• Square Register allows individuals and merchants in the United States, Canada, and Japan to accept offline debit and credit cards on their iOS or Android smartphone or tablet computer. The application software("app") supports manually entering the card details or swiping the card through the Square Reader, a small plastic device that plugs into the audio jack of a supported smartphone or tablet and reads the magnetic stripe. On the iPad version of the Square Register app, the interface resembles a traditional cash register.
Download and read the documents in Edgar.
– http://www.sec.gov/edgar.shtml
– And find the all files that are filed (especially S1)
• Find the information relevant to future sales.
• Construct the Pro‐forma income statement.
• Estimate future free cash flows for the next five years (account for investments, change in working capital, depreciation and taxes)
• Make a reasonable assumption about the growth rate of cash flows until infinity.
2013-10-22 22.19.51.jpg
2013-10-22 22.20.19.jpg
2013-10-22 22.21.54.jpg
Information and society
Since the advent of easy access to the internet and the World Wide Web, society has a different attitude towards information and access to information. The technology changes – from slow desk-tops with dial-up access to smartphones – have also changed our interaction with information.
This is also an area in which generational differences show up. Those of us born before the mid1980s or 1990s have followed all of these changes and have had to adapt to it. For those born in the 1990s (the millennials or digital natives), these methods of getting information have always existed. The millenials have seen some of the technology changes but don’t remember the “old” way. Keep this in mind as you read these notes.
An information society
At the beginning of the semester we talked about the many different ways we get information and the definitions of information. Now we’re going to look more at how information and information technologies have changed society.
Lester and Koehler talk about defining an information society in economic sense. While this is important, I don’t think we need to look at the percentage of our GNP to see that we do live in an information society. Think of all the companies that are based on information – computer technologies, web based businesses, cell phone and technologies, GPS, etc. There are also jobs that rely on information – customer service, stock markets, etc.
Our relationship with information .
SQL SQL 2) Add 25 CUSTOMERSs so that you now have 50 total..docxrafbolet0
The document contains SQL code that inserts 25 new customers and 25 new vehicles into database tables to increase the total numbers of customers and vehicles to 50 each. The code provides details of the customer and vehicle records being inserted such as names, addresses, vehicle details.
SPSS Input
Stephanie Crookston, Dominique Garrett-Smith, Latesha Simpson, Jannie Tollvier,
PSYCH/625
November 25, 2013
Mary Farmer
SPSS Input
After looking at the data and putting it through the ANOVA test; the conclusions are as follows:
There is a huge difference between the groups regarding degrees of freedom. And the use of ANOVA is essential because it is samples taken at different points and times of the same people. Probability is at zero percent because that means its directly at the mean and the f score is used to see if the null hypothesis can be rejected or fail to be rejected of its less than the critical value.
ANOVA
Score
Sum of Squares
df
Mean Square
F
Sig.
Between Groups
609265.938
1
609265.938
2495.987
.000
Within Groups
53213.402
218
244.098
Total
662479.340
219
In a 1000-1250 word essay, explain the meaning of one visual symbol in American Beauty and the relation of that symbol to the message of the film as a whole. Since context forms meaning, you should analyze several instances in which the symbol appears in the film, explaining the meaning of the symbol in each appearance and showing how each instance contributes to the meaning of the symbol in the film as a whole. Since film is a visual medium, I have intentionally asked you to analyze a visual element for this assignment. Therefore, while you certainly should utilize dialogue or other elements of the film’s narrative, please do not neglect to interpret the specifically filmic aspects of this text, such as (but not limited to) camera work (framing, shot length, etc.), editing (“cutting” or “splicing”), sound effects, wardrobe, and lighting.
Here are some visual symbols from which you may choose, but please don’t feel limited to these:
· Plastic bags
· Roses
· Cameras
· Windows or Mirrors
· Guns
· Extreme darkness or bright light
· Specific colors or color combinations
Please note that the task here is twofold: you should present an interpretation of the particular symbol you choose and show how that symbol helps construct the overall message of the film as you see it. A successful thesis statement will present a clear articulation of the meaning of your chosen symbol, a succinct statement as to the overall message of the film, and an explanation of the relationship between these two. As with the other essays for this class, please avoid rendering value judgments. You should not present an evaluation of whether or not you like the film (or whether it’s “good” or “bad”).
Since a successful analysis will require more viewing of the film than what we have time for in class, you may find it advantageous to rent/purchase/download a copy for yourself. For those who would rather not attain their own copy, I have also put a copy of the film on reserve in the library.
Due Dates:
Four copies of your rough draft due: Tuesday 3 December
Workshop: Thursday 5 December
Final draft due: Thursday 12/12 (the day of th.
Spring
2015
–
MAT
137
–Luedeker
Name:
________________________________
Quiz
#1
–
Introduction
to
Sigma
Notation
Directions:
Please
print
out
this
assignment
or
rewrite
the
problems
on
another
sheet
of
paper.
Write
the
final
answer
as
an
integer
or
an
improper
fraction.
You
must
show
all
work
to
receive
credit.
This
assignment
is
due
Wednesday
January
14
at
the
start
of
class.
The
notation
𝑓(𝑛)
!
!!!
is
called
Sigma
Notation.
The
symbol
Σ
means
sum
a
sequence
of
numbers.
The
first
number
in
the
sequence
is
𝑓 𝑎 ,
the
second
number
in
the
sequence
is
𝑓 𝑎 + 1 ,
the
third
number
in
the
sequence
is 𝑓 𝑎 + 2
etc.
,
and
the
last
number
in
the
sequence
is
𝑓(𝑚).
Here
are
two
examples:
𝑛 = 2 + 3 + 4 + 5 + 6 + 7 = 27
!
!!!
𝑛! + 1 =
!
!!!
3! + 1 + 4! + 1 + 5! + 1 + 6! + 1 = 10 + 17 + 26 + 37 = 90
Problems:
Simplify.
Write
your
answer
as
an
integer
or
improper
fraction.
Show
all
work.
1. 𝑛
!"
!!!
2.
1
2!
!
!!!
3.
1
𝑛
!
!!!
4. (−1)!
!
!!!
1
𝑛
5.
1
𝑛!
!
!!!
Spring
2015
–
MAT
137
–Luedeker
Name:
________________________________
Quiz
#2
–
Numerical
Integration
Directions:
Please
print
out
this
assignment
or
rewrite
the
problems
on
another
sheet
of
paper.
Write
the
final
answer
as
a
decimal
rounded
to
three
decimal
places.
You
must
show
all
work
to
receive
credit.
This
assignment
is
due
Friday
January
16
at
the
start
of
class.
Consider
the
definite
integral
𝑒!
!
𝑑𝑥!! .
Use
n
=
4
and
the
following
methods
to
estimate
the
value
of
the
definite
integral.
1. Left
Rule
2. Right
Rule
3. Midpoint
Rule
4. Trapezoid
Rule
5. Simpson’s
Rule
Spring
2015
–
MAT
137
–Luedeker
Name:
________________________________
Quiz
#3
Directions:
Please
print
out
this
assignment
or
rewrite
the
problems
on
another
sheet
of
paper.
You
must
show
all
work
to
receive
credit.
This
assignment
Springdale Shopping SurveyThe major shopping areas in the com.docxrafbolet0
Springdale Shopping Survey*
The major shopping areas in the community of Springdale include Springdale Mall, West Mall, and the downtown area on Main Street. A telephone survey has been conducted to identify strengths and weaknesses of these areas and to find out how they fit into the shopping activities of local residents. The 150 respondents were also asked to provide information about themselves and their shopping habits. The data are provided in the file SHOPPING. The variables in the survey were as follows:
A. How Often Respondent Shops at Each Area (Variables 1–3)
1. Springdale Mall
2. Downtown
3. West Mall
6 or more times/wk.
(1)
(1)
(1)
4–5 times/wk.
(2)
(2)
(2)
2–3 times/wk.
(3)
(3)
(3)
1 time/wk.
(4)
(4)
(4)
2–4 times/mo.
(5)
(5)
(5)
0–1 times/mo.
(6)
(6)
(6)
B. How Much the Respondent Spends during a Trip to Each Area (Variables 4–6)
4. Springdale Mall
5. Downtown
6. West Mall
$200 or more
(1)
(1)
(1)
$150–under $200
(2)
(2)
(2)
$100–under $150
(3)
(3)
(3)
$ 50–under $100
(4)
(4)
(4)
$ 25–under $50
(5)
(5)
(5)
$ 15–under $25
(6)
(6)
(6)
less than $15
(7)
(7)
(7)
C. General Attitude toward Each Shopping Area (Variables 7–9)
7. Springdale Mall
8. Downtown
9. West Mall
Like very much
(5)
(5)
(5)
Like
(4)
(4)
(4)
Neutral
(3)
(3)
(3)
Dislike
(2)
(2)
(2)
Dislike very much
(1)
(1)
(1)
D. Which Shopping Area Best Fits Each Description (Variables 10–17)
Springdale
Mall
Downtown
West
Mall
No
Opinion
10. Easy to return/exchange goods
(1)
(2)
(3)
(4)
11. High quality of goods
(1)
(2)
(3)
(4)
12. Low prices
(1)
(2)
(3)
(4)
13. Good variety of sizes/styles
(1)
(2)
(3)
(4)
14. Sales staff helpful/friendly
(1)
(2)
(3)
(4)
15. Convenient shopping hours
(1)
(2)
(3)
(4)
16. Clean stores and surroundings
(1)
(2)
(3)
(4)
17. A lot of bargain sales
(1)
(2)
(3)
(4)
E. Importance of Each Item in Respondent’s Choice of a Shopping Area (Variables 18–25)
Not Very
Important Important
F. Information about the Respondent (Variables 26–30)
(
18.
Easy
to
return/exchange
goods
(1)
(2)
(3)
(4)
(5)
(6)
(7)
19.
High
quality
of
goods
(1)
(2)
(3)
(4)
(5)
(6)
(7)
20.
Low
prices
(1)
(2)
(3)
(4)
(5)
(6)
(7)
21.
Good
variety
of
sizes/styles
(1)
(2)
(3)
(4)
(5)
(6)
(7)
22.
Sales
staff
helpful/friendly
(1)
(2)
(3)
(4)
(5)
(6)
(7)
23.
Convenient
shopping
hours
(1)
(2)
(3)
(4)
(5)
(6)
(7)
24.
Clean
stores
and
surroundings
(1)
(2)
(3)
(4)
(5)
(6)
(7)
25.
A
lot
of
bargain
sales
(1)
(2)
(3)
(4)
(5)
(6)
(7)
t
)26. Gender: (1) = Male (2) = Female
27. Number of years of school completed:
(1) = less than 8 years (3) = 12–under 16 years
(2) = 8–under 12 years (4) = 16 years or more
28. Marital status: (1) = Married (2) = Single or other
29. Number of people in household: pe.
Springfield assignment InstructionFrom the given information, yo.docxrafbolet0
Springfield assignment Instruction
From the given information, you are required to make a functional network. In Springfield we have a router and four switches connected as daisy chain topology. Then we have output of show commands. It is obvious that it is a non-functional network and you have to implement a solution to make functional.
Task in Springfield assignment
· From the show output commands, you can identify the problems and then provide solution.
· Configure all the tasks as in Springfield assignment as per instructions
· Create Server VLAN, Instructional VLAN, and Administrative VLAN
· Configure Access method of VLANs
· Configure Switch 1 as root bridge
· Configure trunking on all switches
· Configure default gateway
· Create and configure interface VLAN1
First of all, allow me to thank you for your email of offer dated September 2, 2015. I am writing to inform you of my acceptance to your kind offer and in my class CMIT 350/6380. This class has one technical writing assignment broken into three parts: Draft1, Draft2, and Draft3. I do not have any sample assignment, however I am reviewing student’s draft version and providing feedback. To help you in this regard I am submitting you below outline pf paper.
In the beginning please give brief descriptions of the project, such as why are you doing, what are the problems, and possible solutions.
Background information:
Springfield site network is assigned to me to investigate the problems and find the solutions to fix the problem. From the site topology and sh output commands I determined that spanning-tree protocol is misconfigured and it is blocking few ports. And these are the reasons that network is a non-functional.
Implementing
Solution
:
The following are required information for configuring the network
IP address range 10.30.x.x/16
Device to be configured
Configuring commands
Device Names
Configuration Required
Configuring command
Switch#1
All devices
Host name
Hostname Switch_Springfield1
Switch#2
Host name
Hostname Switch_Springfield2
Switch1
All devices
Create console password
Create vty password
Only on Switch1
Create VLANs
Access vlan
Interface fa0/0
Switchport mode adccess
Switchport access vlan 11
Switch1
All Switches
Create trunk connections between switches
Int gi0/0
Switchport mode trunk
Switchport trunk encapsulation dot1q
Switchport trunk allowed native vlan 1
Router
Configure ip address
Int fa0/0
Ip address 10.30.1.1 255.255.255.0
Switch1
Configure default gateway
Ip default-gateway 10.30.1.1 255.255.255.0
Switches
Configure spanning-tree protocol
Spanning-tree RPVST
Switch1
Make Swich 1 as root bridge of network
Configurations
Rough Draft
This paper will focus on the four main theoretical perspectives within sociology which include conflict, functionalism, utilitarianism and symbolic interactionism with the attempt to explain why groups of people choose to perform certain actions and how societies function or change in a certain way.
Socio.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
Spanner Google’s Globally-Distributed DatabaseJames C. Corbett,.docx
1. Spanner: Google’s Globally-Distributed Database
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes,
Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey
Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,
Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd,
Sergey Melnik, David Mwaura,
David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi
Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale
Woodford
Google, Inc.
Published in the Proceedings of OSDI 2012 1
Published in the Proceedings of OSDI 2012 1
Published in the Proceedings of OSDI 2012 16
Abstract
Spanner is Google’s scalable, multi-version,
globallydistributed, and synchronously-replicated database. It is
the first system to distribute data at global scale and support
externally-consistent distributed transactions. This paper
describes how Spanner is structured, its feature set, the
rationale underlying various design decisions, and a novel time
API that exposes clock uncertainty. This API and its
implementation are critical to supporting external consistency
and a variety of powerful features: nonblocking reads in the
past, lock-free read-only transactions, and atomic schema
changes, across all of Spanner.Introduction
Spanner is a scalable, globally-distributed database designed,
built, and deployed at Google. At the highest level of
abstraction, it is a database that shards data across many sets of
Paxos [21] state machines in datacenters spread all over the
world. Replication is used for global availability and geographic
locality; clients automatically failover between replicas.
Spanner automatically reshards data across machines as the
amount of data or the number of servers changes, and it
2. automatically migrates data across machines (even across
datacenters) to balance load and in response to failures. Spanner
is designed to scale up to millions of machines across hundreds
of datacenters and trillions of database rows.
Applications can use Spanner for high availability, even in the
face of wide-area natural disasters, by replicating their data
within or even across continents. Our initial customer was F1
[35], a rewrite of Google’s advertising backend. F1 uses five
replicas spread across the United States. Most other applications
will probably replicate their data across 3 to 5 datacenters in
one geographic region, but with relatively independent failure
modes. That is, most applications will choose lower latency
over higher availability, as long as they can survive 1 or 2
datacenter failures.
Spanner’s main focus is managing cross-datacenter replicated
data, but we have also spent a great deal of time in designing
and implementing important database features on top of our
distributed-systems infrastructure. Even though many projects
happily use Bigtable [9], we have also consistently received
complaints from users that Bigtable can be difficult to use for
some kinds of applications: those that have complex, evolving
schemas, or those that want strong consistency in the presence
of wide-area replication. (Similar claims have been made by
other authors [37].) Many applications at Google have chosen to
use Megastore [5] because of its semirelational data model and
support for synchronous replication, despite its relatively poor
write throughput. As a consequence, Spanner has evolved from
a Bigtable-like versioned key-value store into a temporal multi-
version database. Data is stored in schematized semi-relational
tables; data is versioned, and each version is automatically
timestamped with its commit time; old versions of data are
subject to configurable garbage-collection policies; and
applications can read data at old timestamps. Spanner supports
general-purpose transactions, and provides a SQL-based query
language.
As a globally-distributed database, Spanner provides several
3. interesting features. First, the replication configurations for
data can be dynamically controlled at a fine grain by
applications. Applications can specify constraints to control
which datacenters contain which data, how far data is from its
users (to control read latency), how far replicas are from each
other (to control write latency), and how many replicas are
maintained (to control durability, availability, and read
performance). Data can also be dynamically and transparently
moved between datacenters by the system to balance resource
usage across datacenters. Second, Spanner has two features that
are difficult to implement in a distributed database: it provides
externally consistent [16] reads and writes, and globally-
consistent reads across the database at a timestamp. These
features enable Spanner to support consistent backups,
consistent MapReduce executions [12], and atomic schema
updates, all at global scale, and even in the presence of ongoing
transactions.
These features are enabled by the fact that Spanner assigns
globally-meaningful commit timestamps to transactions, even
though transactions may be distributed. The timestamps reflect
serialization order. In addition, the serialization order satisfies
external consistency (or equivalently, linearizability [20]): if a
transaction T1 commits before another transaction T2 starts,
then T1’s commit timestamp is smaller than T2’s. Spanner is the
first system to provide such guarantees at global scale.
The key enabler of these properties is a new TrueTime API and
its implementation. The API directly exposes clock uncertainty,
and the guarantees on Spanner’s timestamps depend on the
bounds that the implementation provides. If the uncertainty is
large, Spanner slows down to wait out that uncertainty.
Google’s cluster-management software provides an
implementation of the TrueTime API. This implementation
keeps uncertainty small (generally less than 10ms) by using
multiple modern clock references (GPS and atomic clocks).
Section 2 describes the structure of Spanner’s implementation,
its feature set, and the engineering decisions that went into their
4. design. Section 3 describes our new TrueTime API and sketches
its implementation. Section 4 describes how Spanner uses
TrueTime to implement externally-consistent distributed
transactions, lockfree read-only transactions, and atomic
schema updates. Section 5 provides some benchmarks on
Spanner’s performance and TrueTime behavior, and discusses
the experiences of F1. Sections 6, 7, and 8 describe related and
future work, and summarize our conclusions.Implementation
This section describes the structure of and rationale underlying
Spanner’s implementation. It then describes the directory
abstraction, which is used to manage replication and locality,
and is the unit of data movement. Finally, it describes our data
model, why Spanner looks like a relational database instead of a
key-value store, and how applications can control data locality.
A Spanner deployment is called a universe. Given that Spanner
manages data globally, there will be only a handful of running
universes. We currently run a test/playground universe, a
development/production universe, and a production-only
universe.
Spanner is organized as a set of zones, where each zone is the
rough analog of a deployment of Bigtable
Figure 1: Spanner server organization.
servers [9]. Zones are the unit of administrative deployment.
The set of zones is also the set of locations across which data
can be replicated. Zones can be added to or removed from a
running system as new datacenters are brought into service and
old ones are turned off, respectively. Zones are also the unit of
physical isolation: there may be one or more zones in a
datacenter, for example, if different applications’ data must be
partitioned across different sets of servers in the same
datacenter.
Figure 1 illustrates the servers in a Spanner universe. A zone
has one zonemaster and between one hundred and several
thousand spanservers. The former assigns data to spanservers;
the latter serve data to clients. The per-zone location proxies
5. are used by clients to locate the spanservers assigned to serve
their data. The universe master and the placement driver are
currently singletons. The universe master is primarily a console
that displays status information about all the zones for
interactive debugging. The placement driver handles automated
movement of data across zones on the timescale of minutes. The
placement driver periodically communicates with the
spanservers to find data that needs to be moved, either to meet
updated replication constraints or to balance load. For space
reasons, we will only describe the spanserver in any detail.
Spanserver Software Stack
This section focuses on the spanserver implementation to
illustrate how replication and distributed transactions have been
layered onto our Bigtable-based implementation. The software
stack is shown in Figure 2. At the bottom, each spanserver is
responsible for between 100 and 1000 instances of a data
structure called a tablet. A tablet is similar to Bigtable’s tablet
abstraction, in that it implements a bag of the following
mappings:
(key:string, timestamp:int64) → string
Unlike Bigtable, Spanner assigns timestamps to data, which is
an important way in which Spanner is more like a multi-version
database than a key-value store. A
Figure 2: Spanserver software stack.
tablet’s state is stored in set of B-tree-like files and a write-
ahead log, all on a distributed file system called Colossus (the
successor to the Google File System [15]).
To support replication, each spanserver implements a single
Paxos state machine on top of each tablet. (An early Spanner
incarnation supported multiple Paxos state machines per tablet,
which allowed for more flexible replication configurations. The
complexity of that design led us to abandon it.) Each state
machine stores its metadata and log in its corresponding tablet.
Our Paxos implementation supports long-lived leaders with
6. time-based leader leases, whose length defaults to 10 seconds.
The current Spanner implementation logs every Paxos write
twice: once in the tablet’s log, and once in the Paxos log. This
choice was made out of expediency, and we are likely to remedy
this eventually. Our implementation of Paxos is pipelined, so as
to improve Spanner’s throughput in the presence of WAN
latencies; but writes are applied by Paxos in order (a fact on
which we will depend in Section 4).
The Paxos state machines are used to implement a consistently
replicated bag of mappings. The key-value mapping state of
each replica is stored in its corresponding tablet. Writes must
initiate the Paxos protocol at the leader; reads access state
directly from the underlying tablet at any replica that is
sufficiently up-to-date. The set of replicas is collectively a
Paxos group.
At every replica that is a leader, each spanserver implements a
lock table to implement concurrency control. The lock table
contains the state for two-phase locking: it maps ranges of keys
to lock states. (Note that having a long-lived Paxos leader is
critical to efficiently managing the lock table.) In both Bigtable
and Spanner, we designed for long-lived transactions (for
example, for report generation, which might take on the order of
minutes), which perform poorly under optimistic concurrency
control in the presence of conflicts. Operations
Figure 3: Directories are the unit of data movement between
Paxos groups.
that require synchronization, such as transactional reads,
acquire locks in the lock table; other operations bypass the lock
table.
At every replica that is a leader, each spanserver also
implements a transaction manager to support distributed
transactions. The transaction manager is used to implement a
participant leader; the other replicas in the group will be
referred to as participant slaves. If a transaction involves only
one Paxos group (as is the case for most transactions), it can
7. bypass the transaction manager, since the lock table and Paxos
together provide transactionality. If a transaction involves more
than one Paxos group, those groups’ leaders coordinate to
perform twophase commit. One of the participant groups is
chosen as the coordinator: the participant leader of that group
will be referred to as the coordinator leader, and the slaves of
that group as coordinator slaves. The state of each transaction
manager is stored in the underlying Paxos group (and therefore
is replicated).
Directories and Placement
On top of the bag of key-value mappings, the Spanner
implementation supports a bucketing abstraction called a
directory, which is a set of contiguous keys that share a
common prefix. (The choice of the term directory is a historical
accident; a better term might be bucket.) We will explain the
source of that prefix in Section 2.3. Supporting directories
allows applications to control the locality of their data by
choosing keys carefully.
A directory is the unit of data placement. All data in a directory
has the same replication configuration. When data is moved
between Paxos groups, it is moved directory by directory, as
shown in Figure 3. Spanner might move a directory to shed load
from a Paxos group; to put directories that are frequently
accessed together into the same group; or to move a directory
into a group that is closer to its accessors. Directories can be
moved while client operations are ongoing. One could expect
that a 50MB directory can be moved in a few seconds.
The fact that a Paxos group may contain multiple directories
implies that a Spanner tablet is different from a Bigtable tablet:
the former is not necessarily a single lexicographically
contiguous partition of the row space. Instead, a Spanner tablet
is a container that may encapsulate multiple partitions of the
row space. We made this decision so that it would be possible to
colocate multiple directories that are frequently accessed
together.
8. Movedir is the background task used to move directories
between Paxos groups [14]. Movedir is also used to add or
remove replicas to Paxos groups [25], because Spanner does not
yet support in-Paxos configuration changes. Movedir is not
implemented as a single transaction, so as to avoid blocking
ongoing reads and writes on a bulky data move. Instead,
movedir registers the fact that it is starting to move data and
moves the data in the background. When it has moved all but a
nominal amount of the data, it uses a transaction to atomically
move that nominal amount and update the metadata for the two
Paxos groups.
A directory is also the smallest unit whose
geographicreplication properties (or placement, for short) can
be specified by an application. The design of our placement-
specification language separates responsibilities for managing
replication configurations. Administrators control two
dimensions: the number and types of replicas, and the
geographic placement of those replicas. They create a menu of
named options in these two dimensions (e.g., North America,
replicated 5 ways with 1 witness). An application controls how
data is replicated, by tagging each database and/or individual
directories with a combination of those options. For example, an
application might store each end-user’s data in its own
directory, which would enable user A’s data to have three
replicas in Europe, and user B’s data to have five replicas in
North America.
For expository clarity we have over-simplified. In fact, Spanner
will shard a directory into multiple fragments if it grows too
large. Fragments may be served from different Paxos groups
(and therefore different servers). Movedir actually moves
fragments, and not whole directories, between groups.
Data Model
Spanner exposes the following set of data features to
applications: a data model based on schematized semi-relational
tables, a query language, and generalpurpose transactions. The
9. move towards supporting these features was driven by many
factors. The need to support schematized semi-relational tables
and synchronous replication is supported by the popularity of
Megastore [5]. At least 300 applications within Google use
Megastore (despite its relatively low performance) because its
data model is simpler to manage than Bigtable’s, and because of
its support for synchronous replication across datacenters.
(Bigtable only supports eventually-consistent replication across
datacenters.) Examples of well-known Google applications that
use Megastore are Gmail, Picasa, Calendar, Android Market,
and AppEngine. The need to support a SQLlike query language
in Spanner was also clear, given the popularity of Dremel [28]
as an interactive dataanalysis tool. Finally, the lack of cross-
row transactions in Bigtable led to frequent complaints;
Percolator [32] was in part built to address this failing. Some
authors have claimed that general two-phase commit is too
expensive to support, because of the performance or availability
problems that it brings [9, 10, 19]. We believe it is better to
have application programmers deal with performance problems
due to overuse of transactions as bottlenecks arise, rather than
always coding around the lack of transactions. Running two-
phase commit over Paxos mitigates the availability problems.
The application data model is layered on top of the directory-
bucketed key-value mappings supported by the implementation.
An application creates one or more databases in a universe.
Each database can contain an unlimited number of schematized
tables. Tables look like relational-database tables, with rows,
columns, and versioned values. We will not go into detail about
the query language for Spanner. It looks like SQL with some
extensions to support protocol-buffer-valued fields.
Spanner’s data model is not purely relational, in that rows must
have names. More precisely, every table is required to have an
ordered set of one or more primary-key columns. This
requirement is where Spanner still looks like a key-value store:
the primary keys form the name for a row, and each table
defines a mapping from the primary-key columns to the non-
10. primary-key columns. A row has existence only if some value
(even if it is NULL) is defined for the row’s keys. Imposing this
structure is useful because it lets applications control data
locality through their choices of keys.
Figure 4 contains an example Spanner schema for storing photo
metadata on a per-user, per-album basis. The schema language
is similar to Megastore’s, with the additional requirement that
every Spanner database must be partitioned by clients into one
or more hierarchies of tables. Client applications declare the
hierarchies in database schemas via the INTERLEAVE IN
declarations. The table at the top of a hierarchy is a directory
table. Each row in a directory table with key K, together with
all of the rows in descendant tables that start with K in
lexicographic order, forms a directory. ON DELETE CASCADE
says that deleting a row in the directory table deletes any
associated child rows. The figure also illustrates the interleaved
layout for the example database: for
CREATE TABLE Users { uid INT64 NOT NULL, email
STRING
} PRIMARY KEY (uid), DIRECTORY;
CREATE TABLE Albums { uid INT64 NOT NULL, aid INT64
NOT NULL, name STRING
} PRIMARY KEY (uid, aid),
INTERLEAVE IN PARENT Users ON DELETE CASCADE;
Figure 4: Example Spanner schema for photo metadata, and the
interleaving implied by INTERLEAVE IN.
example, Albums(2,1) represents the row from the Albums table
for userid 2, albumid 1. This interleaving of tables to form
directories is significant because it allows clients to describe
the locality relationships that exist between multiple tables,
which is necessary for good performance in a sharded,
distributed database. Without it, Spanner would not know the
most important locality relationships.TrueTime
Method
Returns
11. TT.now()
TTinterval: [earliest,latest]
TT.after(t)
true if t has definitely passed
TT.before(t)
true if t has definitely not arrived
Table 1: TrueTime API. The argument t is of type TTstamp.
This section describes the TrueTime API and sketches its
implementation. We leave most of the details for another paper:
our goal is to demonstrate the power of having such an API.
Table 1 lists the methods of the API. TrueTime explicitly
represents time as a TTinterval, which is an interval with
bounded time uncertainty (unlike standard time interfaces that
give clients no notion of uncertainty). The endpoints of a
TTinterval are of type TTstamp. The TT.now() method returns a
TTinterval that is guaranteed to contain the absolute time during
which TT.now() was invoked. The time epoch is analogous to
UNIX time with leap-second smearing. Define the instantaneous
error bound as , which is half of the interval’s width, and the
average error bound as . The TT.after() and TT.before()
methods are convenience wrappers around TT.now().
Denote the absolute time of an event e by the function tabs(e).
In more formal terms, TrueTime guarantees that for an
invocation tt = TT.now(), tt.earliest ≤ tabs(enow) ≤ tt.latest,
where enow is the invocation event.
The underlying time references used by TrueTime are GPS and
atomic clocks. TrueTime uses two forms of time reference
because they have different failure modes. GPS reference-
source vulnerabilities include antenna and receiver failures,
local radio interference, correlated failures (e.g., design faults
such as incorrect leapsecond handling and spoofing), and GPS
system outages. Atomic clocks can fail in ways uncorrelated to
GPS and each other, and over long periods of time can drift
significantly due to frequency error.
TrueTime is implemented by a set of time master machines per
datacenter and a timeslave daemon per machine. The majority of
12. masters have GPS receivers with dedicated antennas; these
masters are separated physically to reduce the effects of antenna
failures, radio interference, and spoofing. The remaining
masters (which we refer to as Armageddon masters) are
equipped with atomic clocks. An atomic clock is not that
expensive: the cost of an Armageddon master is of the same
order as that of a GPS master. All masters’ time references are
regularly compared against each other. Each master also cross-
checks the rate at which its reference advances time against its
own local clock, and evicts itself if there is substantial
divergence. Between synchronizations, Armageddon masters
advertise a slowly increasing time uncertainty that is derived
from conservatively applied worst-case clock drift. GPS masters
advertise uncertainty that is typically close to zero.
Every daemon polls a variety of masters [29] to reduce
vulnerability to errors from any one master. Some are GPS
masters chosen from nearby datacenters; the rest are GPS
masters from farther datacenters, as well as some Armageddon
masters. Daemons apply a variant of Marzullo’s algorithm [27]
to detect and reject liars, and synchronize the local machine
clocks to the nonliars. To protect against broken local clocks,
machines that exhibit frequency excursions larger than the
worstcase bound derived from component specifications and
operating environment are evicted.
Between synchronizations, a daemon advertises a slowly
increasing time uncertainty. is derived from conservatively
applied worst-case local clock drift. also depends on time-
master uncertainty and communication delay to the time
masters. In our production environment, is typically a sawtooth
function of time, varying from about 1 to 7 ms over each poll
interval. is therefore 4 ms most of the time. The daemon’s poll
interval is currently 30 seconds, and the current applied drift
rate is set at 200 microseconds/second, which together account
Timestamp
Concurrency
13. Operation
Discussion
Control
Replica Required
Read-Write Transaction
§ 4.1.2
pessimistic
leader
Read-Only Transaction
§ 4.1.4
lock-free
leader for timestamp; any for read, subject to § 4.1.3
Snapshot Read, client-provided timestamp
—
lock-free
any, subject to § 4.1.3
Snapshot Read, client-provided bound
§ 4.1.3
lock-free
any, subject to § 4.1.3
Table 2: Types of reads and writes in Spanner, and how they
compare.
for the sawtooth bounds from 0 to 6 ms. The remaining 1 ms
comes from the communication delay to the time masters.
Excursions from this sawtooth are possible in the presence of
failures. For example, occasional time-master unavailability can
cause datacenter-wide increases in . Similarly, overloaded
machines and network links can result in occasional localized
spikes.Concurrency Control
This section describes how TrueTime is used to guarantee the
correctness properties around concurrency control, and how
those properties are used to implement features such as
externally consistent transactions, lockfree read-only
transactions, and non-blocking reads in the past. These features
enable, for example, the guarantee that a whole-database audit
14. read at a timestamp t will see exactly the effects of every
transaction that has committed as of t.
Going forward, it will be important to distinguish writes as seen
by Paxos (which we will refer to as Paxos writes unless the
context is clear) from Spanner client writes. For example, two-
phase commit generates a Paxos write for the prepare phase that
has no corresponding Spanner client write.
Timestamp Management
Table 2 lists the types of operations that Spanner supports. The
Spanner implementation supports readwrite transactions, read-
only transactions (predeclared snapshot-isolation transactions),
and snapshot reads. Standalone writes are implemented as read-
write transactions; non-snapshot standalone reads are
implemented as read-only transactions. Both are internally
retried (clients need not write their own retry loops).
A read-only transaction is a kind of transaction that has the
performance benefits of snapshot isolation [6]. A read-only
transaction must be predeclared as not having any writes; it is
not simply a read-write transaction without any writes. Reads in
a read-only transaction execute at a system-chosen timestamp
without locking, so that incoming writes are not blocked. The
execution of the reads in a read-only transaction can proceed on
any replica that is sufficiently up-to-date (Section 4.1.3).
A snapshot read is a read in the past that executes without
locking. A client can either specify a timestamp for a snapshot
read, or provide an upper bound on the desired timestamp’s
staleness and let Spanner choose a timestamp. In either case, the
execution of a snapshot read proceeds at any replica that is
sufficiently up-to-date.
For both read-only transactions and snapshot reads, commit is
inevitable once a timestamp has been chosen, unless the data at
that timestamp has been garbagecollected. As a result, clients
can avoid buffering results inside a retry loop. When a server
fails, clients can internally continue the query on a different
server by repeating the timestamp and the current read
15. position.Paxos Leader Leases
Spanner’s Paxos implementation uses timed leases to make
leadership long-lived (10 seconds by default). A potential leader
sends requests for timed lease votes; upon receiving a quorum
of lease votes the leader knows it has a lease. A replica extends
its lease vote implicitly on a successful write, and the leader
requests lease-vote extensions if they are near expiration.
Define a leader’s lease interval as starting when it discovers it
has a quorum of lease votes, and as ending when it no longer
has a quorum of lease votes (because some have expired).
Spanner depends on the following disjointness invariant: for
each Paxos group, each Paxos leader’s lease interval is disjoint
from every other leader’s. Appendix A describes how this
invariant is enforced.
The Spanner implementation permits a Paxos leader to abdicate
by releasing its slaves from their lease votes. To preserve the
disjointness invariant, Spanner constrains when abdication is
permissible. Define smax to be the maximum timestamp used by
a leader. Subsequent sections will describe when smax is
advanced. Before abdicating, a leader must wait until
TT.after(smax) is true.Assigning Timestamps to RW
Transactions
Transactional reads and writes use two-phase locking.
As a result, they can be assigned timestamps at any time when
all locks have been acquired, but before any locks have been
released. For a given transaction, Spanner assigns it the
timestamp that Paxos assigns to the Paxos write that represents
the transaction commit.
Spanner depends on the following monotonicity invariant:
within each Paxos group, Spanner assigns timestamps to Paxos
writes in monotonically increasing order, even across leaders. A
single leader replica can trivially assign timestamps in
monotonically increasing order. This invariant is enforced
across leaders by making use of the disjointness invariant: a
leader must only assign timestamps within the interval of its
leader lease. Note that whenever a timestamp s is assigned,
16. smax is advanced to s to preserve disjointness.
Spanner also enforces the following externalconsistency
invariant: if the start of a transaction T2 occurs after the
commit of a transaction T1, then the commit timestamp of T2
must be greater than the commit timestamp of T1. Define the
start and commit events for a transaction Ti by estarti and
ecommiti ; and the commit timestamp of a transaction Ti by si.
The invariant becomes tabs(ecommit1 ) .
The protocol for executing transactions and assigning
timestamps obeys two rules, which together guarantee this
invariant, as shown below. Define the arrival event of the
commit request at the coordinator leader for a write Ti to be
eserveri .
Start The coordinator leader for a write Ti assigns a commit
timestamp si no less than the value of
TT.now().latest, computed after eserveri . Note that the
participant leaders do not matter here; Section 4.2.1 describes
how they are involved in the implementation of the next rule.
Commit Wait The coordinator leader ensures that clients cannot
see any data committed by Ti until TT.after(si) is true. Commit
wait ensures that si is less than the absolute commit time of Ti,
or si < tabs(ecommiti ). The implementation of commit wait is
described in Section 4.2.1. Proof:
(commit wait) (assumption)
(causality)
(start)
s1 < s2
(transitivity)Serving Reads at a Timestamp
The monotonicity invariant described in Section 4.1.2 allows
Spanner to correctly determine whether a replica’s state is
sufficiently up-to-date to satisfy a read. Every replica tracks a
value called safe time tsafe which is the maximum timestamp at
which a replica is up-to-date. A replica can satisfy a read at a
timestamp t if t <= tsafe.
17. Define , where each Paxos state machine has a safe time and
each transaction manager has a safe time tTMsafe. tPaxossafe is
simpler: it is the timestamp of the highest-applied Paxos write.
Because timestamps increase monotonically and writes are
applied in order, writes will no longer occur at or below with
respect to Paxos.
tTMsafe is ∞ at a replica if there are zero prepared (but not
committed) transactions—that is, transactions in between the
two phases of two-phase commit. (For a participant slave,
tTMsafe actually refers to the replica’s leader’s transaction
manager, whose state the slave can infer through metadata
passed on Paxos writes.) If there are any such transactions, then
the state affected by those transactions is indeterminate: a
participant replica does not know yet whether such transactions
will commit. As we discuss in Section 4.2.1, the commit
protocol ensures that every participant knows a lower bound on
a prepared transaction’s timestamp. Every participant leader
(for a group g) for a transaction Ti assigns a prepare timestamp
spreparei,g to its prepare record. The coordinator leader ensures
that the transaction’s commit timestamp si >= spreparei,g over
all participant groups g. Therefore, for every replica in a group
g, over all transactions Ti prepared at over all transactions
prepared at g.Assigning Timestamps to RO Transactions
A read-only transaction executes in two phases: assign a
timestamp sread [8], and then execute the transaction’s reads as
snapshot reads at sread. The snapshot reads can execute at any
replicas that are sufficiently up-to-date.
The simple assignment of sread = TT.now().latest, at any time
after a transaction starts, preserves external consistency by an
argument analogous to that presented for writes in Section
4.1.2. However, such a timestamp may require the execution of
the data reads at sread to block if tsafe has not advanced
sufficiently. (In addition, note that choosing a value of sread
may also advance smax to preserve disjointness.) To reduce the
chances of blocking, Spanner should assign the oldest
timestamp that preserves external consistency. Section 4.2.2
18. explains how such a timestamp can be chosen.
Details
This section explains some of the practical details of read-write
transactions and read-only transactions elided earlier, as well as
the implementation of a special transaction type used to
implement atomic schema changes.
It then describes some refinements of the basic schemes as
described.Read-Write Transactions
Like Bigtable, writes that occur in a transaction are buffered at
the client until commit. As a result, reads in a transaction do not
see the effects of the transaction’s writes. This design works
well in Spanner because a read returns the timestamps of any
data read, and uncommitted writes have not yet been assigned
timestamps.
Reads within read-write transactions use woundwait [33] to
avoid deadlocks. The client issues reads to the leader replica of
the appropriate group, which acquires read locks and then reads
the most recent data. While a client transaction remains open, it
sends keepalive messages to prevent participant leaders from
timing out its transaction. When a client has completed all reads
and buffered all writes, it begins two-phase commit. The client
chooses a coordinator group and sends a commit message to
each participant’s leader with the identity of the coordinator and
any buffered writes. Having the client drive two-phase commit
avoids sending data twice across wide-area links.
A non-coordinator-participant leader first acquires write locks.
It then chooses a prepare timestamp that must be larger than any
timestamps it has assigned to previous transactions (to preserve
monotonicity), and logs a prepare record through Paxos. Each
participant then notifies the coordinator of its prepare
timestamp.
The coordinator leader also first acquires write locks, but skips
the prepare phase. It chooses a timestamp for the entire
transaction after hearing from all other participant leaders. The
commit timestamp s must be greater or equal to all prepare
19. timestamps (to satisfy the constraints discussed in Section
4.1.3), greater than TT.now().latest at the time the coordinator
received its commit message, and greater than any timestamps
the leader has assigned to previous transactions (again, to
preserve monotonicity). The coordinator leader then logs a
commit record through Paxos (or an abort if it timed out while
waiting on the other participants).
Before allowing any coordinator replica to apply the commit
record, the coordinator leader waits until TT.after(s), so as to
obey the commit-wait rule described in Section 4.1.2. Because
the coordinator leader chose s based on TT.now().latest, and
now waits until that timestamp is guaranteed to be in the past,
the expected wait is at least . This wait is typically overlapped
with Paxos communication. After commit wait, the coordinator
sends the commit timestamp to the client and all other
participant leaders. Each participant leader logs the
transaction’s outcome through Paxos. All participants apply at
the same timestamp and then release locks.Read-Only
Transactions
Assigning a timestamp requires a negotiation phase between all
of the Paxos groups that are involved in the reads. As a result,
Spanner requires a scope expression for every read-only
transaction, which is an expression that summarizes the keys
that will be read by the entire transaction. Spanner
automatically infers the scope for standalone queries.
If the scope’s values are served by a single Paxos group, then
the client issues the read-only transaction to that group’s leader.
(The current Spanner implementation only chooses a timestamp
for a read-only transaction at a Paxos leader.) That leader
assigns sread and executes the read. For a single-site read,
Spanner generally does better than TT.now().latest. Define
LastTS() to be the timestamp of the last committed write at a
Paxos group. If there are no prepared transactions, the
assignment sread = LastTS() trivially satisfies external
consistency: the transaction will see the result of the last write,
and therefore be ordered after it.
20. If the scope’s values are served by multiple Paxos groups, there
are several options. The most complicated option is to do a
round of communication with all of the groups’s leaders to
negotiate sread based on LastTS(). Spanner currently
implements a simpler choice. The client avoids a negotiation
round, and just has its reads execute at sread = TT.now().latest
(which may wait for safe time to advance). All reads in the
transaction can be sent to replicas that are sufficiently up-to-
date.Schema-Change Transactions
TrueTime enables Spanner to support atomic schema changes. It
would be infeasible to use a standard transaction, because the
number of participants (the number of groups in a database)
could be in the millions. Bigtable supports atomic schema
changes in one datacenter, but its schema changes block all
operations.
A Spanner schema-change transaction is a generally non-
blocking variant of a standard transaction. First, it is explicitly
assigned a timestamp in the future, which is registered in the
prepare phase. As a result, schema changes across thousands of
servers can complete with minimal disruption to other
concurrent activity. Second, reads and writes, which implicitly
depend on the schema, synchronize with any registered schema-
change timestamp at time t: they may proceed if their
timestamps precede t, but they must block behind the
schemachange transaction if their timestamps are after t.
Without TrueTime, defining the schema change to happen at t
would be meaningless.
replicas
latency (ms)
throughput (Kops/sec)
write
read-only transaction
21. snapshot read
write
read-only transaction
snapshot read
1D
9.4±.6
—
—
4.0±.3
—
—
1
14.4±1.0
1.4±.1
1.3±.1
4.1±.05
10.9±.4
13.5±.1
3
13.9±.6
1.3±.1
1.2±.1
2.2±.5
13.8±3.2
38.5±.3
5
14.4±.4
1.4±.05
1.3±.04
2.8±.3
25.3±5.2
50.0±1.1
Table 3: Operation microbenchmarks. Mean and standard
deviation over 10 runs. 1D means one replica with commit wait
disabled.Refinements
tTMsafe as defined above has a weakness, in that a single
22. prepared transaction prevents tsafe from advancing. As a result,
no reads can occur at later timestamps, even if the reads do not
conflict with the transaction. Such false conflicts can be
removed by augmenting tTMsafe with a fine-grained mapping
from key ranges to preparedtransaction timestamps. This
information can be stored in the lock table, which already maps
key ranges to lock metadata. When a read arrives, it only needs
to be checked against the fine-grained safe time for key ranges
with which the read conflicts.
LastTS() as defined above has a similar weakness: if a
transaction has just committed, a non-conflicting readonly
transaction must still be assigned sread so as to follow that
transaction. As a result, the execution of the read could be
delayed. This weakness can be remedied similarly by
augmenting LastTS() with a fine-grained mapping from key
ranges to commit timestamps in the lock table. (We have not yet
implemented this optimization.) When a read-only transaction
arrives, its timestamp can be assigned by taking the maximum
value of LastTS() for the key ranges with which the transaction
conflicts, unless there is a conflicting prepared transaction
(which can be determined from fine-grained safe time).
tPaxossafe as defined above has a weakness in that it cannot
advance in the absence of Paxos writes. That is, a snapshot read
at t cannot execute at Paxos groups whose last write happened
before t. Spanner addresses this problem by taking advantage of
the disjointness of leader-lease intervals. Each Paxos leader
advances tPaxossafe by keeping a threshold above which future
writes’ timestamps will occur: it maintains a mapping
MinNextTS(n) from Paxos sequence number n to the minimum
timestamp that may be assigned to Paxos sequence number n +
1. A replica can advance to MinNextTS(n) − 1 when it has
applied through n.
A single leader can enforce its MinNextTS() promises easily.
Because the timestamps promised by MinNextTS() lie within a
leader’s lease, the disjointness invariant enforces MinNextTS()
promises across leaders. If a leader wishes to advance
23. MinNextTS() beyond the end of its leader lease, it must first
extend its lease. Note that smax is always advanced to the
highest value in MinNextTS() to preserve disjointness.
A leader by default advances MinNextTS() values every 8
seconds. Thus, in the absence of prepared transactions, healthy
slaves in an idle Paxos group can serve reads at timestamps
greater than 8 seconds old in the worst case. A leader may also
advance MinNextTS() values on demand from slaves.Evaluation
We first measure Spanner’s performance with respect to
replication, transactions, and availability. We then provide
some data on TrueTime behavior, and a case study of our first
client, F1.
Microbenchmarks
Table 3 presents some microbenchmarks for Spanner. These
measurements were taken on timeshared machines: each
spanserver ran on scheduling units of 4GB RAM and 4 cores
(AMD Barcelona 2200MHz). Clients were run on separate
machines. Each zone contained one spanserver. Clients and
zones were placed in a set of datacenters with network distance
of less than 1ms. (Such a layout should be commonplace: most
applications do not need to distribute all of their data
worldwide.) The test database was created with 50 Paxos groups
with 2500 directories. Operations were standalone reads and
writes of 4KB. All reads were served out of memory after a
compaction, so that we are only measuring the overhead of
Spanner’s call stack. In addition, one unmeasured round of
reads was done first to warm any location caches.
For the latency experiments, clients issued sufficiently few
operations so as to avoid queuing at the servers. From the 1-
replica experiments, commit wait is about 5ms, and Paxos
latency is about 9ms. As the number of replicas increases, the
latency stays roughly constant with less standard deviation
because Paxos executes in parallel at a group’s replicas. As the
number of replicas increases, the latency to achieve a quorum
becomes less sensitive to slowness at one slave replica.
24. For the throughput experiments, clients issued sufficiently many
operations so as to saturate the servers’
participants
latency (ms)
mean
99th percentile
1
17.0 ±1.4
75.0 ±34.9
2
24.5 ±2.5
87.6 ±35.9
5
31.5 ±6.2
104.5 ±52.2
10
30.0 ±3.7
95.6 ±25.4
25
35.5 ±5.6
100.4 ±42.7
50
42.7 ±4.1
93.7 ±22.9
100
71.4 ±7.6
131.2 ±17.6
200
150.5 ±11.0
320.3 ±35.1
Table 4: Two-phase commit scalability. Mean and standard
deviations over 10 runs.
CPUs. Snapshot reads can execute at any up-to-date replicas, so
their throughput increases almost linearly with the number of
replicas. Single-read read-only transactions only execute at
25. leaders because timestamp assignment must happen at leaders.
Read-only-transaction throughput increases with the number of
replicas because the number of effective spanservers increases:
in the experimental setup, the number of spanservers equaled
the number of replicas, and leaders were randomly distributed
among the zones. Write throughput benefits from the same
experimental artifact (which explains the increase in throughput
from 3 to 5 replicas), but that benefit is outweighed by the
linear increase in the amount of work performed per write, as
the number of replicas increases.
Table 4 demonstrates that two-phase commit can scale to a
reasonable number of participants: it summarizes a set of
experiments run across 3 zones, each with 25 spanservers.
Scaling up to 50 participants is reasonable in both mean and
99th-percentile, and latencies start to rise noticeably at 100
participants.
Availability
Figure 5 illustrates the availability benefits of running Spanner
in multiple datacenters. It shows the results of three
experiments on throughput in the presence of datacenter failure,
all of which are overlaid onto the same time scale. The test
universe consisted of 5 zones Zi, each of which had 25
spanservers. The test database was sharded into 1250 Paxos
groups, and 100 test clients constantly issued non-snapshot
reads at an aggregrate rate of 50K reads/second. All of the
leaders were explicitly placed in Z1. Five seconds into each
test, all of the servers in one zone were killed: non-leader kills
Z2; leader-hard kills Z1; leader-soft kills Z1, but it gives
notifications to all of the servers that they should handoff
leadership first.
Killing Z2 has no effect on read throughput. Killing
Z1 while giving the leaders time to handoff leadership to
Figure 5: Effect of killing servers on throughput.
a different zone has a minor effect: the throughput drop is not
26. visible in the graph, but is around 3-4%. On the other hand,
killing Z1 with no warning has a severe effect: the rate of
completion drops almost to 0. As leaders get re-elected, though,
the throughput of the system rises to approximately 100K
reads/second because of two artifacts of our experiment: there is
extra capacity in the system, and operations are queued while
the leader is unavailable. As a result, the throughput of the
system rises before leveling off again at its steady-state rate.
We can also see the effect of the fact that Paxos leader leases
are set to 10 seconds. When we kill the zone, the leader-lease
expiration times for the groups should be evenly distributed
over the next 10 seconds. Soon after each lease from a dead
leader expires, a new leader is elected. Approximately 10
seconds after the kill time, all of the groups have leaders and
throughput has recovered. Shorter lease times would reduce the
effect of server deaths on availability, but would require greater
amounts of lease-renewal network traffic. We are in the process
of designing and implementing a mechanism that will cause
slaves to release Paxos leader leases upon leader failure.
TrueTime
Two questions must be answered with respect to TrueTime: is
truly a bound on clock uncertainty, and how bad does get? For
the former, the most serious problem would be if a local clock’s
drift were greater than 200us/sec: that would break assumptions
made by TrueTime. Our machine statistics show that bad CPUs
are 6 times more likely than bad clocks. That is, clock issues
are extremely infrequent, relative to much more serious
hardware problems. As a result, we believe that TrueTime’s
implementation is as trustworthy as any other piece of software
upon which Spanner depends.
Figure 6 presents TrueTime data taken at several thousand
spanserver machines across datacenters up to 2200
Date Date (April 13)
27. Figure 6: Distribution of TrueTime values, sampled right after
timeslave daemon polls the time masters. 90th, 99th, and 99.9th
percentiles are graphed.
km apart. It plots the 90th, 99th, and 99.9th percentiles of ,
sampled at timeslave daemons immediately after polling the
time masters. This sampling elides the sawtooth in due to local-
clock uncertainty, and therefore measures time-master
uncertainty (which is generally 0) plus communication delay to
the time masters.
The data shows that these two factors in determining the base
value of are generally not a problem. However, there can be
significant tail-latency issues that cause higher values of . The
reduction in tail latencies beginning on March 30 were due to
networking improvements that reduced transient network-link
congestion. The increase in on April 13, approximately one hour
in duration, resulted from the shutdown of 2 time masters at a
datacenter for routine maintenance. We continue to investigate
and remove causes of TrueTime spikes.
F1
Spanner started being experimentally evaluated under
production workloads in early 2011, as part of a rewrite of
Google’s advertising backend called F1 [35]. This backend was
originally based on a MySQL database that was manually
sharded many ways. The uncompressed dataset is tens of
terabytes, which is small compared to many NoSQL instances,
but was large enough to cause difficulties with sharded MySQL.
The MySQL sharding scheme assigned each customer and all
related data to a fixed shard. This layout enabled the use of
indexes and complex query processing on a per-customer basis,
but required some knowledge of the sharding in application
business logic. Resharding this revenue-critical database as it
grew in the number of customers and their data was extremely
costly. The last resharding took over two years of intense effort,
and involved coordination and testing across dozens of teams to
minimize risk. This operation was too complex to do regularly:
28. as a result, the team had to limit growth on the MySQL database
by storing some
# fragments
# directories
1
>100M
2–4
341
5–9
5336
10–14
232
15–99
34
100–500
7
Table 5: Distribution of directory-fragment counts in F1.
data in external Bigtables, which compromised transactional
behavior and the ability to query across all data.
The F1 team chose to use Spanner for several reasons. First,
Spanner removes the need to manually reshard. Second, Spanner
provides synchronous replication and automatic failover. With
MySQL master-slave replication, failover was difficult, and
risked data loss and downtime. Third, F1 requires strong
transactional semantics, which made using other NoSQL
systems impractical. Application semantics requires
transactions across arbitrary data, and consistent reads. The F1
team also needed secondary indexes on their data (since
Spanner does not yet provide automatic support for secondary
indexes), and was able to implement their own consistent global
indexes using Spanner transactions.
All application writes are now by default sent through F1 to
Spanner, instead of the MySQL-based application stack. F1 has
2 replicas on the west coast of the US, and 3 on the east coast.
This choice of replica sites was made to cope with outages due
to potential major natural disasters, and also the choice of their
29. frontend sites. Anecdotally, Spanner’s automatic failover has
been nearly invisible to them. Although there have been
unplanned cluster failures in the last few months, the most that
the F1 team has had to do is update their database’s schema to
tell Spanner where to preferentially place Paxos leaders, so as
to keep them close to where their frontends moved.
Spanner’s timestamp semantics made it efficient for F1 to
maintain in-memory data structures computed from the database
state. F1 maintains a logical history log of all changes, which is
written into Spanner itself as part of every transaction. F1 takes
full snapshots of data at a timestamp to initialize its data
structures, and then reads incremental changes to update them.
Table 5 illustrates the distribution of the number of fragments
per directory in F1. Each directory typically corresponds to a
customer in the application stack above F1. The vast majority of
directories (and therefore customers) consist of only 1 fragment,
which means that reads and writes to those customers’ data are
guaranteed to occur on only a single server. The directories with
more than 100 fragments are all tables that contain F1
secondary indexes: writes to more than a few fragments
operation
latency (ms)
count
mean
std dev
all reads
8.7
376.4
21.5B
single-site commit
72.3
112.8
31.2M
multi-site commit
30. 103.0
52.2
32.1M
Table 6: F1-perceived operation latencies measured over the
course of 24 hours.
of such tables are extremely uncommon. The F1 team has only
seen such behavior when they do untuned bulk data loads as
transactions.
Table 6 presents Spanner operation latencies as measured from
F1 servers. Replicas in the east-coast data centers are given
higher priority in choosing Paxos leaders. The data in the table
is measured from F1 servers in those data centers. The large
standard deviation in write latencies is caused by a pretty fat
tail due to lock conflicts. The even larger standard deviation in
read latencies is partially due to the fact that Paxos leaders are
spread across two data centers, only one of which has machines
with SSDs. In addition, the measurement includes every read in
the system from two datacenters: the mean and standard
deviation of the bytes read were roughly 1.6KB and 119KB,
respectively.Related Work
Consistent replication across datacenters as a storage service
has been provided by Megastore [5] and DynamoDB [3].
DynamoDB presents a key-value interface, and only replicates
within a region. Spanner follows Megastore in providing a semi-
relational data model, and even a similar schema language.
Megastore does not achieve high performance. It is layered on
top of Bigtable, which imposes high communication costs. It
also does not support long-lived leaders: multiple replicas may
initiate writes. All writes from different replicas necessarily
conflict in the Paxos protocol, even if they do not logically
conflict: throughput collapses on a Paxos group at several
writes per second. Spanner provides higher performance,
general-purpose transactions, and external consistency.
Pavlo et al. [31] have compared the performance of databases
and MapReduce [12]. They point to several other efforts that
have been made to explore database functionality layered on
31. distributed key-value stores [1, 4, 7, 41] as evidence that the
two worlds are converging. We agree with the conclusion, but
demonstrate that integrating multiple layers has its advantages:
integrating concurrency control with replication reduces the
cost of commit wait in Spanner, for example.
The notion of layering transactions on top of a replicated store
dates at least as far back as Gifford’s dissertation [16]. Scatter
[17] is a recent DHT-based key-value store that layers
transactions on top of consistent replication. Spanner focuses on
providing a higher-level interface than Scatter does. Gray and
Lamport [18] describe a non-blocking commit protocol based on
Paxos. Their protocol incurs more messaging costs than
twophase commit, which would aggravate the cost of commit
over widely distributed groups. Walter [36] provides a variant
of snapshot isolation that works within, but not across
datacenters. In contrast, our read-only transactions provide a
more natural semantics, because we support external
consistency over all operations.
There has been a spate of recent work on reducing or
eliminating locking overheads. Calvin [40] eliminates
concurrency control: it pre-assigns timestamps and then
executes the transactions in timestamp order. HStore [39] and
Granola [11] each supported their own classification of
transaction types, some of which could avoid locking. None of
these systems provides external consistency. Spanner addresses
the contention issue by providing support for snapshot isolation.
VoltDB [42] is a sharded in-memory database that supports
master-slave replication over the wide area for disaster
recovery, but not more general replication configurations. It is
an example of what has been called NewSQL, which is a
marketplace push to support scalable SQL [38]. A number of
commercial databases implement reads in the past, such as
MarkLogic [26] and Oracle’s Total Recall [30]. Lomet and Li
[24] describe an implementation strategy for such a temporal
database.
Farsite derived bounds on clock uncertainty (much looser than
32. TrueTime’s) relative to a trusted clock reference [13]: server
leases in Farsite were maintained in the same way that Spanner
maintains Paxos leases. Loosely synchronized clocks have been
used for concurrencycontrol purposes in prior work [2, 23]. We
have shown that TrueTime lets one reason about global time
across sets of Paxos state machines.Future Work
We have spent most of the last year working with the F1 team to
transition Google’s advertising backend from MySQL to
Spanner. We are actively improving its monitoring and support
tools, as well as tuning its performance. In addition, we have
been working on improving the functionality and performance
of our backup/restore system. We are currently implementing
the Spanner schema language, automatic maintenance of
secondary indices, and automatic load-based resharding. Longer
term, there are a couple of features that we plan to investigate.
Optimistically doing reads in parallel may be a valuable
strategy to pursue, but initial experiments have indicated that
the right implementation is non-trivial. In addition, we plan to
eventually support direct changes of Paxos configurations [22,
34].
Given that we expect many applications to replicate their data
across datacenters that are relatively close to each other,
TrueTime may noticeably affect performance. We see no
insurmountable obstacle to reducing below 1ms. Time-master-
query intervals can be reduced, and better clock crystals are
relatively cheap. Time-master query latency could be reduced
with improved networking technology, or possibly even avoided
through alternate time-distribution technology.
Finally, there are obvious areas for improvement. Although
Spanner is scalable in the number of nodes, the node-local data
structures have relatively poor performance on complex SQL
queries, because they were designed for simple key-value
accesses. Algorithms and data structures from DB literature
could improve singlenode performance a great deal. Second,
moving data automatically between datacenters in response to
changes in client load has long been a goal of ours, but to make
33. that goal effective, we would also need the ability to move
client-application processes between datacenters in an
automated, coordinated fashion. Moving processes raises the
even more difficult problem of managing resource acquisition
and allocation between datacenters.Conclusions
To summarize, Spanner combines and extends on ideas from
two research communities: from the database community, a
familiar, easy-to-use, semi-relational interface, transactions,
and an SQL-based query language; from the systems
community, scalability, automatic sharding, fault tolerance,
consistent replication, external consistency, and wide-area
distribution. Since Spanner’s inception, we have taken more
than 5 years to iterate to the current design and implementation.
Part of this long iteration phase was due to a slow realization
that Spanner should do more than tackle the problem of a
globallyreplicated namespace, and should also focus on
database features that Bigtable was missing.
One aspect of our design stands out: the linchpin of Spanner’s
feature set is TrueTime. We have shown that reifying clock
uncertainty in the time API makes it possible to build
distributed systems with much stronger time semantics. In
addition, as the underlying system enforces tighter bounds on
clock uncertainty, the overhead of the stronger semantics
decreases. As a community, we should no longer depend on
loosely synchronized clocks and weak time APIs in designing
distributed algorithms.Acknowledgements
Many people have helped to improve this paper: our shepherd
Jon Howell, who went above and beyond his responsibilities;
the anonymous referees; and many Googlers: Atul Adya, Fay
Chang, Frank Dabek, Sean
Dorward, Bob Gruber, David Held, Nick Kline, Alex Thomson,
and Joel Wein. Our management has been very supportive of
both our work and of publishing this paper: Aristotle Balogh,
Bill Coughran, Urs Holzle,¨ Doron Meyer, Cos Nicolaou, Kathy
Polizzi, Sridhar Ramaswany, and Shivakumar Venkataraman.
We have built upon the work of the Bigtable and Megastore
34. teams. The F1 team, and Jeff Shute in particular, worked closely
with us in developing our data model and helped immensely in
tracking down performance and correctness bugs. The Platforms
team, and Luiz Barroso and Bob Felderman in particular, helped
to make TrueTime happen. Finally, a lot of Googlers used to be
on our team: Ken Ashcraft, Paul Cychosz, Krzysztof Ostrowski,
Amir Voskoboynik, Matthew Weaver, Theo Vassilakis, and Eric
Veach; or have joined our team recently: Nathan Bales, Adam
Beberg, Vadim Borisov, Ken Chen, Brian Cooper, Cian
Cullinan, Robert-Jan Huijsman, Milind Joshi, Andrey Khorlin,
Dawid Kuroczko, Laramie Leavitt, Eric Li, Mike Mammarella,
Sunil Mushran, Simon Nielsen, Ovidiu Platon, Ananth
Shrinivas, Vadim Suvorov, and Marcel van der Holst.References
[1] Azza Abouzeid et al. “HadoopDB: an architectural hybrid of
MapReduce and DBMS technologies for analytical workloads”.
Proc. of VLDB. 2009, pp. 922–933.
[2] A. Adya et al. “Efficient optimistic concurrency control
using loosely synchronized clocks”. Proc. of SIGMOD. 1995,
pp. 23– 34.
[3] Amazon. Amazon DynamoDB. 2012.
[4] Michael Armbrust et al. “PIQL: Success-Tolerant Query
Processing in the Cloud”. Proc. of VLDB. 2011, pp. 181–192.
[5] Jason Baker et al. “Megastore: Providing Scalable, Highly
Available Storage for Interactive Services”. Proc. of CIDR.
2011, pp. 223–234.
[6] Hal Berenson et al. “A critique of ANSI SQL isolation
levels”. Proc. of SIGMOD. 1995, pp. 1–10.
[7] Matthias Brantner et al. “Building a database on S3”. Proc.
of SIGMOD. 2008, pp. 251–264.
[8] A. Chan and R. Gray. “Implementing Distributed Read-Only
Transactions”. IEEE TOSE SE-11.2 (Feb. 1985), pp. 205–212.
[9] Fay Chang et al. “Bigtable: A Distributed Storage System
for Structured Data”. ACM TOCS 26.2 (June 2008), 4:1–4:26.
[10] Brian F. Cooper et al. “PNUTS: Yahoo!’s hosted data
serving platform”. Proc. of VLDB. 2008, pp. 1277–1288.
[11] James Cowling and Barbara Liskov. “Granola: Low-
35. Overhead Distributed Transaction Coordination”. Proc. of
USENIX ATC.
2012, pp. 223–236.
[12] Jeffrey Dean and Sanjay Ghemawat. “MapReduce: a
flexible data processing tool”. CACM 53.1 (Jan. 2010), pp. 72–
77.
[13] John Douceur and Jon Howell. Scalable Byzantine-
FaultQuantifying Clock Synchronization. Tech. rep. MSR-TR-
200367. MS Research, 2003.
[14] John R. Douceur and Jon Howell. “Distributed directory
service in the Farsite file system”. Proc. of OSDI. 2006, pp.
321–334.
[15] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
“The Google file system”. Proc. of SOSP. Dec. 2003, pp. 29–43.
[16] David K. Gifford. Information Storage in a Decentralized
Computer System. Tech. rep. CSL-81-8. PhD dissertation.
Xerox PARC, July 1982.
[17] Lisa Glendenning et al. “Scalable consistency in Scatter”.
Proc. of SOSP. 2011.
[18] Jim Gray and Leslie Lamport. “Consensus on transaction
commit”. ACM TODS 31.1 (Mar. 2006), pp. 133–160.
[19] Pat Helland. “Life beyond Distributed Transactions: an
Apostate’s Opinion”. Proc. of CIDR. 2007, pp. 132–141.
[20] Maurice P. Herlihy and Jeannette M. Wing.
“Linearizability:a correctness condition for concurrent objects”.
ACM TOPLAS 12.3 (July 1990), pp. 463–492.
[21] Leslie Lamport. “The part-time parliament”. ACM TOCS
16.2 (May 1998), pp. 133–169.
[22] Leslie Lamport, Dahlia Malkhi, and Lidong Zhou.
“Reconfiguring a state machine”. SIGACT News 41.1 (Mar.
2010), pp. 63– 73.
[23] Barbara Liskov. “Practical uses of synchronized clocks in
distributed systems”. Distrib. Comput. 6.4 (July 1993), pp. 211–
219.
[24] David B. Lomet and Feifei Li. “Improving Transaction-
Time DBMS Performance and Functionality”. Proc. of ICDE
36. (2009), pp. 581–591.
[25] Jacob R. Lorch et al. “The SMART way to migrate
replicated stateful services”. Proc. of EuroSys. 2006, pp. 103–
115.
[26] MarkLogic. MarkLogic 5 Product Documentation. 2012.
[27] Keith Marzullo and Susan Owicki. “Maintaining the time
ina distributed system”. Proc. of PODC. 1983, pp. 295–305.
[28] Sergey Melnik et al. “Dremel: Interactive Analysis of
WebScale Datasets”. Proc. of VLDB. 2010, pp. 330–339.
[29] D.L. Mills. Time synchronization in DCNET hosts. Internet
Project Report IEN–173. COMSAT Laboratories, Feb. 1981.
[30] Oracle. Oracle Total Recall. 2012.
[31] Andrew Pavlo et al. “A comparison of approaches to large-
scale data analysis”. Proc. of SIGMOD. 2009, pp. 165–178.
[32] Daniel Peng and Frank Dabek. “Large-scale incremental
processing using distributed transactions and notifications”.
Proc. of OSDI. 2010, pp. 1–15.
[33] Daniel J. Rosenkrantz, Richard E. Stearns, and Philip M.
Lewis II. “System level concurrency control for distributed
database systems”. ACM TODS 3.2 (June 1978), pp. 178–198.
[34] Alexander Shraer et al. “Dynamic Reconfiguration of
Primary/Backup Clusters”. Proc. of USENIX ATC. 2012, pp.
425– 438.
[35] Jeff Shute et al. “F1 — The Fault-Tolerant Distributed
RDBMS Supporting Google’s Ad Business”. Proc. of SIGMOD.
May 2012, pp. 777–778.
[36] Yair Sovran et al. “Transactional storage for geo-replicated
systems”. Proc. of SOSP. 2011, pp. 385–400.
[37] Michael Stonebraker. Why Enterprises Are Uninterested in
NoSQL. 2010.
[38] Michael Stonebraker. Six SQL Urban Myths. 2010.
[39] Michael Stonebraker et al. “The end of an architectural era:
(it’s time for a complete rewrite)”. Proc. of VLDB. 2007, pp.
1150– 1160.
[40] Alexander Thomson et al. “Calvin: Fast Distributed
Transactions for Partitioned Database Systems”. Proc. of
37. SIGMOD. 2012, pp. 1–12.
[41] Ashish Thusoo et al. “Hive — A Petabyte Scale Data
Warehouse Using Hadoop”. Proc. of ICDE. 2010, pp. 996–1005.
[42] VoltDB. VoltDB Resources. 2012.A Paxos Leader-Lease
Management
The simplest means to ensure the disjointness of Paxosleader-
lease intervals would be for a leader to issue a synchronous
Paxos write of the lease interval, whenever it would be
extended. A subsequent leader would read the interval and wait
until that interval has passed.
TrueTime can be used to ensure disjointness without these extra
log writes. The potential ith leader keeps a lower bound on the
start of a lease vote from replica r as vi,rleader =
TT.now().earliest, computed before esendi,r (defined as when
the lease request is sent by the leader). Each replica r grants a
lease at lease, which happens after ereceivei,r (when the replica
receives a lease request); the lease ends at tendi,r =
TT.now().latest + 10, computed after ereceivei,r . A replica r
obeys the singlevote rule: it will not grant another lease vote
until
TT.after(tendi,r ) is true. To enforce this rule across different
incarnations of r, Spanner logs a lease vote at the granting
replica before granting the lease; this log write can be
piggybacked upon existing Paxos-protocol log writes.
When the ith leader receives a quorum of votes (event equorumi
), it computes its lease interval as leasei =
[TT.now().latest,minr(vi,rleader) + 10]. The lease is deemed to
have expired at the leader when
TT.before(minr(vi,rleader) + 10) is false. To prove disjointness,
we make use of the fact that the ith and (i + 1)th leaders must
have one replica in common in their quorums. Call that replica
r0. Proof:
leasei.end = minr(vi,rleader) + 10
(by definition)
(min)