Logging Last Resource Optimization for Distributed Transactions in Oracle WebLogic Server describes optimizing distributed transactions by designating the last resource as a non-XA "logging last resource" (LLR). This allows skipping XA protocol calls for the last resource, reducing latency. The transaction manager logs to the LLR table, which acts as the combined transaction log. If the LLR commit succeeds, other resources are committed; if it fails, the transaction aborts globally. This provides the same ACID guarantees with lower overhead compared to a standard two-phase commit protocol.
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...Dr. Amarjeet Singh
The BRKSS Architecture is based upon shared
nothing clustering that can scale-up to a large number of
computers, increase their speed and maintain the work load.
The architecture comprises of a console along with a CPU that
also acts as a buffer and stores information based on the
processing of transactions, when a batch enters into the
system. This console is connected to a switch (p-ports) which is
again connected to the c-number of clusters through their
respective hubs. The architecture can be used for personal
databases and for online databases like cloud through router.
This architecture uses the concept of load balancing by
moving the transaction among various nodes within the
clusters so that the overhead of a particular node can be
minimised. In this paper we have simulated the working of
BRKSS architecture using JDK 1.7 with Net beans 8.0.2. We
compared the result of performance parameters sch as
turnaround time, throughput and waiting time with existing
hierarchical clustering model.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...Dr. Amarjeet Singh
The BRKSS Architecture is based upon shared
nothing clustering that can scale-up to a large number of
computers, increase their speed and maintain the work load.
The architecture comprises of a console along with a CPU that
also acts as a buffer and stores information based on the
processing of transactions, when a batch enters into the
system. This console is connected to a switch (p-ports) which is
again connected to the c-number of clusters through their
respective hubs. The architecture can be used for personal
databases and for online databases like cloud through router.
This architecture uses the concept of load balancing by
moving the transaction among various nodes within the
clusters so that the overhead of a particular node can be
minimised. In this paper we have simulated the working of
BRKSS architecture using JDK 1.7 with Net beans 8.0.2. We
compared the result of performance parameters sch as
turnaround time, throughput and waiting time with existing
hierarchical clustering model.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Investigating the Use of Synchronized Clocks in TCP Congestion ControlMichele Weigle
My PhD defense
May 14, 2003
University of North Carolina, Chapel Hill
Investigating the Use of Synchronized Clocks in TCP Congestion Control
Advisor: Kevin Jeffay
This slides explains why Paxos is the only correctly way to problems about consensus in a distributed system.
This slides uses several diagram to show how paxos is derived from a naive replication algorithm to a immediate consistent replication algorithm.
It starts with master-slave replication.
Then we refine it to quorum-rw by adding consistency constrain.
And then we refine quorum-rw to paxos by adding atomicity constrain.
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
Do you have the background necessary to take full advantage of Tungsten Replicator in your environments? Tungsten offers enterprise-quality replication features in an open source package hosted on Google Code. This virtual course teaches you how to set up innovative topologies that solve complex replication problems. We start with single MySQL servers running MySQL replication and show a simple path migration path to Tungsten.
Course Topics
- Checking host and MySQL prerequisites
- Downloading code from http://code.google.com/p/tungsten-replicator/
- Installation using the tungsten-installer utility
- Transaction filtering using standard filters as well as customized filters you write yourself
- Enabling and managing parallel replication
- Configuring multi-master and fan-in using multiple replication services
- Backup and restore integration
- Troubleshooting replication problems
- Logging bugs and participating in the Tungsten Replicator community
Replication is a powerful technology that takes knowledge and planning to use effectively. We give you the background that makes replication easier to set up, and allows you to take full advantage of the Tungsten Replicator benefits. Learn how to configure and use it more effectively for your projects in the cloud as well as on-premises hardware.
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Investigating the Use of Synchronized Clocks in TCP Congestion ControlMichele Weigle
My PhD defense
May 14, 2003
University of North Carolina, Chapel Hill
Investigating the Use of Synchronized Clocks in TCP Congestion Control
Advisor: Kevin Jeffay
This slides explains why Paxos is the only correctly way to problems about consensus in a distributed system.
This slides uses several diagram to show how paxos is derived from a naive replication algorithm to a immediate consistent replication algorithm.
It starts with master-slave replication.
Then we refine it to quorum-rw by adding consistency constrain.
And then we refine quorum-rw to paxos by adding atomicity constrain.
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
Do you have the background necessary to take full advantage of Tungsten Replicator in your environments? Tungsten offers enterprise-quality replication features in an open source package hosted on Google Code. This virtual course teaches you how to set up innovative topologies that solve complex replication problems. We start with single MySQL servers running MySQL replication and show a simple path migration path to Tungsten.
Course Topics
- Checking host and MySQL prerequisites
- Downloading code from http://code.google.com/p/tungsten-replicator/
- Installation using the tungsten-installer utility
- Transaction filtering using standard filters as well as customized filters you write yourself
- Enabling and managing parallel replication
- Configuring multi-master and fan-in using multiple replication services
- Backup and restore integration
- Troubleshooting replication problems
- Logging bugs and participating in the Tungsten Replicator community
Replication is a powerful technology that takes knowledge and planning to use effectively. We give you the background that makes replication easier to set up, and allows you to take full advantage of the Tungsten Replicator benefits. Learn how to configure and use it more effectively for your projects in the cloud as well as on-premises hardware.
Transactions and Concurrency Control in distributed systems. Transaction properties, classification, and transaction implementation. Flat, Nested, and Distributed transactions. Inconsistent Retrievals, Lost Update, Dirty Read, and Premature Writes Problem
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
"Is your team looking to bring the power of full, end-to-end stream processing with Apache Flink to your organization but are concerned about the time, resources or skills required? In this talk, Sharon Xie, Decodable Founding Engineer and Apache Flink PMC Member, Robert Metzger, will reveal the biggest lessons learned, and how to avoid common mistakes when adopting Apache Flink. If you have any plans on implementing Apache Flink, then this is a session you do not want to miss.
We will talk about avoiding data-loss with Flink’s Kafka exactly-once producer, configuring Flink for getting the most bang for the buck out of your memory configuration and tuning for efficient checkpointing."
Container Orchestration from Theory to PracticeDocker, Inc.
Join Laura Frank and Stephen Day as they explain and examine technical concepts behind container orchestration systems, like distributed consensus, object models, and node topology. These concepts build the foundation of every modern orchestration system, and each technical explanation will be illustrated using Docker’s SwarmKit as a real-world example. Gain a deeper understanding of how orchestration systems like SwarmKit work in practice and walk away with more insights into your production applications.
Scalar DB is an open source library released under Apache 2 which realizes ACID-compliant transactions on Cassandra, without requiring any modifications to Cassandra itself. It achieves strongly-consistent, linearly scalable, and highly available transactions. This talk will present the theory and practice behind Scalar DB, as well as providing some benchmark results and use cases.
Presentation given at the GoSF meetup on July 20, 2016. It was also recorded on BigMarker here: https://www.bigmarker.com/remote-meetup-go/GoSF-EVCache-Peripheral-I-O-Building-Origin-Cache-for-Images
Slides that accompanied a three-hour crash training course on sysadmin survival skills useful for sysadmins of Evergreen open source library software. Session led by Don McMorris, Equinox Software.
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
Lightweight transactions (LWT) has been a long anticipated feature for Scylla. Join Scylla VP of Product Tzach Livyatan and Software Team Lead Konstantin Osipov for a webinar introducing the Scylla implementation of LWT, a feature that brings strong consistency to our NoSQL database.
In this webinar we will cover the tradeoffs typically made between database consistency, availability and latency; how to use lightweight transactions in Scylla; the similarities and differences between Scylla’s Paxos implementation and Cassandra’s, and what it all means to users.
From attending this live webinar you’ll learn…
The advantages and disadvantages of various consistency options
Scylla lightweight transactions: syntax and semantics
A design and implementation overview, changes in Paxos
Performance comparisons with Apache Cassandra
Scylla’s future roadmap for LWT beyond Paxos
Adopting GraalVM - Scale by the Bay 2018Petr Zapletal
After many years of development, Oracle finally published GraalVM and sparkled a lot of interest in the community. GraalVM is a high-performance polyglot VM with a number of potentially interesting traits we can take advantage of like increased performance and lowered cost. It can also tackle shortcomings of JVM/Scala we are struggling for years like slow-startup times or large jars. Lastly, thanks to its polyglot nature it can open interesting doors we may want to discover. On the other hand, GraalVM may still be bleeding edge technology and having a hard time to deliver the promised features. In this talk, I’d like to discuss advantages and disadvantages of adopting GraalVM, provide you guidance if you decide to do so and also share our story in this area including various samples, and recommendations. This talk is focused on JVM and Scala but should be beneficial for everyone with interested in this topic.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
Apache Drill is new Apache incubator project. It's goal is to provide a distributed system for interactive analysis of large-scale datasets. Inspired by Google's Dremel technology, it aims to process trillions of records in seconds. We will cover the goals of Apache Drill, its use cases and how it relates to Hadoop, MongoDB and other large-scale distributed systems. We'll also talk about details of the architecture, points of extensibility, data flow and our first query languages (DrQL and SQL).
Logging Last Resource Optimization for Distributed Transactions in Oracle Weblogic Server
1. Logging Last Resource Optimization
for Distributed Transactions in
Oracle WebLogic Server
T. Barnes, A. Messinger, P. Parkinson, A. Ganesh,
G. Shegalov, S. Narayan, S. Kareenhalli
2. OLTP: Online Transaction Processing
Transaction is an ACID contract
● Atomic – all or nothing
● Consistent – from the application perspective
● Isolated – masked concurrency through locking or snapshots
● Durable – once committed changes survive subsequent failures
begin
c -= 1000
Checking = 2000 s += 1000 Checking = 1000
Savings = 8000 commit Savings = 9000
time
3. OLTP: Single Resource
● A and D are typically implemented using Write-Ahead Logging
● Transaction recovery is “simple”: REDO phase, UNDO phase.
BEGIN TRANSACTION
/* LSN = 1: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance – 1000 WHERE Number = 1
/* LSN = 2: log for undo and redo in MM buffer*/
UPDATE Accounts SET balance = balance + 1000 WHERE Number = 2
/* LSN = 3: log commit and force (5-6 orders slower)*/
COMMIT TRANSACTION
Accounts LSN=0 Accounts LSN=1 Accounts LSN=2
1 2000 1 1000 1 1000
2 8000 2 8000 2 9000
4. OLTP: Distributed / Two-Phase Commit
Like a wedding ceremony
Coordinator: Will you ...? (prepare)
Resource: I will (OK)
Coordinator: I pronounce you … (commit)
Transaction Resource 1 Resource 1
Coordinator
prepare --> force-log prepare force-log prepare
<-- OK <-- OK
commit --> force-log commit force-log commit
<-- ACK <-- ACK
5. 2PC is A CI D
● 2PC is not about Concurrency Control.
● 2PC transaction is therefore
○ Globally Atomic
○ Locally Isolated
○ Locally Consistent
○ Globally Durable
6. OLTP: Queued Transactions
client` app server database
begin transaction
req_q.enqueue(req1)
commit transaction
begin transaction
creq = req_q.dequeue()
resp = creq.execute()
res_q.enqueue(resp)
commit transaction
begin transaction
resp = res_q.dequeue()
process(resp)
commit transaction
12. “Real Life” XA 2PC
2n+1 writes, 8n messages
TM Resources
xa_start
ack_started
xa_end
ack_ended
Timeline
xa_prepare
force-log prepared
ack_prepared
all-prepared: force-log commit
xa_commit
force-log commit
ack_committed
all-commit: log end
13. Standard 2PC Optimizations
● 1PC: if only one resource enlisted, prepare skipped
● Read-Only: if voted read-only, commit skipped
● XA ceremony of xa_(start|end) is always present
14. Nested 2PC: Coordinator Role Transfer
[Gray’78]
prepare p commit
TC Res2 Res3 c
commit commit
c
● Last Resource is committed in one phase
● 2n messages/ 2n-1 forced writes
● Known topology: linked Databases
15. WebLogic Design Constraints and Goals
● No control over foreign XAResource, TM and topology
● Broadband: minimize blocking RPC, not messages
● Unneeded XA on Res3: save xa_start, xa_end
16. Typical WLS Deployment
● JMS and TM share the same FileStore
● Collocated JMS connection cost is negligible
● JDBC Datasource is remote: blocking RPC
● DB internal resources (locks, latches, etc.) are more
expensive and JEE is not a single client
● Outbound JMS notifies about a JDBC update
● Ideally: JDBC updates visible before JMS updates
17. JDBC as Logging Last Resource
● User enables a non-XA JDBC Datasource as LLR
○ LLR table WL_LLR_<server> in the DS schema
○ No XA overhead for the LLR
● TM log is local log UNION LLR table log
○ WLS does not boot if any LLR table is unavailable
● Restriction: 1 LLR datasource / Transaction
● No coordinator transfers as in Nested 2PC
18. XA 2PC Commit with LL Resource
1. Prepare concurrently all non-LLR XAResources
2. Insert XID into the LLR table
3. Commit the LLR-Resource
4. If 3 is successful, commit non-LLR XAResources
5. Lazy garbage-collection of 2PC records of completed
transactions is piggybacked on future LLR transactions
19. LLR Failure Recovery
● Failure before LLR.commit() => global abort
● Failure during LLR.commit() => similar to media failure
○ Wait until LLR Datasource / table is available for read
○ Presence of the LLR commit log decides the global outcome
○ If unavailable for AbandonTimeoutSeconds log abandoned
● JVM/OS crash: TM scan local log UNION LLR
○ Usual transaction outcome resolution
● 2PC recovery guarantees are not compromised
20. LLR Savings
Back-of-the-envelope for the single-threaded case
with Jeff Dean’s numbers [Google key notes]:
● xa_start (RPC),
● xa_end (RPC),
● xa_prepare (RPC + force-log)
● Insert into LLR table + commit done via single RPC
------------------------------------------------
4xRTT + 1xDiskSeek
= 4x500,000ns + 10,000,000ns = 12 milliseconds
29. WebLogic FileStore
● XA-capable KV store on local file system
● Mime design: allocate under write-head
○ fast writes
○ slow recovery
○ works well up to a couple of GiB
● Transactional use: for JMS messages and JTA logs
● Non-transactional use: Diagnostics and Config