In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could use. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.
See the full talk here: https://www.infoq.com/presentations/lsm-append-data-structures
This talk is about the beauty of sequential access and append only data structures. We'll do this in the context of a little known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.
The world of Microservices is a little different to standard service oriented architectures. They play to an opinionated score of decentralisation, isolation and automation. Stream processing comes from a different angle though. One where analytic function is melded to a firehose of in-flight events. Yet business applications increasingly need to be data intensive, nimble and fast to market. This isn’t as easy as it sounds.
This talk will look at the implications of mixing toolsets from the stream processing world into realtime business applications. How to effectively handle infinite streams, how to leverage a high throughput, persistent Log and deploy dynamic, fault tolerant, streaming services. By mixing these disparate domains we’ll see how we can build application architectures that can withstand the full force and immediacy of the data generation.
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
This talk, presented at JavaOne 2011, describes the ODC, a distributed, in-memory database built in Java that holds objects in a normalized form in a way that alleviates the traditional degradation in performance associated with joins in shared-nothing architectures. The presentation describes the two patterns that lie at the core of this model. The first is an adaptation of the Star Schema model used to hold data either replicated or partitioned data, depending on whether the data is a fact or a dimension. In the second pattern, the data store tracks arcs on the object graph to ensure that only the minimum amount of data is replicated. Through these mechanisms, almost any join can be performed across the various entities stored in the grid, without the need for key shipping or iterative wire calls.
Streaming, Database & Distributed Systems Bridging the DivideBen Stopford
There is something very interesting about stream processing. It builds upon messaging, rather than using a file system, as a more typical database does. But stream processing engines themselves are really a type of database. A database designed specifically to blend streams and tables so you can query continuous results. As such they span an architectural no-mans-land that sits between Database and Distributed Systems fields.
This talk will look at Stateful Stream Processing. Can a streaming engine provide the guarantees of a database? When is a streaming engine best? How do they work, under the covers?
See the full talk here: https://www.infoq.com/presentations/lsm-append-data-structures
This talk is about the beauty of sequential access and append only data structures. We'll do this in the context of a little known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.
The world of Microservices is a little different to standard service oriented architectures. They play to an opinionated score of decentralisation, isolation and automation. Stream processing comes from a different angle though. One where analytic function is melded to a firehose of in-flight events. Yet business applications increasingly need to be data intensive, nimble and fast to market. This isn’t as easy as it sounds.
This talk will look at the implications of mixing toolsets from the stream processing world into realtime business applications. How to effectively handle infinite streams, how to leverage a high throughput, persistent Log and deploy dynamic, fault tolerant, streaming services. By mixing these disparate domains we’ll see how we can build application architectures that can withstand the full force and immediacy of the data generation.
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
This talk, presented at JavaOne 2011, describes the ODC, a distributed, in-memory database built in Java that holds objects in a normalized form in a way that alleviates the traditional degradation in performance associated with joins in shared-nothing architectures. The presentation describes the two patterns that lie at the core of this model. The first is an adaptation of the Star Schema model used to hold data either replicated or partitioned data, depending on whether the data is a fact or a dimension. In the second pattern, the data store tracks arcs on the object graph to ensure that only the minimum amount of data is replicated. Through these mechanisms, almost any join can be performed across the various entities stored in the grid, without the need for key shipping or iterative wire calls.
Streaming, Database & Distributed Systems Bridging the DivideBen Stopford
There is something very interesting about stream processing. It builds upon messaging, rather than using a file system, as a more typical database does. But stream processing engines themselves are really a type of database. A database designed specifically to blend streams and tables so you can query continuous results. As such they span an architectural no-mans-land that sits between Database and Distributed Systems fields.
This talk will look at Stateful Stream Processing. Can a streaming engine provide the guarantees of a database? When is a streaming engine best? How do they work, under the covers?
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).
Rolling presentation during Couchbase Day. Including
Introduction to NoSQL
Why NoSQL?
Introduction to Couchbase
Couchbase Architecture
Single Node Operations
Cluster Operations
HA and DR
Availability and XDCR
Backup/Restore
Security
Developing with Couchbase
Couchbase SDKs
Couchbase Indexing
Couchbase GSI and Views
Indexing and Query
Couchbase Mobile
Database Virtualization: The Next Wave of Big Dataexponential-inc
Servers, Storage and Networking have all been virtualized, the next big wave is the database. SQL databases are the one thing in the cloud that require single dedicated instances. Database virtualization changes all of this, enabling full elasticity without sacrificing functionality.
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
NoSQL databases get a lot of press coverage, but there seems to be a lot of confusion surrounding them, as in which situations they work better than a Relational Database, and how to choose one over another. This talk will give an overview of the NoSQL landscape and a classification for the different architectural categories, clarifying the base concepts and the terminology, and will provide a comparison of the features, the strengths and the drawbacks of the most popular projects (CouchDB, MongoDB, Riak, Redis, Membase, Neo4j, Cassandra, HBase, Hypertable).
Rolling presentation during Couchbase Day. Including
Introduction to NoSQL
Why NoSQL?
Introduction to Couchbase
Couchbase Architecture
Single Node Operations
Cluster Operations
HA and DR
Availability and XDCR
Backup/Restore
Security
Developing with Couchbase
Couchbase SDKs
Couchbase Indexing
Couchbase GSI and Views
Indexing and Query
Couchbase Mobile
Database Virtualization: The Next Wave of Big Dataexponential-inc
Servers, Storage and Networking have all been virtualized, the next big wave is the database. SQL databases are the one thing in the cloud that require single dedicated instances. Database virtualization changes all of this, enabling full elasticity without sacrificing functionality.
Test-Oriented Languages: Is it time for a new era?Ben Stopford
Where is TDD going next? Mock Methods not Objects?
There are many good reasons for using TDD in it’s ‘mockist’ form and you tend to build better software as as result. However there’s still a lot of overhead involved in creating mock and stub objects and passing them around. It makes the approach less accessible, more elitist and harder to integrate into a project if all the devs aren’t indoctrinated with the approach.
The slides below throw around some ideas for how we might do things differently. At it’s core is the concept of mocking methods rather than objects along with a compilation process that forces isolation.
Ideas for Distributing Skills Across a Continental DivideBen Stopford
Ideas for Distributing Skills Across a Continental Divide
Key Points:
- Learning needs to be collaborative and bi-directional
- Use the code base as a primary channel for communication. Encourage this in your practices.
- Select different learning practices for different phases in the project.
-Appraisal of four practices: Abridged Pairing, Collaborative Refactoring, Code Review Blitz, Follow-the-Sun Pairing.
Refactoring tested code - has mocking gone wrong?Ben Stopford
Mock Driven Development, or Interaction Based Testing has been gaining traction. This talk looks at some of the issues around the state based and interaction based approaches.
Robin Bloor and Mark Madsen offer their theories on where the rapidly-changing database market stands today: What’s new? What’s standard? What is the trajectory of this evolving market? Each Analyst will present for 10-15 minutes, then will engage in a dialogue with the moderator and attendees.
The webcast audio and video archive can be found at https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=4695777&rKey=4b284990a1db4ec0
Slides from the Live Webcast on Jan. 18, 2012
The purpose of this event is to allow the Analysts, Robin Bloor and Mark Madsen, to offer their theories on where the database market stands today: What’s new? What’s standard? What is the trajectory of this changing market? Each Analyst will present for 10-15 minutes, then will engage in a dialogue with Host Eric Kavanagh and all attendees.
For more information visit: http://www.databaserevolution.com
Watch this and the entire series at : http://www.youtube.com/playlist?list=PLE1A2D56295866394
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
"Navigating the Database Universe" was the topic of the Big Data Cloud meetup held on Jan 24th 2013 in Santa Clara, CA. This is the presentation made by Mike Stonebraker & Scott Jarr of VoltDB.
This meetup was sponsored by VoltDb.
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...lisapaglia
Webinar presentation delivered by Dr. Michael Stonebraker and Scott Jarr of VoltDB on December 11, 2012. www.voltdb.com
The design decisions you make today will have a huge performance impact down the line. Until recently, when it came to databases, the choice was easy. Essentially, you had one option: the RDBMS. Today, there's a new universe of databases being thrown into production — and not always with the greatest success. How do you make the right choice for your next application? Database pioneer Dr. Michael Stonebraker and VoltDB co-founder Scott Jarr have some thoughts.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
CodeFutures - Scaling Your Database in the CloudRightScale
RightScale Conference Santa Clara 2011: Scaling an application in the cloud often hits the most common bottleneck – the database tier. Not only is database performance the number one cause of poor application performance, but also the issue is magnified in cloud environments where I/O and bandwidth is generally slower and less predictable than in dedicated data centers. Database sharding is a highly effective method of removing the database scalability barrier, operating on top of proven RDBMS products such as MySQL and Postgres – as well as the new NoSQL database platforms. One critical aspect often given too little consideration is monitoring and continuous operation of your databases, including the full lifecycle, to ensure that they stay up.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.
Similar to Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability (20)
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
This talk includes an introduction to the Kafka ecosystem as well as event-driven microserivces, culminating with 10 rules that help with the design of such systems:
1. Don’t use Kafka for shopping carts!
2. Pick Topics with Business Significance
3. Decouple publishers from subscribers
4. Use the log to regenerate state
5. Apply the Single Writer Principal
6. Leverage keeping datasets inside the broker
7. Prefer stream processing over maintaining historic views
8. Sometimes you need historic views. => Replicate Read Only
9. Use Schemas
10. Consider “Stream Management” Services
10 Principals for Effective Event Driven MicroservicesBen Stopford
This talk includes an introduction to the Kafka ecosystem as well as event-driven microserivces, culminating with 10 rules that help with the design of such systems:
1. Don’t use Kafka for shopping carts!
2. Pick Topics with Business Significance
3. Decouple publishers from subscribers
4. Use the log to regenerate state
5. Apply the Single Writer Principal
6. Leverage keeping datasets inside the broker
7. Prefer stream processing over maintaining historic views
8. Sometimes you need historic views. => Replicate Read Only
9. Use Schemas
10. Consider “Stream Management” Services
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
Stream processing affects a wide range of industries today: capturing sensor data, connecting microservices, processing the workloads of internet giants and giving us a real-time alternative to batch analytics.
While these use cases are exciting and valuable they are only a taste of what is to come. In this talk we look at three areas that are likely to become more prominent: Global Apps, Event Stores and Serverless Stream Processing
A Global Source of Truth for the Microservices GenerationBen Stopford
One of the biggest challenges for today’s microservice generation is data, which gets split into fragments that are spread across a company, making it hard to get a joined-up view. One solution is to have a single, shared database that all services can access, but sharing databases across different services is a well-known anti-pattern. What if instead you shared a replayable commit log? This is the basic notion behind one of the most interesting and provocative ideas to arise from the stream-processing community.
Ben Stopford explains how an event stream—stored in a replayable log—can be used as a source of truth, incorporating the retentive properties of a database in a system designed to share data across many teams, cloud providers, or geographies. This leads to the idea of a database turned inside out: a central commit log spawning many continuously updated caches and views, embedded in different microservices. Ben examines the subtler, systemic effects that the pattern leads to—better autonomy, easier evolution and a more ephemeral approach to data—and explores the use of logs that span geographical regions and cloud providers. Along the way, he reflects on the practicalities of using logs as a distributed storage system and looks at some of the real-world applications of this approach.
Building Event Driven Services with Kafka StreamsBen Stopford
The Kafka Summit version of this talk is more practical and includes code examples which walk though how to build a streaming application with Kafka Streams.
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with StreamsBen Stopford
When building service-based systems, we don’t generally think too much about data. If we need data from another service, we ask for it. This pattern works well for whole swathes of use cases, particularly ones where datasets are small and requirements are simple. But real business services have to join and operate on datasets from many different sources. This can be slow and cumbersome in practice.
These problems stem from an underlying dichotomy. Data systems are built to make data as accessible as possible—a mindset that focuses on getting the job done. Services, instead, focus on encapsulation—a mindset that allows independence and autonomy as we evolve and grow. But these two forces inevitably compete in most serious service-based architectures.
Ben Stopford explains why understanding and accepting this dichotomy is an important part of designing service-based systems at any significant scale. Ben looks at how companies make use of a shared, immutable sequence of records to balance data that sits inside their services with data that is shared, an approach that allows the likes of Uber, Netflix, and LinkedIn to scale to millions of events per second.
Ben concludes by examining the potential of stream processors as a mechanism for joining significant, event-driven datasets across a whole host of services and explains why stream processing provides much of the benefits of data warehousing but without the same degree of centralization.
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Ben Stopford
Event Driven Services come in many shapes and sizes from tiny functions that dip into an event stream, right through to heavy, stateful services. This talk makes the case for building such services, be they large or small, using a streaming platform. We will walk through a number of patterns for putting these together using a distributed log, the Kafka Streams API and it’s transactional guarantees.
Building Event Driven Services with Stateful StreamsBen Stopford
Event Driven Services come in many shapes and sizes from tiny event driven functions that dip into an event stream, right through to heavy, stateful services which can facilitate request response. This practical talk makes the case for building this style of system using Stream Processing tools. We also walk through a number of patterns for how we actually put these things together.
Devoxx London 2017 - Rethinking Services With Stateful StreamsBen Stopford
Microservices are one of those polarising concepts that technologists either love or hate. Splitting applications into autonomous units clearly has advantages, but larger service-based systems tend to struggle as the interactions between the services grow.
At the core of this sits a dichotomy: Data systems are designed to make data as accessible as possible. Services, on the other hand, actively encapsulate. These two forces inevitably compete in the architectures we build. By understanding this dichotomy we can better reason about how services should be sewn together. We strike a balance between the ability to adapt quickly and the loose coupling we need to retain autonomy, long term.
In this talk we'll examine how Stateful Stream Processing can be used to build Event Driven Services, using a distributed log like Apache Kafka. In doing so this Data-Dichotomy is balanced with an architecture that exhibits demonstrably better scaling properties, be it increased complexity, team size, data volume or velocity.
Event Driven Services Part 2: Building Event-Driven Services with Apache KafkaBen Stopford
The second online talk in the Confluent Event Driven Services series covers how we build a shared narrative, using an event driven architecture, to conflate communication protocol and state transfer. We investigate how this leads to CQRS and discuss patterns for attaching Read-centric views to our canonical set of streams.
Strata Software Architecture NY: The Data DichotomyBen Stopford
Ben Stopford is an engineer and architect working on the Apache Kafka Core Team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
1. ODC
Beyond The Data Grid: Coherence, Normalisation, Joins
and Linear Scalability
Ben Stopford : RBS
2. The Story…
Industry and academia
We introduce ODC a
have responded with a
The internet era has NoSQL store with a
variety of solutions that
moved us away from unique mechanism
leverage distribution, use
traditional database for efficiently
of a simpler contract and
architecture, now a managing normalised
RAM storage.
quarter of a century data.
old.
We show how we
The result is a highly Finally we introduce the adapt the concept of
scalable, in-memory ‘Connected Replication’ a Snowflake Schema
data store that can pattern as mechanism to aid the application
support both millisecond for making the star of replication and
queries and high schema practical for in partitioning and avoid
bandwidth exports over memory architectures. problems with
a normalised object distributed joins.
model.
3. Database Architecture is Old
Most modern databases still follow a
1970s architecture (for example IBM’s
System R)
4. “Because RDBMSs can be beaten by more than
an order of magnitude on the standard OLTP
benchmark, then there is no market where they
are competitive. As such, they should be
considered as legacy technology more than a
quarter of a century in age, for which a complete
redesign and re-architecting is the appropriate
next step.”
Michael Stonebraker (Creator of Ingres and Postgres)
5. What steps have we taken
to improve the
performance of this
original architecture?
11. These approaches are converging
Shared
Nothing
Shared
Nothing
Regular Teradata, Vertica,
NoSQL…
(memory)
Database
VoltDB, Hstore
Oracle, Sybase,
MySql
ODC
Distributed
Caching
Coherence,
Gemfire,
Gigaspaces
12. So how can we make a data store go
even faster?
Distributed Architecture
Drop ACID: Simplify
the Contract.
Drop disk
13. (1) Distribution for Scalability: The
Shared Nothing Architecture
• Originated in 1990 (Gamma
DB) but popularised by
Teradata / BigTable /
NoSQL
• Massive storage potential
• Massive scalability of
processing
• Commodity hardware
• Limited by cross partition Autonomous processing unit
joins
for a data subset
14. (2) Simplifying the Contract
• For many users ACID is overkill.
• Implementing ACID in a distributed architecture has a
significant affect on performance.
• NoSQL Movement: CouchDB, MongoDB,10gen,
Basho, CouchOne, Cloudant, Cloudera, GoGrid,
InfiniteGraph, Membase, Riptano, Scality….
15. Databases have huge operational
overheads
Taken from “OLTP Through
the Looking Glass, and What
We Found There”
Harizopoulos et al
16. (3) Memory is 100x faster than disk
ms
μs
ns
ps
1MB Disk/Network
1MB Main Memory
0.000,000,000,000
Cross Continental Main Memory
L1 Cache Ref
Round Trip
Ref
Cross Network L2 Cache Ref
Round Trip
* L1 ref is about 2 clock cycles or 0.7ns. This is
the time it takes light to travel 20cm
17. Avoid all that overhead
RAM means:
• No IO
• Single Threaded
⇒ No locking / latching
• Rapid aggregation etc
• Query plans become less
important
18. We were keen to leverage these three
factors in building the ODC
Simplify the Memory
Distribution
contract
Only
19. What is the ODC?
Highly distributed, in memory, normalised data
store designed for scalable data access and
processing.
20. The Concept
Originating from Scott Marcar’s concept of a central
brain within the bank:
“The copying of data lies at the
route of many of the bank’s
problems. By supplying a single
real-time view that all systems can
interface with we remove the need
for reconciliation and promote the
concept of truly shared services” -
Scott Marcar (Head of Risk and
Finance Technology)
21. This is quite tricky problem
High Bandwidth
Access to Lots
of Data
Low Latency
Access to
Scalability to
small
lots of users
amounts of
data
22. ODC Data Grid: Highly Distributed
Physical Architecture
In-memory Lots of parallel
storage
processing
Oracle
Coherence
Messaging (Topic Based) as a system of record
(persistence)
23. The Layers"
Access Layer Java
"
Java
client
client
API
API
Query Layer
Transactions
Data Layer
Mtms
Cashflows
Persistence Layer
28. Traditional Distributed Caching
Approach
Keys Aa-Ap
Keys Fs-Fz
Keys Xa-Yd
Trader
Big Denormliased
Party
Objects are spread
Trade
across a distributed
cache
29. But we believe a data
store needs to be more
than this: it needs to be
normalised!
30. So why is that?
Surely denormalisation
is going to be faster?
33. … as parts of the object graph will be
copied multiple times
Periphery objects that are
Trader
Party
denormalised onto core objects
Trade
will be duplicated multiple times
across the data grid.
Party A
34. …and all the duplication means you
run out of space really quickly
35. Spaces issues are exaggerated
further when data is versioned
Trader
Party
Version 1
Trade
Trader
Party
Version 2
Trade
Trader
Party
Version 3
Trade
Trader
Party
Version 4
Trade
…and you need
versioning to do MVCC
36. And reconstituting a previous time
slice becomes very difficult.
Trade
Trader
Party
Party
Trader
Trade
Party
Trader
Trade
Party
37. Why Normalisation?
Easy to change data (no
distributed locks / transactions)
Better use of memory.
Facilitates Versioning
And MVCC/Bi-temporal
38. OK, OK, lets normalise
our data then. What
does that mean?
56. Looking at the data:
Valuation Legs
Valuations
art Transaction Mapping
Cashflow Mapping Facts:
Party Alias
Transaction
=>Big,
Cashflows common
Legs
Parties
keys
Ledger Book
Source Book
Dimensions
Cost Centre
Product =>Small,
Risk Organisation Unit
Business Unit
crosscutting
HCS Entity Keys
Set of Books
0 37,500,000 75,000,000 112,500,000 150,000,000
57. We remember we are a grid. We
should avoid the distributed join.
58. … so we only want to ‘join’ data that
is in the same process
Coherence’s
KeyAssociation
gives us this
Trades
MTMs
Common
Key
59. So we prescribe different physical
storage for Facts and Dimensions
Replicated
Trader
Party
Trade
Partitioned
(Key association ensures
joins are in process)
60. Facts are held distributed,
Dimensions are replicated
Valuation Legs
Valuations
art Transaction Mapping
Cashflow Mapping Facts:
Party Alias
Transaction
=>Big
Cashflows =>Distribute
Legs
Parties
Ledger Book
Source Book
Dimensions
Cost Centre
Product =>Small
Risk Organisation Unit
Business Unit
=> Replicate
HCS Entity
Set of Books
0 37,500,000 75,000,000 112,500,000 150,000,000
61. - Facts are partitioned across the data layer"
- Dimensions are replicated across the Query Layer
Query Layer
Trader
Party
Trade
Transactions
Data Layer
Mtms
Cashflows
Fact Storage
(Partitioned)
62. Key Point
We use a variant on a
Snowflake Schema to
partition big stuff, that has
the same key and replicate
small stuff that has
crosscutting keys.
63. So how does they help us to run
queries without distributed joins?
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
This query involves:
• Joins between Dimensions: to evaluate where clause
• Joins between Facts: Transaction joins to MTM
• Joins between all facts and dimensions needed to
construct return result
64. Stage 1: Focus on the where clause:"
Where Cost Centre = ‘CC1’
65. Stage 1: Get the right keys to query
the Facts
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
LBs[]=getLedgerBooksFor(CC1)
SBs[]=getSourceBooksFor(LBs[])
So we have all the bottom level
dimensions needed to query facts
Transactions
Mtms
Cashflows
Partitioned
66. Stage 2: Cluster Join to get Facts
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
LBs[]=getLedgerBooksFor(CC1)
SBs[]=getSourceBooksFor(LBs[])
So we have all the bottom level
dimensions needed to query facts
Transactions
Get all Transactions and
Mtms
MTMs (cluster side join) for
the passed Source Books
Cashflows
Partitioned
67. Stage 2: Join the facts together efficiently
as we know they are collocated
68. Stage 3: Augment raw Facts with
relevant Dimensions
Select Transaction, MTM, ReferenceData From
MTM, Transaction, Ref Where Cost Centre = ‘CC1’
Populate raw facts LBs[]=getLedgerBooksFor(CC1)
(Transactions) with SBs[]=getSourceBooksFor(LBs[])
dimension data
So we have all the bottom level
before returning to
dimensions needed to query facts
client.
Transactions
Get all Transactions and
Mtms
MTMs (cluster side join) for
the passed Source Books
Cashflows
Partitioned
71. Coherence Voodoo: Joining
Distributed Facts across the Cluster
Related Trades and MTMs
(Facts) are collocated on the
same machine with Key Affinity.
Trades
MTMs
Aggregator
Direct backing map access
must be used due to threading http://www.benstopford.com/
issues in Coherence
2009/11/20/how-to-perform-efficient-
cross-cache-joins-in-coherence/
72. So we are
normalised
And we can join
without extra
network hops
73. We get to do this…
Trader
Party
Trade
Trade
Trader
Party
74. …and this…
Trader
Party
Version 1
Trade
Trader
Party
Version 2
Trade
Trader
Party
Version 3
Trade
Trader
Party
Version 4
Trade
75. ..and this..
Trade
Trader
Party
Party
Trader
Trade
Party
Trader
Trade
Party
81. I lied earlier. These aren’t all Facts.
Valuation Legs
Valuations
rt Transaction Mapping
Cashflow Mapping
Party Alias
Facts
Transaction
Cashflows
Legs
Parties This is a dimension
Ledger Book
• It has a different
Source Book
Cost Centre key to the Facts.
Dimensions
Product
• And it’s BIG
Risk Organisation Unit
Business Unit
HCS Entity
Set of Books
0 125,000,000
82. We can’t replicate really big stuff… we’ll
run out of space"
=> Big Dimensions are a problem.
84. We noticed that whilst there are lots
of these big dimensions, we didn’t
actually use a lot of them. They are
not all “connected”.
85. If there are no Trades for Barclays in
the data store then a Trade Query will
never need the Barclays Counterparty
86. Looking at the All Dimension Data
some are quite large
Party Alias
Parties
Ledger Book
Source Book
Cost Centre
Product
Risk Organisation Unit
Business Unit
HCS Entity
Set of Books
0 1,250,000 2,500,000 3,750,000 5,000,000
87. But Connected Dimension Data is tiny
by comparison
Party Alias
Parties
Ledger Book
Source Book
Cost Centre
Product
Risk Organisation Unit
Business Unit
HCS Entity
Set of Books
20 1,250,015 2,500,010 3,750,005 5,000,000
88. So we only replicate
‘Connected’ or ‘Used’
dimensions
89. As data is written to the data store we keep our
‘Connected Caches’ up to date
Processing Layer
Dimension Caches
(Replicated)
Transactions
Data Layer
As new Facts are added Mtms
relevant Dimensions that
they reference are moved
Cashflows
to processing layer caches
Fact Storage
(Partitioned)
91. The Replicated Layer is updated by
recursing through the arcs on the
domain model when facts change
92. Saving a trade causes all it’s 1st level
references to be triggered
Query Layer
Save Trade
(With connected
dimension Caches)
Data Layer
Cache
Trade
(All Normalised)
Store
Partitioned
Trigger
Source Cache
Party Ccy
Alias
Book
93. This updates the connected caches
Query Layer
(With connected
dimension Caches)
Data Layer
Trade
(All Normalised)
Party Source Ccy
Alias
Book
94. The process recurses through the object
graph
Query Layer
(With connected
dimension Caches)
Data Layer
Trade
(All Normalised)
Party Source Ccy
Alias
Book
Party
Ledger
Book
95. ‘Connected Replication’
A simple pattern which
recurses through the foreign
keys in the domain model,
ensuring only ‘Connected’
dimensions are replicated
96. Limitations of this approach
• Data set size. Size of connected dimensions limits
scalability.
• Joins are only supported between “Facts” that can
share a partitioning key (But any dimension join can be
supported)
97. Performance is very sensitive to
serialisation costs: Avoid with POF
Deserialise just one field
from the object stream
Binary Value
Integer ID
99. Everything is Java
Java
client
Java schema
API
Java ‘Stored
Procedures’
and ‘Triggers’
100. Messaging as a System of Record"
" ODC provides a realtime
Access Layer
Java Java
client!
API!
view over any part of the
client!
dataset as messaging is the
API!
used as the system of
record.
Processing Layer
Transactions
Messaging provides a more
Data Layer
Mtms scalable system of record
Cashflows than a database would.
Persistence Layer
101. Being event based changes the
programming model.
The system provides both real
time and query based views on
the data.
The two are linked using
versioning
Replication to DR, DB, fact
aggregation
103. Performance
Query with more
than twenty joins
conditions:
2GB per min /
250Mb/s
3ms latency
(per client)
104. Conclusion
Data warehousing, OLTP and Distributed
caching fields are all converging on in-
memory architectures to get away from
disk induced latencies.
106. Conclusion
We present a novel mechanism for
avoiding the distributed join problem by
using a Star Schema to define whether
data should be replicated or partitioned.
Partitioned
Storage
107. Conclusion
We make the pattern applicable to ‘real’
data models by only replicating objects
that are actually used: the Connected
Replication pattern.
108. The End
• Further details online http://www.benstopford.com
(linked from my Qcon bio)
• A big thanks to the team in both India and the UK who
built this thing.
• Questions?
Editor's Notes
This avoids distributed transactions that would be needed when updating dimensions in a denormalised model. It also reduces memory consumption which can become prohibitive in MVCC systems. This then facilitates a storage topology that eradicates cross-machine joins (joins between data sets held on different machines which require that entire key sets are shipped to perform the join operation).
Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)
Big data sets are held distributed and only joined on the grid to collocated objects.Small data sets are held in replicated caches so they can be joined in process (only ‘active’ data is held)