This document describes Databus, a system used at LinkedIn for distributed data replication and change data capture. Some key points:
- Databus provides timeline consistency across distributed data systems by applying a logical clock to data changes and using a pull-based model for replication.
- It addresses the challenges of specialization in distributed data systems through standardization, isolation of consumers from sources, and handling slow consumers without impacting fast ones.
- The architecture includes fetchers that extract changes from databases, a relay for buffering changes, log and snapshot stores, and client libraries that allow applications to consume changes.
- Performance is optimized through partitioning, filtering, and scaling of consumers independently of sources. Databus
Which Change Data Capture Strategy is Right for You?Precisely
Â
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
⢠Advantages and disadvantages of different CDC methods
⢠The replication latency your project requires
⢠How to keep data current in Big Data technologies like Hadoop
Development of concurrent services using In-Memory Data Gridsjlorenzocima
Â
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Which Change Data Capture Strategy is Right for You?Precisely
Â
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
⢠Advantages and disadvantages of different CDC methods
⢠The replication latency your project requires
⢠How to keep data current in Big Data technologies like Hadoop
Development of concurrent services using In-Memory Data Gridsjlorenzocima
Â
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
Â
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".
This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
Â
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
Slides presented at SDBigData Meetup:
http://www.meetup.com/sdbigdata/events/225691323/
There was a request for more Couchbase use case information and NoSQL primer, so I added a number of slides to let me talk to those aspects right before doing the presentation.
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
Â
(Celia Kung, LinkedIn) Kafka Summit SF 2018
For several years, LinkedIn has been using Kafka MirrorMaker as the mirroring solution for copying data between Kafka clusters across data centers. However, as LinkedIn data continued to grow, mirroring trillions of Kafka messages per day across data centers uncovered the scale limitations and operability challenges of Kafka MirrorMaker. To address such issues, we have developed a new mirroring solution, built on top our stream ingestion service, Brooklin. Brooklin MirrorMaker aims to provide improved performance and stability, while facilitating better management through finer control of data pipelines. Through flushless Kafka produce, dynamic management of data pipelines, per-partition error handling and flow control, we are able to increase throughput, better withstand consume and produce failures and reduce overall operating costs. As a result, we have eliminated the major pain points of Kafka MirrorMaker. In this talk, we will dive deeper into the challenges LinkedIn has faced with Kafka MirrorMaker, how we tackled them with Brooklin MirrorMaker and our plans for iterating further on this new mirroring solution.
Active/Active Database Solutions with Log Based Replication in xDB 6.0EDB
Â
EDBâs xDB Replication Server is a highly flexible database replication tool that provides single and multi-master solutions for read/write scalability, availability, performance, and data integration with Oracle, SQL Server and Postgres. Dozens of worldwide customers have been using xDB Replication Server for the past 4 years, and we are extremely excited to introduce a pivotal new release, version 6.0.
This presentation reviews the features in xDB 6.0 including:
* Faster and more efficient replication with log-based Multi Master replication for Postgres Plus and PostgreSQL
* Easier to configure publication tables in bulk with pattern matching selection rules
* Ensure High Availability with integration of the 'Control Schema'
* Improved performance in conflict detection rules
Data Lake and the rise of the microservicesBigstep
Â
By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot.
For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure.
In this meetup weâll give answers on the next questions:
1. Why would someone use a Data Lake?
2. Is it hard to build a Data Lake?
3. What are the main features that a Data Lake should bring in?
4. Whatâs the role of the microservices in the big data world?
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
Â
Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
Spark and Couchbaseâ Augmenting the Operational Database with SparkMatt Ingenthron
Â
How do NoSQL Document-Oriented Databases like Couchbase fit in with Apache Spark? This set of slides gives a couple of use cases, shows why Couchbase works great with Spark, and sets up a scenario for a demo.
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
Â
The RAPIDS Accelerator for Apache Spark is a plugin that enables the power of GPUs to be leveraged in Spark DataFrame and SQL queries, improving the performance of ETL pipelines. User-defined functions (UDFs) in the query appear as opaque transforms and can prevent the RAPIDS Accelerator from processing some query operations on the GPU.
This presentation discusses how users can leverage the RAPIDS Accelerator UDF Compiler to automatically translate some simple UDFs to equivalent Catalyst operations that are processed on the GPU. The presentation also covers how users can provide a GPU version of Scala, Java, or Hive UDFs for maximum control and performance. Sample UDFs for each case will be shown along with how the query plans are impacted when the UDFs are processed on the GPU.
Overcoming the Top Four Challenges to RealâTime Performance in LargeâScale, D...SL Corporation
Â
The most critical large-scale applications today, regardless of industry, involve a demand for real-time data transfer and visualization of potentially large volumes of data. With this demand comes numerous challenges and limiting factors, especially if these applications are deployed in virtual or cloud environments. Attend this session and learn how to overcome the top four challenges to real-time application performance: database performance, network data transfer bandwidth limitations, processor performance and lack of real-time predictability. Solutions discussed will include design of the proper data model for the application data, along with design patterns that facilitate optimal and minimal data transfer across networks.
SQL Server 2008 Fast Track Data Warehouse 2.0
This was a presentation to the Silicon Valley SQL Server User Group in February 2010.
Speaker: Phil Hummel of WinWire Technologies
Presentation developed by Bruce Campbell
Western Region Data Warehouse Specialist, Microsoft
For more information about the SQL Server User Group, contact Mark Ginnebaugh, President of DesignMind, at mark@designmind.com
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
Â
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".
This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
Â
This is one of two presentations given by LinkedIn engineers at Java One 2009.
This presentation was given by David Raccah and Dhananjay Ragade from LinkedIn.
For more information, check out http://blog.linkedin.com/
Slides presented at SDBigData Meetup:
http://www.meetup.com/sdbigdata/events/225691323/
There was a request for more Couchbase use case information and NoSQL primer, so I added a number of slides to let me talk to those aspects right before doing the presentation.
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
Â
(Celia Kung, LinkedIn) Kafka Summit SF 2018
For several years, LinkedIn has been using Kafka MirrorMaker as the mirroring solution for copying data between Kafka clusters across data centers. However, as LinkedIn data continued to grow, mirroring trillions of Kafka messages per day across data centers uncovered the scale limitations and operability challenges of Kafka MirrorMaker. To address such issues, we have developed a new mirroring solution, built on top our stream ingestion service, Brooklin. Brooklin MirrorMaker aims to provide improved performance and stability, while facilitating better management through finer control of data pipelines. Through flushless Kafka produce, dynamic management of data pipelines, per-partition error handling and flow control, we are able to increase throughput, better withstand consume and produce failures and reduce overall operating costs. As a result, we have eliminated the major pain points of Kafka MirrorMaker. In this talk, we will dive deeper into the challenges LinkedIn has faced with Kafka MirrorMaker, how we tackled them with Brooklin MirrorMaker and our plans for iterating further on this new mirroring solution.
Active/Active Database Solutions with Log Based Replication in xDB 6.0EDB
Â
EDBâs xDB Replication Server is a highly flexible database replication tool that provides single and multi-master solutions for read/write scalability, availability, performance, and data integration with Oracle, SQL Server and Postgres. Dozens of worldwide customers have been using xDB Replication Server for the past 4 years, and we are extremely excited to introduce a pivotal new release, version 6.0.
This presentation reviews the features in xDB 6.0 including:
* Faster and more efficient replication with log-based Multi Master replication for Postgres Plus and PostgreSQL
* Easier to configure publication tables in bulk with pattern matching selection rules
* Ensure High Availability with integration of the 'Control Schema'
* Improved performance in conflict detection rules
Data Lake and the rise of the microservicesBigstep
Â
By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot.
For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure.
In this meetup weâll give answers on the next questions:
1. Why would someone use a Data Lake?
2. Is it hard to build a Data Lake?
3. What are the main features that a Data Lake should bring in?
4. Whatâs the role of the microservices in the big data world?
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
Â
Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
Spark and Couchbaseâ Augmenting the Operational Database with SparkMatt Ingenthron
Â
How do NoSQL Document-Oriented Databases like Couchbase fit in with Apache Spark? This set of slides gives a couple of use cases, shows why Couchbase works great with Spark, and sets up a scenario for a demo.
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
Â
The RAPIDS Accelerator for Apache Spark is a plugin that enables the power of GPUs to be leveraged in Spark DataFrame and SQL queries, improving the performance of ETL pipelines. User-defined functions (UDFs) in the query appear as opaque transforms and can prevent the RAPIDS Accelerator from processing some query operations on the GPU.
This presentation discusses how users can leverage the RAPIDS Accelerator UDF Compiler to automatically translate some simple UDFs to equivalent Catalyst operations that are processed on the GPU. The presentation also covers how users can provide a GPU version of Scala, Java, or Hive UDFs for maximum control and performance. Sample UDFs for each case will be shown along with how the query plans are impacted when the UDFs are processed on the GPU.
Overcoming the Top Four Challenges to RealâTime Performance in LargeâScale, D...SL Corporation
Â
The most critical large-scale applications today, regardless of industry, involve a demand for real-time data transfer and visualization of potentially large volumes of data. With this demand comes numerous challenges and limiting factors, especially if these applications are deployed in virtual or cloud environments. Attend this session and learn how to overcome the top four challenges to real-time application performance: database performance, network data transfer bandwidth limitations, processor performance and lack of real-time predictability. Solutions discussed will include design of the proper data model for the application data, along with design patterns that facilitate optimal and minimal data transfer across networks.
SQL Server 2008 Fast Track Data Warehouse 2.0
This was a presentation to the Silicon Valley SQL Server User Group in February 2010.
Speaker: Phil Hummel of WinWire Technologies
Presentation developed by Bruce Campbell
Western Region Data Warehouse Specialist, Microsoft
For more information about the SQL Server User Group, contact Mark Ginnebaugh, President of DesignMind, at mark@designmind.com
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...SL Corporation
Â
The most critical large-scale applications today, regardless of industry, involve a demand for real-time data transfer and visualization of potentially large volumes of data. With this demand comes numerous challenges and limiting factors, especially if these applications are deployed in virtual or cloud environments. In this session, SLâs CEO, Tom Lubinski, explains how to overcome the top four challenges to real-time application performance: database performance, network data transfer bandwidth limitations, processor performance and lack of real-time predictability. Solutions discussed will include design of the proper data model for the application data, along with design patterns that facilitate optimal and minimal data transfer across networks.
How to Build a SaaS App With Twitter-like Throughput on Just 9 ServersNew Relic
Â
Velocity Conference 2011 presentation by New Relic CEO Lew Cirne. - New Relicâs multitenant, SaaS web application monitoring service collects and persists over 90,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. In this presentation Lew Cirne discusses how good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. He shows you how we scale to support customer growth, how we monitor our system, and what traps to look out for.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Â
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Using Distributed In-Memory Computing for Fast Data AnalysisScaleOut Software
Â
This is an overview of how distributed data grids can enable sharing across web servers and virtual cloud environments to enable scalability and high availability. It also covers how distributed data grids are highly useful for running MapReduce analysis across large data sets.
Covers the problems of achieving scalability in server farm environments and how distributed data grids provide in-memory storage and boost performance. Includes summary of ScaleOut Software product offerings including ScaleOut State Server and Grid Computing Edition.
This talk was given by Jun Rao (Staff Software Engineer at LinkedIn) and Sam Shah (Senior Engineering Manager at LinkedIn) at the Analytics@Webscale Technical Conference (June 2013).
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
Â
This talk was given by Hien Luu (Senior Software Engineer at LinkedIn) and Siddharth Anand (Senior Staff Software Engineer at LinkedIn) at the Hadoop Summit (June 2013).
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
Â
This talk was given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn) at the ACM SIGMOD/PODS Conference (June 2013). For the paper written by the LinkedIn Espresso Team, go here:
http://www.slideshare.net/amywtang/espresso-20952131
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
Â
This paper, written by the LinkedIn Espresso Team, appeared at the ACM SIGMOD/PODS Conference (June 2013). To see the talk given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn), go here:
http://www.slideshare.net/amywtang/li-espresso-sigmodtalk
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInAmy W. Tang
Â
This talk was given by Bhaskar Ghosh (Senior Director of Engineering, LinkedIn Data Infrastructure), at the Yale Oct 2012 Symposium on Big Data, in honor of Martin Schultz.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
Â
As AI technology is pushing into IT I was wondering myself, as an âinfrastructure container kubernetes guyâ, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefitâs both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
Â
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more âmechanicalâ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Â
Clients donât know what they donât know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsâ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Â
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as âpredictable inferenceâ.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Â
Are you looking to streamline your workflows and boost your projectsâ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, youâre in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part âEssentials of Automationâ series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Hereâs what youâll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
Weâll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Donât miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Â
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatâs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Â
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But thereâs more:
In a second workflow supporting the same use case, youâll see:
Your campaign sent to target colleagues for approval
If the âApproveâ button is clicked, a Jira/Zendesk ticket is created for the marketing design team
Butâif the âRejectâ button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Â
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
⢠The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
⢠Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
⢠Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
⢠Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
3. LinkedIn by Numbers
ď§ Worldâs largest professional network
ď§ 187M+ members world-wide as of Q3 2012
ď§ Growing at the rate of two per second
ď§ 85 of Fortune 100 companies use Talent Solutions
to hire
ď§ > 2.6M company pages
ď§ > 4B search queries
ď§ 75K+ developers leveraging out APIs
ď§ 1.3M unique publishers
` Databus 3
4. The Consequence of Specialization in
Data Systems
Data Flow is essential
Data Consistency is critical !!!
`
5. Solution: Databus
Standardi
Standardi Standardi
Standardi Standardi
Standardi Standardi
Standardi
Standardi Search Graph Read
Updates
zation
zation zation
zation zation
zation zation
zation
zation Index Index Replicas
Primary
DB Data Change Events
Databus
` 5
6. Two Ways
Application code dual Extract changes from
writes to database and database commit log
pub-sub system
Easy on the surface Tough but possible
Consistent? Consistent!!!
`
7. Key Design Decisions : Semantics
⢠Logical clocks attached to the source
â Physical offsets could be used for internal
transport
â Simplifies data portability
⢠Pull model
â Restarts are simple
â Derived State = f (Source state, Clock)
â + Idempotence = Timeline Consistent!
` 7
8. Key Design Decisions : Systems
⢠Isolate fast consumers from slow consumers
â Workload separation between online, catch-up,
bootstrap
⢠Isolate sources from consumers
â Schema changes
â Physical layout changes
â Speed mismatch
⢠Schema-aware
â Filtering, Projections
â Typically network-bound ď can burn more CPU
` 8
9. Requirements
⢠Timeline consistency
⢠Guaranteed, at least once delivery
⢠Low latency
⢠Schema evolution
⢠Source independence
⢠Scalable consumers
⢠Handle for slow/new consumers without
affecting happy ones (look-back requirements)
` 9
11. 0
Initial Design (2007) Happy
Consumer
Source clock
timer
âŚ
SCN
Direct Pull Relay Happy
DB In Memory Consumer
70000 Buffer
Proxied
3 hrs
Pull
100000 Relay
102400 Slow
DB
Consumer
Pros:
Cons:
1. Consumer Scaling
Slow consumers overwhelming the DB
2. Some isolation
` Databus 11
12. Software Architecture
Four Logical Components
⢠Fetcher
â Fetch from db, relayâŚ
⢠Log Store
â Store log snippet
⢠Snapshot Store
â Store moving data
snapshot
⢠Subscription Client
â Orchestrate pull
across these
`
13. 0
Source clock
timer
SCN
The Databus System Happy
Snapshot Consumer
âŚ
infinite
30000 Log
Relay Happy
10 days In Memory Consumer
70000 Relay
Buffer
80000
90000 3 hrs
100000
102400 Slow
DB
Consumer
Server
Log Storage Snapshot Store
Bootstrap Service
` 13
14. The Relay
⢠Change event buffering (~ 2 â 7 days)
⢠Low latency (10-15 ms)
⢠Filtering, Projection
⢠Hundreds of consumers per relay
⢠Scale-out, High-availability through
redundancy
`
16. The Bootstrap Service
⢠Catch-all for slow / new consumers
⢠Isolate source OLTP instance from large scans
⢠Log Store + Snapshot Store
⢠Optimizations
â Periodic merge
â Predicate push-down
â Catch-up versus full bootstrap
⢠Guaranteed progress for consumers via chunking
⢠Implementations
â Database (MySQL)
â Raw Files
⢠Bridges the continuum between stream and batch systems
`
17. The Consumer Client Library
⢠Glue between Databus infra and business logic
in the consumer
⢠Isolates the consumer from changes in the
databus layer.
⢠Switches between relay and bootstrap as
needed
⢠API
â Callback with transactions
â Iterators over windows
`
18. Fetcher Implementations
⢠Oracle
â Trigger-based
⢠MySQL
â Custom-storage-engine based
⢠In Labs
â Alternative implementations for Oracle
â OpenReplicator integration for MySQL
`
19. Meta-data Management
⢠Event definition, serialization and transport
â Avro
⢠Oracle, MySQL
â Avro definition generated from the table schema
⢠Schema evolution
â Only backwards-compatible changes allowed
⢠Isolation between upgrades on producer and
consumer
`
20. Scaling the consumers
(Partitioning)
⢠Server-side filtering
â Range, mod, hash
â Allows client to control partitioning function
⢠Consumer groups
â Distribute partitions evenly across a group
â Move partitions to available consumers on failure
â Minimize re-processing
`