This document discusses data intensive applications and some of the challenges, tools, and best practices related to them. The key challenges with data intensive applications include large quantities of data, complex data structures, and rapidly changing data. Common tools mentioned include NoSQL databases, message queues, caches, search indexes, and batch/stream processing frameworks. The document also discusses concepts like distributed systems architectures, outage case studies, and strategies for improving reliability, scalability, and maintainability in data systems. Engineers working in this field need an accurate understanding of various tools and how to apply the right tools for different use cases while avoiding common pitfalls.
2016 Mastering SAP Tech - 2 Speed IT and lessons from an Agile Waterfall eCom...Eneko Jon Bilbao
A recent clash of worlds occurred when a local client asked to deliver their Hybris eCommerce portal on top of their global template SAP system. The backend SAP team jogged along in the traditional waterfall pace whilst the frontend Hybris team sought to sprint along in agile fashion. This is the story of how we managed the different worlds, the skills required and the lessons learned from both teams.
For more information on NTA, visit: http://www.solarwinds.com/products/network-traffic-analyzer/info.aspx
Watch this webcast: http://www.solarwinds.com/resources/videos/video-tutorial-netflow-training-part-i.html
This video tutorial covers NetFlow best practices for planning and deployment and is Part 1 of the NetFlow training series.
Executive Briefing: What Is Fast Data And Why Is It ImportantLightbend
[About This Webinar]
Streaming data systems, so called Fast Data, promise accelerated access to information, leading to new innovations and competitive advantages. These systems, however, aren’t just faster versions of Big Data; they force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices.
This means new challenges for your organization. Whereas a batch job might run for hours, a stream processing application might run for weeks or months. This raises the bar for making these systems resilient against traffic spikes, hardware and network failures, and so forth. The good news is that there is a strong history of facing these demands in the world of microservices.
In this webinar by Dr. Dean Wampler, VP of Fast Data Architecture at Lightbend, Inc., we will cut through the buzz around Fast Data and explore how to successfully exploit this new opportunity for innovation in how your organization leverages data. Specifically, Dean will review:
* The business justification for transitioning from batch-oriented big data to stream-oriented fast data
* The architectural and organizational changes that streaming systems require to meet their higher demands for reliability, resiliency, dynamic scalability, etc.
* How some of these requirements can be met by leveraging what your organization already knows about microservice architectures
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
In this exciting and informative talk, presented at PgConf Sillicon Valley 2015, Konstantin cut through the theory to deliver a clear set of practical solutions for scaling applications atop PostgreSQL, eventually supporting millions of active users, tens of thousands concurrently, and with the application stack that responds to requests with a 100ms average. He will share how his team solved one of the biggest challenges they faced: effectively storing and retrieving over 3B rows of "saves" (a Wanelo equivalent of Instagram's "like" or Pinterest's "pin"), all in PostgreSQL, with highly concurrent random access.
Over the last three years, the team at Wanelo optimized the hell out of their application and database stacks. Using PostgreSQL version 9 as their primary data store, Joyent Public Cloud as a hosting environment, the team re-architected their backend for rapid expansion several times over, as the unrelenting traffic kept climbing up. This ultimately resulted in a highly efficient, horizontally scalable, fault tolerant application infrastructure. Unimpressed? Now try getting there without the OPS or DBA teams, all while deploying seven times per day to production, with an application measuring 99.999% uptime over the last 6 months.
2016 Mastering SAP Tech - 2 Speed IT and lessons from an Agile Waterfall eCom...Eneko Jon Bilbao
A recent clash of worlds occurred when a local client asked to deliver their Hybris eCommerce portal on top of their global template SAP system. The backend SAP team jogged along in the traditional waterfall pace whilst the frontend Hybris team sought to sprint along in agile fashion. This is the story of how we managed the different worlds, the skills required and the lessons learned from both teams.
For more information on NTA, visit: http://www.solarwinds.com/products/network-traffic-analyzer/info.aspx
Watch this webcast: http://www.solarwinds.com/resources/videos/video-tutorial-netflow-training-part-i.html
This video tutorial covers NetFlow best practices for planning and deployment and is Part 1 of the NetFlow training series.
Executive Briefing: What Is Fast Data And Why Is It ImportantLightbend
[About This Webinar]
Streaming data systems, so called Fast Data, promise accelerated access to information, leading to new innovations and competitive advantages. These systems, however, aren’t just faster versions of Big Data; they force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices.
This means new challenges for your organization. Whereas a batch job might run for hours, a stream processing application might run for weeks or months. This raises the bar for making these systems resilient against traffic spikes, hardware and network failures, and so forth. The good news is that there is a strong history of facing these demands in the world of microservices.
In this webinar by Dr. Dean Wampler, VP of Fast Data Architecture at Lightbend, Inc., we will cut through the buzz around Fast Data and explore how to successfully exploit this new opportunity for innovation in how your organization leverages data. Specifically, Dean will review:
* The business justification for transitioning from batch-oriented big data to stream-oriented fast data
* The architectural and organizational changes that streaming systems require to meet their higher demands for reliability, resiliency, dynamic scalability, etc.
* How some of these requirements can be met by leveraging what your organization already knows about microservice architectures
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
In this exciting and informative talk, presented at PgConf Sillicon Valley 2015, Konstantin cut through the theory to deliver a clear set of practical solutions for scaling applications atop PostgreSQL, eventually supporting millions of active users, tens of thousands concurrently, and with the application stack that responds to requests with a 100ms average. He will share how his team solved one of the biggest challenges they faced: effectively storing and retrieving over 3B rows of "saves" (a Wanelo equivalent of Instagram's "like" or Pinterest's "pin"), all in PostgreSQL, with highly concurrent random access.
Over the last three years, the team at Wanelo optimized the hell out of their application and database stacks. Using PostgreSQL version 9 as their primary data store, Joyent Public Cloud as a hosting environment, the team re-architected their backend for rapid expansion several times over, as the unrelenting traffic kept climbing up. This ultimately resulted in a highly efficient, horizontally scalable, fault tolerant application infrastructure. Unimpressed? Now try getting there without the OPS or DBA teams, all while deploying seven times per day to production, with an application measuring 99.999% uptime over the last 6 months.
This is a high level presentation on how to develop a monitoring improvement program. The topic of what to monitor is covered in a separate presentation.
A concise report on the state of your environment, with prescriptive recommendations for how you can improve performance, efficiency and security in up to 14 vital IT domains.
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...RightScale
Speaker: Phil Cox, Director Security and Compliance, RightScale
Over the past few years, PCI compliance in the public cloud has been a growing topic of concern and interest. Like us, you probably have heard assertions from both sides of the topic – some stating that one can be a PCI compliant merchant using public IaaS cloud, others stating that it is impossible. We’ll discuss foundational principles and mindsets for PCI compliance, how to determine system/application scope and requirement applicability, and how to meet top-level PCI DSS (Data Security Standard) requirements in the public IaaS cloud.
Downtime is Not an Option: Integrating IBM Z into ServiceNow and SplunkPrecisely
Support critical enterprise initiatives without burdening your mainframe staff.
In today's always-on digital world, downtime isn’t an option. Applications span multiple platforms and networks, requiring an enterprise-wide view of security, critical incidents and outages that can bring business to a halt.
Organizations are investing in Splunk and ServiceNow for real-time enterprise-wide visibility for faster identification, mitigation and resolution of issues that can impact the business. However, without the mainframe, these solutions have a glaring blind spot.
Learn how leading IT organizations support critical security and operational enterprise initiatives by integrating the mainframe with these platforms, without disrupting the mainframe, or the teams that support it.
We’ll cover:
- Top use cases and benefits for including mainframe data in Splunk and ServiceNow
- What happens to your mainframe data in each of these platforms
- Challenges of integration… and how to solve them
VMware remains the go-to option for virtualization for the majority of organizations and has been for some time. The longer it has been around, the more focus there is on making efficiency savings. This is where the Capacity Manager really needs to understand the technology, how to monitor it, and how to decide what headroom exists.
View this webinar with CMG on-demand where we looked at some of the hot topics in understanding VMware Capacity:
• Why OS Monitoring can be misleading
• 5 Key Metrics
• Measuring Processor Capacity
• Measuring Memory Capacity
• Calculating Headroom in VMs
Distributed systems involve complex interactions among many components. This increases the possibilities of failures that could turn a whole system down. Software architects, designers, and developers need to architect, design, and program functional requirements thinking about possibility of failures and the need for a system to keep running despite failures. This presentation tackles but part of the problem, focusing on redundancy, different types of groups, replication, and eventual consistency, finishing with the presentation of CAP theorem.
Presentation delivered at IV Cloud Computing and Big Data Ent at Universdad Nacional de La Plata http://www.jcc.info.unlp.edu.ar/jcc2016/wordpress/index.php/cronograma/
The ExtraHop wire data analytics platform enables IT teams to answer questions they hadn't known to ask before, such as "Which SSL servers are receiving heartbeats?" and "Where are heartbeat messages coming from?"
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
By now you have likely heard the news that IBM has made a strategic investment in Lightbend to bring Reactive solutions to IBM Platforms. So, what does this mean for developers?
During this 30-minute conversation with Karl Wehden, Director of Product Management at Lightbend, and Sebastian Hassinger, from the Developer Partners and Ecosystems team at IBM, will explore the following questions:
1. Why did IBM choose to partner with Lightbend, and vice a versa - what intrigued Lightbend about partnering with IBM?
2. Why is Scala important to this vision of the “Cognitive Era”?
3. What types of companies are creating these types of cognitive applications, and what do you see this partnership doing to help them accelerate their efforts?
4. What tools and technologies will we see begin to collaborate first?
5. In which other IBM products and services will we see Lightbend technologies appear as a joint solution?
6. What is the impact on JVM developers, the tools they use and how they get started with these technologies?
Most organisations think that they have poor data quality, but don’t know how to measure it or what to do about it. Teams of data scientists, analysts, and ETL developers are either blindly taking a “garbage in -> garbage out” approach, or worse still, “cleansing” data to fit their limited perspectives. DataOps is a systematic approach to measuring data and for planning mitigations for bad data.
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way. It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability. In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
This is a high level presentation on how to develop a monitoring improvement program. The topic of what to monitor is covered in a separate presentation.
A concise report on the state of your environment, with prescriptive recommendations for how you can improve performance, efficiency and security in up to 14 vital IT domains.
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...RightScale
Speaker: Phil Cox, Director Security and Compliance, RightScale
Over the past few years, PCI compliance in the public cloud has been a growing topic of concern and interest. Like us, you probably have heard assertions from both sides of the topic – some stating that one can be a PCI compliant merchant using public IaaS cloud, others stating that it is impossible. We’ll discuss foundational principles and mindsets for PCI compliance, how to determine system/application scope and requirement applicability, and how to meet top-level PCI DSS (Data Security Standard) requirements in the public IaaS cloud.
Downtime is Not an Option: Integrating IBM Z into ServiceNow and SplunkPrecisely
Support critical enterprise initiatives without burdening your mainframe staff.
In today's always-on digital world, downtime isn’t an option. Applications span multiple platforms and networks, requiring an enterprise-wide view of security, critical incidents and outages that can bring business to a halt.
Organizations are investing in Splunk and ServiceNow for real-time enterprise-wide visibility for faster identification, mitigation and resolution of issues that can impact the business. However, without the mainframe, these solutions have a glaring blind spot.
Learn how leading IT organizations support critical security and operational enterprise initiatives by integrating the mainframe with these platforms, without disrupting the mainframe, or the teams that support it.
We’ll cover:
- Top use cases and benefits for including mainframe data in Splunk and ServiceNow
- What happens to your mainframe data in each of these platforms
- Challenges of integration… and how to solve them
VMware remains the go-to option for virtualization for the majority of organizations and has been for some time. The longer it has been around, the more focus there is on making efficiency savings. This is where the Capacity Manager really needs to understand the technology, how to monitor it, and how to decide what headroom exists.
View this webinar with CMG on-demand where we looked at some of the hot topics in understanding VMware Capacity:
• Why OS Monitoring can be misleading
• 5 Key Metrics
• Measuring Processor Capacity
• Measuring Memory Capacity
• Calculating Headroom in VMs
Distributed systems involve complex interactions among many components. This increases the possibilities of failures that could turn a whole system down. Software architects, designers, and developers need to architect, design, and program functional requirements thinking about possibility of failures and the need for a system to keep running despite failures. This presentation tackles but part of the problem, focusing on redundancy, different types of groups, replication, and eventual consistency, finishing with the presentation of CAP theorem.
Presentation delivered at IV Cloud Computing and Big Data Ent at Universdad Nacional de La Plata http://www.jcc.info.unlp.edu.ar/jcc2016/wordpress/index.php/cronograma/
The ExtraHop wire data analytics platform enables IT teams to answer questions they hadn't known to ask before, such as "Which SSL servers are receiving heartbeats?" and "Where are heartbeat messages coming from?"
IBM and Lightbend Build Integrated Platform for Cognitive DevelopmentLightbend
By now you have likely heard the news that IBM has made a strategic investment in Lightbend to bring Reactive solutions to IBM Platforms. So, what does this mean for developers?
During this 30-minute conversation with Karl Wehden, Director of Product Management at Lightbend, and Sebastian Hassinger, from the Developer Partners and Ecosystems team at IBM, will explore the following questions:
1. Why did IBM choose to partner with Lightbend, and vice a versa - what intrigued Lightbend about partnering with IBM?
2. Why is Scala important to this vision of the “Cognitive Era”?
3. What types of companies are creating these types of cognitive applications, and what do you see this partnership doing to help them accelerate their efforts?
4. What tools and technologies will we see begin to collaborate first?
5. In which other IBM products and services will we see Lightbend technologies appear as a joint solution?
6. What is the impact on JVM developers, the tools they use and how they get started with these technologies?
Most organisations think that they have poor data quality, but don’t know how to measure it or what to do about it. Teams of data scientists, analysts, and ETL developers are either blindly taking a “garbage in -> garbage out” approach, or worse still, “cleansing” data to fit their limited perspectives. DataOps is a systematic approach to measuring data and for planning mitigations for bad data.
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way. It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability. In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
While cloud computing offers virtually unlimited capacity, harnessing that capacity in an efficient, cost effective fashion can be cumbersome and difficult at the workload level. At the organizational level, it can quickly become chaos.
You must make choices around cloud deployment, and these choices could have a long-lasting impact on your organization. It is important to understand your options and avoid incomplete, complicated, locked-in scenarios. Data management and placement challenges make having the ability to automate workflows and processes across multiple clouds a requirement.
In this webinar, you will:
• Learn how to leverage cloud services as part of an overall computation approach
• Understand data management in a cloud-based world
• Hear what options you have to orchestrate HPC in the cloud
• Learn how cloud orchestration works to automate and align computing with specific goals and objectives
• See an example of an orchestrated HPC workload using on-premises data
From computational research to financial back testing, and research simulations to IoT processing frameworks, decisions made now will not only impact future manageability, but also your sanity.
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
Cloud Design Patterns - Hong Kong CodeaholicsTaswar Bhatti
Talk on Cloud Design Patterns at Hong Kong Codeaholics Meetup Group. Talk includes External Config Pattern, Cache Aside, Federated Identity Pattern, Valet Key Pattern, Gatekeeper Pattern, Circuit Breaker Pattern, Retry Pattern and the Strangler Pattern. These patterns depicts common problems in designing cloud-hosted applications and design patterns that offer guidance.
Visualizing Your Network Health - Know your NetworkDellNMS
An old adage states that you cannot manage what you don’t know. Do you know what devices are on your network, where they are located, how they are configured, what they are connected to, and how they are affected by changes and failures?
Today’s network infrastructure is becoming more and more complex, while demands on the Network Administrator to ensure network availability and performance are higher than ever. Business critical systems depend upon you managing your entire network infrastructure and delivering high-quality service 24/7, 365 days a year. So how do you keep the pace?
Learn how real-time visibility into your entire network infrastructure provides the power to manage your assets with greater control.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Cloud Design Pattern at Carlerton University
External Config Pattern, Cache Aside, Federated Identity Pattern, Valet Key Pattern, Gatekeeper Pattern, Circuit Breaker Pattern, Retry Pattern and the Strangler Pattern. These patterns depicts common problems in designing cloud-hosted applications and design patterns that offer guidance.
Development of concurrent services using In-Memory Data Gridsjlorenzocima
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
Tech for the Non Technical - Anatomy of an Application StackIntelligent_ly
Building technology is a practiced skill and indeed an art, but it's not magic. You hire craftsmen and you trust them with the details. But in order to command their respect and sleep well at night, you deserve to have a big-picture understanding of what they're building and why. A little knowledge will go a long way towards confidently leading your technical product team.
Building high performance and scalable share point applicationsTalbott Crowell
SharePoint custom application development can sometimes be challenging. This presentation at SPS New Hampshire on October 18th, 2014 covers some techniques and strategies on improving performance and scalability of your applications.
Operating System - Types Of Operating System Unit-1abhinav baba
In This Slide There is Operating System And it's types ( Types of operating system)
Batch Operating System
Network Operating System
Time Sharing Operating System
Real Time Operating System
Distributed Operating System
Grails has great performance characteristics but as with all full stack frameworks, attention must be paid to optimize performance. In this talk Lari will discuss common missteps that can easily be avoided and share tips and tricks which help profile and tune Grails applications.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
5. Engineers Job
• Accurate understanding of tools
• Dig deeper into the buzzwords and mine out the
trade-offs
• Understand the principles, algorithms and check
• Where each tool fits in
• How to make good use of each tool
• How to avoid pitfalls
6. Big outages
• Facebook - https://www.facebook.com/notes/facebook-engineering/
more-details-on-todays-outage/431441338919/
• Amazon - http://money.cnn.com/2011/04/22/technology/
amazon_ec2_cloud_outage/index.htm
• Google - https://www.cnet.com/news/google-outage-reportedly-
caused-big-drop-in-global-traffic/
• Sweden dropped off the internet - http://www.networkworld.com/article/
2232047/data-center/missing-dot-drops-sweden-off-the-internet.html
• EBS impact - https://aws.amazon.com/message/65648/
7. Flipkart Big Billion 2015, 2014
Crashes,
No search results,
“Please try after sometime”
What went wrong?
9. AWS Problems
• Whole zone failure problem
• Virtual h/w life is lesser than real h/w, 200days on avg
• Better to be in more than one zone, and redundant across zones
• Multi zone failures too happen, so go for multi-region also
• To maintain high uptime, EBS is not the best option
• I/O rates on EBS are poor
• EBS fails at the region level, not on a per-volume basis
• Failure of an EBS volumes can lock the entire Linux machine, leaving it inaccessible and affecting even
operations that don't have direct disk activity
• Other AWS services that use EBS may fail when EBS fails
• Services like ELB, RDS, Elastic Beanstalk use EBS
• EC2 and S3 don't use EBS
Ref: http://www.talisman.org/~erlkonig/misc/aws-the-good-the-bad+the-ugly/
10. 1. Does this arch ensure that the data remains
correct and complete, even when things go
wrong internally
2. Does this provide consistently good
performance even when part of the system
are degraded
3. Does it scale to handle increase in load
4. What does an API for this kind of service
look like
A typical system architecture
12. Reliability
• The system should work correctly in the face of adversity
• Correctly - Performing the correct function at the desired
level of performance
• Tolerate user mistakes, prevent unauthorised access …
• Adversity - Hardware faults, software faults, and even human
error
• Anticipate faults and design for it
• Even AWS has problems and needs its own way of
planning
13. Software Errors
• Errors
• A runaway process that uses a shared resource like cpu, memory, disk, or network bandwidth
• A service which has slowed down, become unresponsive
• Cascading failures of components
• Fixes
• Careful analysis of assumptions and interactions in the system
• Thorough testing
• Process isolation
• Allowing processes to crash and restart. Chaos Monkey by Netflix.
• Measuring, monitoring and analysing system behaviour in production
• Constantly checking the guarantees a system provide, and raising an alert in case of
discrepancies.
14. Human Errors
• Well defined abstractions, APIs, and admin interfaces
• These make it easy to do the “right thing” and discourage the “wrong thing”
• Setup fully featured non-production sandbox environment
• Here people can explore and experiment using real data w/o affecting real users
• Unit, integration, automated and manual testing.
• Automated is particularly good for covering corner cases
• Allow quick and easy recovery from human errors
• Make it fast to rollback of config changes, gradually roll out new code, tools to recompute the
data
• Setup metrics, monitoring and error rates
• These let us get early warning signals, check if any assumption is being violated, and
diagnose an issue in case of errrors/faults/failures.
15. Scalability
• As the systems grows, there should be reasonable
ways of dealing with the growth
• Grows - growth in data volume, traffic volume or
complexity
16. Describing Load
• Load parameters like
• Request/sec to a web server
• Ratio of reads to writes to a database
• No of simultaneously active users in a chat room
• Hit rate on a cache.
• Twitter - 2 main operations
• Post tweet - 4.6K requests/sec on avg, 12k requests/sec at peak (2012)
• Home timeline - 300K requests/sec
• Hybrid approach if Implementation
• Users with less following - Fanout tweet immediately to home time line caches of all followers of the
user
• Celebrities (30M followers) - Fetch celebrity tweet separately and merge into the timeline of the
celebrity follower only when follower loads his home timeline
17. Describing Performance
• Performance parameters like
• Throughput - In batch processing systems like Hadoop
• Response time - In online systems
• Response time not always remain same for reasons like
• Context switch to a background process
• Loss of a network packet and TCP retransmission
• Garbage collection pause
• Page fault forcing a read from disk
• Mechanical vibrations in the server rack
18. Measuring Performance
• Median and percentiles (95p, 99p, 99.9p) of
performance metrics
• Plotting them on a histogram
• Averaging out histograms for all servers
19. Maintainability
• Over the time many people will work on the system, and they should be able to
work productively
• Fix bugs, investigate failures
• Keep system operational
• Implement new use cases
• Repay technical debt
• 3 design principles for a maintainable system
• Operability
• Simplicity
• Evolvability
20. Operability
• Operational tasks
• Health monitoring and restoring a
service from bad state
• Tracking down cause of failures or
degraded performances
• Updates, security patches
• Capacity planning
• Setting up tools for Deployment and
configuration management
• Moving applications from one
platform to another
• Preserving knowledge as people
come and go
• How data systems can support
effectiveness of operational tasks
• Good monitoring - visibility into
runtime behaviour and system
internals
• Support automation
• Avoiding dependency on individual
machines
• Provide good documentation
• Provide good default behaviour, and
option to override defaults
• Self healing where appropriate with
option to manually control system
state
21. Operations friendly services
best practices
• Expect failures, handle all failures gracefully
• Component may crash/stop
• Dependent component may crash/stop
• Network failure
• Disk can go out of space
• Keep things simple
• Avoid unnecessary dependencies
• Installation should be simple
• Failures on one server should have no impact on rest of the data centre.
• Automate everything
• People make mistakes, they need sleep, they forget things
• Automated processes are testable, fixable and therefore more reliable
Ref: https://www.usenix.org/legacy/events/lisa07/tech/full_papers/hamilton/hamilton.pdf
22. Latency
• Understand latency from the entire latency distribution curve
• Simply looking at 95th or 99th percentile is not sufficient
• Tail latency matters
• Median is not representative of common case. Average is even worse.
• No single metric can define behaviour of latency
• Be conscious with the monitoring tools and the data they report
• Percentiles cant be averaged
• Latency is not service time
• Plot your data with coordinated omission and there is often a quick high rise in the curve
• A non-omitted test often has a smoother curve
• Very few tools actually correct for coordinated omission
• HdrHistogram
• Is additive, uses log buckets, helpful in capturing high volume data in production
Ref: http://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
24. Document vs Relational
• Document database store one-to-many relationships or nested records within the parent record
(not in a separate table)
• One-to-many - One person can have many contact details
• Document and Relational database store many-to-one and many-to-many using unique
identifier called foreign key in relational and document reference in document model.
• Many-to-one - Many persons can have one address
• Many-to-many - Many persons can have many skills
25. Document vs Relational
cont.Document Relational
Data Model
Closer to data structures used by application Better support for joins
Schema flexibility Better support for Many to One relationships
Better performance due to locality Better support for Many to Many relationships
Fault Tolerance
Concurrency
Good for Analytics app where M-M reln is not needed
Bad for
• Reading small portion of a large document
• Writes that increase the size of large document
Recommended
use
• Keep documents fairly small
• Avoid writes that increase document size
27. Facebook Thundering herd Problem
• Problem:
• Millions of people tune in to a celebrity Live broadcast simultaneously,
potentially 100s of thousands of video requests will see a cache miss at the
Edge Cache servers.
• This results in excessive queries to the Origin Cache and Live processing
servers, which are not designed to handle high concurrent loads.
• Solution:
• Create request queues at the Edge cache servers,
• Allowing one request to go through to the livestream server and return the content
to the Edge cache, where it is distributed to the rest of the queue all at once.
Ref: https://code.facebook.com/posts/1653074404941839/under-the-hood-
broadcasting-live-video-to-millions/
28. PostgreSQL MongoDb
Flexibility Have to match schema Put anything in any document
Integrity Read valid data only Read anything out
Consistency
Written means written, no
exceptions (except disk failure, use
RAID)
Written means written, unless something
goes wrong (e.g. server crash, network
partition, disk failure)
Availability If master dies, stop to avoid corruption
If master dies, rebalance to avoid
downtime
Bigger servers
(Expensive, Cant use
cloud)
Good, upto 64 cores, 1TB Ram Bad, per database write lock
Sharding (Cheaper, works
in cloud)
Bad, hard to choose shards to
maintain integrity
Good, built in support with mongos
Replication
Doesn't help write-throughput, always hits master
Faster Failover
Ideal Use Case
Good for storing arbitrary pieces of json, when you don't care at all what is inside that JSON.
If your code expects something TO BE present in json, then MongoDB is wrong choice.
Never use mongoldb if one document has conceptual links to another document(s).
Ref: https://speakerdeck.com/conradirwin/mongodb-confessions-of-a-postgresql-lover
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
https://www.infoq.com/presentations/data-types-issues
29. Storage Engines
• Optimised for one of
• Transaction processing
• Analytics such as column oriented
• Belong to one of families
• Log structured storage engines
• Page oriented storage engines such as B-trees
30. Data structure behind databases
#!/bin/bash
db_set () {
echo "$1,$2" >> database
}
db_get () {
grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}
$ db_set 81 '{"x":"11","places":["London Eye"]}'
$ db_set 42 '{"x":"23","places":["Exploratorium"]}'
$ db_set 42 '{"x":"35","places":["Golden Gate"]}'
$ db_get 42 '{"x":"35","places":["Golden Gate"]}'
$ cat database
81,{"x":"11","place":["London Eye"]}
42,{"x":"23","places":["Exploratorium"]}
42,{“x":"35","places":["Golden Gate"]}
Many dbs use a log, an append-only data file, similar to
what db_set does
But real database has to deal with more issues
• Concurrency control
• Reclaiming disk space
• Log size control
• Handling errors, crash recovery
• Partially written records
• File format
• Deleting records
Append-log is efficient
• Appending and segment merging are faster
• Concurrency and crash recovery are much simpler if
segment file are append-only or immutable
• Merging old segments avoids fragmentation problem
32. Indexes
• Hash Index
• Must fit in memory, for very large no if keys, hash index wont work
• Range queries wont work
• SSTables and LSM-Trees
33. Traditional RDBMS wisdom
• Row store
• Data is in disk block formatting (heavily encoded)
• With a main memory buffer pool of blocks
• Query plans
• Optimize CPU, I/O
• Fundamental operation is read a row
• Indexing via B-Trees
• Clustered or Unclustered
• Dynamic row level locking
• Aries-style write-ahead log
• Replication (sync or async)
• Update the primary first
• Then move the log to the other sites
• And roll forward at the secondary(s)
• MySQL, Oracle, Postgres, SQLServer, DB2
• Traditional wisdom is now obsolete
34. DBMS marketplace
Market size Features Current State
Data
warehouse
s
1/3
• Lots of big reads
• Bulk-loaded from OLTP
systems
• Market already moving towards column
stores (which is not based on traditional
wisdom ex. HP Vertica, Amazon
Paraccel)
• Column stores are 50 - 100 times faster
than row stores
OLTP 1/3
• Lots of small updates
• And a few reads
• Not clear who will win, but NewSQL dbs
are wildly faster. Ex. Voltdb, Google
Spanner
• OLTP and NewSQL
Everything
else
1/3
• Hadoop, NoSQL, graph dbs, array dbs
…
35. Why column-stores are faster
• Typical warehouse query read 4-5 attribute from a 100 column fact table
• Row store - reads all 100 attributes
• Column store - reads just the ones you need
• Compression is way easier and more productive in column store
• Each column has data of same type -> Each block contains data of one kind of attribute. Bitmap
can be used
• No big record headers in column store
• They dont compress well
• A column executor is wildly faster than row executor
• Because of vector processing
36. OLTP and NewSQL
What future holds for OLTP
• Main memory DBMS
• With anti-caching
• Deterministic concurrency control
• HA via active-active
OLTP data bases - 3 big decisions
• Main memory vs disk orientation
• Concurrency control strategy
• Replication strategy
Ref : http://slideshot.epfl.ch/play/suri_stonebraker
37. Data format or schema changes
• Data format/schema change often needs a
change in application code
• Code changes often cannot happen
instantaneously
• Server side apps - Staged Rollout
(installing new codes in some nodes and
gradually installing to other nodes as the
new code is found working fine)
• Client side apps - Some user may and
some may not install upgrade for some
time
• Hence old & new versions of code, and old
& new data formats may potentially coexist
in the system at the same time.
• Backward compatibility - Newer code
can read data that was written by older
code
• Forward compatibility (trickier)- Older
code can read data that was written by
newer code
• Data encoding formats supports in
achieving the above requirements
• JSON, XML, Protocol Buffers, Thrift,
Avro
38. Encoding formats
• Programs generally work with data in 2 representations
• In-memory representation - As object, structs, lists,
array, hash tables, trees and so on.
• These data structures are often optimised for
efficient usage by CPU typically using
pointers
• Disk File and/or Over the network representation -
A self contained byte sequence of bytes to be
stored in a disk file or to be transferred over the
network
• Since a pointer wouldn’t make sense to any
other process, this and in-memory
representation are quite different
• Encoding
• Translation from in-memory to byte
sequence
• Also called marshalling,
serialisation
• Decoding
• Translation from byte sequence to
in-memory
• Also called unmarshalling,
deserialisation, parsing
39. Language-specific vs Standard formats
Language specific Standard
java.io.Serializable, Ruby Marshal, Python Pickle, PHP
serialize/unserialize functions
JSON, XML, CSV
Encoding is tied to programming language
Lots of ambiguity in numbers encoding. XML and CSV cant
distinguish between number and string. JSON distinguishes
numbers and strings, but it doesn't distinguishes integers and
floating points
To restore data in same object types, the decoding
process needs to instantiate arbitrary classes which
has security issues
JSON and XML support Unicode character strings i.e. human
readable text, but don't support binary strings i.e. seq of bytes w/
o character encoding. Generally Base64 is used as a
workaround.
Data versioning is not taken care of. Backward and
Forward compatibility is always an issue
There is optional schema support for XML and JSON. They are
powerful but they are quite complicated too. CSV doesn't have
schema.
Efficiency - cpu time and size of encoded data is
always an afterthought
CSV is vague format, confusion arises if a value contains comma
or newline character. Its escaping rules are not correctly
implemented by all parsers
40. Security issue with arbitrary class
instantiation
• A Vulnerability in Java environments
• Any application that accepts serialized Java objects is likely vulnerable,
even if a framework or library is responsible and not your custom code.
• There’s no easy way to protect applications en-masse. It will take
organizations a long time to find and fix all the different variants of this
vulnerability.
• There’s no way to know what you’re deserializing before you’ve
decoded it.
• An attacker can serialize a bunch of malicious objects and send them to
your application.
• ObjectInputStream in = new ObjectInputStream( inputStream );
• return (Data)in.readObject();
• Once you call readObject(), it’s too late. The attackers malicious objects
have already been instantatiated, and have taken over your entire
server.
• Solution : Allow deserialization, but
make it impossible for attackers to
create instances of arbitrary classes.
• List<Class<?>> safeClasses =
Arrays.asList( BitSet.class,
ArrayList.class );
• Data data =
safeReadObject( Data.class,
safeClasses, 10, 50, inputStream );
• Limit the input to a maximum of
10 embedded objects and 50
bytes of input.
41. Javascript - working with large
numbers
• JS supports only 53 bits integers
• All numbers in JS are floating point numbers
• Numbers including integers and floating point are
represented as sign x mantissa x 2^exponent
• Mantissa has 53 bits
• Exponent can be used to get higher numbers but
they wont be contagious
• Twitter keeps 64 bit integer ids for
status, user, direct message, search ids
• Due to JS integer limitation, json
returned by twitter api includes ids
twice, once as json number and once
as decimal string.
• {"id": 10765432100123456789, "id_str":
"10765432100123456789", ...}
• Languages that use 64 bit unsigned
integers can use property id and don't
need id_str
• Javascript can use id_str along with
library like strint to do all kind of math
operations on id_str
Ref : http://2ality.com/2012/07/large-integers.html
https://groups.google.com/forum/#!topic/twitter-development-talk/ahbvo3VTIYI
42. Scaling to Higher Load
Shared memory Shared disk Shared nothing
Many CPUs, many RAM chips and many
disks joined together under one OS, and
a fast interconnect allows any CPU to
access any part of the memory or disk
Uses several machines with
independent CPUs and RAMs, but
stores data on an array of disks that is
shared between the machines,
connected via a fast network
Each machine running the database
software is called a node. Each node
has its own CPU, RAM, and disks.
Any coordination between nodes is
done at software level, using
conventional network
Cost is super linear. A machine twice the
size may not necessarily handle twice the
load
Application developer needs to be
super cautious. Since the data is
distributed over multiple nodes,
constraints and trade-offs needs to
be taken care of at software level.
Also called Vertical Scaling or Scaling
Up. Its the simplest approach i.e. buy
powerful machine
Also called horizontal scaling or
scaling out
Crash recovery is easiest, but
concurrency control is little difficult
because of the necessity of dealing with
lock table as hot spot
Concurrency control is most difficult
because of coordinating multiple
copies of the same lock table, and
syncing writes to a common log or logs
Concurrency control is more difficult
because it requires a distributed
deadlock detector and a multi-phase
commit protocol
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
43. Scaling to Higher Load cont..
Shared memory Shared disk Shared nothing
Difficulty of concurrency control 2 3 2
Difficulty of crash recovery 1 3 2
Difficulty of data base design 2 2 3
Difficulty of load balancing 1 2 3
Difficulty of high availability 3 2 1
Number of messages 1 2 3
Bandwidth required 3 2 1
Ability to scale to large no of machines 3 2 1
Ability to scale to large distances between machines 3 2 1
Susceptibility to critical sections 3 2 1
Number of system images 1 3 3
Susceptibility to hot spots 3 3 3
Ranking 1 - best, 2 - 2nd best, 3 - 3rd best
Ref: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
44. Replication
Use cases
• Reduce latency
• Increase availability
• Increase read throughput
Scenarios
• Small dataset, stored in a single
machine
• Partitioning or Sharding, stored in
multiple machines
• Faults
• Synchronous vs Asynchronous
replication
• Handling failed replicas
• Eventual consistency
• Setting up new followers
Replicating changes
• Single leader
• Multi leader
• Leaderless
45. Leader based replication
• Also known as Active/Passive and
Master/Slave replication
• Built in feature of
• Postgres, Mysql, Oracle Data
Guard, Sql Server Availability
Group
• MongoDB, RethinkDB, Espresso
• Kafka, RabbitMQ
• Network File Systems,
replicated block devices like
DRDB
• Synchronous replication
• Leader waits for confirmation from follower before reporting success to its
client
• Guarantee of up-to-date copy between leader and follower
• All followers can never be synchronous : any one node outage would cause
the whole system to grind to a halt
• Asynchronous replication
• Leader sends message to follower and reports success to its client (does n
wait for confirmation from follower)
• Often leader based replication is asynchronous
• Non durable - If leader fails and is not recoverable, all un-replicated writes
are lost
• It is inevitable when many followers or geographically distributed followers
• Semi-synchronous replication
• If sync follower becomes unavailable or slow, an async follower is made
synchronous
46. Setting up new followers
• Take a consistent snapshot of leader’s db (without taking a lock on entire db)
• Most dbs have this built in. 3rd party tools like innobackupex for Mysql can also be
used.
• Snapshot should record the exact position of leader’s replication log. This position
is called as log sequence number (postgres), binlog coordinates (mysql).
• Copy the snapshot to the new follower node
• Follower connects to the leader, and request all the data changes happened after log
sequence number
• After follower has processed all the backlog of data changes, it is said to caught up.
Now follower can continue processing data from leader as then happen
47. Handling node outages
• Leader failure handling is trickier than
follower failure handling:-
• Determining that the leader has failed
• Timeout is most popular strategy
to detect a leader’s failure (nodes
bounce messages back and forth
between each other and when a
node doesn't respond for 30secs
it is assumed to be dead)
• Choosing a new leader
• Either election process or
previously chosen controller node.
The best candidate is usually the
replica with most up-to-date data
changes from the old leader.
• Reconfiguring the system to use the
new leader
• Using Request Routing, client now
send data to new leader. When
old leader comes back, the
system has to ensure that it
becomes follower and recognise
the new leader
• Failover is subject to things that may go wrong
• Async repl : New leader do not have all the writes form old leader
• If old leader rejoin the cluster, what should happen to new those writes?
• The new leader may have received conflicting writes in the meantime!!
• Commonly, these writes are discarded, which has its own problems
• Violation of client’s durability expectation
• Dangerous situations may arise if other storage systems outside of
database needs to be coordinated with the database contents
• Ex. Github incident when out-of-date MySQL follower was
promoted to leader. Some auto increment primary keys were
reused by old and new leaders. The same keys were used by
Redis store. This resulted in private data of some users shared
with some other users
• Split brain : 2 nodes both believing they are the leader
• Sometime this lead to shutdown of both system
• Timeout : Right timeout for leader to be declared dead
• Short timeout can lead to unnecessary failovers
• Long timeout can be due to load on network, traffic spike, any failure
during such situation can worsen the situation further
Due to unavailability of easy solutions to these problems, most devops teams prefer to use manual failover even if software supports auto failover
48. Implementations of replication logs
Statement based Write ahead log Logical log Trigger based
Every write statement is logged
and sent to follower i.e. every
insert, update, delete is
statement is forwarded. The
follower parses and executes
the statement as if it is received
from a client.
The database log is used to
build a replica on another
node - both log-structured
storage engine and b-tree
uses a log in some or the
other way to store the data
Use different log formats
for replication and for
storage engine.
A transaction that
modifies several rows
generates several such
log records
Involves application code,
replication is moved up the
application layer. Ex. When
only a subset of data is to be
replicated, or want to replicate
from one kind of database to
another.
Cons : A statement that calls a
non-deterministic function like
NOW() or RAND() is likely
generate a different value on
replica.
Auto-increment may have a
different impact in executed in
different order
Cons: The Log describes
data on a very low level
including details like which
bytes were changed on
which disk block. This make
tight coupling with storage
engine - A zero-downtime
upgrade of database
softwares by first upgrading
followers and then making
one of the nodes the leader
is not possible.
Pros: Allows for
decoupling of replication
log and storage engine -
a zero-downtime
upgrade is hence
possible.
The logical log can sent
to external systems such
as data warehouse,
custom indexes, caches
Triggers and stored
procedures are used to
achieve this.
Cons: Greater overhead than
other replication methods,
more prone to bugs and
limitations.
Used in MySQL before ver 5.1.
Now MySQL switches to row
based repl whenever any non-
determinism is present in a
statement
Used in PostgreSql and
Oracle
Mysql’s binlog when
configured to use row
based replication use
this approach.
Databus for Oracle and
Bucardo for Postgres
Ref: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/
49. Multi leader replication
• A somewhat retrofitted
feature in many databases
• Often it causes pitfalls and
problems with other
database features
• Auto incrementing keys
• Triggers
• Integrity constraints
• Multi leader replication is
often considered a
dangerous territory that
should be avoided if
possible
50. Use cases for Multi leader replication
• Multi-datacenter operation
• Performance is better as the every write can be processed in local/nearest datacenter
• Datacenter outages can be better tolerated
• Network problems can be better tolerated
• Clients with offline operation
• Calendar apps in mobile phones, laptop and other devices, needs to allow create/edit/view of calendar events even if they are not
connected to internet.
• All offline changes needs to be synced with server and other devices when the drive is next online.
• All device local database acts a leader, and there is async multi-leader replication process between the replicas of calendar on all
devices
• There is a rich history of broken calendar sync implementations. Hence multi- leader repl is a tricky thing to get right
• CouchDB is designed for making this use case easier
• Collaborative editing
• Google docs, Etherpad
• Changes are instantly applied to local replica and async replicated to server and other users editing the same document
• To avoid conflicts, each user obtain a lock before editing the document
• For faster collaboration, the unit of change is made very small, ex. a single keystroke.
51. Conflict resolution - in multi leader replication
• Custom conflict resolution
• On write
• Bucardo works this way
• When conflict is detected, the database calls a conflict handler. In bucardo it can be perl script.
• The handler runs in the background and do not allow to prompt the user
• On read
• CouchDB works this way
• When conflict is detected, all conflicting writes are stored.
• The next time data is read, all the multiple versions of data is returned to application code.
• The application may prompt the user or resolve the conflict and write back the result in database.
• Automatic conflict resolution
• Used by Amazon
• Frequently products that are removed from cart still appear in the cart due to the conflict resolution logic errors
52. Ensuring Consistency in Multi-leader Replication
• Pessimistic Locking
• Wait for your turn
• Optimistic Locking
• Early bird gets the worm
• Conflict Resolution
• Your mother cleans up later
• Conflict Avoidance
• Solve the problem by not having it
#https://www.percona.com/live/mysql-conference-2013/sessions/state-art-mysql-multi-master-replication slide 7
53. Microservices at UBER
• Microservices bring benefits like
• Each teams owning their own
release cycles
• Each team responsible for their
own uptime
• Microservices has challenges like
• The aggregate velocity can be
much slower for ex. the Java team
has to figure out how to talk to
metrics system, so do Node people
and Go people
• A hard fought bug on one platform
has also to be fought on another
platform
• “I hadn't expected the cost of multiple
languages to be as high as it was” —
Matt Ranney (Uber’s Chief System
Architect)
Present in lots of data
centres around the world
TLS termination
at front end
Riak clusters to manage the
site of all in progress jobs
Completed jobs travel from
Marketplace to other logic
systems through Kafka
Marketplace - the dispatch system which
supports all sorts of logistics including rides,
ubereats etc. in Node.js, Java, Go
Other queues execute other
workflows ex. prompt user
to get the receipt and rate
the trip
Map services compete the
ETAs and routes for the trip.
Some high throughput
systems written in Java
All Kafka streams go to Hadoop for
analytical processing
Moving towards type safe and
verifiable interfaces between
services as type unsafe JSON cost
is too huge
A lot of early code was using
JSON over HTTP, which
makes it hard to validate
interfaces.
Army of mobile phones around the world
doing black box testing