The document discusses various MySQL indexing concepts like primary key indexes, secondary indexes, clustered indexes and hash indexes. It explains how indexes are used based on the left prefix rule and selectivity. It also covers storage engines like InnoDB and MyISAM. The document discusses locking errors like lock wait timeouts and deadlocks. It explains isolation levels like repeatable read, read committed and serializable. It provides details about the Aurora undo log and how it differs from vanilla MySQL. It emphasizes monitoring MySQL using the error log, slow query log and metrics. It also briefly discusses Aurora parallel queries.
CTF3, Stripe's third Capture-the-Flag, focused on distributed systems engineering with a goal of learning to build fault-tolerant, performant software while playing around with a bunch of cool cutting-edge technologies.
More here: https://stripe.com/blog/ctf3-launch.
During the continuous mORMot refactoring, some core part of the framework was rewritten. In this session, we propose a journey to a refactoring of a single loop. It will take us from a naïve but working approach, to a 10 times faster Pascal rewrite, and then introduce how SSE2 and AVX2 assembly could boost the process even further – to reach more than 30 times improvement! No previous knowledge of assembly is needed: we will try to introduce how modern CPUs work, and will have some fun with algorithms and SIMD parallelism.
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
Twitter processes over 500 million tweets per day and more than 2 billion search queries per day. The company uses a search architecture based on Lucene with custom extensions. This includes an in-memory real-time index optimized for concurrency without locks, and a schema-based document factory. Future work includes support for parallel index segments and additional Lucene features.
This document provides an overview of concurrency concepts including:
- The speaker discusses some basic JVM knowledge, concurrency concepts, and personal design suggestions.
- Key concurrency topics like reorder, happens-before relationships, volatile fields, and CopyOnWriteArrayList are summarized.
- Common concurrency patterns like producer-consumer, readers-writers, and dining philosophers problems are explained.
- The goal is to make developers aware of concurrency issues and provide resources to study the topic further.
This document summarizes the basics of memory management in Python. It discusses key concepts like variables, objects, references, and reference counting. It explains how Python uses reference counting with generational garbage collection to manage memory and clean up unused objects. The document also covers potential issues with reference counting like cyclic references and threads, and how the global interpreter lock impacts multi-threading in Python.
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
How does Twitter track the top trending topics?
How does Amazon keep track of the top-selling items for the day?
How many cabs have been booked this month using your App?
Is the password that a new user is choosing a common/compromised password?
Modern web-scale systems process billions of transactions and generate terabytes of data every single day. In order to find answers to questions against this data, one would initiate a multi-minute query against a NoSQL datastore or kick off a batch job written in a distributed processing framework such as Spark or Flink. However, these jobs are throughput-heavy and not suited for realtime low-latency queries. However, you and your customers would like to have all this information "right now".
At the end of this talk, you'll realize that you can power these low-latency queries and with incredibly low memory footprint "IF" you are willing to accept answers that are, say, 96-99% accurate. This talk introduces some of the go-to probabilistic data structures that are used by organisations with large amounts of data - specifically Bloom filter, Count Min Sketch and HyperLogLog.
The document discusses data-oriented design principles for game engine development in C++. It emphasizes understanding how data is represented and used to solve problems, rather than focusing on writing code. It provides examples of how restructuring code to better utilize data locality and cache lines can significantly improve performance by reducing cache misses. Booleans packed into structures are identified as having extremely low information density, wasting cache space.
The Ring programming language version 1.9 book - Part 100 of 210Mahmoud Samir Fayed
This document provides reference information about the Ring programming language, including lists of its keywords, built-in functions, and types of errors. It describes the language's architecture as having Ring applications that use Ring libraries and extensions written in C/C++. It also covers Ring's memory management using garbage collection, and how data like strings, numbers, and binary data are represented internally.
CTF3, Stripe's third Capture-the-Flag, focused on distributed systems engineering with a goal of learning to build fault-tolerant, performant software while playing around with a bunch of cool cutting-edge technologies.
More here: https://stripe.com/blog/ctf3-launch.
During the continuous mORMot refactoring, some core part of the framework was rewritten. In this session, we propose a journey to a refactoring of a single loop. It will take us from a naïve but working approach, to a 10 times faster Pascal rewrite, and then introduce how SSE2 and AVX2 assembly could boost the process even further – to reach more than 30 times improvement! No previous knowledge of assembly is needed: we will try to introduce how modern CPUs work, and will have some fun with algorithms and SIMD parallelism.
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
Twitter processes over 500 million tweets per day and more than 2 billion search queries per day. The company uses a search architecture based on Lucene with custom extensions. This includes an in-memory real-time index optimized for concurrency without locks, and a schema-based document factory. Future work includes support for parallel index segments and additional Lucene features.
This document provides an overview of concurrency concepts including:
- The speaker discusses some basic JVM knowledge, concurrency concepts, and personal design suggestions.
- Key concurrency topics like reorder, happens-before relationships, volatile fields, and CopyOnWriteArrayList are summarized.
- Common concurrency patterns like producer-consumer, readers-writers, and dining philosophers problems are explained.
- The goal is to make developers aware of concurrency issues and provide resources to study the topic further.
This document summarizes the basics of memory management in Python. It discusses key concepts like variables, objects, references, and reference counting. It explains how Python uses reference counting with generational garbage collection to manage memory and clean up unused objects. The document also covers potential issues with reference counting like cyclic references and threads, and how the global interpreter lock impacts multi-threading in Python.
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
How does Twitter track the top trending topics?
How does Amazon keep track of the top-selling items for the day?
How many cabs have been booked this month using your App?
Is the password that a new user is choosing a common/compromised password?
Modern web-scale systems process billions of transactions and generate terabytes of data every single day. In order to find answers to questions against this data, one would initiate a multi-minute query against a NoSQL datastore or kick off a batch job written in a distributed processing framework such as Spark or Flink. However, these jobs are throughput-heavy and not suited for realtime low-latency queries. However, you and your customers would like to have all this information "right now".
At the end of this talk, you'll realize that you can power these low-latency queries and with incredibly low memory footprint "IF" you are willing to accept answers that are, say, 96-99% accurate. This talk introduces some of the go-to probabilistic data structures that are used by organisations with large amounts of data - specifically Bloom filter, Count Min Sketch and HyperLogLog.
The document discusses data-oriented design principles for game engine development in C++. It emphasizes understanding how data is represented and used to solve problems, rather than focusing on writing code. It provides examples of how restructuring code to better utilize data locality and cache lines can significantly improve performance by reducing cache misses. Booleans packed into structures are identified as having extremely low information density, wasting cache space.
The Ring programming language version 1.9 book - Part 100 of 210Mahmoud Samir Fayed
This document provides reference information about the Ring programming language, including lists of its keywords, built-in functions, and types of errors. It describes the language's architecture as having Ring applications that use Ring libraries and extensions written in C/C++. It also covers Ring's memory management using garbage collection, and how data like strings, numbers, and binary data are represented internally.
Back to Basics 3: Scaling 30,000 Requests a Second with MongoDBMongoDB
Mike Chesnut of Crittercism discussed how they scaled their MongoDB deployment to handle over 30,000 requests per second. He explained how they initially started with a single mongos process which caused problems as load increased. They overcame this by introducing a separate mongos tier for client connections. Chesnut also discussed how not running the balancer properly led to an unbalanced cluster, and the steps they took to manually balance chunks until the balancer could be safely re-enabled. He emphasized designing for scalability from the start and adapting architecture over time.
Titie: Rethink Package Components on De-Duplication: From Logical Sharing to Physical Sharing
URL: http://events.linuxfoundation.org/2010/linuxcon-japan/suzaki
Abstract: inux distributions include many logical sharing techniques (shared library, symbolic link, etc) on memory and storage. Unfortunately they cause security and management problems, such as GOT (Global Offset Table) overwrite attack, TOCTTOU (Time Of Check To Time Of Use) attack, Dependency hell, etc. In order to mitigate the problems, we propose the replacement of logical sharing by physical sharing (memory and disk deduplication; e.g., KSM: Kernel Samepage Merging, Content Addressable Storage, etc). Original ELF binaries are transformed as self-contained binaries which include dynamic linked shared libraries as “pseudo-static”. The binaries become fat but physical resource usage is mitigated by deduplication. We have investigated the effect on Debian and Ubuntu and confirmed that the physical impact is low. Data centers of Cloud Computing utilize the deduplication techniques. Users and administrators should consider Linux images in terms of security and maintenance, and the usage of deduplicated fat binaries.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
Signed and unsigned variables can represent positive and negative numbers, or just positive numbers respectively. Declaring a variable as unsigned increases its maximum positive range. Storage classes like auto, static, extern and register provide information about a variable's location and visibility. Bitwise operators operate on binary representations of numbers and are commonly used for tasks like compression/encryption that require bit-level manipulation.
This document provides an overview of managing data and operations distribution in MongoDB sharded clusters. It discusses shard key selection, applying and reverting shard keys, automatic and manual chunk splitting and merging, balancing data distribution across shards, and cleaning up orphaned documents. The goal is to optimally distribute data and load across shards in the cluster.
This document provides an introduction and overview of the Python programming language. It discusses what Python is, its features, applications, and how to install Python on Windows and Linux systems. It also covers Python basics like variables, data types, operators, comments, conditional statements like if/else, and loops like for, while, and nested loops. Examples are provided for key concepts. The document is intended as a beginner tutorial for learning Python.
Counting and sorting are basic tasks that distributed systems rely on. The document discusses different approaches for distributed counting and sorting, including software combining trees, counting networks, and sorting networks. Counting networks like bitonic and periodic networks have depth of O(log2w) where w is the network width. Sorting networks can sort in the same time complexity by exploiting an isomorphism between counting and sorting networks. Sample sorting is also discussed as a way to sort large datasets across multiple threads.
Sangam 18 - Database Development: Return of the SQL JediConnor McDonald
A look at the techniques that middle tier developers can employ to get greater value out of their applications, simply by having an understanding of how the database works and how to make it sing.
what every web and app developer should know about multithreadingIlya Haykinson
Multithreading allows executing multiple tasks simultaneously by splitting a program into multiple threads. It is useful for improving performance on web applications and when disk/network I/O is involved. However, threads introduce complexity as code and data may be accessed concurrently. Synchronization techniques like mutexes are needed to coordinate thread access and prevent race conditions from causing unpredictable behavior. Developers must carefully manage shared resources and avoid deadlocks when using multithreading.
APEX Connect 2019 - array/bulk processing in PLSQLConnor McDonald
A beginners level talk at the syntax for bulk processing in PLSQL, why it is so important for performance and scalability, and how to diagnose errors when it goes wrong
The document describes a cache-aware hybrid sorter that is faster than the STL sort. It first radix sorts input streams into substreams that fit into the CPU cache. This is done in a cache-friendly manner by splitting streams based on cache size. The substreams are then merged using a loser tree merge, which has better memory access patterns than a heap-based priority queue. Testing showed the hybrid sort was 2-6 times faster than STL sort and scaled well on multi-core CPUs.
Cassandra introduction apache con 2014 budapestDuyhai Doan
This document provides an introduction and summary of Cassandra presented by Duy Hai Doan. It discusses Cassandra's history as a NoSQL database created at Facebook and open sourced in 2008. The key architecture of Cassandra including its data distribution across nodes, replication for failure tolerance, and consistency models for reads and writes is summarized.
This presentation of ROBO INDIA comprises all of the elements that must be known to learn the programming language C.
This ppt also explains all these topics in details.
We welcome all you views and queries. Please write us, we are found at-
website: http://roboindia.com
mail: info@roboindia.com
The nightmare of locking, blocking and isolation levels!Boris Hristov
am sure you all know that troubleshooting problems related to locking and blocking (hey, sometimes there are deadlocks too) can be a real nightmare! In this session, you will be able to see and understand why and how locking actually works, what problems it causes and how can we use isolation levels and various other techniques to resolve them!
The JVM memory model describes how threads in the Java eco-system interact through memory. While the memory model impact on developing for the JVM may not be obvious, it is the cause for certain number of "anomalies" that are, well, by design.
In this presentation we will explore the aspects of the memory model, including things like reordering of instructions, volatile members, monitors, atomics and JIT.
Basics in algorithms and data structure Eman magdy
The document discusses data structures and algorithms. It notes that good programmers focus on data structures and their relationships, while bad programmers focus on code. It then provides examples of different data structures like trees and binary search trees, and algorithms for searching, inserting, deleting, and traversing tree structures. Key aspects covered include the time complexity of different searching algorithms like sequential search and binary search, as well as how to implement operations like insertion and deletion on binary trees.
This document provides an overview of the Python programming language in under 90 minutes. It covers Python basics like Hello World, variables, data types, objects, functions, conditionals, and more. The goal is to teach readers enough Python to read, write, and understand basic Python programs in a short period of time. It also provides references to additional resources like the author's book for learning Python in more depth.
- Publishers send messages to topics in Apache Kafka which are partitioned across brokers
- Brokers append messages to the ends of partitions and subscribers can request messages from specific offsets in partitions
- This allows subscribers to replay processing from any point in time as they request messages based on offset rather than relying on brokers to deliver messages
This document discusses using the relay log as a solution for failover in a multi-source replication scenario where the binary log positions and transaction IDs are different across slaves. The relay log contains the same position and transaction ID for a transaction on all slaves, allowing one to be used as a global transaction ID for failover. Specifically, a modified client was created to dump the relay log, and the MySQL server was updated to support relay log dumping, providing a hero - the unsung relay log - for high availability when GTIDs were not available.
Back to Basics 3: Scaling 30,000 Requests a Second with MongoDBMongoDB
Mike Chesnut of Crittercism discussed how they scaled their MongoDB deployment to handle over 30,000 requests per second. He explained how they initially started with a single mongos process which caused problems as load increased. They overcame this by introducing a separate mongos tier for client connections. Chesnut also discussed how not running the balancer properly led to an unbalanced cluster, and the steps they took to manually balance chunks until the balancer could be safely re-enabled. He emphasized designing for scalability from the start and adapting architecture over time.
Titie: Rethink Package Components on De-Duplication: From Logical Sharing to Physical Sharing
URL: http://events.linuxfoundation.org/2010/linuxcon-japan/suzaki
Abstract: inux distributions include many logical sharing techniques (shared library, symbolic link, etc) on memory and storage. Unfortunately they cause security and management problems, such as GOT (Global Offset Table) overwrite attack, TOCTTOU (Time Of Check To Time Of Use) attack, Dependency hell, etc. In order to mitigate the problems, we propose the replacement of logical sharing by physical sharing (memory and disk deduplication; e.g., KSM: Kernel Samepage Merging, Content Addressable Storage, etc). Original ELF binaries are transformed as self-contained binaries which include dynamic linked shared libraries as “pseudo-static”. The binaries become fat but physical resource usage is mitigated by deduplication. We have investigated the effect on Debian and Ubuntu and confirmed that the physical impact is low. Data centers of Cloud Computing utilize the deduplication techniques. Users and administrators should consider Linux images in terms of security and maintenance, and the usage of deduplicated fat binaries.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
Signed and unsigned variables can represent positive and negative numbers, or just positive numbers respectively. Declaring a variable as unsigned increases its maximum positive range. Storage classes like auto, static, extern and register provide information about a variable's location and visibility. Bitwise operators operate on binary representations of numbers and are commonly used for tasks like compression/encryption that require bit-level manipulation.
This document provides an overview of managing data and operations distribution in MongoDB sharded clusters. It discusses shard key selection, applying and reverting shard keys, automatic and manual chunk splitting and merging, balancing data distribution across shards, and cleaning up orphaned documents. The goal is to optimally distribute data and load across shards in the cluster.
This document provides an introduction and overview of the Python programming language. It discusses what Python is, its features, applications, and how to install Python on Windows and Linux systems. It also covers Python basics like variables, data types, operators, comments, conditional statements like if/else, and loops like for, while, and nested loops. Examples are provided for key concepts. The document is intended as a beginner tutorial for learning Python.
Counting and sorting are basic tasks that distributed systems rely on. The document discusses different approaches for distributed counting and sorting, including software combining trees, counting networks, and sorting networks. Counting networks like bitonic and periodic networks have depth of O(log2w) where w is the network width. Sorting networks can sort in the same time complexity by exploiting an isomorphism between counting and sorting networks. Sample sorting is also discussed as a way to sort large datasets across multiple threads.
Sangam 18 - Database Development: Return of the SQL JediConnor McDonald
A look at the techniques that middle tier developers can employ to get greater value out of their applications, simply by having an understanding of how the database works and how to make it sing.
what every web and app developer should know about multithreadingIlya Haykinson
Multithreading allows executing multiple tasks simultaneously by splitting a program into multiple threads. It is useful for improving performance on web applications and when disk/network I/O is involved. However, threads introduce complexity as code and data may be accessed concurrently. Synchronization techniques like mutexes are needed to coordinate thread access and prevent race conditions from causing unpredictable behavior. Developers must carefully manage shared resources and avoid deadlocks when using multithreading.
APEX Connect 2019 - array/bulk processing in PLSQLConnor McDonald
A beginners level talk at the syntax for bulk processing in PLSQL, why it is so important for performance and scalability, and how to diagnose errors when it goes wrong
The document describes a cache-aware hybrid sorter that is faster than the STL sort. It first radix sorts input streams into substreams that fit into the CPU cache. This is done in a cache-friendly manner by splitting streams based on cache size. The substreams are then merged using a loser tree merge, which has better memory access patterns than a heap-based priority queue. Testing showed the hybrid sort was 2-6 times faster than STL sort and scaled well on multi-core CPUs.
Cassandra introduction apache con 2014 budapestDuyhai Doan
This document provides an introduction and summary of Cassandra presented by Duy Hai Doan. It discusses Cassandra's history as a NoSQL database created at Facebook and open sourced in 2008. The key architecture of Cassandra including its data distribution across nodes, replication for failure tolerance, and consistency models for reads and writes is summarized.
This presentation of ROBO INDIA comprises all of the elements that must be known to learn the programming language C.
This ppt also explains all these topics in details.
We welcome all you views and queries. Please write us, we are found at-
website: http://roboindia.com
mail: info@roboindia.com
The nightmare of locking, blocking and isolation levels!Boris Hristov
am sure you all know that troubleshooting problems related to locking and blocking (hey, sometimes there are deadlocks too) can be a real nightmare! In this session, you will be able to see and understand why and how locking actually works, what problems it causes and how can we use isolation levels and various other techniques to resolve them!
The JVM memory model describes how threads in the Java eco-system interact through memory. While the memory model impact on developing for the JVM may not be obvious, it is the cause for certain number of "anomalies" that are, well, by design.
In this presentation we will explore the aspects of the memory model, including things like reordering of instructions, volatile members, monitors, atomics and JIT.
Basics in algorithms and data structure Eman magdy
The document discusses data structures and algorithms. It notes that good programmers focus on data structures and their relationships, while bad programmers focus on code. It then provides examples of different data structures like trees and binary search trees, and algorithms for searching, inserting, deleting, and traversing tree structures. Key aspects covered include the time complexity of different searching algorithms like sequential search and binary search, as well as how to implement operations like insertion and deletion on binary trees.
This document provides an overview of the Python programming language in under 90 minutes. It covers Python basics like Hello World, variables, data types, objects, functions, conditionals, and more. The goal is to teach readers enough Python to read, write, and understand basic Python programs in a short period of time. It also provides references to additional resources like the author's book for learning Python in more depth.
- Publishers send messages to topics in Apache Kafka which are partitioned across brokers
- Brokers append messages to the ends of partitions and subscribers can request messages from specific offsets in partitions
- This allows subscribers to replay processing from any point in time as they request messages based on offset rather than relying on brokers to deliver messages
This document discusses using the relay log as a solution for failover in a multi-source replication scenario where the binary log positions and transaction IDs are different across slaves. The relay log contains the same position and transaction ID for a transaction on all slaves, allowing one to be used as a global transaction ID for failover. Specifically, a modified client was created to dump the relay log, and the MySQL server was updated to support relay log dumping, providing a hero - the unsung relay log - for high availability when GTIDs were not available.
The document discusses data storage and processing. Data can be stored in memory or on disk using file systems like local XFS/ZFS or distributed systems like HDFS, S3, or Ceph. Distributed file systems allow for parallel processing of data by moving computation to the data locations. This map-reduce framework involves mapping functions to distributed data segments followed by reducing the results. Hadoop uses HDFS for storage and the MapReduce framework for distributed computation on large datasets across clusters.
my attempt to demystify datastores.
how to choose a store that fits your needs what are the questions you need to ask ?
hbase hadoop mysql cassandra vertica etc
Docker allows for the delivery of applications using containers. Containers are lightweight and allow for multiple applications to run on the same host, unlike virtual machines which each require their own operating system. Docker images contain the contents and configuration needed to run an application. Images are built from manifests and layers of content and configuration are added. Running containers from images allows applications to be easily delivered and run. Containers can be connected to volumes to preserve data when the container is deleted. Docker networking allows containers to communicate and ports can be exposed to the host.
StormWars - when the data stream shrinksvishnu rao
Apache Storm is a stream processing framework that can be used to process real-time data from data streams like Apache Kafka or Amazon Kinesis. When data in Amazon Kinesis is repartitioned into new shards, the partition metadata used by Storm becomes invalid. To address this, a solution is to define a white list of shards for each Storm topology, so that individual topologies are not affected when shards are added or removed from the stream.
The document proposes a "Punch Clock" concept to help debug Apache Storm transactional topologies. A Punch Clock would record when batches of tuples enter and exit spouts and bolts. Each spout/bolt would have a Punch Card ID to track the batch. Punching in would add the ID to a data structure, punching out would remove it. This would help identify batches stuck in specific spouts/bolts on hosts. It could be exposed via JMX to aggregate data across worker JVMs running the spouts/bolts. The goal is to determine batch flow through the topology and find any that are stuck.
a wild Supposition: can MySQL be Kafka ?vishnu rao
Apache Kafka is a distributed publish-subscribe messaging system that uses a distributed commit log to store messages. It allows applications to publish and subscribe to streams of records. The document discusses some similarities and differences between using MySQL and Kafka for messaging, such as Kafka's ability to horizontally scale by adding more brokers with multiple partitions, versus the single partition in a MySQL instance. While unconventional, the document proposes that MySQL could potentially be used as a messaging system like Kafka.
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...vishnu rao
At the 5th Elephant BigData conference in bangalore, india , 27-july-2012.
https://fifthelephant.talkfunnel.com/2012/384-build-your-own-real-time-analytics-and-visualization-enable-complex-event-processing-event-patterns-and-aggregates
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
18. Index use
● Left prefix rule
○ Name, ID, country
■ Select … where ID = 3 and country = us
■ Select … where ID = 3
■ Select … where country = us
■ Select … where Name = ‘blah’ and country = us
■ Select … where Name = ‘blah’ and ID = 4
19. Index use
● Left prefix rule
○ Name, ID, country
■ Select … where ID = 3 and country = us
■ Select … where ID = 3
■ Select … where country = us
■ Select … where Name = ‘blah’ and country = us
■ Select … where Name = ‘blah’ and ID = 4
● Duplicate indexes
○ Name, country
○ Name, country, zip
○ Name, zip
20. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
21. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
22. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
○ => since 2ndary index, the node has pointers to 70% of rows
23. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
○ => since 2ndary index, the node has pointers to 70% of rows
○ What does MySQL Do ?
24. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
○ => since 2ndary index, the node has pointers to 70% of rows
○ What does MySQL Do ?
■ Does it read index , traverse it and then go to disk for the 70% of data ?
25. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
○ => since 2ndary index, the node has pointers to 70% of rows
○ What does MySQL Do ?
■ Does it read index , traverse it and then go to disk for the 70% of data ?
■ It does not , it bypasses index and goes to disk directly
26. Index use
● Selectivity rule
○ Select … where gender = male
○ Say 2ndary index on gender (cardinality is say 2 = male/female)
○ Say 70 % of rows are male
○ => in the btree, 70% of rows will be under male node.
○ => since 2ndary index, the node has pointers to 70% of rows
○ What does MySQL Do ?
■ Does it read index , traverse it and then go to disk for the 70% of data ?
■ It does not , it bypasses index and goes to disk directly
■ It does a table scan ! (more effective).
● Explain might indicate use of index but in practice it does not!
31. Lock errors
1. Lock wait timeout exceeded:
a. set global innodb_lock_wait_timeout = x; <default is 150 sec i believe>
b. Show engine innodb status; (when txn is blocked, u can see on whats its blocked)
c. Show process list; (list of connections and what they are doing)
32. Lock errors
1. Lock wait timeout exceeded:
a. set global innodb_lock_wait_timeout = x; <default is 150 sec i believe>
b. Show engine innodb status; (when txn is blocked, u can see on whats its blocked)
c. Show process list; (list of connections and what they are doing)
2. Deadlocks
a. You have to do nothing. Auto resolved by Mysql - randomly 1 txn wins and other rolled back.
b. Show engine innodb status - shows u latest deadlocks that occurred
34. Read locks
1. Select for update
a. Use carefully.
i. You might end up locking part of the index tree. (select .. where cost > 50)
b. Good practice is to select row ids first and then update (i.e. specific row locks)
i. Select id where cost > 50
ii. Update where id = x
35. Read locks
1. Select for update
a. Use carefully.
i. You might end up locking part of the index tree. (select .. where cost > 50)
b. Good practice is to select row ids first and then update (i.e. specific row locks)
i. Select id where cost > 50
ii. Update where id = x
2. Select for share
a. It’s a pure read lock. Writers will wait for read to complete
38. Isolation levels
1. Repeatable read (default i believe in aurora)
a. The same read if done again in the txn sees the same thing (except if some other txn commits
before the second read)
39. Isolation levels
1. Repeatable read (default i believe in aurora)
a. The same read if done again in the txn sees the same thing (except if some other txn commits
before the second read)
2. Read Committed (newly added)
a. Every read in the txn sees the latest state
40. Isolation levels
1. Repeatable read (default i believe in aurora)
a. The same read if done again in the txn sees the same thing (except if some other txn commits
before the second read)
2. Read Committed (newly added)
a. Every read in the txn sees the latest state
3. Read un-Committed (not recommended)
a. A read in the txn sees the dirty state of uncommitted txns
41. Isolation levels
1. Repeatable read (default i believe in aurora)
a. The same read if done again in the txn sees the same thing (except if some other txn commits
before the second read)
2. Read Committed (newly added)
a. Every read in the txn sees the latest state
3. Read un-Committed (not recommended)
a. A read in the txn sees the dirty state of uncommitted txns
4. Serializable
a. All txns go in sequence
42. Why is this important ?
For every transaction:
43. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
44. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
b. Even newly inserted rows are stored in undo log.
45. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
b. Even newly inserted rows are stored in undo log.
c. Once txn completes (rolled back/committed) , the undo log purges the relevant rows.
46. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
b. Even newly inserted rows are stored in undo log.
c. Once txn completes (rolled back/committed) , the undo log purges the relevant rows.
2. Rows being read:
a. A snapshot is stored in the undo log.
47. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
b. Even newly inserted rows are stored in undo log.
c. Once txn completes (rolled back/committed) , the undo log purges the relevant rows.
2. Rows being read:
a. A snapshot is stored in the undo log.
b. It helps satisfy the isolation level of txn
48. Why is this important ?
For every transaction:
1. Rows updated:
a. Before version of rows is stored in undo-log. If txn is rolled back, the before version is restored.
b. Even newly inserted rows are stored in undo log.
c. Once txn completes (rolled back/committed) , the undo log purges the relevant rows.
2. Rows being read:
a. A snapshot is stored in the undo log.
b. It helps satisfy the isolation level of txn
c. Once txn completes (rolled back/committed) , the undo log purges the relevant rows.
49. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
50. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
51. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
● Imagine long running transactions
52. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
● Imagine long running transactions
○ Based on isolation level
53. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
● Imagine long running transactions
○ Based on isolation level
○ The undo log might keep growing …
54. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
● Imagine long running transactions
○ Based on isolation level
○ The undo log might keep growing …
○ Purging / garbage collection will not occur…
55. Aurora Undo Log
● Normally in vanilla mysql,
○ each node has its own storage.
● Storage is shared across nodes
○ => a single undo log that is shared between writer & reader
● Imagine long running transactions
○ Based on isolation level
○ The undo log might keep growing …
○ Purging / garbage collection will not occur…
○ At some point, Paralysis due to overdue GC
56. Writes to Aurora & Cost
● Keep an eye on IOPS
○ IOPS ++ == $ ++
● Batch your writes if possible
● Compress your data before sending.
58. Monitoring
1. Never ignore mysql error logs. It might have something critical mentioned. Its
your best friend !
2. Can Enable slow query logs to keep track of slow running queries
3. Metrics
a. Recommend Percona PMM (available metrics are graphed nicely)
b. Buffer pool usage metrics
c. Undo log history growth / RollbackSegmentHistoryListLength metric
d. Insert latencies
e. IOPS usage
59. Aurora Parallel query
1. The only feature missing in other mysql variants.
2. Allows for parallelism in a select query
60. Aurora Parallel query
1. The only feature missing in other mysql variants.
2. Allows for parallelism in a select query
3. Bypasses the in-memory buffer pool doing table scans on disk. :)
61. Aurora Parallel query
1. The only feature missing in other mysql variants.
2. Allows for parallelism in a select query
3. Bypasses the in-memory buffer pool doing table scans on disk. :)
a. => IOPS => $ :)
62. Aurora Parallel query
1. The only feature missing in other mysql variants.
2. Allows for parallelism in a select query
3. Bypasses the in-memory buffer pool doing table scans on disk. :)
a. => IOPS => $ :)
4. Supposedly good for your reporting queries
63. Other helpful stuff
1. Use START TRANSACTION READ ONLY
; (less bookkeeping for readonly)
2. Run an explain on your query; be aware if index is used.
a. Explains are not always accurate
3. Show process list; (i used to kill long running transactions/ sleeping transactions - no
mercy :) )
4. Show engine innodb status;
5. You have an index on group by columns but order by columns not in index ?
6. Joining 2 tables - think of 2 for loops (keep outer for loop short)
7. Query Cache - apparently works well in aurora ! (discouraged in rds/mysql)
64. Finally ● Make 1 change at a time
○ Change
○ See effect
○ Make next change
65. Finally
● Make 1 change at a time
○ Change
○ See effect
○ Make next change
● Keep an eye on $ cost
66. Select QNS from you;
select Thank you from me;
Who am I ?
Ex Mysql Guy at Flipkart / Data guy at Trustana
linkedin.com/in/213vishnu/
mash213.wordpress.com/conferences/
https://twitter.com/sweetweet213