This presentation is from the Gophercon-India where we talked about how to design a concurrent high performance database client in go language. We talked about how we use goroutines and channels to our advantages. we also talked about how to use pools for efficient memory utilization.
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
In this webinar, Ivan K will compare the performance and features of InfluxDB and Elasticsearch for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. Come hear about how Ivan conducted his tests to determine which time-series db would best fit your needs. We will reserve 15 minutes at the end of the talk for you to ask Ivan directly about his test processes and independent viewpoint.
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
LINE is a messaging service with 200+ million active users. I will introduce why we feed 100+ billion daily messages into Kafka and how various systems such as data sync, abuse detection and analysis are depending on and leveraging it. It will be also introduced how we leverage dynamic tracing tools like SystemTap to inspect broker’s performance on production system, which led me to fix KAFKA-4614.
Presented by Yuto Kawamura, LINE Corporation
Systems Track
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
OpenTSDB continues to scale along with HBase. A number of updates have been implemented to push writes over 2 million data points a second. Here we will discuss about HBase schema improvements, including salting, random UI assignment, and using append operations instead of puts. You'll also get AsyncHBase development updates about rate limiting, statistics, and security.
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
In this webinar, Ivan K will compare the performance and features of InfluxDB and Elasticsearch for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. Come hear about how Ivan conducted his tests to determine which time-series db would best fit your needs. We will reserve 15 minutes at the end of the talk for you to ask Ivan directly about his test processes and independent viewpoint.
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
LINE is a messaging service with 200+ million active users. I will introduce why we feed 100+ billion daily messages into Kafka and how various systems such as data sync, abuse detection and analysis are depending on and leveraging it. It will be also introduced how we leverage dynamic tracing tools like SystemTap to inspect broker’s performance on production system, which led me to fix KAFKA-4614.
Presented by Yuto Kawamura, LINE Corporation
Systems Track
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
OpenTSDB continues to scale along with HBase. A number of updates have been implemented to push writes over 2 million data points a second. Here we will discuss about HBase schema improvements, including salting, random UI assignment, and using append operations instead of puts. You'll also get AsyncHBase development updates about rate limiting, statistics, and security.
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
This session describes how to get the most out of OpenStack Cinder volumes on Ceph.
We’ll discuss:
Performance configuration, tuning, and workloads.
Performance test results of Red Hat Enterprise Linux OpenStack Platform 5, Red Hat Enterprise Linux OpenStack Platform 6, Red Hat Ceph Storage 1.2.3, and Firefly.
Anticipated improvements in performance for Red Hat Ceph Storage 1.3.
The Cassandra architecture shines at ensuring a very high availability of data even while nodes are failing or are overloaded. On the other hand, query latency will often rise during these events, especially on the higher percentiles. Many improvements have been made to reduce this effect over the past years. This talk will focus on one in particular: Speculative Retries. Introduced in Cassandra 2.0 on the server side and in the Java Driver 3.0 on the client side, this strategy remains complex to fully understand and to finely tune. This talk will deep dive into theoretical and practical aspects of Speculative Retries, showing the effect of tuning strategies with ad-hoc benchmarks.
About the Speakers
Michael Figuiere Cloud Platform Engineer, Netflix
Michael is a senior software engineer at Netflix where he works on improving the cloud storage infrastructure. He previously worked at Apple and DataStax where he worked for several years on creating Drivers and Developer Tools for Cassandra. At ease with both enterprise applications and lower level technologies, he specializes in distributed architectures and topics such as databases, search engines, and cloud.
Minh Do Senior Distributed Engineer, Netflix
Minh Do has been working at Netflix for the last several years to run, patch, and troubleshoot Cassandra on both server and client sides, and is also a co-creator of Dynomite project. Prior to Netflix, at Tango, he spearheaded its Big Data pipeline system from the ground using Spark/Hadoop. Before that, at Qualys, he built a distributed queue system that bridges traffics between all major components. He has passion in distributed system, machine learning/deep learning, and data storages.
Discussion about the evolution of metrics in Cassandra from 1.0 to 3.0, how the metric changes impact operational tooling, pros and cons for different metric representations, and how and why DataStax OpsCenter collects and stores metrics. Includes a deep dive on how DataStax OpsCenter represents and stores the different kinds of metrics to provide visibility beyond simple cluster averages both behind the scenes and in the rendering.
About the Speaker
Chris Lohfink Software Engineer, DataStax
I am a Java, Python, and Clojure developer who has been using Cassandra in an application development and operational context for the last five years. The last nearly two years I have been working with the OpsCenter Monitoring team at DataStax to improve the accuracy and breadth of the visualization tooling available.
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
Infrastructure failures are a given in the cloud, but in a multi-tenant environment separating those failures from usage can be a challenge. I'll be presenting data gathered from over a hundred region server failures at HubSpot along with what we've done to improve our MTTR and what we're contributing back to the community. Covered topics will include separating usage-related failures from infrastructure and hardware failures, as well as steps we've taken to improve MTTR in both scenarios.
This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Latency is one of the most common Service Level Indicators (SLI), but where should it be measured from?
There are three main ways to measure latency:
•Server-side latency: Precise and high cardinality but missing the big picture
•Client-side latency: Big picture but noisy
•Blackbox monitoring latency: Good trade-off between the other two
In this talk, we will dive deeper into each perspective and how all of them can be leveraged. We will use Criteo’s large-scale key/value infrastructure as a case study
An overview of Cassandra drivers for Java, Ruby, Python with tips and tricks for getting the most performance from Cassandra. Tune your application for low latency or high throughput.
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
This session describes how to get the most out of OpenStack Cinder volumes on Ceph.
We’ll discuss:
Performance configuration, tuning, and workloads.
Performance test results of Red Hat Enterprise Linux OpenStack Platform 5, Red Hat Enterprise Linux OpenStack Platform 6, Red Hat Ceph Storage 1.2.3, and Firefly.
Anticipated improvements in performance for Red Hat Ceph Storage 1.3.
The Cassandra architecture shines at ensuring a very high availability of data even while nodes are failing or are overloaded. On the other hand, query latency will often rise during these events, especially on the higher percentiles. Many improvements have been made to reduce this effect over the past years. This talk will focus on one in particular: Speculative Retries. Introduced in Cassandra 2.0 on the server side and in the Java Driver 3.0 on the client side, this strategy remains complex to fully understand and to finely tune. This talk will deep dive into theoretical and practical aspects of Speculative Retries, showing the effect of tuning strategies with ad-hoc benchmarks.
About the Speakers
Michael Figuiere Cloud Platform Engineer, Netflix
Michael is a senior software engineer at Netflix where he works on improving the cloud storage infrastructure. He previously worked at Apple and DataStax where he worked for several years on creating Drivers and Developer Tools for Cassandra. At ease with both enterprise applications and lower level technologies, he specializes in distributed architectures and topics such as databases, search engines, and cloud.
Minh Do Senior Distributed Engineer, Netflix
Minh Do has been working at Netflix for the last several years to run, patch, and troubleshoot Cassandra on both server and client sides, and is also a co-creator of Dynomite project. Prior to Netflix, at Tango, he spearheaded its Big Data pipeline system from the ground using Spark/Hadoop. Before that, at Qualys, he built a distributed queue system that bridges traffics between all major components. He has passion in distributed system, machine learning/deep learning, and data storages.
Discussion about the evolution of metrics in Cassandra from 1.0 to 3.0, how the metric changes impact operational tooling, pros and cons for different metric representations, and how and why DataStax OpsCenter collects and stores metrics. Includes a deep dive on how DataStax OpsCenter represents and stores the different kinds of metrics to provide visibility beyond simple cluster averages both behind the scenes and in the rendering.
About the Speaker
Chris Lohfink Software Engineer, DataStax
I am a Java, Python, and Clojure developer who has been using Cassandra in an application development and operational context for the last five years. The last nearly two years I have been working with the OpsCenter Monitoring team at DataStax to improve the accuracy and breadth of the visualization tooling available.
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
Infrastructure failures are a given in the cloud, but in a multi-tenant environment separating those failures from usage can be a challenge. I'll be presenting data gathered from over a hundred region server failures at HubSpot along with what we've done to improve our MTTR and what we're contributing back to the community. Covered topics will include separating usage-related failures from infrastructure and hardware failures, as well as steps we've taken to improve MTTR in both scenarios.
This talk was given during Lucene Revolution 2017 and has two goals: first, to discuss the tradeoffs for running Solr on Docker. For example, you get dynamic allocation of operating system caches, but you also get some CPU overhead. We'll keep in mind that Solr nodes tend to be different than your average container: Solr is usually long running, takes quite some RSS and a lot of virtual memory. This will imply, for example, that it makes more sense to use Docker on big physical boxes than on configurable-size VMs (like Amazon EC2).
The second goal is to discuss issues with deploying Solr on Docker and how to work around them. For example, many older (and some of the newer) combinations of Docker, Linux Kernel and JVM have memory leaks. We'll go over Docker operations best practices, such as using container limits to cap memory usage and prevent the host OOM killer from terminating a memory-consuming process - usually a Solr node. Or running Docker in Swarm mode over multiple smaller boxes to limit the spread of a single issue.
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Chris Larsen (Yahoo!)
This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Latency is one of the most common Service Level Indicators (SLI), but where should it be measured from?
There are three main ways to measure latency:
•Server-side latency: Precise and high cardinality but missing the big picture
•Client-side latency: Big picture but noisy
•Blackbox monitoring latency: Good trade-off between the other two
In this talk, we will dive deeper into each perspective and how all of them can be leveraged. We will use Criteo’s large-scale key/value infrastructure as a case study
An overview of Cassandra drivers for Java, Ruby, Python with tips and tricks for getting the most performance from Cassandra. Tune your application for low latency or high throughput.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VfmmLC.
David Riddoch talks about the technologies that make very high performance networking possible on commodity servers and networks, with a special focus on kernel bypass technologies including sockets acceleration and NFV. These techniques give user-space applications direct access the network adapter hardware, making possible sub-microsecond latencies and millions of messages per second per thread. Filmed at qconlondon.com.
David Riddoch leads the development of the Solarflare Open IP stack.
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
In the financial industry, losing data is unacceptable. Financial firms are adopting Kafka for their critical applications. Kafka provides the low latency, high throughput, high availability, and scale that these applications require. But can it also provide complete reliability? As a system architect, when asked “Can you guarantee that we will always get every transaction,” you want to be able to say “Yes” with total confidence.
In this session, we will go over everything that happens to a message – from producer to consumer, and pinpoint all the places where data can be lost – if you are not careful. You will learn how developers and operation teams can work together to build a bulletproof data pipeline with Kafka. And if you need proof that you built a reliable system – we’ll show you how you can build the system to prove this too.
Using Compuware Strobe to Save CPU: 4 Real-life Cases from the Files of CPT G...Compuware
See four real-life examples of how CPT Global--a worldwide IT consulting services firm specializing in capacity planning, performance tuning and testing--uses Compuware Strobe for application performance monitoring and analysis to help customers save CPU by identifying inefficiencies in:
• COBOL code
• VSAM access
• A vendor product
• CICS system settings
XDP in Practice: DDoS Mitigation @CloudflareC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2NtlaER.
Gilberto Bertin discusses the architecture of Cloudflare’s automatic DDoS mitigation pipeline, the initial packet filtering solution based on Iptables, and why Cloudflare had to introduce userspace offload. Bertin also describes how they switched from a proprietary offload technology to XDP for network stack bypass and how they are using XDP to load balance traffic. Filmed at qconlondon.com.
Gilberto Bertin works as a System Engineer at Cloudflare London. After working on variety of technologies like P2P VPNs and userspace TCP/IP stacks, he joined the Cloudflare DDoS team in London to help filter all the bad internet traffic.
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
NDIV is a young, very simple, yet efficient network traffic diverter. Its purpose is to help build network applications that intercept packets at line rate with a very low processing overhead. A first example application is a stateless HTTP server reaching line rate on all packet sizes.
Willy Tarreau, HaproxyTech
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
This presentation shows that code coverage guided fuzzing is possible in the context of network daemon fuzzing.
Some fuzzers are blackbox while others are protocol aware. Even ones which are made protocol aware, fuzzer writers typically model the protocol specification and implement packet awareness logic in the fuzzer. Unfortunately, just because the fuzzer is protocol aware, it does not guarantee that sufficient code paths have been reached.
The presentation deals with specific scenarios where the target protocol is completely unknown (proprietary) and no source code or protocol specs are accessible. The tool developed builds a feedback loop between the client and the server components using the concept of "gate functions". A gate function triggers monitoring. The pintool component tracks the binary code coverage for all the functions untill it reaches an exit gate. By instrumenting such gated functions, the tool is able to measure code coverage during packet processing.
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
By Rajiv Kurian, software engineer at SignalFx.
At SignalFx, we deal with high-volume high-resolution data from our users. This requires a high performance ingest pipeline. Over time we’ve found that we needed to adapt architectural principles from specialized fields such as HPC to get beyond performance plateaus encountered with more generic approaches. Some key examples include:
* Write very simple single threaded code, instead of complex algorithms
* Parallelize by running multiple copies of simple single threaded code, instead of using concurrent algorithms
* Separate the data plane from the control plane, instead of slowing data for control
* Write compact, array-based data structures with minimal indirection, instead of pointer-based data structures and uncontrolled allocation
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
In this InfluxDays NYC 2019 presentation, InfluxData VP of Products Tim Hall and Sales Engineer Sam Dillard discuss architecture patterns with InfluxEnterprise time series platform. They cover an overview of InfluxEnterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice. Presentation highlights include InfluxEnterprise cluster architecture and how to determine if you're ready for adopting InfluxEnterprise.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
12. Aerospike
Database Nodes
Application
1) No Hotspots
– DHT simplifies data
partitioning
2) Smart Client – 1 hop to data,
no load balancers needed
3) Shared Nothing Architecture,
every node identical - no
coordinators needed
7) XDR – sync’d replication across
data centers ensures Zero Downtime
8) Scale linearly as data-sizes and
workloads increase
9) Add capacity with no service
interruption
4) Single row ACID
– sync’d replication in cluster
5) Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rack aware, rolling upgrades...
6) Transactions and long running
tasks prioritized real-time
Benefits of Design:
13. Client Libraries Need To Do More - TRANSPARENTLY
■ Implements Aerospike API
■ Optimistic row locking
■ Optimized binary protocol
■ Cluster Tracking
■ Learns about cluster changes,
Partition Map
■ Transaction Semantics
■ Global Transaction ID
■ Retransmit and timeout
■ Linearly Scales
■ No extra hop to data
■ No load balancers in the way
15. Database Nodes
Application
Cluster Tracking
Data is distributed evenly across nodes in a
cluster using the Aerospike Smart Partitions™
algorithm.
■ Even distribution of
■Partitions across nodes
■Records across Partitions
■Data across Flash device
? Need to know Where each key is!
16. func NewCluster(policy *ClientPolicy, hosts []*Host) (*Cluster, error) {
newCluster := &Cluster{...}
// start up cluster maintenance go routine
newCluster.wgTend.Add(1)
go newCluster.clusterBoss(policy)
return newCluster, nil
}
func (clstr *Cluster) clusterBoss(policy *ClientPolicy) {
defer clstr.wgTend.Done()
Loop:
for {
select {
case <-clstr.tendChannel:
// tend channel closed
break Loop
case <-time.After(tendInterval):
if err := clstr.tend(); err != nil {
Logger.Warn(err.Error())
}
}
}
// cleanup code goes here
// close the nodes, ...
}
func (clstr *Cluster) Close() {
if !clstr.closed.Get() {
// send close signal to maintenance channel
close(clstr.tendChannel)
// wait until tend is over
clstr.wgTend.Wait()
}
}
Cluster Tracking
Spawn Goroutine To
Track Cluster Changes
{
17. func NewCluster(policy *ClientPolicy, hosts []*Host) (*Cluster, error) {
newCluster := &Cluster{...}
// start up cluster maintenance go routine
newCluster.wgTend.Add(1)
go newCluster.clusterBoss(policy)
return newCluster, nil
}
func (clstr *Cluster) clusterBoss(policy *ClientPolicy) {
defer clstr.wgTend.Done()
Loop:
for {
select {
case <-clstr.tendChannel:
// tend channel closed
break Loop
case <-time.After(tendInterval):
if err := clstr.tend(); err != nil {
Logger.Warn(err.Error())
}
}
}
// cleanup code goes here
// close the nodes, ...
}
func (clstr *Cluster) Close() {
if !clstr.closed.Get() {
// send close signal to maintenance channel
close(clstr.tendChannel)
// wait until tend is over
clstr.wgTend.Wait()
}
}
Cluster Tracking
On Intervals, Update
Cluster Status
{
18. func NewCluster(policy *ClientPolicy, hosts []*Host) (*Cluster, error) {
newCluster := &Cluster{...}
// start up cluster maintenance go routine
newCluster.wgTend.Add(1)
go newCluster.clusterBoss(policy)
return newCluster, nil
}
func (clstr *Cluster) clusterBoss(policy *ClientPolicy) {
defer clstr.wgTend.Done()
Loop:
for {
select {
case <-clstr.tendChannel:
// tend channel closed
break Loop
case <-time.After(tendInterval):
if err := clstr.tend(); err != nil {
Logger.Warn(err.Error())
}
}
}
// cleanup code goes here
// close the nodes, ...
}
func (clstr *Cluster) Close() {
if !clstr.closed.Get() {
// send close signal to maintenance channel
close(clstr.tendChannel)
// wait until tend is over
clstr.wgTend.Wait()
}
}
Cluster Tracking
Broadcast Closing Of
Cluster
{
19. func NewCluster(policy *ClientPolicy, hosts []*Host) (*Cluster, error) {
newCluster := &Cluster{...}
// start up cluster maintenance go routine
newCluster.wgTend.Add(1)
go newCluster.clusterBoss(policy)
return newCluster, nil
}
func (clstr *Cluster) clusterBoss(policy *ClientPolicy) {
defer clstr.wgTend.Done()
Loop:
for {
select {
case <-clstr.tendChannel:
// tend channel closed
break Loop
case <-time.After(tendInterval):
if err := clstr.tend(); err != nil {
Logger.Warn(err.Error())
}
}
}
// cleanup code goes here
// close the nodes, ...
}
func (clstr *Cluster) Close() {
if !clstr.closed.Get() {
// send close signal to maintenance channel
close(clstr.tendChannel)
// wait until tend is over
clstr.wgTend.Wait()
}
}
Cluster Tracking
Break the loop to clean up
{
25. 7
{
Defer on all producers will ensure Close() is called at least once
26. Queries
Consumer
Producer #N
{Will not deadlock, all
producers will return
{
The Above code will never panic, or block indefinitely:
• defer on Execute() will ensure Close() is called at least once
• When cancelled channel is closed, even if the Records channel is
full and blocked, select will go through and return from goroutine
27. Queries
The Above code will never panic, or block indefinitely:
• defer on Execute() will ensure Close() is called at least once
• When cancelled channel is closed, even if the Records channel is
full and blocked, select will go through and return from goroutine
Consumer
Producer #N
{Will not panic, all
producers have already
stopped
29. Take away
• Channels are a good abstraction for data streaming
• Easier to avoid deadlocks
• Use the least number of channels and goroutines
• Multiplex on demand
• Goroutines are cheap, but not free
• Pass channels throughout API, don’t ask for new ones
31. Don’t Create Garbage
• Avoid Memory Allocations
• Might be trickier than you think!
• Pool All Objects
• Buffers
• Hash Objects
• Network Connections
• Reusable Intermediate / Temporary Objects
• Random Number Generators
• sync.Pool is not ideal to pool long living objects
• Integrated into GC, sweeps and deallocates objects too often
• LIFO pools tend to do better regarding CPU cache
• Channels for pool implementation still perform well though
32. Don’t Reallocate Memory
• Avoid Memory Allocations
• Might be trickier than you think!
• All following code snippets allocate memory:
b := new(bytes.Buffer)
longBuf := bytes.Repeat([]byte{32}, 100)
// Allocation happens in Write to Grow the buffer
b.Write([]byte{32})
// Re-allocation happens in Write to Grow the buffer
b.Write(longBuf)
// allocation happens inside Trim
strings.Trim(" Trim this string ", " ")
h := ripemd160.New()
h.Write(data)
// allocation happens inside Sum()
h.Sum(nil)
// Why? To support the general case
func (d *digest) Sum(in []byte) []byte {
// Make a copy of d0 so that caller can keep writing and summing.
d := *d0
35. Pools
• Make Pools Deterministic
• Let User Decide About Parameters Per Use-Case
• Document The Above!
type BufferPool struct {
pool [][]byte
poolSize int
pos int64
maxBufSize int
initBufSize int
mutex sync.Mutex
}
func (bp *BufferPool) Put(buf []byte) {
if len(buf) <= bp.maxBufSize {
...
}
// buffer too big to go back into the pool
}
// a custom buffer pool with fine grained control over its contents
// maxSize: 128KiB
// initial bufferSize: 16 KiB
// maximum buffer size to keep in the pool: 128K
var bufPool = NewBufferPool(512, 16*1024, 128*1024)
// SetCommandBufferPool can be used to customize the command Buffer Pool parameters to calib
// the pool for different workloads
func SetCommandBufferPool(poolSize, initBufSize, maxBufferSize int) {
bufPool = NewBufferPool(poolSize, initBufSize, maxBufferSize)
}
Set & Enforce Limits for
Pooled Buffers
{
{
Set Sensible Defaults
Let Users Tweak Per
Use-Case
39. Comparison vs C and Java
100% WRITE 100% READ
50% READ
50% WRITE
C 261K 308K 273K
JAVA 180K 280K 220K
GO 306K 238K 276K
Data Type: String of size 2000
Fastest Slowest
Transactions Per Second
40. Comparison vs C and Java
100% WRITE 100% READ
50% READ
50% WRITE
C 299K 334K 307K
JAVA 308K 339K 325K
GO 338K 328K 330K
Data Type: Integer (8 Bytes)
Fastest Slowest
Transactions Per Second