Chicago Data Summit: Flume: An IntroductionCloudera, Inc.
Flume is an open-source, distributed, streaming log collection system designed for ingesting large quantities of data into large-scale data storage and analytics platforms such as Apache Hadoop. It has four goals in mind: Reliability, Scalability, Extensibility, and Manageability. Its horizontal scalable architecture offers fault-tolerant end-to-end delivery guarantees, support for low-latency event processing, provides a centralized management interface , and exposes metrics for ingest monitoring and reporting. It natively supports writing data to Hadoop's HDFS but also has a simple extension interface that allows it to write to other scalable data systems such as low-latency datastores or incremental search indexers.
This presentation describes Flume, a distributed log collection system for shipping data to frameworks such as Hadoop and HBase. It provides an overview and describes updates and emerging stories from the community since its open source release. These are the slides from the 2/18/11 Austin, TX HUG.
During the talk, we will build a simple web app using Lift and then introduce Akka ( http://akkasource.org) to help scale it. Specifically, we will demonstrate Remote Actors, "Let it crash" fail over, and Dispatcher. Other Scala oriented tools we will use include sbt and ENSIME mode for emacs.
Chicago Data Summit: Flume: An IntroductionCloudera, Inc.
Flume is an open-source, distributed, streaming log collection system designed for ingesting large quantities of data into large-scale data storage and analytics platforms such as Apache Hadoop. It has four goals in mind: Reliability, Scalability, Extensibility, and Manageability. Its horizontal scalable architecture offers fault-tolerant end-to-end delivery guarantees, support for low-latency event processing, provides a centralized management interface , and exposes metrics for ingest monitoring and reporting. It natively supports writing data to Hadoop's HDFS but also has a simple extension interface that allows it to write to other scalable data systems such as low-latency datastores or incremental search indexers.
This presentation describes Flume, a distributed log collection system for shipping data to frameworks such as Hadoop and HBase. It provides an overview and describes updates and emerging stories from the community since its open source release. These are the slides from the 2/18/11 Austin, TX HUG.
During the talk, we will build a simple web app using Lift and then introduce Akka ( http://akkasource.org) to help scale it. Specifically, we will demonstrate Remote Actors, "Let it crash" fail over, and Dispatcher. Other Scala oriented tools we will use include sbt and ENSIME mode for emacs.
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming ReplicationLinas Virbalas
Hot standby and streaming replication move the needle forward for high availability and scaling for a wide number of applications. Continuent Tungsten has earlier supported PostgreSQL clustering using warm standby. In this talk we will describe how to build clusters using the PostgreSQL 9 features and give our report from the trenches.
This talk will cover how PostgreSQL 9 hot standby and streaming replication work from a user perspective, then dive into a description of how Tungsten really makes them to shine.
We'll cover the following issues:
* Configuration of warm standby and streaming replication
* Provisioning new standby instances
* Strategies for balancing reads across primary and standby database
* Managing failover
Please join us for an enlightening presentation a set of PostgreSQL 9 features that are interesting to a wide range of PostgreSQL users.
Living the Easy Life with Rules-Based Autonomic Database ClustersLinas Virbalas
Our business at Continuent is development of database clusters with highly simplified management and the ability to be operated unattended for prolonged periods of time. As part of our Tungsten Clustering product we developed an extensible, low-latency, fault-tolerant management framework for database clusters built around a core of group communications and business rules. We have found that our system is easy to maintain and to extend. For example, an extension to switch virtual IP addresses in the event of a database node failure was implemented in an afternoon as a set of two rules and a single bash script. In our talk we will cover the following:
* Basic architecture of a rules-based management framework for databases
* Introduction to business rules, with code examples showing how they work to repair problems ranging from simple process failures to network partitions
* A quick demo of business rules in operation.
* Finally, some thoughts on the benefits of the approach and our experiences (good and bad) with autonomic management of database clusters.
This is an approach to management that we believe will be of interest to anyone who cares about keeping important data highly available as well anyone interested in learning about rules technology.
Apache MXNet for IoT with Apache NiFi. Using Apache MXNet with Apache NiFi and MiniFi for IoT use cases. Ingesting, managing, orchestration and running IoT workloads.
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
A hands-on deep dive on using Apachee MiniFi with Apache MXNet on the edge device including Raspberry Pi with Movidius and NVidia Jetson TX1. We run deep learning models on the edge device and send images, sensor data and deep learning results if values exceed norms. Using S2S data is sent to NiFi for further processing, additional deep learning processing, data augmentation. A stream of data is landed as ORC files in HDFS with Hive tables on-top.
Processed data in AVRO format with a schema stored in Schema Registry. Visualization is shown in Zeppelin.
Use Cases: Security Camera Monitoring, Utility Asset Anomaly Detection, Temperature and Humdiity filtering for devices.
This talk builds on several existing articles I have written:
https://community.hortonworks.com/articles/142686/real-time-ingesting-and-transforming-sensor-and-so.html
https://community.hortonworks.com/articles/121916/controlling-big-data-flows-with-gestures-minifi-ni.html
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html
https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
https://community.hortonworks.com/articles/136039/integrating-nvidia-jetson-tx1-running-tensorrt-int-3.html
https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
Building Data Pipelines for Solr with Apache NiFiBryan Bende
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic. Some of NiFi's key features include a web-based user interface for monitoring and controlling data flows, guaranteed delivery, data provenance, and easy extensibility through custom processor development.
These features make NiFi a perfect candidate for building production quality data pipelines that interact with Apache Solr. This talk will demonstrate how to use a NiFi processor that delivers data to a Solr update handler, as well as a processor for extracting data from Solr on regular intervals for delivery to down-stream systems. In addition we will show how these processors can be combined with other built-in NiFi processors to solve a variety of use cases, including log aggregation, and indexing messages received from Kafka.
Data-Center Replication with Apache AccumuloJosh Elser
Talk given at Accumulo Summit 2014 in Hyattsville, MD.
Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency and no administrator intervention. This talk will outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outline future work in the area.
Accumulo Summit 2014: Data-Center Replication with Apache AccumuloAccumulo Summit
Speaker: Josh Elser
Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency. The only options currently available involve quiescing a table, exporting that table, copying it to the remote instance and importing it. This is unacceptable for a few reasons, the most important of these reasons being the require unavailability to export the given table. This talkwill outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outlines future work in the area.
If you want a 10,000 ft overview of the concept of flow analysis, take a look at these 6 slides. It discusses the concept of a “flow”, the role of exporters and collectors, and the characteristics of various flow formats.
5 things cucumber is bad at by Richard LawrenceSkills Matter
This talk will look at 5 things Cucumber’s bad at, why that’s a good thing, and what it tells us about Cucumber’s sweet spot in a team’s toolkit.
Many times, when people complain about something Cucumber’s not good at, they’re unwittingly describing something Cucumber shouldn't be good at. They’re revealing that they don’t quite understand BDD and Cucumber’s role in it.
Cucumber is the world's most misunderstood collaboration tool and people need to hear this over and over again.
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming ReplicationLinas Virbalas
Hot standby and streaming replication move the needle forward for high availability and scaling for a wide number of applications. Continuent Tungsten has earlier supported PostgreSQL clustering using warm standby. In this talk we will describe how to build clusters using the PostgreSQL 9 features and give our report from the trenches.
This talk will cover how PostgreSQL 9 hot standby and streaming replication work from a user perspective, then dive into a description of how Tungsten really makes them to shine.
We'll cover the following issues:
* Configuration of warm standby and streaming replication
* Provisioning new standby instances
* Strategies for balancing reads across primary and standby database
* Managing failover
Please join us for an enlightening presentation a set of PostgreSQL 9 features that are interesting to a wide range of PostgreSQL users.
Living the Easy Life with Rules-Based Autonomic Database ClustersLinas Virbalas
Our business at Continuent is development of database clusters with highly simplified management and the ability to be operated unattended for prolonged periods of time. As part of our Tungsten Clustering product we developed an extensible, low-latency, fault-tolerant management framework for database clusters built around a core of group communications and business rules. We have found that our system is easy to maintain and to extend. For example, an extension to switch virtual IP addresses in the event of a database node failure was implemented in an afternoon as a set of two rules and a single bash script. In our talk we will cover the following:
* Basic architecture of a rules-based management framework for databases
* Introduction to business rules, with code examples showing how they work to repair problems ranging from simple process failures to network partitions
* A quick demo of business rules in operation.
* Finally, some thoughts on the benefits of the approach and our experiences (good and bad) with autonomic management of database clusters.
This is an approach to management that we believe will be of interest to anyone who cares about keeping important data highly available as well anyone interested in learning about rules technology.
Apache MXNet for IoT with Apache NiFi. Using Apache MXNet with Apache NiFi and MiniFi for IoT use cases. Ingesting, managing, orchestration and running IoT workloads.
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
A hands-on deep dive on using Apachee MiniFi with Apache MXNet on the edge device including Raspberry Pi with Movidius and NVidia Jetson TX1. We run deep learning models on the edge device and send images, sensor data and deep learning results if values exceed norms. Using S2S data is sent to NiFi for further processing, additional deep learning processing, data augmentation. A stream of data is landed as ORC files in HDFS with Hive tables on-top.
Processed data in AVRO format with a schema stored in Schema Registry. Visualization is shown in Zeppelin.
Use Cases: Security Camera Monitoring, Utility Asset Anomaly Detection, Temperature and Humdiity filtering for devices.
This talk builds on several existing articles I have written:
https://community.hortonworks.com/articles/142686/real-time-ingesting-and-transforming-sensor-and-so.html
https://community.hortonworks.com/articles/121916/controlling-big-data-flows-with-gestures-minifi-ni.html
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html
https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
https://community.hortonworks.com/articles/136039/integrating-nvidia-jetson-tx1-running-tensorrt-int-3.html
https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
Building Data Pipelines for Solr with Apache NiFiBryan Bende
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic. Some of NiFi's key features include a web-based user interface for monitoring and controlling data flows, guaranteed delivery, data provenance, and easy extensibility through custom processor development.
These features make NiFi a perfect candidate for building production quality data pipelines that interact with Apache Solr. This talk will demonstrate how to use a NiFi processor that delivers data to a Solr update handler, as well as a processor for extracting data from Solr on regular intervals for delivery to down-stream systems. In addition we will show how these processors can be combined with other built-in NiFi processors to solve a variety of use cases, including log aggregation, and indexing messages received from Kafka.
Data-Center Replication with Apache AccumuloJosh Elser
Talk given at Accumulo Summit 2014 in Hyattsville, MD.
Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency and no administrator intervention. This talk will outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outline future work in the area.
Accumulo Summit 2014: Data-Center Replication with Apache AccumuloAccumulo Summit
Speaker: Josh Elser
Apache Accumulo presently lacks the ability to automatically replicate its contents to another Accumulo instance with low latency. The only options currently available involve quiescing a table, exporting that table, copying it to the remote instance and importing it. This is unacceptable for a few reasons, the most important of these reasons being the require unavailability to export the given table. This talkwill outline the problems in designing a low-latency replication system for Accumulo tables, describe an implementation that leverages some useful features of Accumulo, and outlines future work in the area.
If you want a 10,000 ft overview of the concept of flow analysis, take a look at these 6 slides. It discusses the concept of a “flow”, the role of exporters and collectors, and the characteristics of various flow formats.
5 things cucumber is bad at by Richard LawrenceSkills Matter
This talk will look at 5 things Cucumber’s bad at, why that’s a good thing, and what it tells us about Cucumber’s sweet spot in a team’s toolkit.
Many times, when people complain about something Cucumber’s not good at, they’re unwittingly describing something Cucumber shouldn't be good at. They’re revealing that they don’t quite understand BDD and Cucumber’s role in it.
Cucumber is the world's most misunderstood collaboration tool and people need to hear this over and over again.
Patterns for slick database applicationsSkills Matter
Slick is Typesafe's open source database access library for Scala. It features a collection-style API, compact syntax, type-safe, compositional queries and explicit execution control. Community feedback helped us to identify common problems developers are facing when writing Slick applications. This talk suggests particular solutions to these problems. We will be looking at reducing boiler-plate, re-using code between queries, efficiently modeling object references and more.
Scala e xchange 2013 haoyi li on metascala a tiny diy jvmSkills Matter
Metascala is a tiny metacircular Java Virtual Machine (JVM) written in the Scala programming language. Metascala is barely 3000 lines of Scala, and is complete enough that it is able to interpret itself metacircularly. Being written in Scala and compiled to Java bytecode, the Metascala JVM requires a host JVM in order to run.
The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ which make up HotSpot, the standard implementation, and more amenable to implementing fun features like continuations, isolates or value classes. The 3000 lines of code gives you:
The bytecode interpreter, together with all the run-time data structures
A stack-machine to SSA register-machine bytecode translator
A custom heap, complete with a stop-the-world, copying garbage collector
Implementations of parts of the JVM's native interface
Although it is far from a complete implementation, Metascala already provides the ability to run untrusted bytecode securely (albeit slowly), since every operation which could potentially cause harm (including memory allocations and CPU usage) is virtualized and can be controlled. Ongoing work includes tightening of the security guarantees, improving compatibility and increasing performance.
ENJOYIN
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...Skills Matter
Code Quotations: Code-as-Data for F#
This tutorial will cover F# Code Quotations in-depth. You'll learn what Code Quotations are, how to use them, and where to apply them in your applications. We'll work through several real-world examples to highlight the important features -- and potential pitfalls -- of Code Quotations.
Cukeup nyc ian dees on elixir, erlang, and cucumberlSkills Matter
Elixir, Erlang, and Cucumberl
Elixir is a new Ruby-inspired programming language that uses the powerful concurrent machinery of Erlang behind the scenes. Cucumberl is a port of Cucumber to Erlang. Let's see what happens when we put them together.
In this talk, we'll discuss:
How Erlang's concurrency makes it easier to write robust programs
Elixir's approachable syntax
How to test Erlang and Elixir programs using Cucumberl
Attendees will walk away with a solid introduction to the principles of Erlang, and an appreciation of the way Elixir brings the joy of Ruby to the solidity of the Erlang runtime.
Cukeup nyc peter bell on getting started with cucumber.jsSkills Matter
Cukeup NYC. Peter Bell on Getting started with cucumber.js
Ever wished you could use cucumber in your javascript apps? In this talk we'll look at the current state of play of cucumber js, when you should and shouldn't use it, and how to get started writing your step definitions in javascript.
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...Skills Matter
In this engaging experience report, we will present 3 different views – Developer, Tester, Business Analyst – of implementing Acceptance Test Driven Development in a complex, data-driven domain. Hear how we used ATDD for building a ubiquitous language across the entire team, promoting faster feedback, and cultivating a culture where product owners were deeply invested in the quality of both every deliverable and the system as a whole.
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...Skills Matter
In this tutorial, Phil and Rachel will introduce you to the Try F# samples giving you exposure to, and an understanding of, how F# tackles some real-world scenarios. We'll help you explore, generate, and just play around with code samples, as well as talk you through some of the key principles of F#. By the end of this session, you'll have gone from zero to data science in only a few hours!
Progressive f# tutorials nyc don syme on keynote f# in the open source worldSkills Matter
F# is a powerful open-source language which Microsoft, other companies and the F# community all contribute to. In this talk, Don will discuss how the “F# space” has recently opened up significantly in interesting ways. F# now includes contributions that range from Cloud IDE platforms, Cloud Compute frameworks, Data interoperability components, Cross-platform execution, Try F#, MonoDevelop, and even Emacs editor integration with surprising tooling support, as well as the Visual F# tools from Microsoft and the broader NuGet package ecosystem. Don will also talk about some of the latest contributions from Microsoft Research, including new type provider components for F#, and describe how his team work with the Visual F# team and other teams around Microsoft. There will also be demos of some fun new stuff that’s been going on with F# at MSR and the community.
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...Skills Matter
Would you like to learn how to make your software testing practices more effective? And how to use your testing strategy to better capture and reflect customer requirements? Gojko Adzic takes a critical look at the effectiveness of current software testing practices and proposes strategies to make it much more effective.
Dmitry mozorov on code quotations code as-data for f#Skills Matter
Code Quotations: Code-as-Data for F#
This tutorial will cover F# Code Quotations in-depth. You'll learn what Code Quotations are, how to use them, and where to apply them in your applications. We'll work through several real-world examples to highlight the important features -- and potential pitfalls -- of Code Quotations.
Simon Peyton Jones: Managing parallelismSkills Matter
If you want to program a parallel computer, it obviously makes sense to start with a computational paradigm in which parallelism is the default (ie functional programming), rather than one in which computation is based on sequential flow of control (the imperative paradigm). And yet, and yet ... functional programmers have been singing this tune since the 1980s, but do not yet rule the world. In this talk I’ll say why I think parallelism is too complex a beast to be slain at one blow, and how we are going to be driven, willy-nilly, towards a world in which side effects are much more tightly controlled than now. I’ll sketch a whole range of ways of writing parallel program in a functional paradigm (implicit parallelism, transactional memory, data parallelism, DSLs for GPUs, distributed processes, etc, etc), illustrating with examples from the rapidly moving Haskell community, and identifying some of the challenges we need to tackle.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. Scenario
• Situa,on:
– You
have
hundreds
of
services
producing
logs
– You’re
running
a
daily
cron
job
on
the
logs
• Rota,ng
the
logs
• Maybe
compressing
or
otherwise
processing
them
• Transferring
them
to
HDFS
(the
Hadoop
Distributed
File
System)
• Problem:
– As
the
amount
of
data
increases,
it
takes
longer
and
longer
to
run
the
cron
job
7/15/2010 2
3. You
need
a
“Flume”
• Flume
is
a
distributed
system
that
gets
your
logs
from
their
source
and
aggregates
them
to
where
you
want
to
process
them
• Open
source,
Apache
v2.0
License
• Goals:
– Reliability
– Scalability
– Extensibility
– Manageability
Columbia Gorge, Broughton Log Flume
7/15/2010 3
4. Use
cases
• Collec,ng
logs
from
nodes
in
your
Hadoop
cluster
• Collec,ng
logs
from
services
such
as
hUpd,
mail,
etc.
• Collec,ng
impressions
from
custom
apps
for
an
ad
network
• But
wait,
there’s
more!
It’s log, log ... Everyone wants a log!
– Basic
online
in-‐stream
analysis
– Online
in-‐stream
file
processing
and
manipula,on
7/15/2010 4
5. Key
abstrac,ons
• Data
path
and
control
path
Agent
• Nodes
are
in
the
data
path
– Nodes
have
a
source
and
a
sink
Collector
– They
can
take
different
roles
• A
typical
topology
has
agent
nodes
and
collector
nodes
• Op,onally
it
has
processor
nodes
• Masters
are
in
the
control
path
Master
– Centralized
point
of
configura,on
– Specify
sources
and
sinks
– Can
control
flows
of
data
between
nodes
– Use
one
master
or
use
many
with
a
ZooKeeper-‐backed
quorum
7/15/2010 5
8. Outline
• What
is
Flume?
– Goals
and
architecture
• Reliability
– Fault-‐tolerance
and
High
availability
• Scalability
– Horizontal
scalability
of
all
nodes
and
masters
• Extensibility
– Unix
principle,
all
kinds
of
data,
all
kinds
of
sources,
all
kinds
of
sinks
• Manageability
– Centralized
management
suppor,ng
dynamic
reconfigura,on
7/15/2010 8
9. RELIABILITY
The logs will still get there…
7/15/2010 9
10. Tunable
data
reliability
levels
• Best
effort
– Fire
and
forget
Agent Collector HDFS
• Store
on
failure
+
retry
– Local
acks,
local
errors
detectable
Agent Collector HDFS
– Failover
when
faults
detected
• End-‐to-‐end
reliability
– End
to
end
acks
Agent Collector HDFS
– Data
survives
compound
failures,
and
may
be
retried
mul,ple
,mes
7/15/2010 10
13. Data
path
is
horizontally
scalable
Agent
Agent Collector HDFS
Agent
Agent
• Add
collectors
to
increase
availability
and
to
handle
more
data
– Assumes
a
single
agent
will
not
dominate
a
collector
– Fewer
connec,ons
to
HDFS
– Larger,
more
efficient
writes
to
HDFS
• Agents
have
mechanisms
for
machine
resource
tradeoffs
• Write
log
locally
to
avoid
collector
disk
IO
boUleneck
and
catastrophic
failures
• Compression
and
batching
(trade
cpu
for
network)
• Push
computa,on
into
the
event
collec,on
pipeline
(balance
IO,
Mem,
and
CPU
resource
boUlenecks)
7/15/2010 13
14. Load
balancing
Agent
Agent Collector
Agent
Agent Collector
Agent Collector
Agent
• Agents
are
logically
par,,oned
and
can
send
to
different
collectors
• Use
randomiza,on
to
pre-‐specify
failovers
when
many
collectors
exist
• Spread
load
if
a
collector
goes
down
• Spread
load
if
new
collectors
are
added
to
the
system
7/15/2010 14
15. Load
balancing
Agent
Agent Collector
Agent
Agent Collector
Agent Collector
Agent
• Agents
are
logically
par,,oned
and
can
send
to
different
collectors
• Use
randomiza,on
to
pre-‐specify
failovers
when
many
collectors
exist
• Spread
load
if
a
collector
goes
down
• Spread
load
if
new
collectors
are
added
to
the
system
7/15/2010 15
16. Control
plane
is
horizontally
scalable
Node Master
ZK1
Node Master
ZK2
Node Master
ZK3
• A
master
controls
dynamic
configura,ons
of
nodes
– Uses
consensus
protocol
to
keep
state
consistent
– Scales
well
for
configura,on
reads
– Allows
for
adap,ve
repar,,oning
in
the
future
• Nodes
can
talk
to
any
master
• Masters
can
talk
to
any
ZooKeeper
member
7/15/2010 16
17. Control
plane
is
horizontally
scalable
Node Master
ZK1
Node Master
ZK2
Node Master
ZK3
• A
master
controls
dynamic
configura,ons
of
nodes
– Uses
consensus
protocol
to
keep
state
consistent
– Scales
well
for
configura,on
reads
– Allows
for
adap,ve
repar,,oning
in
the
future
• Nodes
can
talk
to
any
master
• Masters
can
talk
to
any
ZooKeeper
member
7/15/2010 17
18. Control
plane
is
horizontally
scalable
Node Master
ZK1
Node Master
ZK2
Node Master
ZK3
• A
master
controls
dynamic
configura,ons
of
nodes
– Uses
consensus
protocol
to
keep
state
consistent
– Scales
well
for
configura,on
reads
– Allows
for
adap,ve
repar,,oning
in
the
future
• Nodes
can
talk
to
any
master
• Masters
can
talk
to
any
ZooKeeper
member
7/15/2010 18
19. EXTENSIBILITY
Turn raw logs into something useful…
7/15/2010 19
20. Flume
is
easy
to
extend
• Simple
source
and
sink
APIs
– Event
granularity
streaming
design
– Have
many
simple
opera,ons
and
compose
for
complex
behavior
• End-‐to-‐end
principle
– Put
smarts
and
state
at
the
end
points.
Keep
the
middle
simple
• Flume
deals
with
reliability
– Just
add
a
new
source
or
add
a
new
sink
and
Flume
has
primi,ves
to
deal
with
reliability
7/15/2010 20
21. Variety
of
Data
sources
• Can
deal
with
push
and
pull
sources
push
App
Agent
• Supports
many
legacy
event
sources
– Tailing
a
file
poll
App
Agent
– Output
from
periodically
Exec’ed
program
– Syslog,
Syslog-‐ng
– Experimental:
IRC
/
TwiUer
/
Scribe
/
AMQP
embed
App
Agent
7/15/2010 21
22. Variety
of
Data
output
• Send
data
to
many
sinks
– HDFS,
Files,
Console,
RPC
– Experimental:
HBase,
Voldemort,
S3,
etc…
• Supports
an
extensible
variety
of
outputs
formats
and
des,na,ons
– Output
to
language-‐neutral
and
open
data
formats
(JSON,
Avro,
text)
– Compressed
output
files
in
development
• Uses
decorators
to
process
event
data
in-‐flight
– Sampling,
aUribute
extrac,on,
filtering,
projec,on,
checksumming,
batching,
wire
compression,
etc…
7/15/2010 22
24. Centralized
data
flow
management
• Master
specifies
node
sources,
sinks
and
data
flows
– Simply
specify
the
role
of
the
node:
collector,
agent
– Or
specify
a
custom
configura,on
for
a
node
• Control
Interfaces:
– Flume
Shell
– Basic
Web
– HUE
+
Flume
Manager
App
(Enterprise
users)
7/15/2010 24
26. For
advanced
users
• A
concise
and
precise
configura,on
language
for
specifying
arbitrary
data
paths
– Dataflows
are
essen,ally
DAGs
– Control
specific
event
flows
• Enable
durability
mechanism
and
failover
mechanisms
• Tune
the
parameters
these
mechanisms
– Dynamic
updates
of
configura,ons
• Allows
for
live
failover
changes
• Allows
for
handling
newly
provisioned
machines
• Allows
for
changing
analy,cs
7/15/2010 26
28. Summary
• Flume
is
a
distributed,
reliable,
scalable
system
for
collec,ng
and
delivering
high-‐volume
con,nuous
event
data
such
as
logs
– Tunable
data
reliability
levels
– Reliable
master
backed
by
ZooKeeper
– Write
data
to
HDFS
into
buckets
ready
for
batch
processing
– Dynamically
configurable
nodes
– Simplified
automated
management
for
agent+collector
topologies
• Open
Source
Apache
v2.0
license
7/15/2010 28