Talk Abstract
Fluo provides a framework to incrementally process large datasets stored in Accumulo. Using Fluo, developers can write applications that maintain a large scale computation using a series of small transactional updates. When compared to batch processing frameworks, Fluo enables lower latency, continuous analysis of data by sacrificing throughput. This talk will provide an overview of the Fluo project by touching on its design, use cases, and API. The talk will show how developers can write Fluo applications to solve problems in a new way. It will highlight the benefits of using Fluo as well as cover the trade offs and potential problems developers may face when writing Fluo applications. The talk will end with a discussion of the current status and future direction of the Fluo project.
Speaker
Michael Walch
Software Engineer, Peterson Technologies
Mike is a software engineer and committer on the Fluo project. He has a background in distributed systems and data science. He holds a Masters in Computer Science from Johns Hopkins University and and B.S in Electrical & Computer Engineering from Carnegie Mellon University.
Accumulo Summit 2015: Reactive programming in Accumulo: The Observable WAL [I...Accumulo Summit
Talk Abstract
Technology is slowly moving from a "pull" to a "push" paradigm. With the release of Amazon Lambda services users can create notifications based on events that happen within various Amazon Databases. Unlike Accumulo, many databases have mechanisms which enable users to gain access to events such as the 'side effects processor' for HBase and 'rivers' in ElasticSearch. We will talk about using the write-ahead log in Accumulo to create a simple event notification system enabling users to execute code based on inserts and deletes into an Accumulo table. This comes in handy in a world where we have meta-data in multiple indexes. By allowing event notifications to be pushed we can also spread our code's work load to multiple machines giving us a more distributed architecture.
Speaker
Sapan 'Soup' Shah
Lead Engineer, Immuta
Soup is a Lead Engineer with Immuta Inc. Soup has worked on a set of solutions which have involved indexing various types of data into Accumulo. Besides working with Accumulo Soup also works with MapReduce, HBase, Thrift, Zookeeper, Elastic Search, Storm, and Protocol Buffers to produce custom cloud solutions for Immuta Inc. Prior to joining Immuta Soup was a Lead Engineer working at 42Six where he worked on platforms and cloud solutions. Before that he worked at Google where he had a chance to work with cutting edge cloud technologies.
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
Flux was designed to work across databases and data stores. In this talk, Adam will walk through the steps necessary for you to add your own database or custom data source to Flux.
Training Slides: 203 - Backup & RecoveryContinuent
Watch this 36min training to learn about planning for backups, what some of the methods and tools are, how to restore backups and more.
TOPICS COVERED
- How to develop a backup plan
- Methods and tools for taking a backup
- Verifying the backup contains the last binary position, and the importance of this
- Restore backups into the cluster
- Provision a replica from an existing datasource
9:40 am InfluxDB 2.0 and Flux – The Road Ahead Paul Dix, Founder and CTO | ...InfluxData
Paul will continue to chart the road ahead by outlining the next phase of development for InfluxDB 2.0 and for Flux, InfluxData’s new data scripting and query language. He will discuss Flux’s role in multi-data source environments and explain how InfluxDB can be deployed in on-premise, multi-cloud, and hybrid environments.
Accumulo Summit 2015: Reactive programming in Accumulo: The Observable WAL [I...Accumulo Summit
Talk Abstract
Technology is slowly moving from a "pull" to a "push" paradigm. With the release of Amazon Lambda services users can create notifications based on events that happen within various Amazon Databases. Unlike Accumulo, many databases have mechanisms which enable users to gain access to events such as the 'side effects processor' for HBase and 'rivers' in ElasticSearch. We will talk about using the write-ahead log in Accumulo to create a simple event notification system enabling users to execute code based on inserts and deletes into an Accumulo table. This comes in handy in a world where we have meta-data in multiple indexes. By allowing event notifications to be pushed we can also spread our code's work load to multiple machines giving us a more distributed architecture.
Speaker
Sapan 'Soup' Shah
Lead Engineer, Immuta
Soup is a Lead Engineer with Immuta Inc. Soup has worked on a set of solutions which have involved indexing various types of data into Accumulo. Besides working with Accumulo Soup also works with MapReduce, HBase, Thrift, Zookeeper, Elastic Search, Storm, and Protocol Buffers to produce custom cloud solutions for Immuta Inc. Prior to joining Immuta Soup was a Lead Engineer working at 42Six where he worked on platforms and cloud solutions. Before that he worked at Google where he had a chance to work with cutting edge cloud technologies.
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
Flux was designed to work across databases and data stores. In this talk, Adam will walk through the steps necessary for you to add your own database or custom data source to Flux.
Training Slides: 203 - Backup & RecoveryContinuent
Watch this 36min training to learn about planning for backups, what some of the methods and tools are, how to restore backups and more.
TOPICS COVERED
- How to develop a backup plan
- Methods and tools for taking a backup
- Verifying the backup contains the last binary position, and the importance of this
- Restore backups into the cluster
- Provision a replica from an existing datasource
9:40 am InfluxDB 2.0 and Flux – The Road Ahead Paul Dix, Founder and CTO | ...InfluxData
Paul will continue to chart the road ahead by outlining the next phase of development for InfluxDB 2.0 and for Flux, InfluxData’s new data scripting and query language. He will discuss Flux’s role in multi-data source environments and explain how InfluxDB can be deployed in on-premise, multi-cloud, and hybrid environments.
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverDataStax Academy
So you’ve grabbed the latest 2.0 beta of DataStax C# driver from NuGet. Now what? In this talk, Luke will walk you through some of the basics of the C# driver--how to bootstrap the driver and connect to a cluster, execute statements, and retrieve result sets. Wondering what the difference between a PreparedStatement and a SimpleStatement is? Not sure what the appropriate lifetime for a Cluster or a Session object is and whether you should reuse one (from multiple threads)? What about ADO.NET and LINQ support? We’ll cover this and more, so that you can get on with building applications on top of Cassandra and .NET.
Alexander Sapin from Yandex presents reasoning, design considerations, and implementation of ClickHouse Keeper. It replaces ZooKeeper in ClickHouse clusters, thereby simplifying operation enormously.
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
Distributed real time stream processing- why and howPetr Zapletal
In this talk you will discover various state-of-the-art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs, their intended use-cases, and how to choose between them. Petr will focus on the popular frameworks, including Spark Streaming, Storm, Samza and Flink. You will also explore theoretical introduction, common pitfalls, popular architectures, and much more.
The demand for stream processing is increasing. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications, include trading, social networks, the Internet of Things, and system monitoring, are becoming more and more important. A number of powerful, easy-to-use open source platforms have emerged to address this.
Petr's goal is to provide a comprehensive overview of modern streaming solutions and to help fellow developers with picking the best possible solution for their particular use-case. Join this talk if you are thinking about, implementing, or have already deployed a streaming solution.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.
2014-09-01 Taverna tutorial in Bonn: Advanced Taverna features.
List handling, Cross Product and Dot product.
Looping asynchronous services.
Control links.
Retries.
Parallel service invocation.
Cassandra Day NY 2014: Getting Started with the DataStax C# DriverDataStax Academy
So you’ve grabbed the latest 2.0 beta of DataStax C# driver from NuGet. Now what? In this talk, Luke will walk you through some of the basics of the C# driver--how to bootstrap the driver and connect to a cluster, execute statements, and retrieve result sets. Wondering what the difference between a PreparedStatement and a SimpleStatement is? Not sure what the appropriate lifetime for a Cluster or a Session object is and whether you should reuse one (from multiple threads)? What about ADO.NET and LINQ support? We’ll cover this and more, so that you can get on with building applications on top of Cassandra and .NET.
Alexander Sapin from Yandex presents reasoning, design considerations, and implementation of ClickHouse Keeper. It replaces ZooKeeper in ClickHouse clusters, thereby simplifying operation enormously.
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.
Distributed real time stream processing- why and howPetr Zapletal
In this talk you will discover various state-of-the-art open-source distributed streaming frameworks, their similarities and differences, implementation trade-offs, their intended use-cases, and how to choose between them. Petr will focus on the popular frameworks, including Spark Streaming, Storm, Samza and Flink. You will also explore theoretical introduction, common pitfalls, popular architectures, and much more.
The demand for stream processing is increasing. Immense amounts of data has to be processed fast from a rapidly growing set of disparate data sources. This pushes the limits of traditional data processing infrastructures. These stream-based applications, include trading, social networks, the Internet of Things, and system monitoring, are becoming more and more important. A number of powerful, easy-to-use open source platforms have emerged to address this.
Petr's goal is to provide a comprehensive overview of modern streaming solutions and to help fellow developers with picking the best possible solution for their particular use-case. Join this talk if you are thinking about, implementing, or have already deployed a streaming solution.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.
2014-09-01 Taverna tutorial in Bonn: Advanced Taverna features.
List handling, Cross Product and Dot product.
Looping asynchronous services.
Control links.
Retries.
Parallel service invocation.
Slide deck for the fourth data engineering lunch, presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines.
This talk is an application-driven walkthrough to modern stream processing, exemplified by Apache Flink, and how this enables new applications and makes old applications easier and more efficient. In this talk, we will walk through several real-world stream processing application scenarios of Apache Flink, highlighting unique features in Flink that make these applications possible. In particular, we will see (1) how support for handling out of order streams enables real-time monitoring of cloud infrastructure, (2) how the ability handle high-volume data streams with low latency SLAs enables real-time alerts in network equipment, (3) how the combination of high throughput and the ability to handle batch as a special case of streaming enables an architecture where the same exact program is used for real-time and historical data processing, and (4) how stateful stream processing can enable an architecture that eliminates the need for an external database store, leading to more than 100x performance speedup, among many other benefits.
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData InfluxData
This talk will go into the details of migrating from TICK to InfluxDB 2.0. We’ll touch on data migration, what to consider when migrating dashboards from InfluxQL to Flux, and considerations for moving from Kapacitor and TICKscript to Tasks and Flux.
Need to make a horizontal change across 100+ microservices? No worries, Sheph...Aori Nevo, PhD
In this talk, you’ll be introduced to an innovative,
powerful, flexible, and easy-to-use tool developed by the
folks at NerdWallet that simplifies some of the complexities
associated with making horizontal changes in service
oriented systems via automation.
Finally available on SlideShare, Dan Goodinson's SAPinsider presentation: How to pinpoint and fix sources of performance problems in your SAP BusinessObjects BI reports and dashboards.
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
****UPDATE: Project is now open sourced at https://www.github.com/industrydive/fileflow****
From Pydata DC 2016
Description
Data warehousing and analytics projects can, like ours, start out small - and fragile. With an organically growing mess of scripts glued together and triggered by cron jobs hiding on different servers, we needed better plumbing. After perusing the data pipelining landscape, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool from Airbnb.
Abstract
The power of any reporting tool breaks based on the data behind it, so when our data warehousing process got too big for its humble origins, we searched for something better. After testing out several options such as Drake, Pydoit, Luigi, AWS Data Pipeline, and Pinball, we landed on Airflow, an Apache incubating batch processing pipelining and scheduler tool originating from Airbnb, that provides the benefits of pipeline construction as directed acyclic graphs (DAGs), along with a scheduler that can handle alerting, retries, callbacks and more to make your pipeline robust. This talk will discuss the value of DAG based pipelines for data processing workflows, highlight useful features in all of the pipelining projects we tested, and dive into some of the specific challenges (like time travel) and successes (like time travel!) we’ve experienced using Airflow to productionize our data engineering tasks. By the end of this talk, you will learn
- pros and cons of several Python-based/Python-supporting data pipelining libraries
- the design paradigm behind Airflow, an Apache incubating data pipelining and scheduling service, and what it is good for
- some epic fails to avoid and some epic wins to emulate from our experience porting our data engineering tasks to a more robust system
- some quick-start tips for implementing Airflow at your organization.
Conduct data discovery or rapid BI prototyping without becoming a Hadoop expert by analyzing big data with standard BI tools, including Cognos. View the webinar video recording and download this deck: http://www.senturus.com/resources/running-cognos-on-hadoop/.
See a cost effective, scalable solution that does not have the barriers to entry common with big data applications. The webinar explains: 1) use cases for Hadoop, 2) pros and cons of different visualization tools and their integration with Hadoop and 3) a demonstration of BigInsights, IBM’s solution.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
A GitOps model for High Availability and Disaster Recovery on EKSWeaveworks
Enterprises today require high availability and disaster recovery for critical business systems. One of the advantages Kubernetes can bring to the table is greater reliability and stability. When disaster strikes, cluster or application recovery should be quick and dependable.
Paul Curtis, Principal Solutions Architect at Weaveworks will demonstrate how to leverage Weave Kubernetes Platform and GitOps to create disaster recovery plans and highly available clusters with minimal effort on EKS.
In this webinar you will learn:
The 4 principles of GitOps (operations by pull request)
How to build for reproducibility, security and scale with EKS from the start
GitOps driven cluster and cluster lifecycle management with WKP
Drupal Perfomance. Talk given at DrupalCamp North, 25th July 2015.
This session looked at tools you can use to analyse the performance and benchmark a Drupal site. It then looked at tools and techniques that can be used to improve the site performance. The session also included a case study about the Drupal based BAFTA website that was built by Access. Focusing on the recent Film and TV awards, which saw a large amount of traffic in a short amount of time.
Apache Syncope: an Apache Camel Integration ProposalGiacomo Lamonaco
Apache Syncope is a great solution for Identity Management. In this month I analyzed some use cases that led me to reflect about the flexibility of provisioning process to adapt to various (and sometimes very cumbersome) deployment scenarios.
The questions is: How well Syncope orchestrates the provisioning? The problem is that Syncope lacks of a provisioning manager: this component could allow an easy and fully customizable definition of provisioning control logic.
My proposal consists in a redefinition of the (user and role) controller concept, through the Apache Camel framework. Why this framework? I think that Camel fits the need of easy control logic definition. Moreover Camel supports a wide range of external components: it means that it can be easily integrated with existing frameworks, like Activiti.
Similar to Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API] (20)
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
4. Solution 2 - Maintain counts using Fluo
Website
fluo.io
github.com
apache.org
nytimes.com
# Inbound
53
1,385,192
2,528,190
53,395,000
Fluo Table
+1
-1
Web
Crawler
Internet
Web
Cache
5. Solution 3 - Use both: update popular sites using batch
processing & update long tail using Fluo
# Inbound
Links
Update every
hour using
MapReduce
Update in real-time
using Fluo
Website Distribution
nytimes.com
github.com
fluo.io
6. Fluo 101 - Basics
- Provides cross-row transactions and snapshot isolation which
makes it safe to do concurrent updates
- Allows for incremental processing of data
- Based on Google’s Percolator paper
- Started as a side project by Keith Turner in 2013
- Originally called Accismus
- Tested using synthetic workloads
- Almost ready for production environments
7. Fluo 101 - Accumulo vs Fluo
- Fluo is a transactional API built on top of Accumulo
- Fluo stores its data in Accumulo
- Fluo uses Accumulo conditional mutations for transactions
- Fluo has a table structure (row, column, value) similar to Accumulo
except Fluo has no timestamp
- Each Fluo application runs its own processes
- Oracle allocates timestamps for transactions
- Workers run user code (called observers) that perform transactions
9. Fluo 101 - Client API
Used by developers to ingest data or interact with Fluo from
external applications (REST services, crawlers, etc)
public void addDocument(FluoClient fluoClient, String docId, String content) {
TypeLayer typeLayer = new TypeLayer(new StringEncoder());
try (TypedTransaction tx1 = typeLayer.wrap(fluoClient.newTransaction())) {
if (tx1.get().row(docId).col(CONTENT_COL).toString() == null) {
tx1.mutate().row(docId).col(CONTENT_COL).set(content);
tx1.commit();
}
}
}
10. Fluo 101 - Observers
- Developers can write observers that are triggered when a column is
modified and run by Fluo workers.
- Best practice: Do work/transactions in observers over client code
public class DocumentObserver extends TypedObserver {
@Override
public void process(TypedTransactionBase tx, Bytes row, Column column) {
// do work here
}
@Override
public ObservedColumn getObservedColumn() {
return new ObservedColumn(CONTENT_COL, NotificationType.STRONG);
}
}
11. Example Fluo Application
- Problem: Maintain word & document counts as documents
are added and deleted from Fluo in real time
- Fluo client performs two actions:
1. Add document to table
2. Mark document for deletion
- Which triggers two observers:
- Add Observer - increase word and document counts
- Delete Observer - decrease counts and clean up
12. Add first document to table
Fluo Table
Row
d : doc1
Column
doc
Value
my first hello world
Fluo Client
Client Cluster
Add
Observer
Delete
Observer
13. An observer increments word counts
Fluo Table
Row
d : doc1
w : first
w : hello
w : my
w : world
total : docs
Column
doc
cnt
cnt
cnt
cnt
cnt
Value
my first hello world
1
1
1
1
1
Fluo Client
Client Cluster
Add
Observer
Delete
Observer
14. A second document is added
Fluo Table
Row
d : doc1
d : doc2
w : first
w : hello
w : my
w : second
w : world
total : doc
Column
doc
doc
cnt
cnt
cnt
cnt
cnt
cnt
Value
my first hello world
second hello world
1
2
1
1
2
2
Fluo Client
Client Cluster
Add
Observer
Delete
Observer
15. First document is marked for deletion
Fluo Table
Row
d : doc1
d : doc1
d : doc2
w : first
w : hello
w : my
w : second
w : world
total : doc
Column
doc
delete
doc
cnt
cnt
cnt
cnt
cnt
cnt
Value
my first hello world
second hello world
1
2
1
1
2
2
Fluo Client
Client Cluster
Add
Observer
Delete
Observer
16. Observer decrements counts and deletes document
Fluo Table
Row
d : doc1
d : doc1
d : doc2
w : first
w : hello
w : my
w : second
w : world
total : doc
Column
doc
delete
doc
cnt
cnt
cnt
cnt
cnt
cnt
Value
my first hello world
second hello world
1
1
1
1
1
1
Fluo Client
Client Cluster
Add
Observer
Delete
Observer
17. Things to watch out for...
- Collisions occur when two transactions update the same data at the
same time
- Only one transaction will succeed. Others need to be retried.
- Some OK but too many can slow computation
- Avoid collisions by not updating same row/column on every transaction
- Write Skew occurs when two transactions read an overlapping data set
and make disjoint updates without seeing the other update
- Result is different than if transactions were serialized
- Prevent write skew by making both transactions update same row/column. If
concurrent, a collision will occur and only one transaction will succeed.
18. How does Fluo fit in?
Higher
Large Join
Throughput
Lower
Slower Processing Latency Faster
Batch
Processing
MapReduce,
Spark
Incremental
Processing
Fluo, Percolator
Stream
Processing
Storm
19. Don’t use Fluo if...
1. You want to do ad-hoc analysis on your data
(use batch processing instead)
2. Your incoming data is being joined with a small data set
(use stream processing instead)
20. Use Fluo if...
1. If you want to maintain a large scale computation
using a series of small transaction updates
2. Periodic batch processing jobs are taking too long to
join new data with existing data
21. Fluo Application Lifecycle
1. Use batch processing to seed computation with historical data
2. Use Fluo to process incoming data and maintain computation in
real-time
3. While processing, Fluo can be queried and notifications can be
made to user
22. Major Progress
2010 2013 2014 2015
Google releases
Percolator paper
Keith Turner starts
work on Percolator
implementation for
Accumulo as a side
project (originally
called Accismus)
Fluo can
process
transactions
1.0.0-alpha
released
Oracle and worker
can be run in YARN
Changed project
name to Fluo
1.0.0-beta
releasing
soon
Solidified Fluo
Client/Observer API
Automated running
Fluo cluster on
Amazon EC2
Multi-application
support
Improved how observer
notifications are found
Created
Stress Test
23. Fluo Stress Test
- Motivation: Needed test that stresses Fluo
and is easy to verify for correctness
- The stress test computes the number of
unique integers by building a bitwise trie
- New integers are added at leaf nodes
- Observers watch all nodes, create parents,
and percolate total up to root node
- Test runs successfully if count at root is
same a number of leaf nodes
- Multiple transactions can operate on same
nodes causing collisions
1110
11xx = 3
1100
10xx = 0 01xx = 1 00xx = 1
xxxx = 5
0101 00011110
24. Easy to run Fluo
1. On machine with Maven+Git, clone the fluo-dev and fluo repos
2. Follow some basic configuration steps
3. Run the following commands
It’s just as easy to run a Fluo cluster on Amazon EC2
fluo-dev download # Downloads Accumulo, Hadoop, Zookeeper tarballs
fluo-dev setup # Sets up locally Accumulo, Hadoop, etc
fluo-dev deploy # Build Fluo distribution and deploy locally
fluo new myapp # Create configuration for ‘myapp’ Fluo application
fluo init myapp # Initialize ‘myapp’ in Zookeeper
fluo start myapp # Start the oracle and worker processes of ‘myapp’ in YARN
fluo scan myapp # Print snapshot of data in Fluo table of ‘myapp’
25. Fluo Ecosystem
fluo
Main Project Repo
fluo-quickstart
Simple Fluo
example
fluo-stress
Stresses Fluo on
cluster
fluo-io.github.io
Fluo project website
phrasecount
In-depth Fluo
example
fluo-deploy
Run Fluo on EC2
cluster
fluo-dev
Helps developers
run Fluo locally
26. Future Direction
- Primary focus: Release production-ready 1.0 release with stable API
- Other possible work:
- Fluo-32: Real world example application
- Possibly using CommonCrawl data
- Fluo-58: Support writing observers in Python
- Fluo-290: Support running Fluo on Mesos
- Fluo-478: Automatically scale up & down Fluo workers based on
workload
27. Get involved!
1. Experiment with Fluo
- API has stabilized
- Tools and development process make it easy
- Not recommended for production yet (wait for 1.0)
2. Contribute to Fluo
- ~85 open issues on GitHub
- Review-then-commit process