Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
A look at what HA is and what PostgreSQL has to offer for building an open source HA solution. Covers various aspects in terms of Recovery Point Objective and Recovery Time Objective. Includes backup and restore, PITR (point in time recovery) and streaming replication concepts.
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Understanding SQL Trace, TKPROF and Execution Plan for beginnersCarlos Sierra
The three fundamental steps of SQL Tuning are: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation. This introductory session on SQL Tuning is for novice DBAs and Developers that are required to investigate a piece of an application performing poorly.
On this session participants will learn about producing a SQL Trace then a summary TKPROF report. A sample TKPROF is navigated with the audience, where the trivial and the no so trivial is exposed and explain. Execution Plans are also navigated and explained, so participants can later untangle complex Execution Plans and start diagnosing SQL performing badly.
This session encourages participants to ask all kind of questions that could be potential road-blocks for deeper understanding of how to address a SQL performing poorly.
ISO SQL:2016 introduced Row Pattern Matching: a feature to apply (limited) regular expressions on table rows and perform analysis on each match. As of writing, this feature is only supported by the Oracle Database 12c.
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
Of course there is no such thing as perfect service discovery, and we will see why in the talk. However, the way ProxySQL is deployed in this case minimizes the risk for split-brains, and this is why I qualify it as almost perfect. But let’s step back a little...
MySQL alone is not a high availability solution. To provide resilience to primary failure, other components need to be integrated with MySQL. At MessageBird, these additional components are ProxySQL and Orchestrator. In this talk, we describe how ProxySQL is architectured to provide close to perfect Service Discovery and how this, combined with Orchestrator, allows for automatic failover. The talk presents the details of the integration of MySQL, ProxySQL and Orchestrator in Google Cloud (and it would be easy to re-implement a similar architecture at other cloud vendors or on-premises). We will also cover lessons learned for the 2 years this architecture has been in production. Come to this talk to learn more about MySQL high availability, ProxySQL and Orchestrator.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
The Delta Architecture pattern has made the lives of data engineers much simpler, but what about improving query performance for data analysts? What are some common places to look at for tuning query performance? In this session we will cover some common techniques to apply to our delta tables to make them perform better for data analysts queries. We will look at a few examples of how you can analyze a query, and determine what to focus on to deliver better performance results.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
The earliest relational databases were monolithic on-premise systems that were powerful and full-featured. Fast forward to the Internet and NoSQL: BigTable, DynamoDB and Cassandra. These distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon, or Facebook, they too "required" horizontal scalability.
But in a real way, NoSQL and even NewSQL have forgotten single node performance where scaling out isn't an option. And single node performance is important because it allows you to do more with much less. With a smaller footprint and simpler stack, overhead decreases and your application can still scale.
In this talk, we describe TimescaleDB's methods for single node performance. The nature of time-series workloads and how data is partitioned allows users to elastically scale up even on single machines, which provides operational ease and architectural simplicity, especially in cloud environments.
Understanding SQL Trace, TKPROF and Execution Plan for beginnersCarlos Sierra
The three fundamental steps of SQL Tuning are: 1) Diagnostics Collection; 2) Root Cause Analysis (RCA); and 3) Remediation. This introductory session on SQL Tuning is for novice DBAs and Developers that are required to investigate a piece of an application performing poorly.
On this session participants will learn about producing a SQL Trace then a summary TKPROF report. A sample TKPROF is navigated with the audience, where the trivial and the no so trivial is exposed and explain. Execution Plans are also navigated and explained, so participants can later untangle complex Execution Plans and start diagnosing SQL performing badly.
This session encourages participants to ask all kind of questions that could be potential road-blocks for deeper understanding of how to address a SQL performing poorly.
ISO SQL:2016 introduced Row Pattern Matching: a feature to apply (limited) regular expressions on table rows and perform analysis on each match. As of writing, this feature is only supported by the Oracle Database 12c.
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
Of course there is no such thing as perfect service discovery, and we will see why in the talk. However, the way ProxySQL is deployed in this case minimizes the risk for split-brains, and this is why I qualify it as almost perfect. But let’s step back a little...
MySQL alone is not a high availability solution. To provide resilience to primary failure, other components need to be integrated with MySQL. At MessageBird, these additional components are ProxySQL and Orchestrator. In this talk, we describe how ProxySQL is architectured to provide close to perfect Service Discovery and how this, combined with Orchestrator, allows for automatic failover. The talk presents the details of the integration of MySQL, ProxySQL and Orchestrator in Google Cloud (and it would be easy to re-implement a similar architecture at other cloud vendors or on-premises). We will also cover lessons learned for the 2 years this architecture has been in production. Come to this talk to learn more about MySQL high availability, ProxySQL and Orchestrator.
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
The Delta Architecture pattern has made the lives of data engineers much simpler, but what about improving query performance for data analysts? What are some common places to look at for tuning query performance? In this session we will cover some common techniques to apply to our delta tables to make them perform better for data analysts queries. We will look at a few examples of how you can analyze a query, and determine what to focus on to deliver better performance results.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
The earliest relational databases were monolithic on-premise systems that were powerful and full-featured. Fast forward to the Internet and NoSQL: BigTable, DynamoDB and Cassandra. These distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon, or Facebook, they too "required" horizontal scalability.
But in a real way, NoSQL and even NewSQL have forgotten single node performance where scaling out isn't an option. And single node performance is important because it allows you to do more with much less. With a smaller footprint and simpler stack, overhead decreases and your application can still scale.
In this talk, we describe TimescaleDB's methods for single node performance. The nature of time-series workloads and how data is partitioned allows users to elastically scale up even on single machines, which provides operational ease and architectural simplicity, especially in cloud environments.
A presentation I made for Apache Spark and Apache Cassandra Integration.
First I present what are some of the differences between RDBMS and NoSQL, then I proceed with the Cassandra infrastructure and usual errors when creating a Cassandra Data Model.
Finally, I provide the Spark underlying main concepts and some settings for proper configuration.
Leveraging the Power of Solr with SparkQAware GmbH
Lucene Revolution 2016, Boston: Talk by Johannes Weigend (@JohannesWeigend, CTO at QAware).
Abstract: Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Time Series data is proliferating with literally every step that we take, just think about things like Fit Bit bracelets that track your every move and financial trading data all of which is timestamped.
Time series data requires high performance reads and writes even with a huge number of data sources. Both speed and scale are integral to success, which makes for a unique challenge for your database.
A time series NoSQL data model requires flexibility to support unstructured, and semi-structured data as well as the ability to write range queries to analyze your time series data. So how can you tackle speed, scale and flexibility all at once?
Join Professional Services Architect Drew Kerrigan and Developer Advocate Matt Brender for a discussion of:
Examples of time series data sets, from IoT to Finance to jet engines
What makes time series queries different from other database queries
How to model your dataset to answer the right questions about your data
How to store, query and analyze a set of time series data points
Learn how a NoSQL database model and Riak TS can help you address the unique challenges of time series data.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Massive Data Processing in Adobe Using Delta LakeDatabricks
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used
(String FTW)
Data Manipulation using UDFs
Writer Worries and How to Wipe them Away
Staging Tables FTW
Datalake Replication Lag Tracking
Performance Time!
Amazon Redshift는 속도가 빠른 페타바이트 규모의 완전관리형 데이터 웨어하우스로, 간편하고 비용 효율적으로 모든 데이터를 기존 비즈니스 인텔리전스 도구를 사용하여 분석할 수 있게 해줍니다. 이 강연에서는 RedShift를 활용해 데이터 웨어하우스를 구축하고 데이터를 분석할 때의 모범사례과 다양한 고려사항에 대해 알아보고, Amazon S3에 있는 엑사바이트 규모의 데이터에 대해 복잡한 쿼리를 실행할 직접 수행할 수 있는 RedShift Spectrum을 실제로 사용할 때 고려사항에 대해 함께 다룰 예정입니다.
연사: 정영준, 아마존 웹서비스 솔루션즈 아키텍트
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesMongoDB
In this session we will dive into some of the use-cases companies are currently deploying MongoDB for in the energy space. It is becoming more important for companies to make data driven decisions, and MongoDB can often be the right tool for analyzing the massive amounts of data coming in. Whether tracking oil well site statistics, power meter data, or feeds from sensors, MongoDB can be a great fit for tracking and analyzing that data, using it to make smart, informed business decisions.
Similar to Re-Engineering PostgreSQL as a Time-Series Database (20)
Building Reliability - The Realities of ObservabilityAll Things Open
Presented at the ATO RTP Meetup
Presented by Jeremy Proffit, Director of DevSecOps & SRE for Customer Care and Communications, Ally
Title: Building Reliability - The Realities of Observability
Abstract: Join me as we discuss true observability, learn what works and what doesn't. We'll not only discuss dashboards, monitoring and alerting, but how these can be built by automation or included in your IAC modules. We'll talk about how to properly alert staff based on priority to keep your staff and yourself sane. And even discuss architecture and how it impacts reliably and why serverless isn't always the best at being reliable.
Presented at the ATO RTP Meetup
Presented by Peter Zaitsev, Founder of Percona
Title: Modern Database Best Practices
Abstract: There are now more Database choices available for developers than ever before - there are general purpose databases and specialized databases, single node and distributed databases, Open Source, Proprietary databases and databases available exclusively in the cloud. In this presentation we will cover the best practices of choosing database(s) for your applications, best practices as it comes to application development as well as managing those databases to achieve best possible performance, security, availability at the lowest cost.
All Things Open 2023
Presented at All Things Open 2023
Presented by Deb Bryant - Open Source Initiative, Patrick Masson - Apereo Foundation, Stephen Jacobs - Rochester Institute of Technology, Ruth Suehle - SAS, & Greg Wallace - FreeBSD Foundation
Title: Open Source and Public Policy
Abstract: New regulations in the software industry and adjacent areas such as AI, open science, open data, and open education are on the rise around the world. Cyber Security, societal impact of AI, data and privacy are paramount issues for legislators globally. At the same time, the COVID-19 pandemic drove collaborative development to unprecedented levels and took Open Source software, open research, open content and data from mainstream to main stage, creating tension between public benefit and citizen safety and security as legislators struggle to find a balance between open collaboration and protecting citizens.
Historically, the open source software community and foundations supporting its work have not engaged in policy discussions. Moving forward, thoughtful development of these important public policies whilst not harming our complex ecosystems requires an understanding of how our ecosystem operates. Ensuring stakeholders without historic benefit of representation in those discussions becomes paramount to that end.
Please join our open discussion with open policy stakeholders working constructively on current open policy topics. Our panelists will provide a view into how oss foundations and other open domain allies are now rising to this new challenge as well as seizing the opportunity to influence positive changes to the public’s benefit.
Topics: Public Policy, Open Science, Open Education, current legislation in the US and EU, US interest in OSS sustainability, intro to the Open Policy Alliance
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...All Things Open
Presented at All Things Open 2023
Presented by Ashpak Shaikh & Lucy Shen - Intuit
Title: Weaving Microservices into a Unified GraphQL Schema with graph-quilt
Abstract: The magic of GraphQL is that it provides data access through a single endpoint—clean and easy. But as the number of GraphQL microservices your tech stack depends on starts to grow, that single-endpoint purpose becomes a new multi-endpoint problem. Ideally, we would have an orchestrator that could aggregate schemas from multiple microservices into a unified GraphQL schema and route the requests to the appropriate microservice.
Enter graph-quilt, an open source Java library that provides recursive schema stitching and Apollo Federation style schema composition. In this talk, we’ll walk through our GraphQL journey and show you how to use graph-quilt to simplify your data orchestration needs. We will also share our open sourced reference implementation of a highly performant graph-quilt gateway currently being used in production here at Intuit, where we’ve had incredible success in scaling the gateway with 50+ microservices and 150+ clients.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
The State of Passwordless Auth on the Web - Phil NashAll Things Open
Presented at All Things Open 2023
Presented by Phil Nash - Sonar
Title: The State of Passwordless Auth on the Web
Abstract: Can we get rid of passwords yet? They make for a poor user experience and users are notoriously bad with them. The advent of WebAuthn has brought a passwordless world closer, but where do we really stand?
In this talk we'll explore the current user experience of WebAuthn and the requirements a user has to fulfil to authenticate without a password. We'll also explore the fallbacks and safeguards we can use to make the password experience better and more secure. By the end of the session you'll have a vision of how authentication could look in the future and a blueprint for how to build the best auth experience today.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Total ReDoS: The dangers of regex in JavaScriptAll Things Open
Presented at All Things Open 2023
Presented by Phil Nash - Sonar
Title: Total ReDoS: The dangers of regex in JavaScript
Abstract: Regular expressions are complicated and can be hard to learn. On top of that, they can also be a security risk; writing the wrong pattern can open your application up to denial of service attacks. One token out of place and you invite in the dreaded ReDoS.
But how can a regular expression cause this? In this talk we’ll track down the patterns that can cause this trouble, explain why they are an issue and propose ways to fix them now and avoid them in the future. Together we’ll demystify these powerful search patterns and keep your application safe from expressions that behave in a way that is anything but regular.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
What Does Real World Mass Adoption of Decentralized Tech Look Like?All Things Open
Presented at All Things Open 2023
Presented by Karl Mozurkewich - Storj
Title: What Does Real World Mass Adoption of Decentralized Tech Look Like?
Abstract: We delve into the transformative potential of decentralized technology. Beginning with a brief overview of the rise of centralization with the advent of the internet and the counter-shift marked by blockchain we explore the intrinsic characteristics of decentralized and distributed systems, such as trustless operations, peer-to-peer networks, and enterprise application scalability. Various sectors, including finance, supply chains, media and entertainment, data science and cloud infrastructure are on the brink of disruption. The societal implications are vast, with the potential for greater individual empowerment, a greener planet and more viable resource utilization, but concerns about data security persist.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by Anastasia Lalamentik - Kaleido
Title: How to Write & Deploy a Smart Contract
Abstract: In this talk, Anastasia Lalamentik, Full Stack Engineer at Kaleido, will walk through how Ethereum smart contracts work and go over related concepts like gas fees, the Ethereum Virtual Machine (EVM), the block explorer, and the Solidity programming language. This is vital to anyone who wants to build a blockchain app and is a great introduction to blockchain technology for newcomers to the space.
By the end of the talk, attendees will better understand how to:
- Write a simple smart contract
- Deploy their smart contract to an Ethereum test network through the latest tools like Hardhat and the MetaMask wallet
- Test interactions with their deployed smart contract and ensure that everything is working properly
Additionally, participants will get to interact with Anastasia's deployed smart contract at the end of the talk. Anastasia’s past talks have attracted and have been attended by a diverse group of participants with a range of experience in the space.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlowAll Things Open
Presented at All Things Open 2023
Presented by Paul Brebner - Instaclustr (by Spot by NetApp)
Title: Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Abstract: In this talk we’ll build a Drone delivery application, and then use it to do some Machine Learning “on the fly”.
In the 1st part of the talk, we'll build a real-time Drone Delivery demonstration application using a combination of two open-source technologies: Uber’s Cadence (for stateful, scheduled, long-running workflows), and Apache Kafka (for fast streaming data).
With up to 2,000 (simulated) drones and deliveries in progress at once this application generates a vast flow of spatio-temporal data.
In the 2nd part of the talk, we'll use this platform to explore Machine Learning (ML) over streaming and drifting Kafka data with TensorFlow to try and predict which shops will be busy in advance.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at the All Things Open 2023 Inclusion and Diversity in Open Source Event
Presented by Efraim Marquez-Arreaza - Red Hat
Title: DEI Challenges and Success
Abstract: In today's world, many companies and organizations have Diversity, Equity and Inclusion (DEI) communities. Red Hat Unidos is a DEI community focused on advocating for the Hispanic/Latine community. In this talk, we would like to share our challenges and success during the past 4-years and plans for the future.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by Lydia Cupery - HubSpot
Title: Scaling Web Applications with Background Jobs: Takeaways from Generating a Huge PDF
Abstract: Do you need to perform time-consuming or CPU-intensive processes in your web application but are concerned about performance? That’s where background jobs come in. By offloading resource-intensive tasks to separate worker processes, you can improve the scalability of your web application.
In this talk, I'll share my experience of using background jobs to scale our web application. I'll discuss the challenges my team faced that led us to adopt background jobs. Then, I'll share practical tips on how to design background jobs for CPU-intensive or time-consuming processes, such as generating huge PDFs and batch emailing. I'll wrap up by going over the performance and cost tradeoffs of background jobs.
I'll use Typescript, Express, and Heroku as examples in this talk, but the concepts and best practices that I'll share are applicable to other languages and tools.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by Robert Aboukhalil - CZI
Title: Supercharging tutorials with WebAssembly
Abstract: sandbox.bio is a free platform that features interactive command-line tutorials for bioinformatics. This talk is a deep-dive into how sandbox.bio was built, with a focus on how WebAssembly enabled bringing command-line tools like awk and grep to the web. Although these tools were originally written in C/C++, they all run directly in the browser, thanks to WebAssembly! And since the computations run on each user's computer, this makes the application highly scalable and cost-effective.
Along the way, I'll discuss how WebAssembly works and how to get started using it in your own applications. The talk will also cover more advanced WebAssembly features such as threads and SIMD, and will end with a discussion of WebAssembly's benefits and pitfalls (it's a powerful technology, but it's not always the right tool!).
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by K.S. Bhaskar - YottaDB LLC
Title: Using SQL to Find Needles in Haystacks
Abstract: Database journal files capture every update to a database. A database of a few hundred GB can generate GBs worth of journal files every minute at busy times. Troubleshooting and forensices, especially of rare and intermittent problems, such as which process made what update and when, is an exercise of finding needles in haystacks. A similar problem exists with syslogs. A solution is to load the journal files and syslogs into a database, and use SQL to query the database. Bhaskar will present and demonstrate this with a 100% FOSS stack.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Configuration Security as a Game of Pursuit InterceptAll Things Open
Presented at All Things Open 2023
Presented by Wes Widner - Automox
Title: Configuration Security as a Game of Pursuit Intercept
Abstract: In this session we will take a look at the emerging field of cloud security posture management and how we can approach the problem space using a class of board games known as pursuit/intercept. Using the game Scotland Yard as a visual illustration we'll explore the cognitive and technical limitations that all CSPM systems face and what you should look for when evaluating the strengths and weakness of CSPM vendors and approaches.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by Carol Huang & Mike Fix - Stripe
Title: Scaling an Open Source Sponsorship Program
Abstract: We already know this: the open-source ecosystem needs further monetary investment from the companies that benefit most from it. Likewise, companies say they want to participate in these initiatives, but find it hard to dedicate resources to open source funding when there isn’t a clear ROI.
This talk discusses how the Open Source Program Office at Stripe built a scalable, sustainable open source sponsorship model that aligns internal company incentives with those of open source maintainers and the community at large. We go over the unique “platformization” of our OSPO that allowed us to create multiple funding models, such as BYOB (Bring Your Own Budget), and share lessons learned from this experience as well as other OSPOs.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Build Developer Experience Teams for Open SourceAll Things Open
Presented at All Things Open 2023
Presented by Arundeep Nagaraj - Amazon Web Services (AWS)
Title: Build Developer Experience Teams for Open Source
Abstract: Open Source has become the default strategy for many IT organizations and Enterprises. However, the constant challenge with Open Source leaders of these organizations has been -
How is my product's developer experience?
Is this the right metric to track?
How can I scale my team to support our products better?
How can I add automation to scale redundant workflows?
If my product involves working with developers, how can I scale to the complexity of the requests and reduce Engineering bandwidth?
The challenges within support of open source products continues to magnify depending on the end user persona whether they are consumers or contributors to your product. Consumers utilize your product, SDK's and API's and are blocked with using it or run into issues, whereas contributors are advanced users of your software that understands the codebase to provide a meaningful contribution back to the product.
The answer to the above is to look at Open Source support as a first-class citizen of your corporate support strategy. To employ the right level of developer focused support as opposed to traditional infrastructure based support is key to scale to the amount of developers using your product. Supporting customers in the open involves more than pure support - building customer / developer experiences (DX) in the open (across platforms and communities) that pivots over the ability of your product's users or developers to be focused on the end-to-end value add. This helps with your active developer growth and retention of users.
Key Takeaways:
- IT leaders of Open Source will learn to employ strategies to build a DX team that engages on multiple platforms
- Work on identifying accurate metrics for product and organization
- Innovate on platforms such as Discord to build a bot and a dashboard
- Ability to leverage customer feedback and iterate over the customer success flywheel
- Distinguish between DX and Developer Advocacy (DA)
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Presented at All Things Open 2023
Presented by Danny McCormick - Google
Title: Deploying Models at Scale with Apache Beam
Abstract: Apache Beam is an open source tool for building distributed scalable data pipelines. This talk will explore how Beam can be used to perform common machine learning tasks, with a heavy focus on running inference at scale. The talk will include a demo component showing how Beam can be used to deploy and update models efficiently on both CPUs and GPUs for inference workloads.
An attendee can expect to leave this talk with a high level understanding of Beam, the challenges of deploying models at scale, and the ability to use Beam to easily parallelize their inference workloads.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Sudo – Giving access while staying in controlAll Things Open
Presented at All Things Open 2023
Presented by Peter Czanik - One Identity
Title: Sudo – Giving access while staying in control
Abstract: Sudo is used by millions to control and log administrator access to systems, but using the default configuration only, there are plenty of blind spots. Using the latest features in sudo let you watch some previously blind spots and control access to them. Here are four major new features, which arrived since the 1.9.0 release, allowing you see your blind spots:
- configuring a working directory or chroot within sudo often makes full shell access redundant
- JSON-formatted logs give you more details on events and are easier to act on
- relays in sudo_logsrvd make session recording collection more secure and reliable
- you can log and control sub-commands executed by the command run through sudo
Let us take a closer look at each of these.
Previously, there were quite a few situations where you had to give users full shell access through sudo. Typical examples include when you need to run a command from a given directory, or running commands in a chroot environment. You can now configure the working directory or the chroot directory and give access only to the command the user really needs.
Logging is a central role of sudo, to see who did what on the system. Using JSON-formatted log messages gives you even more information about events. What is even more: structured logs are easier to act on. Setting up alerting for suspicious events is much easier when you have a single parser to configure for any kind of sudo logs. You can collect sudo logs not only by local syslog, but also by using sudo_logsrvd, the same application used to collect session recordings.
Speaking of session recordings: instead of using a single central server, you can now have multiple levels of sudo_logsrvd relays between the client and the final destination. This allows session collection even if the central server is unavailable, providing you with additional security. It also makes your network configuration simpler.
Finally, you can log sub-commands executed from the command started through sudo. You can see commands started from a shell. No more unnoticed shell access from text editors. Best of all: you can also intercept sub-commands.
These are just a few of the most prominent features helping you to watch and control previous blind spots on your systems. See these and other possibilities in action in some live demos during our presentation.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsAll Things Open
Presented at All Things Open 2023
Presented by Christine Abernathy - F5, Inc.
Title: Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Abstract: As Artificial Intelligence (AI) and Machine Learning (ML) applications continue to surge, it is crucial to be aware of and address the security risks associated with these technologies. In this talk, Christine will explore AI/ML failure modes, threats, and mitigation strategies. She will guide you through the fundamentals of ML models then introduce you to key security challenges such as adversarial attacks, data poisoning, model inversion, model stealing, and membership inference attacks, using real-world examples to demonstrate their potential impact.
Christine will also discuss privacy and ethical considerations in ML, touching upon techniques like federated learning and shedding light on the current regulatory landscape surrounding security risks. If you are developing AI/ML applications or incorporating AI/ML components into your technology stack, check out this talk. You will walk away with a deeper understanding of the current AI/ML security landscape and a toolkit to help you address these risks, enabling you to build safer, more secure, and privacy-aware applications.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...All Things Open
Presented at All Things Open 2023
Presented by Carlos Santana - AWS
Title: Securing Cloud Resources Deployed with Control Planes on Kubernetes using Governance and Policy as Code
Abstract: Are you concerned about the security of your cloud resources deployed on Kubernetes? Are you struggling to ensure compliance with regulatory requirements while managing your cloud infrastructure? If yes, then this talk is for you!
We will discuss how to secure cloud resources deployed with Crossplane on Kubernetes using Governance and Policy as Code. We will explore how to leverage Governance and Policy as Code tools like Rego, Kyverno, and OPA to ensure security and compliance.
By the end of this talk, you will have a better understanding of the challenges associated with securing cloud resources deployed with Crossplane or ACK on Kubernetes, the importance of Governance and Policy as Code in ensuring security and compliance, and why it is critical to use open source and open standards in these technologies.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
4. Of every type
• Regular: Machines and sensors
• Irregular: Web and machine events
• Forward looking: Logistics and forecasting
• Derived data: Inferences from AI/ML models
7. Existing databases don’t work for time series
Relational Databases NoSQL Databases
Every other time-series database today is NoSQL
Hard to scale
Underperform on complex queries,
are hard to use, and lead to data silos
12. Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Hard to scale
13. Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Hard to scale
14. B-tree Insert Pain
1 2010
1 10 13 24 2925
5Insert batch: 178
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
15. B-tree Insert Pain
1 2010
1 10 13 24 2925
5Insert batch: 178
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
16. 1 2010
1 10 13 24 2925
Insert batch: 8
5
17
B-tree Insert Pain
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
17. 10 13
B-tree Insert Pain
1 2010
1 24 2925
Insert batch: 8
5 17
Memory Capacity: 2 NODES
IN MEMORY
WRITE TO DISK
18. Challenge in scaling up
• Indexes write to random parts of B-tree
• As table grows large
– Indexes no longer fit in memory
– Random writes cause swapping
Device: A
Time: 01:01:01
Device: Z
Time: 01:01:01
Device, Time DESC
20. • Ingest millions of datapoint
per second
• Scale to 100s billions of rows
• Elastically scale up and out
• Faster than Influx, Cassandra,
Mongo, vanilla Postgres
Scale &
Performance
• Inherits 20+ years of
PostgreSQL reliability
• Streaming replication,
HA, backup/recovery
• Data lifecycle: continuous
rollups, retention, archiving
• Enterprise-grade security
Proven &
Enterprise Ready
• Zero learning curve
• Zero friction: Existing tools
and connectors work
• Enrich understanding: JOIN
against relational data
• Freedom for data model, no
cardinality issues
SQL for
time series
TimescaleDB
Scalable time-series database, full SQL
Packaged as a PostgreSQL extension
21. >20x
TimescaleDB vs. PostgreSQL
(batch inserts)
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
1.11M
METRICS / S
22. TimescaleDB vs.
PostgreSQL
SPEEDUP
Table scans, simple
column rollups
~0-20%
GROUPBYs 20-200%
Time-ordered
GROUPBYs
400-10000x
DELETEs 2000x
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
24. Key-value store with
indexed key lookup at
high-write rates
NoSQL champion: Log-Structured Merge Trees
• Compressed data storage
• Common approach for time series:
use key <name, tags, field, time>
+
25. NoSQL + LSMTs Come at a Cost
• Significant memory overhead
• Lack of secondary indexes / tag lock-in
• Less powerful queries
• Weaker consistency (no ACID)
• No JOINS
• Loss of SQL ecosystem
+
39. But treat it like a single table
Chunks
• Indexes
• Triggers
• Constraints
• Foreign keys
• UPSERTs
• Table mgmt
Hypertable
40. TimescaleDB: Easy to Get Started
CREATE TABLE conditions (
time timestamptz,
temp float,
humidity float,
device text
);
SELECT create_hypertable('conditions', 'time', ‘device', 4,
chunk_time_interval => interval '1 week’);
INSERT INTO conditions
VALUES ('2017-10-03 10:23:54+01', 73.4, 40.7, 'sensor3');
SELECT * FROM conditions;
time | temp | humidity | device
------------------------+------+----------+---------
2017-10-03 11:23:54+02 | 73.4 | 40.7 | sensor3
41. Create partitions
automatically at runtime.
Avoid a lot of manual
work.
CREATE TABLE conditions (
time timestamptz,
temp float,
humidity float,
device text
);
CREATE TABLE conditions_p1 PARTITION OF conditions
FOR VALUES FROM (MINVALUE) TO ('g')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p2 PARTITION OF conditions
FOR VALUES FROM ('g') TO ('n')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p3 PARTITION OF conditions
FOR VALUES FROM ('n') TO ('t')
PARTITION BY RANGE (time);
CREATE TABLE conditions_p4 PARTITION OF conditions
FOR VALUES FROM ('t') TO (MAXVALUE)
PARTITION BY RANGE (time);
-- Create time partitions for the first week in each device partition
CREATE TABLE conditions_p1_y2017m10w01 PARTITION OF conditions_p1
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p2_y2017m10w01 PARTITION OF conditions_p2
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p3_y2017m10w01 PARTITION OF conditions_p3
FOR VALUES FROM ('2017-10-01') TO ('2017-10-07');
CREATE TABLE conditions_p4_y2017m10w01 PARTITION OF conditions_p4
FOR VALUES FROM ('2017-10-01') TO (‘2017-10-07');
-- Create time-device index on each leaf partition
CREATE INDEX ON conditions_p1_y2017m10w01 (time);
CREATE INDEX ON conditions_p2_y2017m10w01 (time);
CREATE INDEX ON conditions_p3_y2017m10w01 (time);
CREATE INDEX ON conditions_p4_y2017m10w01 (time);
INSERT INTO conditions VALUES ('2017-10-03 10:23:54+01',
73.4, 40.7, ‘sensor3');
44. Single node: Scaling up via adding disks
• Faster inserts
• Parallelized queries
How Benefit
Chunks spread across many disks (elastically!)
either RAIDed or via distinct tablespaces
46. Multi-node: Scaling out across sharded primaries
U
nderdevelopm
ent
• Chunks spread across servers
• Insert/query to any server
• Distributed query optimizations
(push-down LIMITs and aggregates, etc.)
48. SELECT time, temp FROM data
WHERE time > now() - interval ‘7 days’
AND device_id = ‘12345’
Avoid querying chunks via constraint exclusion
49. Avoid querying chunks via constraint exclusion
SELECT time, device_id, temp FROM data
WHERE time > ‘2017-08-22 18:18:00+00’
50. Avoid querying chunks via constraint exclusion
SELECT time, device_id, temp FROM data
WHERE time > now() - interval ’24 hours’
51. Additional time-based query optimizations
PG doesn’t
know to use
the index
CREATE INDEX ON readings(time);
SELECT date_trunc(‘minute’, time) as bucket,
avg(cpu)
FROM readings
GROUP BY bucket
ORDER BY bucket DESC
LIMIT 10;
Timescale
understands
time
52. Global queries but local indexes
• Constraint exclusion selects chunks globally
• Local indexes speed up queries on chunks
– B-tree, Hash, GiST, SP-GiST, GIN and BRIN
– Secondary and composite columns, UNIQUE* constraints
53. Optimized for many chunks
• Faster chunk exclusion
– Avoid opening / gather stats on all chunks during constraint exclusion:
Decreased planning on 4000 chunks from 600ms to 36ms
• Better LIMITs across chunks
– Avoid requiring one+ tuple per chunk during MergeAppend / LIMIT
54. “ We've been using TimescaleDB for over a year to
store all kinds of sensor and telemetry data as part of
our Power Management database.
We've scaled to 500 billion rows and the performance
we're seeing is monstrous, almost 70% faster queries.”
- Sean Wallace, Software Engineer
500B
ROWS
400K
ROWS / SEC
50K
CHUNKS
5min
INTERVALS
55. Efficient retention policies
SELECT time, device_id, temp FROM data
WHERE time > now() - interval ’24 hours’
Drop chunks, don’t delete rows
avoids vacuuming
60. Data Retention + Aggregations
Granularity raw 15 min day
Retention 1 week 1 month forever
61. Unlock the richness of your monitoring data
TimescaleDB
+
PostgreSQL
Prometheus
Remote Storage Adapter
+
pg_prometheus
Prometheus Grafana
62. pg_prometheus
Prometheus Data Model in TimescaleDB / PostgreSQL
CREATE TABLE metrics (sample prom_sample);
INSERT INTO metrics
VALUES (‘cpu_usage{service=“nginx”,host=“machine1”} 34.6 1494595898000’);
• Scrape metrics with CURL:
curl http://myservice/metrics | grep -v “^#” | psql -c “COPY metrics FROM STDIN”
• New data type prom_sample: <time, name, value, labels>
63. Automate normalized storage
SELECT create_prometheus_table(‘metrics’);
Time
01:02:00
01:03:00
01:04:00
01:04:00
01:04:00
Value
90
1024
70
900
70
Label
{host: “h001”}
{host: “h002”}
{host: “1984” }
{host: “super”}
{host: “marshal”}
Id
1
2
3
4
5
Label Id
1
1
2
2
5
Name
CPU
Mem
CPU
Mem
IO
Labels stored in separate host metadata table
64. Easily query auto-created view
SELECT sample
FROM metrics
WHERE time > NOW() - interval ’10 min’ AND
name = ‘cpu_usage’ AND
Labels @> ‘{“service”: “nginx”}’;
Columns: | sample | time | name | value | labels |