The document discusses using Bloom filters and cache sketches to enable caching of dynamic content across the web caching hierarchy. A cache sketch is a compact probabilistic data structure that allows clients and servers to track cache invalidations and revalidate cached data. This approach aims to keep cached data fresh while minimizing network requests. It could enable low-latency delivery of dynamic content from ubiquitous caches like content delivery networks and browsers.
Bloom Filters for Web Caching - Lightning TalkFelix Gessert
This document discusses using Bloom filters to cache dynamic web content for low latency. It describes how Bloom filters can be used to proactively revalidate cached data and check if it is still fresh. An end-to-end example is provided showing how Bloom filters could work from the browser cache to a CDN to check cached objects and estimate their time-to-live. Code snippets demonstrate integrating Bloom filters into querying and loading data from a backend database to leverage caching. The goal is to serve dynamic content from ubiquitous web caches for low latency with less processing.
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Cloud Databases in Research and PracticeFelix Gessert
Felix Gessert is a PhD student in the database group at the University of Hamburg. His research project for his PhD focuses on cloud database startups. In the presentation, Gessert categorizes different types of cloud databases including database-as-a-service, infrastructure-as-a-service, platform-as-a-service, managed RDBMS/DWH/NoSQL databases, proprietary databases/object stores, backend-as-a-service, analytics-as-a-service, and cloud-deployed databases. He then discusses common aspects of database-as-a-service including multi-tenancy approaches, billing models, authentication, authorization, and service level agreements.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
This document provides an overview of NoSQL data stores and techniques for scalable data management. It begins with an introduction to NoSQL and the motivations for using specialized data systems instead of traditional relational databases. It then covers the four main classes of NoSQL databases - key-value stores, wide-column stores, document stores, and graph databases. The document also discusses the CAP theorem and its implications, as well as common techniques like sharding, replication, and query processing that NoSQL databases employ to achieve scalability and high availability. The goal is to help readers understand how to approach decisions around which database system may be best for their needs and requirements.
MongoDB World 2016: NOW TV and Linear Streaming: Scaling MongoDB for High Loa...MongoDB
The document discusses improvements made to NOW TV's streaming platform to better handle unpredictable load from live linear streaming events. Key issues addressed include:
- Heartbeats were changed to not terminate streams on non-OK responses to be more resilient during outages.
- Concurrency tracking was improved by tracking playout slots by device ID rather than just ID, to reclaim slots after app crashes.
- Product data storage was optimized by storing entitlements rather than duplicating product documents.
- Viewing history APIs were improved by merging viewings and bookmark collections and adding indexes.
- MongoDB indexing was optimized to improve performance of queries for viewing history and other APIs.
Pachyderm: Building a Big Data Beast On KubernetesKubeAcademy
Pachyderm is a containerized data analytics solution that's completely deployed using Kubernetes. We take all the amazing tools and potential in the container ecosystem and unlock that power for massive-scale data processing. In this talk we'll show you how to leverage Docker, Kubernetes, and Pachyderm, to build incredibly robust and scalable data infrastructure. We'll start by discussing the key components of a modern data-drive company and how your infrastructure choices can have a massive impact on your product and scalability roadmap. We'll then dive into some architecture details to show how Kubernetes, Docker, and Pachyderm all work in tandem to create a cohesive data infrastructure stack. Finally, we will demonstrate some high-level use cases and powerful benefits you get from the architecture we've outlined.
KubeCon schedule link: http://sched.co/4WWA
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
Speaker: Joseph Fluckiger, Senior Software Architect, ThermoFisher Scientific
Level: 200 (Intermediate)
Track: Atlas
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments – a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
What You Will Learn:
- How we modeled mass spectrometry data to enable us to write and read an enormous about of experimental data efficiently.
- Learn about the best MongoDB tools and patterns for .NET applications.
- Live demo of scaling a MongoDB Atlas cluster with zero down time and visualizing live data from a million dollar Mass Spectrometer stored in MongoDB.
Bloom Filters for Web Caching - Lightning TalkFelix Gessert
This document discusses using Bloom filters to cache dynamic web content for low latency. It describes how Bloom filters can be used to proactively revalidate cached data and check if it is still fresh. An end-to-end example is provided showing how Bloom filters could work from the browser cache to a CDN to check cached objects and estimate their time-to-live. Code snippets demonstrate integrating Bloom filters into querying and loading data from a backend database to leverage caching. The goal is to serve dynamic content from ubiquitous web caches for low latency with less processing.
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Cloud Databases in Research and PracticeFelix Gessert
Felix Gessert is a PhD student in the database group at the University of Hamburg. His research project for his PhD focuses on cloud database startups. In the presentation, Gessert categorizes different types of cloud databases including database-as-a-service, infrastructure-as-a-service, platform-as-a-service, managed RDBMS/DWH/NoSQL databases, proprietary databases/object stores, backend-as-a-service, analytics-as-a-service, and cloud-deployed databases. He then discusses common aspects of database-as-a-service including multi-tenancy approaches, billing models, authentication, authorization, and service level agreements.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
This document provides an overview of NoSQL data stores and techniques for scalable data management. It begins with an introduction to NoSQL and the motivations for using specialized data systems instead of traditional relational databases. It then covers the four main classes of NoSQL databases - key-value stores, wide-column stores, document stores, and graph databases. The document also discusses the CAP theorem and its implications, as well as common techniques like sharding, replication, and query processing that NoSQL databases employ to achieve scalability and high availability. The goal is to help readers understand how to approach decisions around which database system may be best for their needs and requirements.
MongoDB World 2016: NOW TV and Linear Streaming: Scaling MongoDB for High Loa...MongoDB
The document discusses improvements made to NOW TV's streaming platform to better handle unpredictable load from live linear streaming events. Key issues addressed include:
- Heartbeats were changed to not terminate streams on non-OK responses to be more resilient during outages.
- Concurrency tracking was improved by tracking playout slots by device ID rather than just ID, to reclaim slots after app crashes.
- Product data storage was optimized by storing entitlements rather than duplicating product documents.
- Viewing history APIs were improved by merging viewings and bookmark collections and adding indexes.
- MongoDB indexing was optimized to improve performance of queries for viewing history and other APIs.
Pachyderm: Building a Big Data Beast On KubernetesKubeAcademy
Pachyderm is a containerized data analytics solution that's completely deployed using Kubernetes. We take all the amazing tools and potential in the container ecosystem and unlock that power for massive-scale data processing. In this talk we'll show you how to leverage Docker, Kubernetes, and Pachyderm, to build incredibly robust and scalable data infrastructure. We'll start by discussing the key components of a modern data-drive company and how your infrastructure choices can have a massive impact on your product and scalability roadmap. We'll then dive into some architecture details to show how Kubernetes, Docker, and Pachyderm all work in tandem to create a cohesive data infrastructure stack. Finally, we will demonstrate some high-level use cases and powerful benefits you get from the architecture we've outlined.
KubeCon schedule link: http://sched.co/4WWA
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
Speaker: Joseph Fluckiger, Senior Software Architect, ThermoFisher Scientific
Level: 200 (Intermediate)
Track: Atlas
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments – a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
What You Will Learn:
- How we modeled mass spectrometry data to enable us to write and read an enormous about of experimental data efficiently.
- Learn about the best MongoDB tools and patterns for .NET applications.
- Live demo of scaling a MongoDB Atlas cluster with zero down time and visualizing live data from a million dollar Mass Spectrometer stored in MongoDB.
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
This document summarizes Plazma, Treasure Data's distributed analytical database that can import 40 billion records per day. It discusses how Plazma reliably imports and processes large volumes of data through its scalable architecture with real-time and archive storage. Data is imported using Fluentd and processed using its column-oriented, schema-on-read design to enable fast queries. The document also covers Plazma's transaction API and how it is optimized for metadata operations.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...Databricks
Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system that leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over petabytes of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. This session will briefly introduce Alluxio and present different ways that Alluxio can help Spark jobs. Get best practices for using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.
MongoDB has grown significantly since 2009, with over 20 million downloads and 20,000-30,000 downloads per day. MongoDB aims to continually improve, extend, and innovate its database platform. Some recent improvements include the WiredTiger storage engine, replica set elections, and document validation. MongoDB also aims to extend its capabilities with features like the $lookup aggregation operator, BI connectors, and read-only views. MongoDB is innovating in areas like mixed storage engine sets, zones for geographical distribution, and cloud services like MongoDB Atlas.
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB
What happens when you need to combine data from MongoDB along with other systems into a cohesive view for business intelligence? How do you extract, transform, and load MongoDB data into a centralized data warehouse? In this session we’ll talk about Google BigQuery, a managed, petabyte-scale data warehouse, and the various ways to get MongoDB data into it. We’ll cover managed options like Apache Beam and Cloud Dataflow as well as other tools that can help make moving and using MongoDB data easy for business intelligence workloads.
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Speed Kit: Getting Websites out of the Web Performance Stone AgeFelix Gessert
Page load time is money. This is not only true for companies like Amazon that lose more than $1.3B in revenue per year if their website is a 10th of a second slower. It is also true for publishers, whose business model depends on a user experience that facilitates consumption of as much content as possible. However, many publishers have heterogeneous and complex technology stacks that make it extremely hard to tackle performance, scalability and page load time.
Novel browser technologies now offer a means of making web performance as simple as including a script. The research spin-off Baqend has developed a "Speed Kit" that directly hooks into existing publisher, e-commerce and brand website and makes them 50-300% faster. In this pitch, we will explain how the technology works for large websites.
MongoDB Ops Manager and Kubernetes - James BroadheadMongoDB
This document provides an overview of how to deploy and manage MongoDB in Kubernetes using the MongoDB Enterprise Kubernetes Operator. It begins with introductions to Kubernetes, Operators, and MongoDB Ops Manager. It then demonstrates deploying a sharded MongoDB cluster with the Operator, including examples of operational tasks like version upgrades, horizontal and vertical scaling. The document also covers topics like fault tolerance, performance, and automation benefits of the Operator. It concludes by stating the current beta status of the Operator and encouraging questions.
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
MongoDB introduces new capabilities that change the way micro-services interact with the database, capabilities that are either absent or exist only partially in high-end commercial databases such as Oracle. In this session I will share from my experiences building a cloud-based, multi-tenant SaaS application with extreme security requirements. We will cover topics including considerations for storing multi-tenant data in the database, best practices for authentication and authorization, and performance considerations specific to security in MongoDB.
The document discusses Docker and how DataStax Enterprise (DSE) can be run within Docker containers. It provides background on Docker, key Docker concepts like images and containers, and the benefits of containers for application development and deployment. It then covers specifics of running DSE in Docker, including configuring processes, networking, storage, and best practices. It concludes with potential future work like splitting DSE processes across containers and integrating with orchestration platforms.
Creating Real-Time Data Mashups with Node.js and Adobe CQ by Josh Millerrtpaem
This document discusses using Node.js to create real-time data mashups with Adobe CQ. Node.js is well-suited for handling real-time data due to its high throughput capabilities, while Adobe CQ excels at content management. The document provides an overview of integrating the two platforms by consuming CQ data via its REST API and combining it with real-time data sources in Node.js applications. Examples of common use cases for Node.js and gotchas to be aware of when working with the technology are also presented.
Presto is a distributed SQL query engine that allows for interactive analysis of large datasets across various data sources. It was created at Facebook to enable interactive querying of data in HDFS and Hive, which were too slow for interactive use. Presto addresses problems with existing solutions like Hive being too slow, the need to copy data for analysis, and high costs of commercial databases. It uses a distributed architecture with coordinators planning queries and workers executing tasks quickly in parallel.
The document discusses using spot instances with Druid for cost savings. It describes that spot instances provide lower costs but less availability than on-demand instances. The document outlines how Druid is configured to use Terraform and Helm for infrastructure setup and deployment. It also discusses how Druid's stateless architecture and redundancy across middle managers and historical nodes allows it to withstand spot instance interruptions without data loss.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:
-Network flow use cases and why this data is important.
-Reference architectures from production systems at a major international Bank.
-Why Kafka and Druid and other OSS tools for Network Flows.
-A demo of one such system.
Couchbase Sydney meetup #1 Couchbase Architecture and ScalabilityKarthik Babu Sekar
This document provides an overview of a Couchbase meetup presentation on Couchbase architecture and scalability. The presentation covers the history of Couchbase, its key capabilities as a data management solution, architecture including services for data, indexing and querying, common use cases, what makes Couchbase unique, always-on availability features, administration and management, performance and scalability including auto sharding and cross data center replication, and examples of using its query language N1QL.
Cloud Dataflow is a fully managed service and SDK from Google that allows users to define and run data processing pipelines. The Dataflow SDK defines the programming model used to build streaming and batch processing pipelines. Google Cloud Dataflow is the managed service that will run and optimize pipelines defined using the SDK. The SDK provides primitives like PCollections, ParDo, GroupByKey, and windows that allow users to build unified streaming and batch pipelines.
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !! Karthik Babu Sekar
The document discusses new features and improvements in Couchbase 4.6, including timestamp-based conflict resolution for cross datacenter replication, secret management and pluggable authentication modules for security, and new CBImport and CBExport tools. It also covers updates to search and query functionality.
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...MongoDB
Find out more about our journey of migrating to MongoDB after using Oracle for our hotel search database for over ten years.
- How did we solve the synchronization problem with the Master Database?
- How to get fast search results (even with massive write operations)?
- How other issues were solved
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB
This document summarizes key information about Saavn, India's largest music streaming service. Some key points:
- Saavn has 18 million global monthly active users, with 14 million in India. The majority (64%) use Android devices to access over 25 million tracks.
- Push notifications are a primary driver of mobile app growth for Saavn. They send over 30 million notifications per day and see 3x more engagement from targeted notifications.
- Saavn stores user notification messages and activity data in MongoDB. They upgraded to WiredTiger for its document locking and high performance. Maintaining over 500GB of user data required implementing sharding and migrating the data.
- Tools like
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
This document summarizes Plazma, Treasure Data's distributed analytical database that can import 40 billion records per day. It discusses how Plazma reliably imports and processes large volumes of data through its scalable architecture with real-time and archive storage. Data is imported using Fluentd and processed using its column-oriented, schema-on-read design to enable fast queries. The document also covers Plazma's transaction API and how it is optimized for metadata operations.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...Databricks
Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system that leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over petabytes of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. This session will briefly introduce Alluxio and present different ways that Alluxio can help Spark jobs. Get best practices for using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.
MongoDB has grown significantly since 2009, with over 20 million downloads and 20,000-30,000 downloads per day. MongoDB aims to continually improve, extend, and innovate its database platform. Some recent improvements include the WiredTiger storage engine, replica set elections, and document validation. MongoDB also aims to extend its capabilities with features like the $lookup aggregation operator, BI connectors, and read-only views. MongoDB is innovating in areas like mixed storage engine sets, zones for geographical distribution, and cloud services like MongoDB Atlas.
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB
What happens when you need to combine data from MongoDB along with other systems into a cohesive view for business intelligence? How do you extract, transform, and load MongoDB data into a centralized data warehouse? In this session we’ll talk about Google BigQuery, a managed, petabyte-scale data warehouse, and the various ways to get MongoDB data into it. We’ll cover managed options like Apache Beam and Cloud Dataflow as well as other tools that can help make moving and using MongoDB data easy for business intelligence workloads.
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Speed Kit: Getting Websites out of the Web Performance Stone AgeFelix Gessert
Page load time is money. This is not only true for companies like Amazon that lose more than $1.3B in revenue per year if their website is a 10th of a second slower. It is also true for publishers, whose business model depends on a user experience that facilitates consumption of as much content as possible. However, many publishers have heterogeneous and complex technology stacks that make it extremely hard to tackle performance, scalability and page load time.
Novel browser technologies now offer a means of making web performance as simple as including a script. The research spin-off Baqend has developed a "Speed Kit" that directly hooks into existing publisher, e-commerce and brand website and makes them 50-300% faster. In this pitch, we will explain how the technology works for large websites.
MongoDB Ops Manager and Kubernetes - James BroadheadMongoDB
This document provides an overview of how to deploy and manage MongoDB in Kubernetes using the MongoDB Enterprise Kubernetes Operator. It begins with introductions to Kubernetes, Operators, and MongoDB Ops Manager. It then demonstrates deploying a sharded MongoDB cluster with the Operator, including examples of operational tasks like version upgrades, horizontal and vertical scaling. The document also covers topics like fault tolerance, performance, and automation benefits of the Operator. It concludes by stating the current beta status of the Operator and encouraging questions.
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
MongoDB introduces new capabilities that change the way micro-services interact with the database, capabilities that are either absent or exist only partially in high-end commercial databases such as Oracle. In this session I will share from my experiences building a cloud-based, multi-tenant SaaS application with extreme security requirements. We will cover topics including considerations for storing multi-tenant data in the database, best practices for authentication and authorization, and performance considerations specific to security in MongoDB.
The document discusses Docker and how DataStax Enterprise (DSE) can be run within Docker containers. It provides background on Docker, key Docker concepts like images and containers, and the benefits of containers for application development and deployment. It then covers specifics of running DSE in Docker, including configuring processes, networking, storage, and best practices. It concludes with potential future work like splitting DSE processes across containers and integrating with orchestration platforms.
Creating Real-Time Data Mashups with Node.js and Adobe CQ by Josh Millerrtpaem
This document discusses using Node.js to create real-time data mashups with Adobe CQ. Node.js is well-suited for handling real-time data due to its high throughput capabilities, while Adobe CQ excels at content management. The document provides an overview of integrating the two platforms by consuming CQ data via its REST API and combining it with real-time data sources in Node.js applications. Examples of common use cases for Node.js and gotchas to be aware of when working with the technology are also presented.
Presto is a distributed SQL query engine that allows for interactive analysis of large datasets across various data sources. It was created at Facebook to enable interactive querying of data in HDFS and Hive, which were too slow for interactive use. Presto addresses problems with existing solutions like Hive being too slow, the need to copy data for analysis, and high costs of commercial databases. It uses a distributed architecture with coordinators planning queries and workers executing tasks quickly in parallel.
The document discusses using spot instances with Druid for cost savings. It describes that spot instances provide lower costs but less availability than on-demand instances. The document outlines how Druid is configured to use Terraform and Helm for infrastructure setup and deployment. It also discusses how Druid's stateless architecture and redundancy across middle managers and historical nodes allows it to withstand spot instance interruptions without data loss.
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.
In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics:
-Network flow use cases and why this data is important.
-Reference architectures from production systems at a major international Bank.
-Why Kafka and Druid and other OSS tools for Network Flows.
-A demo of one such system.
Couchbase Sydney meetup #1 Couchbase Architecture and ScalabilityKarthik Babu Sekar
This document provides an overview of a Couchbase meetup presentation on Couchbase architecture and scalability. The presentation covers the history of Couchbase, its key capabilities as a data management solution, architecture including services for data, indexing and querying, common use cases, what makes Couchbase unique, always-on availability features, administration and management, performance and scalability including auto sharding and cross data center replication, and examples of using its query language N1QL.
Cloud Dataflow is a fully managed service and SDK from Google that allows users to define and run data processing pipelines. The Dataflow SDK defines the programming model used to build streaming and batch processing pipelines. Google Cloud Dataflow is the managed service that will run and optimize pipelines defined using the SDK. The SDK provides primitives like PCollections, ParDo, GroupByKey, and windows that allow users to build unified streaming and batch pipelines.
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !! Karthik Babu Sekar
The document discusses new features and improvements in Couchbase 4.6, including timestamp-based conflict resolution for cross datacenter replication, secret management and pluggable authentication modules for security, and new CBImport and CBExport tools. It also covers updates to search and query functionality.
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...MongoDB
Find out more about our journey of migrating to MongoDB after using Oracle for our hotel search database for over ten years.
- How did we solve the synchronization problem with the Master Database?
- How to get fast search results (even with massive write operations)?
- How other issues were solved
MongoDB World 2016: Scaling Targeted Notifications in the Music Streaming Wor...MongoDB
This document summarizes key information about Saavn, India's largest music streaming service. Some key points:
- Saavn has 18 million global monthly active users, with 14 million in India. The majority (64%) use Android devices to access over 25 million tracks.
- Push notifications are a primary driver of mobile app growth for Saavn. They send over 30 million notifications per day and see 3x more engagement from targeted notifications.
- Saavn stores user notification messages and activity data in MongoDB. They upgraded to WiredTiger for its document locking and high performance. Maintaining over 500GB of user data required implementing sharding and migrating the data.
- Tools like
Eine durchschnittliche Webseite lädt 2299KB an Daten und macht dafür 100 HTTP Anfragen. Dass Ladezeiten einen immensen Einfluss auf User-Zufriedenheit und Business-Metriken haben, bezweifelt dieser Tage niemand mehr. Aber die Meinungen darüber, welche Techniken Ladezeiten effektiv minimieren, gehen weit auseinander. Dieser Vortrag gibt einen detaillierten Überblick zu den wichtigsten Techniken der Web Performance Optimierung vom Critical Rendering Path bis zu verteilten Caching-Infrastrukturen an einem Beispiel aus der Praxis.
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
In this talk we share the lessons learned while building out the Baqend Cloud platform on AWS and Docker. Baqend’s AWS-hosted architecture consists of a caching CDN-Layer, global and local load balancing, a group of REST and Node.js servers and a database cluster with Redis and MongoDB. As customers have their own set of containerized REST and Node servers, we needed a cluster that on the one hand is horizontally scalable and on the other hand easily manageable and fault-tolerant from an operational perspective. Today there are at least 4 popular systems that claim to support this:
- Kubernetes
- Apache Mesos
- Docker Swarm
- AWS Elastic Container Service (ECS)
Thinking that ECS would certainly be the easiest option on AWS, we started building our cluster on it. We quickly came to realize that while ECS was astoundingly stable and easy to use there were inherent limitations that could not be worked around. An old Docker version, missing network isolation, no means of parameterizing task and forced memory constraints are major limitations of ECS we will talk about. Seeing the daunting operational overhead of running Kubernetes or Mesos in practice we turned to Docker’s native clustering solution Swarm. We will present how Swarm works with both Docker and AWS and highlight the advantages and downsides compared to Amazon’s ECS.
MongoDB World 2016: Get MEAN and Lean with MongoDB and KubernetesMongoDB
1) The document discusses using MongoDB and Kubernetes to reduce impedance mismatches in software stacks and deployment processes.
2) It proposes using a MEAN stack with MongoDB as the database to align the client, server, and data layers. Docker is used to package the application and Kubernetes manages deploying containers across a cluster.
3) The presentation includes demos of deploying a MEAN app to Kubernetes and running MongoDB on Kubernetes, including recovering from node failures through replication and services.
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business intelligence helps analyze and model large amounts of data. Looker offers a modern approach to BI leveraging AWS that’s fast, agile, and easy to manage. Join this webinar to learn how MessageMe, which provides emotionally engaging messaging apps to consumers, leverages Looker business intelligence software and the Amazon Redshift data warehouse service to analyze billions of rows of customer data in seconds.
Webinar topics include:
• How MessageMe turns billions of rows of customer data stored in Amazon Redshift into actionable insights
• How Looker connects directly to Amazon Redshift in just a few clicks, enabling MessageMe to build a modern, big data analytics in the cloud. Who should attend
• Information or Solution Architects, Data Analysts, BI Directors, DBAs, Development Leads, Developers, or Technical IT Leaders.
Presenters:
• Justin Rosenthal, CTO, MessageMe
• Keenan Rice, VP, Marketing & Alliances, Looker
• Tina Adams, Senior Product Manager, AWS
The title of this talk is a crass attempt to be catchy and topical, by referring to the recent victory of Watson in Jeopardy.
My point (perhaps confusingly) is not that new computer capabilities are a bad thing. On the contrary, these capabilities represent a tremendous opportunity for science. The challenge that I speak to is how we leverage these capabilities without computers and computation overwhelming the research community in terms of both human and financial resources. The solution, I suggest, is to get computation out of the lab—to outsource it to third party providers.
Abstract follows:
We have made much progress over the past decade toward effective distributed cyberinfrastructure. In big-science fields such as high energy physics, astronomy, and climate, thousands benefit daily from tools that enable the distributed management and analysis of vast quantities of data. But we now face a far greater challenge. Exploding data volumes and new research methodologies mean that many more--ultimately most?--researchers will soon require similar capabilities. How can we possible supply information technology (IT) at this scale, given constrained budgets? Must every lab become filled with computers, and every researcher an IT specialist?
I propose that the answer is to take a leaf from industry, which is slashing both the costs and complexity of consumer and business IT by moving it out of homes and offices to so-called cloud providers. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity, empowering investigators with new capabilities and freeing them to focus on their research.
I describe work we are doing to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date, and suggest a path towards large-scale delivery of these capabilities. I also suggest that these developments are part of a larger "revolution in scientific affairs," as profound in its implications as the much-discussed "revolution in military affairs" resulting from more capable, low-cost IT. I conclude with some thoughts on how researchers, educators, and institutions may want to prepare for this revolution.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
IWMW 2003: C7 Bandwidth Management Techniques: Technical And Policy IssuesIWMW
Slides used in workshop session C7 on "Bandwidth Management Techniques: Technical And Policy Issues" at the IWMW 2003 event held at the University of Kent on 11-13 June 2003.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2003/sessions/#workshops-c
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
This document summarizes a presentation on patterns for cloud computing using Microsoft Azure. It discusses key concepts like roles, storage, and messaging in Azure. It provides an example of using cloud computing for dynamic scaling of web applications. The presentation also covers development tools and experiences for Azure, different storage options like blobs, tables and SQL Azure, and lessons learned.
The document discusses Microsoft Azure, a cloud computing platform. It provides an overview of key Azure concepts like scalability, flexible pricing models, and global datacenter infrastructure. It also describes Azure services like compute, storage, SQL databases, and AppFabric that help developers build and scale applications in the cloud. Commercial pricing information is included to show how Azure offers flexible consumption-based pricing based on actual usage.
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...MongoDB
Presented by Michael Fulke, Development Team Lead, Royal Bank of Scotland
Experience level: Beginner
When addressing common investment banking use-cases, incumbent application architectures have proven themselves to be complex, difficult to maintain and expensive. Driven by the apparently competing pressures of cost and agility, RBS used MongoDB to build a common enterprise data fabric which is underpinning several core trading platforms. In this session, you will learn how RBS has successfully integrated MongoDB into a wider Java-based architecture, built with a strong open source bias.
Scalable Architectures - Microsoft Finland DevDays 2014Kallex
The document discusses scaling a digital service called TeamUp to serve tens of millions of users. TeamUp allows talents, fans, and sponsors to connect. It was originally built using ASP.NET MVC but faced challenges scaling to large numbers of users. The summary proposes migrating to a scalable architecture by storing data in JSON files in Azure Blob Storage and serving content directly from the blobs to improve performance and reduce costs. Caching at various levels from the mobile apps to CDNs is also discussed to further improve scalability.
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
This document discusses using Azure services for big data analytics and data insights. It provides an overview of Azure services like Azure Batch, Azure Data Lake, Azure HDInsight and Power BI. It then describes a demo solution that uses these Azure services to analyze job posting data, including collecting data using a .NET application, storing in Azure Data Lake Store, processing with Azure Data Lake Analytics and Azure HDInsight, and visualizing results in Power BI. The presentation includes architecture diagrams and discusses implementation details.
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
Co-presented with Will Perry (@willpe). Real-world experiences using CouchDB inside Microsoft, and also how to get started with CouchDB on Microsoft Azure.
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document discusses various techniques for optimizing UI performance, including optimizing caching, minimizing round-trip times, minimizing request size, minimizing payload size, and optimizing browser rendering. Specific techniques mentioned include leveraging browser and proxy caching, minimizing DNS lookups and redirects, combining external JavaScript, minimizing cookie and request size, enabling gzip compression, and optimizing images. Profiling and heap analysis tools are also discussed for diagnosing backend performance issues.
The document provides an overview of data mesh principles and hands-on examples for implementing a data mesh. It discusses key concepts of a data mesh including data ownership by domain, treating data as a product, making data available everywhere through self-service, and federated governance of data wherever it resides. Hands-on examples are provided for creating a data mesh topology with Apache Kafka as the underlying infrastructure, developing data products within domains, and exploring consumption of real-time and historical data from the mesh.
This document discusses optimizing the client-side performance of websites. It describes how reducing HTTP requests through techniques like image maps, CSS sprites, and combining scripts and stylesheets can improve response times. It also recommends strategies like using a content delivery network, adding expiration headers, compressing components, correctly structuring CSS and scripts, and optimizing JavaScript code and Ajax implementations. The benefits of a performant front-end are emphasized, as client-side optimizations often require less time and resources than back-end changes.
AJAX allows web pages to be updated asynchronously by exchanging data with a web server behind the scenes, allowing parts of a page to change without reloading the entire page. Tuenti uses AJAX extensively to update parts of their single-page application, caching content on both client and server sides for scalability. They route requests to different server farms based on client location and cache content to improve performance. Tuenti serves billions of images per day using multiple CDNs and pre-fetches content to minimize load times.
Similar to Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times (20)
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
14. Achieve a fast render of the page by:
◦ Reducing the critical resources needed
◦ Reducing the critical bytes which must be transferred
◦ Loading JS, CSS and HTML templates asynchronously
◦ Rendering the page progressively
◦ Minifying & Concatenating CSS, JS and images
Frontend Performance
Break-down of the Critical Rendering Path
Google Developers, Web Fundamentals
https://developers.google.com/web/fundamentals/performance/critical-rendering-path/analyzing-crp.
15. Well known problem & good tooling:
◦ Optimizing CSS (postcss)
◦ Concatenating CSS and JS (processhtml)
◦ Minification and Compression (cssmin, UglifyJS, Google
Closure, imagemin)
◦ Inline the critical CSS (addyosmani/critical)
◦ Hash assets to make them cacheable (gulp-rev-all)
Frontend Performance
Tools to improve your page load
16. Network Performance
Break down of a single resource load
DNS Lookup
◦ Every domain has its own DNS lookup
Initial connection
◦ TCP makes a three way handshake 2 roundtrips
◦ SSL connections have a more complex handshake +2 roundtrips
Time to First Byte
◦ Depends heavily on the distance between client and the backend
◦ Includes the time the backend needs to render the page
Session lookups, Database Queries, Template rendering …
Content Download
◦ Files have a high transfer time on new connections, since the initial
congestion window is small many roundtrips
17. Network Performance
Common Tuning Knobs
Persistent connections, if possible HTTP/2
Avoid redirects
Explicit caching headers (no heuristic caching)
Content Delivery Networks
◦ To reduce the distance between client and server
◦ To cache images, CSS, JS
◦ To terminate SSL early and optimized
Single Page Apps:
◦ Small initial page that loads additional parts asynchronously
◦ Cacheable HTML templates + load dynamic data
◦ Only update sections of the page during navigation
19. Network Latency: Impact
I. Grigorik, High performance browser networking.
O’Reilly Media, 2013.
2× Bandwidth = Same Load Time
½ Latency ≈ ½ Load Time
21. Polaris:
Idea: construct graph that captures real read/write and
write/write JS/CSS dependencies
Improvement: ~30% depending on RTT and bandwidth
Limitation: cannot deal with non-determinism, requires server to
generate a dependency graph for each client view
Research Approaches
Two Examples
Netravali, Ravi, James Mickens, et al. Polaris: Faster Page Loads
Using Fine-grained Dependency Tracking, NSDI 2016
22. Shandian:
Idea: Proxy is more powerful than browser, especially mobile
evaluate page on proxy
Improvement: ~50% for slow Android device
Limitation: needs modified browser, only useful for slow devices
Research Approaches
Two Examples
Wang, Xiao Sophia, Arvind Krishnamurthy, and David Wetherall.
"Speeding up Web Page Loads with Shandian." NSDI 2016.
Client Proxy
23. Shandian:
Idea: Proxy is more powerful than browser especially mobile ->
evaluate page on proxy
Improvement: ~50% for slow Android device
Limitation: needs modified browser, only useful for slow devices
Other Research Approaches
Two Examples
Wang, Xiao Sophia, Arvind Krishnamurthy, and David Wetherall.
"Speeding up Web Page Loads with Shandian." NSDI 2016.
Client Proxy
Many good ideas in current research,
but:
o Only applicable to very few use cases
o Mostly require modified browsers
o Small performance improvements
24. Performance: State of the Art
Summarized
Frontend Latency Backend
• Doable with the
right set of best
practices
• Good support
through build tools
• Caching and CDNs
help, but a
considerable effort
and only for static
content
• Many frameworks
and platforms
• Horizontal
scalability is very
difficult
25. Performance: State of the Art
Summarized
Frontend Latency Backend
• Easy with the right
set of best
practices
• Good support
through build tools
• Caching and CDNs
help, but large
effort and only for
static content
• Many frameworks
and platforms
• Horizontal
scalability is very
difficult
Good Resources:
Good Tools:
https://developers.google.com/web/fundamentals/performance/?hl=en
https://www.udacity.com/course/website-performance-optimization--ud884chimera.labs.oreilly.com/
books/1230000000545
shop.oreilly.com/produc
t/0636920033578.do
https://developers.google.com/speed/
pagespeed/
https://gtmetrix.com http://www.webpagetest.org/
26. Performance: State of the Art
Summarized
Frontend Latency Backend
• Doable with the
right set of best
practices
• Good support
through build tools
• Caching and CDNs
help, but large
effort and only for
static content
• Many frameworks
and platforms
• Horizontal
scalability is very
difficult
How to cache & scale
dynamic content?
31. In a nutshell
Solution: Proactively Revalidate Data
Cache Sketch (Bloom filter)
updateIs still fresh? 1 0 11 0 0 10 1 1
32. Innovation
Solution: Proactively Revalidate Data
F. Gessert, F. Bücklers, und N. Ritter, „ORESTES: a Scalable
Database-as-a-Service Architecture for Low Latency“, in
CloudDB 2014, 2014.
F. Gessert und F. Bücklers, „ORESTES: ein System für horizontal
skalierbaren Zugriff auf Cloud-Datenbanken“, in Informatiktage
2013, 2013.
F. Gessert, S. Friedrich, W. Wingerath, M. Schaarschmidt, und
N. Ritter, „Towards a Scalable and Unified REST API for Cloud
Data Stores“, in 44. Jahrestagung der GI, Bd. 232, S. 723–734.
F. Gessert, M. Schaarschmidt, W. Wingerath, S. Friedrich, und
N. Ritter, „The Cache Sketch: Revisiting Expiration-based
Caching in the Age of Cloud Data Management“, in BTW 2015.
F. Gessert und F. Bücklers, Performanz- und
Reaktivitätssteigerung von OODBMS vermittels der Web-
Caching-Hierarchie. Bachelorarbeit, 2010.
F. Gessert und F. Bücklers, Kohärentes Web-Caching von
Datenbankobjekten im Cloud Computing. Masterarbeit 2012.
W. Wingerath, S. Friedrich, und F. Gessert, „Who Watches the
Watchmen? On the Lack of Validation in NoSQL
Benchmarking“, in BTW 2015.
M. Schaarschmidt, F. Gessert, und N. Ritter, „Towards
Automated Polyglot Persistence“, in BTW 2015.
S. Friedrich, W. Wingerath, F. Gessert, und N. Ritter, „NoSQL
OLTP Benchmarking: A Survey“, in 44. Jahrestagung der
Gesellschaft für Informatik, 2014, Bd. 232, S. 693–704.
F. Gessert, „Skalierbare NoSQL- und Cloud-Datenbanken in
Forschung und Praxis“, BTW 2015
33. Client
Expiration-
based Caches
Invalidation-
based Caches
Request
Path
Server/DB
Cache
Hits
Browser Caches,
Forward Proxies,
ISP Caches
Content Delivery
Networks,
Reverse Proxies
Expiration-based Caches:
An object x is considered
fresh for TTLx seconds
The server assigns TTLs
for each object
Invalidation-based Caches:
Expose object eviction
operation to the server
Web Caching Concepts
Invalidation- and expiration-based caches
34. Classic Web Caching: Example
A tiny image resizer
Desktop
Mobile
Tablet
Resized
once
Cached and
delivered many
times
35. The „Bloom filter principle“:
“Wherever a list or set is used, and space is at a premium,
consider using a Bloom filter if the effect of false positives can be
mitigated.”
Bloom filter Concepts
Compact Probabilistic Sets
A. Broder und M. Mitzenmacher, „Network applications
of bloom filters: A survey“, Internet Mathematics, 2004.
def insert(obj):
for each position in hashes(obj):
bits[position] = 1
def contains(obj):
for each position in hashes(obj):
if bits[position] == 0:
return false;
return true
Bit array of length m
k independent hash functions
insert(obj): add to set
contains(obj):
Always returns true if the
element was inserted
Might return true even
though it was not inserted
(false positive)
36. Bloom filter Concepts
Visualized
1 m
0 0 0 0 0 0 0 0 0 0
Empty Bloom Filter
1 m
0 1 0 0 0 0 1 0 0 1
Insert x
h1h2 h3
x
1 m
1 1 0 0 1 0 1 0 1 1
Insert y
h1h2 h3
y
Query x
1 m
1 1 0 0 1 0 1 0 1 1
h1h2 h3
=1?
n y
contained
37. Bloom filter Concepts
False Positives
False-Positive
for z
Query z
1 m
1 1 0 0 1 0 1 0 1 1
h1h2h3
=1?
y
contained
𝑓 ≈ 1 − 𝑒− ln 2 𝑘
≈ 0.6185
𝑚
𝑛
The false positive rate depends on the
bits m and the inserted elements n:
For f=1% the required bits per element are: 2.081 ln(1/0.01) = 9.5
39. Our Bloom filters
Example: Redis-backed Counting Bloom Filter
Redis-backed Bloom filters:
◦ Can be shared by many servers
◦ Highly efficient through Redis‘ bitwise operations
◦ Tunable persistence
Counting Bloom Filters: use counters instead of bits to
also allow removals
◦ Stores the materialized Bloom filter for fast retrieval
0 2 0 0 1 0 3 0 1 1COUNTS
0 1 0 0 1 0 1 0 1 1BITS
40. Idea: use standard HTTP Caching for query results and
records
Problems:
The Cache Sketch approach
Caching Dynamic Data
How to keep the
browser cache up-to-
date?
How to automatically
cache dynamic data in
a CDN?
When is data cacheable and
for how long approximately?
46. Client
Expiration-
based Caches
Invalidation-
based Caches
Request
Path
Server/DB
Cache
Hits
Browser Caches,
Forward Proxies,
ISP Caches
Content Delivery
Networks,
Reverse Proxies
at
connect
Periodic
every Δ
seconds
at
transaction
begin
2 31
Invalidations,
Records
Needs Invalidation?
Needs Revalidation?
The Cache Sketch approach
Letting the client handle cache coherence
Staleness-MinimizationInvalidation-Minimization
Client Cache Sketch
10101010 Bloom filter
Server Cache Sketch
10201040
10101010
Counting
Bloom Filter
Non-expired
Record Keys
Report Expirations
and Writes
47. The End to End Path of Requests
The Caching Hierarchy
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy Cache
Miss
Hit
Miss
Miss
Miss
Miss
Orestes
DB.posts.get(id) JavaScript
48. The End to End Path of Requests
The Caching Hierarchy
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy Cache
Miss
Hit
Miss
Miss
Miss
Miss
Orestes
GET /db/posts/{id} HTTP
49. The End to End Path of Requests
The Caching Hierarchy
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy Cache
Miss
Hit
Miss
Miss
Miss
Miss
Orestes
Cache-Hit: Return Object
Cache-Miss or Revalidation: Forward
Request
50. The End to End Path of Requests
The Caching Hierarchy
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy Cache
Miss
Hit
Miss
Miss
Miss
Miss
Orestes
Return record from DB
with caching TTL
51. The End to End Path of Requests
The Caching Hierarchy
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy Cache
Miss
Hit
Miss
Miss
Miss
Miss
Orestes
Updated by
Cache Sketch
Updated by the
server
52. Let ct be the client Cache Sketch generated at time t, containing
the key keyx of every record x that was written before it expired
in all caches, i.e. every x for which holds:
The Client Cache Sketch
∃ 𝑟(𝑥, 𝑡 𝑟, 𝑇𝑇𝐿), 𝑤 𝑥, 𝑡 𝑤 ∶ 𝑡 𝑟 + 𝑇𝑇𝐿 > 𝑡 > 𝑡 𝑤 > 𝑡 𝑟
k hash functions m Bloom filter bits
1 0 0 1 1 0 1 1
h1
hk
...keyfind(key)
Client Cache Sketch
Bits = 1
no
yes
GET request
Revalidation
Cache
Hit
Miss
key
key
53. Let ct be the client Cache Sketch generated at time t, containing
the key keyx of every record x that was written before it expired
in all caches, i.e. every x for which holds:
The Client Cache Sketch
∃ 𝑟(𝑥, 𝑡 𝑟, 𝑇𝑇𝐿), 𝑤 𝑥, 𝑡 𝑤 ∶ 𝑡 𝑟 + 𝑇𝑇𝐿 > 𝑡 > 𝑡 𝑤 > 𝑡 𝑟
k hash functions m Bloom filter bits
1 0 0 1 1 0 1 1
h1
hk
...keyfind(key)
Client Cache Sketch
Bits = 1
no
yes
GET request
Revalidation
Cache
Hit
Miss
key
key
JavaScript Bloomfilter:
~100 LOCs
~1M lookups per second
54. Let ct be the client Cache Sketch generated at time t, containing
the key keyx of every record x that was written before it expired
in all caches, i.e. every x for which holds:
The Client Cache Sketch
∃ 𝑟(𝑥, 𝑡 𝑟, 𝑇𝑇𝐿), 𝑤 𝑥, 𝑡 𝑤 ∶ 𝑡 𝑟 + 𝑇𝑇𝐿 > 𝑡 > 𝑡 𝑤 > 𝑡 𝑟
k hash functions m Bloom filter bits
1 0 0 1 1 0 1 1
h1
hk
...keyfind(key)
Client Cache Sketch
Bits = 1
no
yes
GET request
Revalidation
Cache
Hit
Miss
key
key
JavaScript Bloomfilter:
~100 LOCs
~1M lookups per second
Guarantee: data is never stale for more
than the age of the Cache Sketch
55. The Server Cache Sketch
Scalable Implementation
Performance > 200k
ops per second:
Add keyx if x unexpired
and write occured
Remove x from Blom
filter when expired
Load Bloom filter
56. 1 4 020
Browser
Cache
CDN
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
57. 1 4 020
Browser
Cache
CDN
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
58. 1 4 020
Browser
Cache
CDN
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
60. 1 4 020 31 1 110
Flat(Counting Bloomfilter)
Browser
Cache
CDN
1
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
61. 1 4 020 31 1 110
hashB(oid)hashA(oid)
Browser
Cache
CDN
1
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
62. 1 4 020 31 1 110
hashB(oid)hashA(oid)
Browser
Cache
CDN
1
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
63. 1 4 020 31 1 110
Browser
Cache
CDN
1
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
64. 1 4 020
hashB(oid)hashA(oid)
1 1 110
Browser
Cache
CDN
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
65. 1 4 020
hashB(oid)hashA(oid)
1 1 110
Browser
Cache
CDN
Clients load the Cache Sketch at connection
Every non-stale cached record can be reused
without degraded consistency
Faster Page Loads1
𝑓 ≈ 1 − 𝑒−
𝑘𝑛
𝑚
𝑘
𝑘 = ln 2 ⋅ (
𝑛
𝑚
)
False-Positive
Rate:
Hash-
Functions:
With 20.000 distinct updates and 5% error rate: 11 KByte
66. Solution: Δ-Bounded Staleness
◦ Clients refresh the Cache Sketch so its age never exceeds Δ
→ Consistency guarantee: Δ-atomicity
Faster CRUD Performance
Client
Expiration-
based Caches
Invalidation-
based Caches
Server
Cache Sketch ctQuery Cache
Sketch
fresh records
Revalidate record & Refresh Cache Sketch
Cache Hits
Fresh record & new Cache Sketch
-time t
-time t + Δ
2
67. Scalable ACID Transcations
Solution: Conflict-Avoidant Optimistic Transactions
◦ Cache Sketch fetched with transaction begin
◦ Cached reads → Shorter transaction duration → less aborts
3
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
validation 4
5Writes (Public)
Read all
prevent conflicting
validations
Committed OR aborted + stale objects
Commit: readset versions & writeset
3
Reads
2
68. Scalable ACID Transcations
Novelty: ACID transactions on sharded DBs like MongoDB
Current Work: DESY and dCache building a scalable namespace
for their file system on this
3
With Caching
Without
Caching
69. Problem: if TTL ≫ time to next write, then it is
contained in Cache Sketch unnecessarily long
TTL Estimator: finds „best“ TTL
Trade-Off:
TTL Estimation
Determining the best TTL and cacheability
Longer TTLsShorter TTLs
• Higher cache-hit rates
• more invalidations
• less invalidations
• less stale reads
70. Idea:
1. Estimate average time to next write 𝐸[𝑇 𝑤] for each record
2. Weight 𝐸[𝑇 𝑤] using the cache miss rate
TTL Estimation
Determining the best TTL
Client
Server
Reads
Misses
λm: Miss Rate
λw: Write Rate
collect
TTL
per record
λm λw
Caches
Writes
~ Poisson
TTL Estimator
Objective:
-maximize Cache Hits
-minimize Purges
-minimize Stale Reads
-bound Cache Sketch
false positive rate
Writes
~ Poisson
71. Idea:
1. Estimate average time to next write 𝐸[𝑇 𝑤] for each record
2. Weight 𝐸[𝑇 𝑤] using the cache miss rate
TTL Estimation
Determining the best TTL
Client
Server
Reads
Misses
λm: Miss Rate
λw: Write Rate
collect
TTL
per record
λm λw
Caches
Writes
~ Poisson
TTL Estimator
Objective:
-maximize Cache Hits
-minimize Purges
-minimize Stale Reads
-bound Cache Sketch
false positive rate
Writes
~ Poisson
Good TTLs small Bloom filter
TTL < TTLmin no caching of write-
heavy objects
72. End-to-End Example
Browser
Browser
Cache
CDN
Cache Server
Client Cache
Sketch
Server Cache
Sketch
b={x2}
t = {(x2, t2),
(x3, t3),(x1, t1)}
b= INITIALIZE c={(x2,t2),(x3,t3)} c={(x1,t1)
b={x2}
CONNECT
bt0={x2}
READ x3
QUERY
x3
RESPONSE
false
GET
x3
RESPONSE
x3
QUERY
x2
RESPONSE
true
READ x2
REVALIDATE
x2
c={(x3,t3)}
RESPONSE
x2,t4
c={(x2,t4),(x3,t3)} c={(x2,t4)}
REPORT READ
x2,t4
b={x2}
t = {(x2, t4),
(x3, t3),(x1, t1)}
RESPONSE
inv=true
WRITE x1
PUT
x1=v
REPORT WRITE
x1
RESPONSE
ok
INVALIDATE
x1
b={x1,x2}
t = {(x2, t4),
(x3, t3),(x1, t1)}
73. Consistency
What are the guarantees?
Consistency Level How
Δ-atomicity (staleness never
exceeds Δ seconds)
Controlled by age of Cache
Sketch
Montonic Writes Guaranteed by database
Read-Your-Writes and
Montonic Reads
Cache written data and most
recent versions
Always
Opt-in
Causal Consistency If timestamp older than
Cache Sketch it is given, else
revalidate
Strong Consistency
(Linearizability)
Explicit Revalidation (Cache
Miss)
75. Cache all
GET requests
Authorize the user on
protected resources
Validate & renew
session tokens of users
Varnish and Fastly
What we do on the edge
Reject rate
limited users
Handle CORS
pre-flight requests
Access-Control-*
Collect access logs
& report failures
76. The Cache Sketch
Summary
Static Data Mutable Objects
{
"id":"/db/Todo/b5d9bef9-
6c1f-46a5-…",
"version":1,
"acl":null,
"listId":"7b92c069-…",
"name":"Test",
"activities":[],
"active":true,
"done":false
}
Queries/Aggregates
max-age=31557600
Immutability ideal for
static web caching:
Cache Sketch for browser
cache, proxies and ISP
caches
Invalidations for CDNs and
reverse proxies
SELECT TOP 4,
WHERE tag=„x“
How to do this?
77. Continuous Query Matching
Generalizing the Cache Sketch to query results
Main challenge: when to invalidate?
◦ Objects: for every update and delete
◦ Queries: as soon as the query result changes
How to detect query result
changes in real-time?
78. Query Caching
Example
Add, Change, Remove all entail an invalidation and
addition to the cache sketch
SELECT * FROM posts
WHERE tags CONTAINS 'b'
Query Predicate P
Cached Query Result Q
𝑜𝑏𝑗1 ∈ 𝐐
𝑜𝑏𝑗2 ∈ 𝐐
Change
Add
Remove
81. Latency mostly < 15ms, scales linearly w.r.t. number of
servers and number of tables
Query Matching Performance
Latency of detecting invalidations
82. Setting: query results can either be represented as
references (id-list) or full results (object-lists)
Approach: Cost-based decision model that weighs
expected round-trips vs expected invalidations
Ongoing Research: Reinforcement learning of decisions
Learning Representations
Determining Optimal TTLs and Cacheability
[𝑖𝑑1, 𝑖𝑑2, 𝑖𝑑3]
Object-ListsId-Lists
[ 𝑖𝑑: 1, 𝑣𝑎𝑙: ′𝑎′
, 𝑖𝑑: 2, 𝑣𝑎𝑙: ′𝑏′ ,
{𝑖𝑑: 3, 𝑣𝑎𝑙: ′𝑐′}]
Less Invalidations Less Round-Trips
90. Team: Felix Gessert, Florian Bücklers, Hannes Kuhlmann,
Malte Lauenroth, Michael Schaarschmidt
19. August 2014
Orestes Caching
Technology as a
Backend-as-a-Service
91. Page-Load Times
What impact does caching have in practice?0,7s
1,8s
2,8s
3,6s
3,4s
CALIFORNIEN
0,5s
1,8s
2,9s
1,5s
1,3s
FRANKFURT
0,6s
3,0s
7,2s
5,0s
5,7s
SYDNEY
0,5s
2,4s
4,0s
5,7s
4,7s
TOKYO
-überrascht
-eigentlich sollte das jetzt schon geladen sein, das dauert sonst nicht so lange
-Das (was?) waren 9,3 s – so lange lädt eine durchschnittliche E-Commerce Webseite
-Selbst kleinste Ladezeitverzögerungen haben immense Auswirkungen auf die Nutzerzufriedenheit und den Geschäftserfolg
-Zahllose Studien haben diese Effekte quantifiziert, hier sind einige der bekanntesten
-Amazon, 800 Mio. verlorener Umsatz
-Yahoo, weil Nutzer unzufriedener sind
-Google, 500ms durch attraktivere Inhalte und trotzdem Rückgang des Umsatzes und der Aufrufe
-Aberdeen Group, Marktforschungsinstitut, d.h. 7% weniger Besucher werden zu Kunden
-Neben diesen abstrakten Zahlen haben wir mit Unternehmen wie GoodGame Studios und WLW gesprochen
-Das (was?) waren 9,3 s – so lange lädt eine durchschnittliche E-Commerce Webseite
-Selbst kleinste Ladezeitverzögerungen haben immense Auswirkungen auf die Nutzerzufriedenheit und den Geschäftserfolg
-Zahllose Studien haben diese Effekte quantifiziert, hier sind einige der bekanntesten
-Amazon, 800 Mio. verlorener Umsatz
-Yahoo, weil Nutzer unzufriedener sind
-Google, 500ms durch attraktivere Inhalte und trotzdem Rückgang des Umsatzes und der Aufrufe
-Aberdeen Group, Marktforschungsinstitut, d.h. 7% weniger Besucher werden zu Kunden
-Neben diesen abstrakten Zahlen haben wir mit Unternehmen wie GoodGame Studios und WLW gesprochen
-Das (was?) waren 9,3 s – so lange lädt eine durchschnittliche E-Commerce Webseite
-Selbst kleinste Ladezeitverzögerungen haben immense Auswirkungen auf die Nutzerzufriedenheit und den Geschäftserfolg
-Zahllose Studien haben diese Effekte quantifiziert, hier sind einige der bekanntesten
-Amazon, 800 Mio. verlorener Umsatz
-Yahoo, weil Nutzer unzufriedener sind
-Google, 500ms durch attraktivere Inhalte und trotzdem Rückgang des Umsatzes und der Aufrufe
-Aberdeen Group, Marktforschungsinstitut, d.h. 7% weniger Besucher werden zu Kunden
-Neben diesen abstrakten Zahlen haben wir mit Unternehmen wie GoodGame Studios und WLW gesprochen
-Das (was?) waren 9,3 s – so lange lädt eine durchschnittliche E-Commerce Webseite
-Selbst kleinste Ladezeitverzögerungen haben immense Auswirkungen auf die Nutzerzufriedenheit und den Geschäftserfolg
-Zahllose Studien haben diese Effekte quantifiziert, hier sind einige der bekanntesten
-Amazon, 800 Mio. verlorener Umsatz
-Yahoo, weil Nutzer unzufriedener sind
-Google, 500ms durch attraktivere Inhalte und trotzdem Rückgang des Umsatzes und der Aufrufe
-Aberdeen Group, Marktforschungsinstitut, d.h. 7% weniger Besucher werden zu Kunden
-Neben diesen abstrakten Zahlen haben wir mit Unternehmen wie GoodGame Studios und WLW gesprochen
-Diese und andere Studien beweisen also, dass langsame Apps und Webseiten allen wichtigen Kennzahlen schaden
-Die Frage ist also: wenn das Problem schon so lange und genau bekannt ist - warum ist es bisher technisch ungelöst?
-Hier sehen wir einen Endnutzer der aus den USA auf eine europäische Anwendung zugreift
-Schlechte Ladezeiten entstehen durch zwei technische Engpässe
1. Wenn ein Nutzer auf eine Webanwendung zu, muss zunächst ein Web-Server das sogenannte „Backend“ die Webseite generieren. Dabei entsteht eine hohe Verarbeitungszeit beim Zusammenstellen der Inhalte aus einer Datenbank.
2. Diese Webseite und alle Ressourcen, die sie enthält –z.B. Bilder- müssen über das Internet zum Nutzer übertragen werden. Dadurch entsteht eine als „Latenz“ bezeichnete Übertragungszeit, die um so höher ist, je weiter der Nutzer von dem Backend entfernt ist. Diese Latenz und nicht die Bandbreite bestimmen die wahrgenommenen Ladezeiten.
- Diese beiden Engpässe können wir durch unser Produkt Baqend für beliebige Anwendungen lösen
-Bei unserem Ansatz werden alle Daten nicht nur im Backend sondern auch global verteilt zwischengespeichert, um so dicht wie möglich an Endnutzern positioniert zu sein.
Wenn also ein Nutzer die Webanwendung aufruft, ist unabhängig von seinem Standort die Latenz gering
Durch die Zwischenspeicher, sogenannte „Caches“, entsteht kein Verarbeitungsaufwand im Backend. Durch unsere Algorithmen können diese existierenden und zahlreich vorhandenen Caches erstmals für alle Anwendungsdaten genutzt werden.
-Die technische und wissenschaftliche Basis von Baqend beruht auf verschiedenen Innovationen im Bereich Datenbanken und Cloud Computing
Die neuen Algorithmen und Architekturen sind durch 4,5 Jahre Forschung und Entwicklung am Institut für Verteilte Systeme und Informationssysteme an der Uni Hamburg entstanden. Kern der Innovation ist eine Technik, die es erlaubt, die bestehende Caching-Infrastruktur auf eine völlig neue Art zu verwenden
Beim klassischen Caching werden angefragte Daten automatisch in Caches zwischengespeichert
Ein Problem entsteht, wenn Daten geändert werden. Nach der Änderung sind die Einträge in den Caches veraltet. Wenn Nutzer also die Webanwendung aufrufen, erhalten sie veraltete Daten.
-Die technische und wissenschaftliche Basis von Baqend beruht auf verschiedenen Innovationen im Bereich Datenbanken und Cloud Computing
Die neuen Algorithmen und Architekturen sind durch 4,5 Jahre Forschung und Entwicklung am Institut für Verteilte Systeme und Informationssysteme an der Uni Hamburg entstanden. Kern der Innovation ist eine Technik, die es erlaubt, die bestehende Caching-Infrastruktur auf eine völlig neue Art zu verwenden
Beim klassischen Caching werden angefragte Daten automatisch in Caches zwischengespeichert
Ein Problem entsteht, wenn Daten geändert werden. Nach der Änderung sind die Einträge in den Caches veraltet. Wenn Nutzer also die Webanwendung aufrufen, erhalten sie veraltete Daten.
-Es gibt zwei Typen von Caches: solche die durch ein Backend aktualisierbar sind und solche die unter keiner Kontrolle stehen
-Unsere Lösung verwendet eine spezielle Datenstruktur, über die effizient festgestellt werden kann, welche Daten veraltet sind. Zusätzlich werden bei einer Änderung Daten in kontrollierbaren Caches automatisch aktualisiert
Die Webanwendung verwendet daraufhin automatisch den Bloomfilter, um veraltete Daten neu zu laden, wodurch Caches sich selber aktualisieren. Für alle Anwender ist dadurch garantiert, dass sie stets aktuelle Daten erhalten.
-Die Details dieser Verfahren sind in wissenschaftlichen Publikationen veröffentlicht
-Durch die jahrelange Vorarbeit haben wir gegenüber unseren Mitbewerbern einen kaum einzuholenden Vorsprung
Was ändert sich Client-seitig bzw. für die Anwendungsentwickler?
Was ändert sich Client-seitig bzw. für die Anwendungsentwickler?
Was ändert sich Client-seitig bzw. für die Anwendungsentwickler?
Was ändert sich Client-seitig bzw. für die Anwendungsentwickler?
Wie unterscheidet sich eine client-seitige Objektanfrage in den Fällen ‚aktuell‘ und ‚stale‘?
Kernaussage links: Bei steigenden Cache-Hit-Raten werden Seitenladezeiten von Webseiten drastisch reduziert, selbst bei höheren False Positive Rates des Bloom Filters. Nimmt man die mittlere Cache-Hit-Rate bei Facebook-Bildern an (statische Daten) erzielt man mit Orestes eine 2.5-fache Verbesserungen der Ladezeit (dynamische Daten).
Kernaussage rechts: Der Yahoo Cloud Serving Benchmark zeigt, dass selbst bei relativ schreiblastigen Workloads signifikante Latenzverbesserungen möglich sind. Bei leselastigen Workloads ist der Effekt noch deutlicher.
Sollte man das wirklich ‚Architecture‘ überschreiben? Vielleicht besser ‚Principle‘ oder ‚Basic Idea‘ oder ‚Basic Principle‘ …
Warum zweimal Pub-Sub?
Warum die Pfeile in diese Richtungen?
Kann man ‚Websockets‘ weglassen?
Was ist Continuous Query?
Was ist das grundlegende Prinzip der Wartung von Queries durch den Cache Sketch?
Was genau ist gemeint, wenn da ‚Query‘ steht? (Statement-Text, Query-ID, Query-Result-Set, …)?
Wie werden Queries im Cache Sketch repräsentiert? Nur durch die Ergebnis-Objekte, oder benötigt die Query eigene Einträge? Wie werden ‚Query State Updates‘ repräsentiert?
Wieso Redis auf Distribution Layer?
Ist Matching reine ‚Prädikatevaluation‘?
Was genau ist mit den Begriffen gemeint, die rechts neben ‚Distribution Layer‘ stehen?
Wenn initial eine Query geschickt wird, muss diese durch den kompletten Loop?
…
Kernaussage: Der Streaming Layer hat relativ konstante Latenzen bis zu einem Durchsatz von 2 Mio. Matches pro Server pro Sekunde. Das reicht bei 40 Updates/s für 100K simultan gecachte Querys oder Continuous Querys.
Besprechen?
Noch mal komplett durchsprechen?
Hier fehlen die Queries (und die polyglotten views)
-Vorstellung: Felix, Baqend GmbH, alle andern
-Um die Ladezeitvorteile praktisch zu belegen, haben wir als Beispielanwendung eine Newsseite entwickelt
Die Anwendung haben wir auf Baqend und den vier wichtigsten Konkurrenzprodukten realisiert und untersucht
Dafür haben wir die Ladezeiten für Nutzer aus vier geographischen Standorten gemessen. Aufgrund unser neuen Techniken ist Baqend standortübergreifend stets um mindestens Faktor 2,5 schneller.
Selbst wenn Nutzer und Anbieter aus der gleichen Region kommen, sorgt die durch Baqend eingesparte Verarbeitungszeit für einen signifikanten Geschwindigkeitsvorteil.