Fast data ingest – collecting, storing and processing streaming data in large volume and velocity – is a common requirement and use case. Production use, resilient to data loss, requires high performance to prevent massive data backlog. See how Redis enables you to address these with very little effort.
Fast data ingest – collecting, storing and processing streaming data in large volume and velocity – is a common requirement and use case. Production use, resilient to data loss, requires high performance to prevent massive data backlog. See how Redis enables you to address these with very little effort.
Xiaomi is a Chinese technology company, it sells more than 100 million smartphones worldwide in 2018, and also owns one of the world's largest IoT device platforms. Xiaomi builds dozens of mobile apps and Internet services based on intelligent devices, including Ads, news feeds, finance service, game, music, video, personal cloud service and so on. The rapid growth of business results in exponential growth of the data analytics infrastructure. The amount of data has roared more than 20 times in the past 3 years, which renders us big challenges on the HDFS scalability
In this talk, we introduce how we scale HDFS to support hundreds of PB data with thousands nodes:
1. How Xiaomi use Hadoop and the characteristic of our usage
2. We made HDFS federation cluster to be used like a single cluster, most applications don't need to change any code to migrate from a single cluster to a federation cluster. Our works include a wrapper FileSystem compatible with DistributedFileSystem, supporting rename among different name spaces and zookeeper-based mount table renewer.
3. Experience of tuning NameNode to improve scalability
4. How to maintain hundreds of HDFS clusters and the optimization we did on client-side to make user and programs access these clusters easily with high performance
High-Volume Data Collection and Real Time Analytics Using Rediscacois
In this talk, we describe using Redis, an open source, in-memory key-value store, to capture large volumes of data from numerous remote sources while also allowing real-time monitoring and analytics. With this approach, we were able to capture a high volume of continuous data from numerous remote environmental sensors while consistently querying our database for real time monitoring and analytics.
* See more of my work at http://www.codehenge.net
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesMongoDB
Curious about the benefits of sharding your MongoDB deployments? Do you need help deciding when you should shard, or which collections to shard first? Or maybe you just need some guidance on finding the right shard key. This session will cover these topics and give you a primer on MongoDB sharding and why it makes the database so compelling for so many applications. This is an entry-level to medium-level talk with references and links to more advanced material on sharding MongoDB.
Running Analytics at the Speed of Your BusinessRedis Labs
The speed at which you can extract insights from your data is increasingly a competitive edge for your business. Data and analytics have to be at lightning fast speeds to seriously impact your user acquisition.
Join this webinar featuring Forrester analyst Noel Yuhanna and Leena Joshi, VP Product Marketing at Redis Labs to learn how you can glean insights faster with new open source data processing frameworks like Spark and Redis.
In this webinar you will learn:
* Why analytics has to run at the real time speed of business
* How this can be achieved with next generation Big Data tools
* How data structures can optimize your hybrid transaction-analytics processing scenarios
At Protectwise we've successfully deployed a DSE Search/SOLR as our platform for indexing and searching billions of time series events a day. However, keeping petabytes of data around for months and years on expensive servers, largely untouched, can quickly become a cost burden. We'll cover the building of a custom ""cold storage"" solution using Cassandra, SOLR, Spark, Parquet and S3. This solution employs probabilistic data structures, customizations to Parquet and a specialized streaming S3 client, to allow our customers to securely search petabytes of data in seconds without breaking the bank.
About the Speaker
Joshua Hollander Principal Engineer, Protectwise
Josh is Principal Engineer at Protectwise. He is functional programming devotee, stream whisperer, big data wrangler and sees ""Monoids everywhere"". He holds a Masters Degree in Computer Science from the University of Colorado.
Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
This will be a reprise of my popular talk from last year, updated and expanded for Cassandra 3.0. I'll discuss the general approach to troubleshooting Cassandra, then give a guided tour what to look for in nodetool, logs, and OpsCenter, highlighting the most useful topics for troubleshooting real-world Cassandra issues.
About the Speaker
J.b. Langston Principal Support Engineer, DataStax
I've been with DataStax support for over 4 years, helping customers troubleshoot problems and keep their Cassandra clusters running smoothly. Prior to that, I had 8 years of experience supporting a Java-based grid computing platform.
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSInfluxData
In this session, everyone will learn how to architect their own InfluxEnterprise clusters to be performant and resilient whether in a single data center or spread across multiple datacenters.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
Red Hat's own Sr. Cloud Storage Solutions Architect Narendra Narang took the podium at Red Hat Storage Day New York 1/19/16 to highlight emerging use cases for Red Hat's software-defined-storage products.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Xiaomi is a Chinese technology company, it sells more than 100 million smartphones worldwide in 2018, and also owns one of the world's largest IoT device platforms. Xiaomi builds dozens of mobile apps and Internet services based on intelligent devices, including Ads, news feeds, finance service, game, music, video, personal cloud service and so on. The rapid growth of business results in exponential growth of the data analytics infrastructure. The amount of data has roared more than 20 times in the past 3 years, which renders us big challenges on the HDFS scalability
In this talk, we introduce how we scale HDFS to support hundreds of PB data with thousands nodes:
1. How Xiaomi use Hadoop and the characteristic of our usage
2. We made HDFS federation cluster to be used like a single cluster, most applications don't need to change any code to migrate from a single cluster to a federation cluster. Our works include a wrapper FileSystem compatible with DistributedFileSystem, supporting rename among different name spaces and zookeeper-based mount table renewer.
3. Experience of tuning NameNode to improve scalability
4. How to maintain hundreds of HDFS clusters and the optimization we did on client-side to make user and programs access these clusters easily with high performance
High-Volume Data Collection and Real Time Analytics Using Rediscacois
In this talk, we describe using Redis, an open source, in-memory key-value store, to capture large volumes of data from numerous remote sources while also allowing real-time monitoring and analytics. With this approach, we were able to capture a high volume of continuous data from numerous remote environmental sensors while consistently querying our database for real time monitoring and analytics.
* See more of my work at http://www.codehenge.net
Lightning Talk: What You Need to Know Before You Shard in 20 MinutesMongoDB
Curious about the benefits of sharding your MongoDB deployments? Do you need help deciding when you should shard, or which collections to shard first? Or maybe you just need some guidance on finding the right shard key. This session will cover these topics and give you a primer on MongoDB sharding and why it makes the database so compelling for so many applications. This is an entry-level to medium-level talk with references and links to more advanced material on sharding MongoDB.
Running Analytics at the Speed of Your BusinessRedis Labs
The speed at which you can extract insights from your data is increasingly a competitive edge for your business. Data and analytics have to be at lightning fast speeds to seriously impact your user acquisition.
Join this webinar featuring Forrester analyst Noel Yuhanna and Leena Joshi, VP Product Marketing at Redis Labs to learn how you can glean insights faster with new open source data processing frameworks like Spark and Redis.
In this webinar you will learn:
* Why analytics has to run at the real time speed of business
* How this can be achieved with next generation Big Data tools
* How data structures can optimize your hybrid transaction-analytics processing scenarios
At Protectwise we've successfully deployed a DSE Search/SOLR as our platform for indexing and searching billions of time series events a day. However, keeping petabytes of data around for months and years on expensive servers, largely untouched, can quickly become a cost burden. We'll cover the building of a custom ""cold storage"" solution using Cassandra, SOLR, Spark, Parquet and S3. This solution employs probabilistic data structures, customizations to Parquet and a specialized streaming S3 client, to allow our customers to securely search petabytes of data in seconds without breaking the bank.
About the Speaker
Joshua Hollander Principal Engineer, Protectwise
Josh is Principal Engineer at Protectwise. He is functional programming devotee, stream whisperer, big data wrangler and sees ""Monoids everywhere"". He holds a Masters Degree in Computer Science from the University of Colorado.
Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
This will be a reprise of my popular talk from last year, updated and expanded for Cassandra 3.0. I'll discuss the general approach to troubleshooting Cassandra, then give a guided tour what to look for in nodetool, logs, and OpsCenter, highlighting the most useful topics for troubleshooting real-world Cassandra issues.
About the Speaker
J.b. Langston Principal Support Engineer, DataStax
I've been with DataStax support for over 4 years, helping customers troubleshoot problems and keep their Cassandra clusters running smoothly. Prior to that, I had 8 years of experience supporting a Java-based grid computing platform.
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSInfluxData
In this session, everyone will learn how to architect their own InfluxEnterprise clusters to be performant and resilient whether in a single data center or spread across multiple datacenters.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.
Red Hat's own Sr. Cloud Storage Solutions Architect Narendra Narang took the podium at Red Hat Storage Day New York 1/19/16 to highlight emerging use cases for Red Hat's software-defined-storage products.
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
The traditional lambda architecture has been a popular solution for joining offline batch operations with real time operations. This setup incurs a lot of developer and operational overhead since it involves maintaining code that produces the same result in two, potentially different distributed systems. In order to alleviate these problems, we need a unified framework for processing and building data pipelines across batch and stream data sources.
Based on our experiences running and developing Apache Samza at LinkedIn, we have enhanced the framework to support: a) Pluggable data sources and sinks; b) A deployment model supporting different execution environments such as Yarn or VMs; c) A unified processing API for developers to work seamlessly with batch and stream data. In this talk, we will cover how these design choices in Apache Samza help tackle the overhead of lambda architecture. We will use some real production use-cases to elaborate how LinkedIn leverages Apache Samza to build unified data processing pipelines.
Speaker
Navina Ramesh, Sr. Software Engineer, LinkedIn
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
- Learn how AWS can help you process and make better use of your data with meaningful insights.
- Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
- Learn about real time data processing with Amazon Kinesis.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
Analyze Big Data for Consumer Applications with Looker BI and Amazon Redshift Customizing the customer experience based on user behavior is a constant challenge for today’s consumer apps. Business intelligence helps analyze and model large amounts of data. Looker offers a modern approach to BI leveraging AWS that’s fast, agile, and easy to manage. Join this webinar to learn how MessageMe, which provides emotionally engaging messaging apps to consumers, leverages Looker business intelligence software and the Amazon Redshift data warehouse service to analyze billions of rows of customer data in seconds.
Webinar topics include:
• How MessageMe turns billions of rows of customer data stored in Amazon Redshift into actionable insights
• How Looker connects directly to Amazon Redshift in just a few clicks, enabling MessageMe to build a modern, big data analytics in the cloud. Who should attend
• Information or Solution Architects, Data Analysts, BI Directors, DBAs, Development Leads, Developers, or Technical IT Leaders.
Presenters:
• Justin Rosenthal, CTO, MessageMe
• Keenan Rice, VP, Marketing & Alliances, Looker
• Tina Adams, Senior Product Manager, AWS
ScyllaDB V Developer Deep Dive Series: Resiliency and Strong Consistency via ...ScyllaDB
ScyllaDB’s implementation of the Raft consensus protocol translates to strong, immediately consistent schema updates, topology changes, tables and indexes, and more. This eliminates schema and data conflicts, enables rapid and safe increases in cluster capacity, and provides a leap forward in manageability. Join this webinar to learn how the Raft consensus algorithm has been implemented, what you can do with it today, and what radical new capabilities it will enable in the days ahead.
Building Cloud-Native App Series - Part 4 of 11
Microservices Architecture Series
NoSQL vs SQL
Redis, MongoDB, AWS DynamoDB
Big Data Design Patterns
Sharding, Partitions
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
Calculating dynamic pricing, estimated travel times or detecting fraud in real time. These are all the cases where realtime analytics create the differentiation between experiences. Redis comes with built in types to enable realtime processing of complex analytics with data types like sorted sets, hyperloglog, bloom and cuckoo filters and more.
New to MongoDB? We'll provide an overview of installation, high availability through replication, scale out through sharding, and options for monitoring and backup. No prior knowledge of MongoDB is assumed. This session will jumpstart your knowledge of MongoDB operations, providing you with context for the rest of the day's content.
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Mydbops
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applications by Bhanu Jamwal, Head of Solution Engineering, PingCAP at the Mydbops Opensource Database Meetup 14.
This presentation discusses the challenges in choosing the right database for modern applications, focusing on MySQL alternatives. It highlights the growth of new applications, the need to improve infrastructure, and the rise of cloud-native architecture.
The presentation explores alternatives to MySQL, such as MySQL forks, database clustering, and distributed SQL. It introduces TiDB as a distributed SQL database for modern applications, highlighting its features and top use cases.
Case studies of companies benefiting from TiDB are included. The presentation also outlines TiDB's product roadmap, detailing upcoming features and enhancements.
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
Similar to Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database (20)
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
4. Docker Hub: The World’s Most Popular Database
# of containers launched as of Feb 2018
630M+
(1.87M/day, 78K/hr, 1.28K/min)
308M+
263M+
24M+
4
5. Redis Enterprise
5
DBaaS
• Available since mid 2013
• 8,100+ enterprise customers
Software
• Available since early 2015
• 300+ enterprise customers
550K+ databases
managed worldwide
• 6 of top Fortune 10 companies
• 3 of top 5 communications companies
Customers
• 3 of top 4 credit card issuers
• 3 of top 5 healthcare companies
29. 29
Lessons learned
m4.large
a quorum node
with no-data
Redis Enterprise
Node 3
• 5+ years in production
• 550K+ database created
• 50+ data-centers/zones
• 2000+ node failure events
• 100+ complete data-center outages
30. 30
HA Concept #1– Quorum by Nodes, not by Shards
3 replicas Redis
90GB
r4.4xlarge
m4.large
a quorum node
with no-data
Redis Enterprise
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
M1 S1 S2
Node 1 Node 2 Node 3 Node 1 Node 2 Node 3
M1 S1
31. 31
HA Concept #1 – Quorum by Nodes, not by Shards
3 replicas Redis
90GB
r4.4xlarge
m4.large
a quorum node
with no-data
Redis Enterprise
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
M1 S1 S2
Node 1 Node 2 Node 3 Node 1 Node 2 Node 3
M1 S1
32. 32
HA Concept #1 – Quorum by Nodes, not by Shards
3 replicas Redis
90GB
r4.4xlarge
m4.large
a quorum node
with no-data
Redis Enterprise
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
90GB
r4.4xlarge
M1 S1 S2
Node 1 Node 2 Node 3 Node 1 Node 2 Node 3
M1 S1
• ~30% infrastructure cost savings
• Less network traffic
• Easy to manage
33. 33
HA Concept #2 – Pure In-Memory Replication
Disk-based Replication
(OSS default)
M S
OSS Diskless Replication
M S
Pure In-Memory Replication
M S
1
2
3 1 2
1
34. 34
HA Concept #2 – Pure In-Memory Replication
Disk-based Replication
(OSS default)
M S
OSS Diskless Replication
M S
Pure In-Memory Replication
M S
1
2
3 1 2
1
35. 35
HA Concept #2 – Pure In-Memory Replication
Disk-based Replication
(OSS default)
M S
OSS Diskless Replication
M S
Pure In-Memory Replication
M S
1
2
3 1 2
1
x2 faster
38. HA Concept #4 – How to deploy a Multi-AZ/Rack Cluster
1. At least 3 AZs/Racks
2. Distance between Azs/Racks
< 10msec
3. Master and Slave of the same
shard must be deployed
on different AZs/Rack
4. For every i,j,k:
#_of_node ( AZi +AZj) > #_of_node AZk
41. Data Loss
SSD - persistent and
ephemeral
data
Failed Instance
SSD - persistent
and ephemeral
data
New Empty Instance
Data-Persistence - The Wrong Way
Uses Network Attached Persistent Storage, not Ephemeral
Discuss SQL Strategy
42. Data Loss
SSD - persistent and
ephemeral
data
Failed Instance
SSD - persistent
and ephemeral
data
New Empty Instance
Data-Persistence - The Wrong Way
No Data Loss
SSD - ephemeral
data
Failed Instance
SSD - ephemeral
data
New Populated Instance
Persistent Storage
AOF,
Snapshot
Data-Persistence - The Right Way
AOF,
Snapshot
Dataload
Uses Network Attached Persistent Storage not Ephemeral
Discuss SQL Strategy
44. Tunable Data Persistence Configuration
Tuned for Speed
Data-Persistence at the slave
M S
Non-Replicated
M
45. Tunable Data Persistence Configuration
Tuned for Speed
Data-Persistence at the slave
M S
Tuned for Reliability
Data-Persistence at the master & slave
M S
Non-Replicated
M
AOF-every-sec, AOF-every-write, Snapshot (RDB)
46. • Redis performance during AOF
rewrite
• Data-persistence when multiple
Redis instances reside on the
same node
Two Main Challenges with Redis Data-Persistence
52. 52
And It’s still Fast (extremely fast) with Modules
RedisSearch – x5
53. 53
And It’s still Fast (extremely fast) with Modules
ReBloom – x17RedisSearch – x5
54. 54
msec msec
RedisSearch – x5 Redis-ML – x2000
And It’s still Fast (extremely fast) with Modules
ReBloom – x17
55. 55
msec msec
Redis-ML – x2000
And It’s still Fast (extremely fast) with Modules
Redis-Graph
Wait for Redisconf:
Pier 27, San-Francisco
April 26-88
RedisSearch – x5 ReBloom – x17
56. 56
It Uses a Different Approach for Active-Active
5
59. We Need Something Faster than the Speed of Light
Light > 20msec RTT
Network > 70msec RTT
Redis < 1msec RTT
60. Conflict Resolution is Hard
• Application level solution à too complex to write
• LWW (Last Write Wins) à doesn’t work for many of the Redis use cases, e.g.:
• Counters
• Sets
• Sorted Sets
• Lists
• Modules’ new datatypes
61. CRDT
• Years of academic research
• Based on consensus free protocol
• Strong eventual consistency
• Built to resolve conflicts with complex data types
63. Solving Conflicts – Counters
c = 500
Replica A
c = 500
Replica B
c = 500
Replica C
63
64. Solving Conflicts – Counters
c = 500
INCRBY 200
Replica A
c = 500
Replica B
c = 500
Replica C
64
65. Solving Conflicts – Counters
c = 500
INCRBY 200
Replica A
c = 500
DECRBY 300
Replica B
c = 500
Replica C
65
66. Solving Conflicts – Counters
c = 500
INCRBY 200
Replica A
c = 500
DECRBY 300
Replica B
c = 500
INCRBY 1000
Replica C
66
67. Convergence Function (commutative):
500 + ∑c(i) = 500 +200 -300 +1000 = 1400
Solving Conflicts – Counters
c = 500
INCRBY 200
Replica A
c = 500
DECRBY 300
Replica B
c = 500
INCRBY 1000
Replica C
67
68. Solving Conflicts – Sets
S = {A, B, C}
Replica A
S = {A, B, C}
Replica B
S = {A, B, C}
Replica C
68
69. Solving Conflicts – Sets
S = {A, B, C}
SADD D
Replica A
S = {A, B, C}
Replica B
S = {A, B, C}
Replica C
69
70. Solving Conflicts – Sets
S = {A, B, C}
SADD D
Replica A
S = {A, B, C}
SADD A
Replica B
S = {A, B, C}
Replica C
70
71. Solving Conflicts – Sets
S = {A, B, C}
SADD D
Replica A
S = {A, B, C}
SADD A
Replica B
S = {A, B, C}
SREM A
Replica C
71
72. Convergence Function (associative):
• S = S + D + A - A = {A, B, C, D}
• Observed Removed + Add Wins
Solving Conflicts – Sets
S = {A, B, C}
SADD D
Replica A
S = {A, B, C}
SADD A
Replica B
S = {A, B, C}
SREM A
Replica C
72
80. • Multi-tenant multiple
shards/ DBs
• Customer B
• Customer A
• Customer N
Multi-Tenant from Day One
• Single tenant multiple
shards/DBs
• OR
81. #1
200GB
#2
200GB
#50
200GB
50 x r3.8xlarge instances
#51
200GB
#51
200GB
#100
200GB
1st replica for HA
#101
200GB
#102
200GB
#150
200GB
2nd replica for quorum
Total cost (reserved instances) = $2,132,250/yr
10TB Deployment on AWS with 2 Replicas
82. #1
200GB
#2
200GB
#50
200GB
50 x r3.8xlarge instances
#51
200GB
#51
200GB
#100
200GB
1 replica for HA
#101
15GB
Total cost (reserved instances) = $1,421,500/yr Savings = $710,750/yr
1 quorum server
10TB Deployment on AWS with 1 Replica + a Quorum Server
83. Redis on Flash – Built for a Tiered Memory Architecture
Persistent Storage:
Entire Dataset
AOF, Snapshot
SSD:
Cold Values
DRAM:
Keys & Hot Values
Cluster Node
83
84. RoF - Designed for the New Persistent Memory Technology
84
NVMe vs. SATA
85. RoF - Designed for the New Persistent Memory Technology
85
Optane (3DXP) vs. NVMe