The document discusses Apache Cassandra's SASI (SSTable Attached Secondary Index). It provides a 5 minute introduction to Cassandra, introduces SASI and how it follows the SSTable lifecycle, describes how SASI works at the cluster level for distributed queries and indexing, and details the local read/write process including data structures and query planning. Some benchmarks are shown for full table scans on a large dataset using SASI with Spark. The key advantages and use cases for SASI are discussed along with its limitations compared to dedicated search engines.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2shXBpj
This CloudxLab Apache Spark - Loading & Saving data tutorial helps you to understand Loading & Saving data in Apache Spark in detail. Below are the topics covered in this tutorial:
1) Common Data Sources
2) Common Supported File Formats
3) Handling Text Files using Scala
4) Loading CSV
5) SequenceFiles
6) Object Files
7) Hadoop Input and Output Format - Old and New API
8) Protocol Buffers
9) File Compression
10) Handling LZO
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
DataSource V2 and Cassandra – A Whole New WorldDatabricks
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we’ll walk you through some of the biggest highlights and how you can take advantage of them today.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2shXBpj
This CloudxLab Apache Spark - Loading & Saving data tutorial helps you to understand Loading & Saving data in Apache Spark in detail. Below are the topics covered in this tutorial:
1) Common Data Sources
2) Common Supported File Formats
3) Handling Text Files using Scala
4) Loading CSV
5) SequenceFiles
6) Object Files
7) Hadoop Input and Output Format - Old and New API
8) Protocol Buffers
9) File Compression
10) Handling LZO
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
DataSource V2 and Cassandra – A Whole New WorldDatabricks
Data Source V2 has arrived for the Spark Cassandra Connector, but what does this mean for you? Speed, Flexibility and Usability improvements abound and we’ll walk you through some of the biggest highlights and how you can take advantage of them today.
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Milan...Codemotion
Apache Cassandra is a scalable database with high availability features. But they come with severe limitations in term of querying capabilities. Since the introduction of SASI in Cassandra 3.4, the limitations belong to the pass. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new LIKE %term% syntax. To illustrate how SASI works, we'll use a database of 100 000 albums and artists.
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
Since the introduction of SASI in Cassandra 3.4, it is way easier than before to query data. Now you can create performant indices on your columns as well as benefit from full text search capabilities with the introduction of the new `LIKE '%term%'` syntax.
This talk will show the architecture on a high level and exposes all the trade-offs so you can choose and use SAS wisely.
We also highlight some use-cases where SASI is not a good fit and should be avoided (there is no magic sorry)
To illustrate the talk, we'll use a sample database of 110 000 albums and artists and create indices on them
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, Datastax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - ...NoSQLmatters
DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns
In this session, you'll see how to leverage the best features of Cassandra to solve real world issues (Rate limiting/anti fraud system, account validation, security token …). We'll also highlight some common anti-patterns (queue,partition key miss,CQL3 null) and see how to solve them in the Cassandra way.
SASI, Cassandra on the full text search ride - DuyHai Doan - Codemotion Amste...Codemotion
Apache Cassandra is a scalable database with high availability features. But they come with severe limitations in term of querying capabilities. Since the introduction of SASI in Cassandra 3.4, the limitations belong to the pass. Now you can create performant indices on your columns as well as benefit from **full text search** capabilities with the introduction of the new `LIKE %term%` syntax. To illustrate how SASI works, we'll use a database of 100 000 albums and artists. We'll also show how SASI can help to accelerate analytics scenarios with Spark using SparkSQL predicate-pushdown
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
Monitoring an entire application is not a simple task, but with the right tools it is not a hard task either. However, events like Black Friday can push your application to the limit, and even cause crashes. As the system is stressed, it generates a lot more logs, which may crash the monitoring system as well. In this talk I will walk through the best practices when using the Elastic Stack to centralize and monitor your logs. I will also share some tricks to help you with the huge increase of traffic typical in Black Fridays.
The slides we used at the first meetup hosted at Redis Labs' TLV offices :)
Touches on some of the more notable user-facing functionality in the newest Redis version, as well as interesting internal optimizations with major gains.
#RedisTLV: www.meetup.com/Tel-Aviv-Redis-Meetup/events/227594422/
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
How does Twitter track the top trending topics?
How does Amazon keep track of the top-selling items for the day?
How many cabs have been booked this month using your App?
Is the password that a new user is choosing a common/compromised password?
Modern web-scale systems process billions of transactions and generate terabytes of data every single day. In order to find answers to questions against this data, one would initiate a multi-minute query against a NoSQL datastore or kick off a batch job written in a distributed processing framework such as Spark or Flink. However, these jobs are throughput-heavy and not suited for realtime low-latency queries. However, you and your customers would like to have all this information "right now".
At the end of this talk, you'll realize that you can power these low-latency queries and with incredibly low memory footprint "IF" you are willing to accept answers that are, say, 96-99% accurate. This talk introduces some of the go-to probabilistic data structures that are used by organisations with large amounts of data - specifically Bloom filter, Count Min Sketch and HyperLogLog.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
1. SASI, Cassandra on the full text search ride
DuyHai DOAN – Apache Cassandra evangelist
2. @doanduyhai
1 5 minutes introduction to Apache Cassandra™
2 SASI introduction
3 SASI cluster-wide
4 SASI local read/write path
5 Query planner
6 Some benchmarks
7 Take away
2
8. @doanduyhai
Coordinator node
8
Responsible for handling requests (read/write)
Every node can be coordinator
• masterless
• no SPOF
• proxy role
H
A
E
D
B C
G F
coordinator
request
1
2 3
11. @doanduyhai
What is SASI ?
11
• SSTable-Attached Secondary Index à new 2nd index impl that follows
SSTable life-cycle
• Objective: provide more performant & capable 2nd index
13. @doanduyhai
Why is it better than native 2nd index ?
13
• follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized
• new data-strutures
• range query (<, ≤, >, ≥) possible
• full text search options
16. @doanduyhai
Distributed index
16
On cluster level, SASI works exactly like native 2nd index
H
A
E
D
B C
G F
UK user1 user102 … user493
US user54 user483 … user938
UK user87 user176 … user987
UK user17 user409 … user787
23. @doanduyhai
Caveat 1: non restrictive filters
23
H
A
E
D
B C
G F
coordinator
Hit all
nodes
eventually
L
24. @doanduyhai
Caveat 1 solution : always use LIMIT
24
H
A
E
D
B C
G F
coordinator
SELECT *
FROM …
WHERE ...
LIMIT 1000
25. @doanduyhai
Caveat 2: 1-to-1 index (user_email)
25
H
A
E
D
B C
G F
coordinator
Not found WHERE user_email = ‘xxx'
26. @doanduyhai
Caveat 2: 1-to-1 index (user_email)
26
H
A
E
D
B C
G F
coordinator
Still no result
WHERE user_email = ‘xxx'
27. @doanduyhai
Caveat 2: 1-to-1 index (user_email)
27
H
A
E
D
B C
G F
coordinator
At best 1 user found
At worst 0 user found
WHERE user_email = ‘xxx'
28. @doanduyhai
Caveat 2 solution: materialized views
28
For 1-to-1 index/relationship, use materialized views instead
CREATE MATERIALIZED VIEW user_by_email AS
SELECT * FROM users
WHERE user_id IS NOT NULL and user_email IS NOT NULL
PRIMARY KEY (user_email, user_id)
But range queries ( <, >, ≤, ≥) not possible …
32. @doanduyhai
SASI Life-cycle: in-memory
32
Commit log1
. . .
1
Commit log2
Commit logn
Memory
. . .
MemTable
Table1
MemTable
Table2
MemTable
TableN
2
Index
MemTable1
Index
MemTable2
. . .
Index
MemTableN
3
ACK the client
33. @doanduyhai
Local write path data structures
33
Index mode, data type Data structure Usage
PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%'
CONTAINS, text Guava ConcurrentSuffixTree
name LIKE ’%John%'
name LIKE ’%ny’
PREFIX, other JDK ConcurrentSkipListSet
age = 20
age >= 20 AND age <= 30
SPARSE, other JDK ConcurrentSkipListSet
age = 20
age >= 20 AND age <= 30
suitable for 1-to-N index with N ≤ 5
36. @doanduyhai
Local write path summary
36
Index files are built
• on memtable flush
• on compaction flush
To avoid OOM, index files are split into chunk of
• 1Gb for memtable flush
• max_compaction_flush_memory_in_mb for compaction flush
à consequences: SASI has impact on write bandwidth (CPU & disk I/O)
37. @doanduyhai
Local read path
37
• first, optimize query using Query Planer (see later)
• then load chunks (4k) of index files from disk into memory
• perform binary search to find the indexed value(s)
• retrieve the corresponding partition keys and push them into the Partition
Key Cache
à Yes, currently SASI only keep partition key(s) so on wide partition it’s not very
optimized ...
54. @doanduyhai
Hardware specs
13 bare-metal machines
• 6 CPU HT (12 vcores)
• 64Gb RAM
• 4 SSDs in RAID0 for a total of 1.5Tb
Data set
• 13 billions of rows
• 1 numerical index with 36 distinct values
• 2 text index with 7 distinct values
• 1 text index with 3 distinct values
54
55. @doanduyhai
Benchmark results
Full table scan using co-located Spark (no LIMIT)
55
Predicate count Fetched rows Query time in sec
1 36 109 986 609
2 2 781 492 330
3 1 044 547 372
4 360 334 116
56. @doanduyhai
Benchmark results
Full table scan using co-located Spark (no LIMIT)
56
Predicate count Fetched rows Query time in sec
1 36 109 986 609
2 2 781 492 330
3 1 044 547 372
4 360 334 116
59. @doanduyhai
SASI vs search engines
SASI vs Solr/ElasticSearch ?
• Cassandra is not a search engine !!! (database = durability)
• always slower because 2 passes (SASI index read + original Cassandra data)
• no scoring
• no ordering (ORDER BY)
• no grouping (GROUP BY) à Apache Spark for analytics
If you don’t need the above features, SASI is for you!
59
60. @doanduyhai
SASI sweet spots
SASI is a relevant choice if
• you need multi criteria search and you don't need ordering/grouping/scoring
• you mostly need 100 to 10000 of rows for your search queries
• you always know the partition keys of the rows to be searched for (this one applies to
native secondary index too)
• you want to index static columns (SASI has no penalty since it indexes the whole
partition)
60
61. @doanduyhai
SASI blind spots
SASI is a poor choice if
• you have strong SLA on search latency, for example few millisecs requirement
• ordering of the search results is important for you
61