A moderate dive into Apache Cassandra data modelling, how to do it and how CQL differs from SQL. Presented by Alex Thompson at the Sydney Cassandra Meetup.
This document provides an overview of new features introduced in ES2015 including arrow functions, classes, modules, larger standard library, smarter object expressions, destructuring, default arguments, template strings, block scoping, symbols, iterators, generators, and async/await. It also mentions using Babel to transpile newer JavaScript syntax for older environments and lists some popular libraries that use ES2015 features like ImmutableJS, Flow, React, and React Native.
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
(defrecord Assistant [name id])
(updatePersonalInfo )
Manager:
(defrecord Manager [name id employees])
(raise )
(extend-type Assistant Employee
(roles [this] "assistant"))
(extend-type Manager Employee
(roles [this] (str "manager of " (count employees))))
85
The Expression Problem
86
The Expression Problem
Add a new
data type
Add a new
operation
Without changing:
- Existing data types
- Existing operations
87
The Expression Problem
Add Employee
Add raise()
Without changing:
- Assistant
This document provides an overview of Python data structures including lists, tuples, dictionaries, and sets. It discusses common list methods like append(), insert(), remove(), and sort(). It provides examples of using lists as stacks and queues. It also covers list comprehensions, the del statement, tuple packing and unpacking, set operations, looping through dictionaries, and comparing sequences.
This document discusses various built-in functions in MySQL including date functions, string functions, and numeric functions. It provides examples of functions such as DATE_FORMAT() to format dates, CONCAT() to concatenate strings, and FLOOR() to round numbers down. Various functions are demonstrated on sample data from a cds table to manipulate and extract date, string, and numeric values.
The document discusses the basics of creating GUIs in Python using the tkinter module. It presents code examples that demonstrate how to create windows, add labels, buttons, frames for layout, entry boxes, file selection dialogs, color pickers, and radio buttons. The examples gradually build upon each other to showcase different tkinter widgets and features like callbacks, layouts, and user input/output.
This document provides an overview of new features introduced in ES2015 including arrow functions, classes, modules, larger standard library, smarter object expressions, destructuring, default arguments, template strings, block scoping, symbols, iterators, generators, and async/await. It also mentions using Babel to transpile newer JavaScript syntax for older environments and lists some popular libraries that use ES2015 features like ImmutableJS, Flow, React, and React Native.
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
(defrecord Assistant [name id])
(updatePersonalInfo )
Manager:
(defrecord Manager [name id employees])
(raise )
(extend-type Assistant Employee
(roles [this] "assistant"))
(extend-type Manager Employee
(roles [this] (str "manager of " (count employees))))
85
The Expression Problem
86
The Expression Problem
Add a new
data type
Add a new
operation
Without changing:
- Existing data types
- Existing operations
87
The Expression Problem
Add Employee
Add raise()
Without changing:
- Assistant
This document provides an overview of Python data structures including lists, tuples, dictionaries, and sets. It discusses common list methods like append(), insert(), remove(), and sort(). It provides examples of using lists as stacks and queues. It also covers list comprehensions, the del statement, tuple packing and unpacking, set operations, looping through dictionaries, and comparing sequences.
This document discusses various built-in functions in MySQL including date functions, string functions, and numeric functions. It provides examples of functions such as DATE_FORMAT() to format dates, CONCAT() to concatenate strings, and FLOOR() to round numbers down. Various functions are demonstrated on sample data from a cds table to manipulate and extract date, string, and numeric values.
The document discusses the basics of creating GUIs in Python using the tkinter module. It presents code examples that demonstrate how to create windows, add labels, buttons, frames for layout, entry boxes, file selection dialogs, color pickers, and radio buttons. The examples gradually build upon each other to showcase different tkinter widgets and features like callbacks, layouts, and user input/output.
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
1. Initialize Allegro and load necessary addons like images.
2. Create a display and show a message box to initialize the window.
3. Draw shapes and bitmaps to the display and flip periodically to animate.
4. Set up event handling for input from keyboard, mouse, and timer to control animation.
5. Inside the game loop: handle input, update object positions, redraw, and flip display continuously.
Tracking Data Updates in Real-time with Change Data CaptureScyllaDB
Change Data Capture is one of the important new features coming to the Scylla NoSQL database. It enables the user to track updates to a table(s) in real time, supporting a variety of use cases, such as:
- Real-time update of microservices based on Scylla updates
Replicating the data, or part of it, between loosely coupled Scylla clusters
- Soft real-time analytics updates based on the stream of Scylla updates
This document discusses MySQL 5.7's JSON datatype. It introduces JSON and why it is useful for integrating relational and schemaless data. It covers creating JSON columns, inserting and selecting JSON data using functions like JSON_EXTRACT. It discusses indexing JSON columns using generated columns. Performance is addressed, showing JSON tables can be 40% larger with slower inserts and selects compared to equivalent relational tables without indexes. Options for stored vs virtual generated columns are presented.
The document discusses various built-in functions in MySQL for manipulating date, time, string, and numeric data. It describes functions for formatting dates, extracting date elements, adding or subtracting times, concatenating and modifying strings. Common functions covered include DATE_FORMAT(), NOW(), CURDATE(), CONCAT(), REPLACE(), LEFT(), RIGHT(), and MID().
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
Robert Hodges is the Altinity CEO with over 30 years of experience in DBMS, virtualization, and security. ClickHouse is the 20th DBMS he has worked with. Alexander Zaitsev is the Altinity CTO and founder with decades of experience designing and operating petabyte-scale analytic systems. Vitaliy Zakaznikov is the QA Architect with over 13 years of testing hardware and software and is the author of the TestFlows open source testing tool.
This document provides an introduction to Cassandra including:
1) An overview of Cassandra's key architecture including its linear scalability, continuous availability across data centers, and operational simplicity.
2) A discussion of Cassandra's data model including its use of Last Write Wins for conflict resolution and examples of modeling one-to-many relationships using clustered tables.
3) Details on Cassandra's consistency levels and how they impact availability and durability of writes and reads.
Presentation given at the 2013 Clojure Conj on core.matrix, a library that brings muli-dimensional array and matrix programming capabilities to Clojure
MySQL is a popular open-source database that allows users to create, query, and manage relational database tables. The document introduces how to use MySQL, including how to connect to MySQL, create databases and tables, insert and query data using SQL statements like SELECT, and sort and filter query results. Pattern matching operators like LIKE and wildcard characters can be used to search for specific strings within table columns.
This document provides an overview of MySQL and SQL commands. It discusses:
1) Different database architectures like 2-tier and 3-tier that separate the database, web server, and application logic.
2) How to connect to a MySQL server using the mysql command line client and send SQL statements.
3) Common SQL commands like SELECT, INSERT, UPDATE, DELETE etc and how to retrieve, modify and delete data from database tables.
4) Examples of creating a database, table, inserting and querying data using the marks table example.
5) Additional SQL functions, wildcards, limiting results and other advanced query features.
The document defines variables for use in dashboards for MySQL, Elasticsearch, Graphite, InfluxDB, and Prometheus. It includes variable definitions to select values like hostnames, filters, sources, and time intervals that can be used for filtering and aggregating metrics from each system.
- The document discusses setting up LaTeX to write statistical reports, including data, code, graphics, and a written report
- It covers subsetting data, handling missing values, presenting code clearly with comments and formatting, saving graphics in PDF and PNG formats, and using LaTeX to write the written report portion
- Students are instructed to recreate graphics from previous lectures, modify a template to include them, and start on homework 2 which involves writing a statistical report
This document provides an introduction and overview of MySQL. It discusses that MySQL is a popular open-source database management system, pronounced "my ess que ell". The document then outlines topics like connecting to MySQL, entering queries, creating and using databases and tables, and provides examples of basic queries for selecting, sorting, and counting rows of data.
The document discusses various database topics like collecting email addresses, using functions in queries, collations, generating API keys, using primary keys, and partitioning tables. It provides code examples and explanations for selecting unique records based on email address with different casing, generating unique IDs, and configuring partitions in MySQL tables.
This document contains R code examples for response surface methodology. It includes examples for reading in data, fitting different regression models including first-order and second-order polynomial models, and interpreting the results, including lack of fit tests and plots of estimated response surfaces. Sections cover two-level factorial designs, fractional factorial designs, steepest ascent, and other advanced topics in response surface methodology.
This document discusses the architecture of web applications using MySQL and PHP. It describes the main components including the relational database (MySQL), middleware (PHP), web server (Apache, IIS), and web browser. It also provides examples of SQL statements like SELECT, INSERT, WHERE to query and manipulate data in a MySQL database.
The document summarizes new features and improvements in Cassandra 2.0, including enhanced performance, scalability, and ease of use. Key updates include improved cursors for paging through large result sets, batching of prepared statements, simplified parameterized queries, additional CQL3 functionality, and lightweight transactions. Future plans outlined are secondary indexes on collections, more efficient repairs, custom data types, and aggregate functions in CQL. The document provides examples and explanations of new capabilities such as tracing, tombstone handling, and rapid read protection.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
The document provides an overview of a presentation on Apache Cassandra and Spark. It introduces the speaker and their background with Cassandra. The presentation will cover a recap of Cassandra, replication, fault tolerance, data modeling, and Spark integration. It will also look at a potential use case with KillrWeather. Common Cassandra use cases include ordered data like time series for events, financial transactions, and sensor data.
- The document discusses optimization techniques for SQL injection attacks, including reducing injection length, improving data retrieval speed, leveraging data compression, and exploiting vulnerabilities through blind SQL injection.
- Specific techniques mentioned include using shorter SQL functions like SUBSTR() instead of SUBSTRING(), retrieving hashed data a byte at a time using logical AND operations, and ordering queries randomly to retrieve data in a non-sequential manner.
- The document provides examples of exploiting a blind SQL injection vulnerability through techniques like ordering results based on random number seeds and retrieving multi-byte values with a binary search approach.
DEF CON 27 -OMER GULL - select code execution from using sq liteFelipe Prado
The document discusses gaining code execution using SQLite database queries. It provides background on SQLite, examines its attack surface when querying an untrusted database, and explores previous work exploiting memory corruptions. The author proposes a technique called "Query Oriented Programming" to leverage SQL queries to implement memory leakage and other exploitation primitives to achieve remote code execution without using traditional scripting languages.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
1. Initialize Allegro and load necessary addons like images.
2. Create a display and show a message box to initialize the window.
3. Draw shapes and bitmaps to the display and flip periodically to animate.
4. Set up event handling for input from keyboard, mouse, and timer to control animation.
5. Inside the game loop: handle input, update object positions, redraw, and flip display continuously.
Tracking Data Updates in Real-time with Change Data CaptureScyllaDB
Change Data Capture is one of the important new features coming to the Scylla NoSQL database. It enables the user to track updates to a table(s) in real time, supporting a variety of use cases, such as:
- Real-time update of microservices based on Scylla updates
Replicating the data, or part of it, between loosely coupled Scylla clusters
- Soft real-time analytics updates based on the stream of Scylla updates
This document discusses MySQL 5.7's JSON datatype. It introduces JSON and why it is useful for integrating relational and schemaless data. It covers creating JSON columns, inserting and selecting JSON data using functions like JSON_EXTRACT. It discusses indexing JSON columns using generated columns. Performance is addressed, showing JSON tables can be 40% larger with slower inserts and selects compared to equivalent relational tables without indexes. Options for stored vs virtual generated columns are presented.
The document discusses various built-in functions in MySQL for manipulating date, time, string, and numeric data. It describes functions for formatting dates, extracting date elements, adding or subtracting times, concatenating and modifying strings. Common functions covered include DATE_FORMAT(), NOW(), CURDATE(), CONCAT(), REPLACE(), LEFT(), RIGHT(), and MID().
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
Robert Hodges is the Altinity CEO with over 30 years of experience in DBMS, virtualization, and security. ClickHouse is the 20th DBMS he has worked with. Alexander Zaitsev is the Altinity CTO and founder with decades of experience designing and operating petabyte-scale analytic systems. Vitaliy Zakaznikov is the QA Architect with over 13 years of testing hardware and software and is the author of the TestFlows open source testing tool.
This document provides an introduction to Cassandra including:
1) An overview of Cassandra's key architecture including its linear scalability, continuous availability across data centers, and operational simplicity.
2) A discussion of Cassandra's data model including its use of Last Write Wins for conflict resolution and examples of modeling one-to-many relationships using clustered tables.
3) Details on Cassandra's consistency levels and how they impact availability and durability of writes and reads.
Presentation given at the 2013 Clojure Conj on core.matrix, a library that brings muli-dimensional array and matrix programming capabilities to Clojure
MySQL is a popular open-source database that allows users to create, query, and manage relational database tables. The document introduces how to use MySQL, including how to connect to MySQL, create databases and tables, insert and query data using SQL statements like SELECT, and sort and filter query results. Pattern matching operators like LIKE and wildcard characters can be used to search for specific strings within table columns.
This document provides an overview of MySQL and SQL commands. It discusses:
1) Different database architectures like 2-tier and 3-tier that separate the database, web server, and application logic.
2) How to connect to a MySQL server using the mysql command line client and send SQL statements.
3) Common SQL commands like SELECT, INSERT, UPDATE, DELETE etc and how to retrieve, modify and delete data from database tables.
4) Examples of creating a database, table, inserting and querying data using the marks table example.
5) Additional SQL functions, wildcards, limiting results and other advanced query features.
The document defines variables for use in dashboards for MySQL, Elasticsearch, Graphite, InfluxDB, and Prometheus. It includes variable definitions to select values like hostnames, filters, sources, and time intervals that can be used for filtering and aggregating metrics from each system.
- The document discusses setting up LaTeX to write statistical reports, including data, code, graphics, and a written report
- It covers subsetting data, handling missing values, presenting code clearly with comments and formatting, saving graphics in PDF and PNG formats, and using LaTeX to write the written report portion
- Students are instructed to recreate graphics from previous lectures, modify a template to include them, and start on homework 2 which involves writing a statistical report
This document provides an introduction and overview of MySQL. It discusses that MySQL is a popular open-source database management system, pronounced "my ess que ell". The document then outlines topics like connecting to MySQL, entering queries, creating and using databases and tables, and provides examples of basic queries for selecting, sorting, and counting rows of data.
The document discusses various database topics like collecting email addresses, using functions in queries, collations, generating API keys, using primary keys, and partitioning tables. It provides code examples and explanations for selecting unique records based on email address with different casing, generating unique IDs, and configuring partitions in MySQL tables.
This document contains R code examples for response surface methodology. It includes examples for reading in data, fitting different regression models including first-order and second-order polynomial models, and interpreting the results, including lack of fit tests and plots of estimated response surfaces. Sections cover two-level factorial designs, fractional factorial designs, steepest ascent, and other advanced topics in response surface methodology.
This document discusses the architecture of web applications using MySQL and PHP. It describes the main components including the relational database (MySQL), middleware (PHP), web server (Apache, IIS), and web browser. It also provides examples of SQL statements like SELECT, INSERT, WHERE to query and manipulate data in a MySQL database.
The document summarizes new features and improvements in Cassandra 2.0, including enhanced performance, scalability, and ease of use. Key updates include improved cursors for paging through large result sets, batching of prepared statements, simplified parameterized queries, additional CQL3 functionality, and lightweight transactions. Future plans outlined are secondary indexes on collections, more efficient repairs, custom data types, and aggregate functions in CQL. The document provides examples and explanations of new capabilities such as tracing, tombstone handling, and rapid read protection.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
The document provides an overview of a presentation on Apache Cassandra and Spark. It introduces the speaker and their background with Cassandra. The presentation will cover a recap of Cassandra, replication, fault tolerance, data modeling, and Spark integration. It will also look at a potential use case with KillrWeather. Common Cassandra use cases include ordered data like time series for events, financial transactions, and sensor data.
- The document discusses optimization techniques for SQL injection attacks, including reducing injection length, improving data retrieval speed, leveraging data compression, and exploiting vulnerabilities through blind SQL injection.
- Specific techniques mentioned include using shorter SQL functions like SUBSTR() instead of SUBSTRING(), retrieving hashed data a byte at a time using logical AND operations, and ordering queries randomly to retrieve data in a non-sequential manner.
- The document provides examples of exploiting a blind SQL injection vulnerability through techniques like ordering results based on random number seeds and retrieving multi-byte values with a binary search approach.
DEF CON 27 -OMER GULL - select code execution from using sq liteFelipe Prado
The document discusses gaining code execution using SQLite database queries. It provides background on SQLite, examines its attack surface when querying an untrusted database, and explores previous work exploiting memory corruptions. The author proposes a technique called "Query Oriented Programming" to leverage SQL queries to implement memory leakage and other exploitation primitives to achieve remote code execution without using traditional scripting languages.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
This document provides an overview of Apache Cassandra including:
- What Cassandra is and how it differs from an RDBMS by not supporting joins, having an optional schema, and being transactionless.
- Cassandra's data model using keyspaces, column families, and static vs dynamic column families.
- How to integrate Cassandra with Java applications using the Hector client and ColumnFamilyTemplate for querying, updating, and deleting data.
- Additional topics covered include the CAP theorem, data storage and compaction, and using CQL via JDBC.
SQL is a language used to interface with relational database systems. It was developed by IBM in the 1970s and is now an industry standard. SQL has three main sublanguages: DDL for defining database schemas, DML for manipulating data, and DCL for controlling access.
Some key points about SQL include:
- DDL commands like CREATE, ALTER, and DROP are used to define and modify database structures.
- DML commands like SELECT, INSERT, UPDATE, and DELETE are used to query and manipulate the data.
- DCL commands like COMMIT, ROLLBACK, GRANT and REVOKE control transactions and user privileges.
- SQL can be used
This document discusses third party patches for MySQL that provide quick wins and new features. It summarizes five such patches: 1) Slow query filtering which helps identify expensive queries, 2) Index statistics which helps determine unused indexes, 3) An InnoDB dictionary limit which constrains memory usage, 4) A global long query time setting, and 5) A "fix" for InnoDB group commit performance regressions in MySQL 5.0. The document encourages using third party patches to gain features and improvements not yet available in the MySQL core.
The document summarizes Cassandra developments over the past 5 years, including keynote details from Jonathan Ellis on Cassandra 1.2 and 2.0. Some highlights include improvements to scalability, performance and reliability in Cassandra 1.2, and the introduction of new features in Cassandra 2.0 like lightweight transactions (CAS), improved compaction, and experimental triggers. The keynote outlines changes and removals between the two versions to ease the transition for developers and operators.
1. The document discusses using graphics and data visualization to improve understanding of database performance issues and SQL tuning. It provides examples of how visualizations can clearly show relationships in complex SQL queries and data that are difficult to understand from text or code alone.
2. Key steps in visual SQL tuning are laid out, including drawing tables as nodes, joins as connection lines, and filters as markings on tables. This helps identify optimization opportunities like missing indexes or stale statistics.
3. The document emphasizes that a lack of clarity in visualizing complex data and queries can have devastating consequences, while graphics enable easy understanding and effective problem-solving.
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
The document provides an overview of different index types in Postgres including B-Tree, GIN, GiST, and BRIN indexes. It discusses what each index type is best suited for, how to create each type of index, and their internal data structures. Specifically, it covers that B-Tree indexes are good for equality comparisons, GIN indexes store unique values efficiently for arrays/JSON and are useful for containment operators, GiST indexes allow overlapping ranges and are useful for nearest neighbor searches, and BRIN indexes provide scalable indexing for large tables.
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
Title: Introduction to Apache Cassandra 1.2
Details: Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to core concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency. He’ll also cover new features in Cassandra version 1.2 including virtual nodes, CQL 3 language and query tracing.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010, he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
This document provides an introduction to Apache Cassandra, including an overview of key concepts like the cluster, nodes, data model, and data modeling best practices. It discusses Cassandra's origins and popularity. The presentation covers the cluster architecture with consistent hashing and token ranges, replication strategies, consistency levels, and more. It also summarizes the Cassandra data model including tables, columns, SSTables, caching, compaction and discusses building a Twitter-like data model in CQL.
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
The document discusses histograms in Oracle's cost-based optimizer (CBO). Histograms help improve cardinality estimates when data is skewed, leading to better query plans. They were introduced in Oracle 8 and are now automatically collected, with the number of buckets and type (frequency or height balanced) depending on the number of distinct values. The document provides background on histograms and how the CBO uses them to estimate selectivity and cardinality.
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
Modern Apache Cassandra provides a highly scalable and available database. Some key points covered in the document include:
- Cassandra has been under active development since 2008 and is now at version 2.0, with 2.1 upcoming.
- It is used by many companies for applications such as social media features, logging, notifications, and more due to its abilities around scalability, high availability, and tunable consistency.
- Cassandra uses a decentralized architecture with no single point of failure and dynamic partitioning of data across nodes using a token ring approach for high availability without a single point of failure.
- It provides tunable consistency levels, lightweight transactions, and other features for flexibility while maintaining high
This document discusses several features introduced in Oracle Database 12c and 18c that improve the handling of SQL and PL/SQL code. It covers longer identifier names, compile-time resolvable expression sizes, improved overflow handling for listagg, column-level collation, and deprecating code. Examples are provided to demonstrate the usage and benefits of each feature.
Drivers connect applications to Cassandra clusters and maintain connections to nodes. They probe clusters to discover nodes, token ranges, and latency. Drivers are data-aware and can route queries to appropriate replicas or fail over if needed. Cassandra clusters can span multiple data centers for redundancy, workload separation, and geographic distribution of data and queries. Configuration files like cassandra.yaml and cassandra-env.sh are used to configure memory, data storage, caching, and other settings. Cassandra clusters should be provisioned on commodity servers using tools like cassandra-stress to test workloads and estimate needed nodes.
A look at the distributed features of Apache Cassandra drivers and best practices in using those drivers. Presented by Alex Thompson at the Sydney Cassandra Meetup.
Apache Cassandra - Diagnostics and monitoringAlex Thompson
This presentation is intended as a field guide for users of Apache Cassandra.
This guide specifically covers an explanation of diagnostics tools and monitoring tools and methods used in conjunction with Apache Cassandra. It is written in a pragmatic order with the most important tools first. Presented by Alex Thompson at the Sydney Cassandra Meetup
Each Cassandra node is independent but cooperates with other nodes via gossip protocols to share information. The gossip protocol allows nodes to learn about the topology of the cluster and state of other nodes without any single point of failure. Data is replicated across multiple nodes in the cluster according to the replication strategy configured at the keyspace level to ensure resiliency even if multiple nodes fail.
Powerful big data processing and storage combined, this presentation walks thru the basics of integrating Apache Spark and Apache Cassandra. Presented by Alex Thompson at the Sydney Cassandra Meetup.
Building Apache Cassandra clusters for massive scaleAlex Thompson
Covering theory and operational aspects of bring up Apache Cassandra clusters - this presentation can be used as a field reference. Presented by Alex Thompson at the Sydney Cassandra Meetup.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
1. Cassandra Data Modelling
CQL is not SQL
Querying simple tables + CQL TRACE (your new best friend)
C* columns and disk storage
C* column nesting (clustering)
Querying clustering columns
RDBMS data modelling: normalize tables then define queries
C* data modelling: define queries then define denormalized tables
C* data modelling: use-case
3. Where is my data?
01->09
30->39
10->19
40->49
20->29
50->59
70->79
60->69
CREATE keyspace...
CREATE TABLE users {
username text,
password text,
address text,
PRIMARY KEY(username)
}
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
4. Where is my data?
01->09
30->39
10->19
40->49
20->29
50->59
70->79
60->69
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
Each node owns a range of tokens and a ring
ALWAYS forms a complete token range.
hash(primary key) -> token
The token produced always falls between the upper
bound and lower bound of the complete token range
(0->79)*
*doesn’t matter if the PK is a string, int, float, GUID,
blob...always falls within the token range.
the hash produced is randomized
so token for hash(john@site.com) could produce be
any number between 0-79, but will always produce
the same number
-> consistent hashing
5. Where is my data?
01->09 bill@yahoo.com
30->39
10->19
40->49
20->29
john@site.com
50->59
70->79
bill@yahoo.com
60->69
token = hash(primary key)
eg
hash(john@site.com) = 26
hash(bill@yahoo.com) = 79
hash(james@gmail.com) = 5
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
6. What is the difference between:
SQL> SELECT * FROM users;
and
CQL> SELECT * FROM users;
Answer : Where is my data?
CQL is not SQL
7. SELECT * FROM users;
1. query node 3
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
Where is my data?
8 2
5
7 3
1
6
4
01->09 bill@yahoo.com
30->39
10->19
40->49
20->29
john@site.com
50->59
70->79
bill@yahoo.com
60->69
8. SELECT * FROM users;
1. query node 3
2. query node 8
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
Where is my data?
8 2
5
7 3
1
6
4
01->09 bill@yahoo.com
30->39
10->19
40->49
20->29
john@site.com
50->59
70->79
bill@yahoo.com
60->69
9. SELECT * FROM users;
1. query node 3
2. query node 8
3. query node 1
username password address
john@site.com xxx 35 Arthur St
bill@yahoo.com xxx 21 Jump St
james@gmail.com xxx 18 Smith St
Where is my data?
8 2
5
7 3
1
6
4
01->09 bill@yahoo.com
30->39
10->19
40->49
20->29
john@site.com
50->59
70->79
bill@yahoo.com
60->69
10. Username (PK) PasswordAddress
aaaaaaaaxxx xxx
aaaaaaabxxx xxx
bbbbbbbbxxx xxx
cccccccccxxx xxx
zzzzzzzzzxxx xxx
40->49
SELECT * FROM users;
1. query node 3
2. query node 8
3. query node 1
4. query node 2
This is called a table scan in C* parlance and is
a performance anti-pattern, you can see that very
quickly you will timeout the query.
Proper design means this query is unnecessary.
Test all queries by running a TRACE in DevCenter
or at the CQLSH prompt as a sanity check.
Where is my data?
8 2
5
7 3
1
6
4
01->09 bill@yahoo.com
30->39
10->19
40->49
20->29
john@site.com
50->59
70->79
bill@yahoo.com
60->69
11. C* columns and disk storage
C* is very slow at scanning down a list of partition_keys because they are distributed over many partitions / nodes
C* is very fast at scanning across columns for a specific partition_key because they are on a single partition.
col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 | col11 | col12 | col13 | col14 | col15 | col16….aaaaaaab
fast scan !!
slow
scan
OK, i get that the partition_keys are spread out on different nodes and thats why scanning down them is slow, but that doesn’t explain why
scanning across columns is fast for a specific partition_key (e.g john@site.com )
It all comes down to the disk storage of columns for a specific partition_key, the on disk.
13. C* column nesting (clustering)
CREATE TABLE sessions {
username text,
session_id text,
url text,
time_spent int,
browser text,
PRIMARY KEY(username, session_id)
}
PRIMARY KEY(<partition_key>, <clustering column>)
john@site.com
session2
url xxx
time_spent xxx
browser xxx
session3 ...
SELECT * FROM sessions WHERE username=”john@site.com” AND session_id=”session2”;
Dictates what node
the data is stored on
Dictates how data is
stored and sorted
under the partition
key
15. C* column nesting (clustering) - queries
CREATE TABLE sessions {
username text,
session_id text,
url text,
time_spent int,
browser text,
PRIMARY KEY(username, session_id)
}
GOOD:
SELECT * FROM sessions WHERE username=”john@site.com”;
SELECT * FROM sessions WHERE username=”john@site.com” AND session_id=”session2”;
WRONG:
SELECT * FROM sessions WHERE session_id=”session2”;
RULE: Clustering columns or partition_keys prior to the most granular clustering column must be present in the query.
16. C* column nesting (clustering) - queries
CREATE TABLE timeline (
day text,
hour int,
min int,
sec int,
reading text,
PRIMARY KEY (day, hour, min, sec)
);
PRIMARY KEY(<partition_key>, <cl. column>, <cl. column>, <cl. column>)
day1
hour1
min1
sec1
reading
sec2
reading
day 2 ...
SELECT * FROM timeline WHERE day=day1 AND hour=hour1 AND min=min1 AND sec=sec1;
17. C* column nesting (clustering) - queries
CREATE TABLE timeline (
day text,
hour int,
min int,
sec int,
value text,
PRIMARY KEY (day, hour, min, sec)
);
GOOD:
SELECT * FROM timeline WHERE day=day1;
SELECT * FROM timeline WHERE day=day1 AND hour=hour1;
SELECT * FROM timeline WHERE day=day1 AND hour=hour1 AND min=min1;
SELECT * FROM timeline WHERE day=day1 AND hour=hour1 AND min=min1 AND sec=sec1;
WRONG:
SELECT * FROM sessions WHERE day=day1 AND min=min1;
RULE: Clustering columns must be present in the query in the same order as the PRIMARY KEY
CAREFUL: be aware how much data you are returning !!
18. C* column nesting (clustering) - queries
Notes and limitations
(partition_key:aaaaaaab)
(column=col1, value=xx, timestamp=1357866010549000)
(column=col2, value=xx, timestamp=1357866010549000)
(column=col3, value=xx, timestamp=1357866010549000)
(column=col4, value=xx, timestamp=1357866010549000)
(column=col5, value=xx, timestamp=1357866010549000)
(column=col6, value=xx, timestamp=1357866010549000)
(column=col7, value=xx, timestamp=1357866010549000)
(column=col8, value=xx, timestamp=1357866010549000)
(column=col9, value=xx, timestamp=1357866010549000)
(column=col10, value=xx, timestamp=1357866010549000)
(column=col11, value=xx, timestamp=1357866010549000)
(column=col12, value=xx, timestamp=1357866010549000)
(column=col13, value=xx, timestamp=1357866010549000)
(column=col14, value=xx, timestamp=1357866010549000)
(column=col15, value=xx, timestamp=1357866010549000)
(column=col16, value=xx, timestamp=1357866010549000)
RULE: Always design your tables so that you limit the amount of data stored in a single partition_key to the size of the
in_memory_compaction_limit_in_mb that is set in cassandra.yaml (default 64mb)
Why? Compaction (which we will cover later) needs to be able to process a complete partition_key and all its underlying data in memory,
swapping to and from disk introduces serious performance degradation and poor JVM GC behaviour.
RULE: Clustering columns must be present in the query in the same order as the PRIMARY KEY
CAREFUL: be aware how much data you are returning !!
19. Where is my data?
CQL TRACE will show you where your data is,
how costly it is to get it in terms of time and
how many node hops it is going to take to get it.
20. Sane queries on simple tables + introducing indexes
CREATE TABLE users {
username text,
password text,
address text,
age int,
PRIMARY KEY(username)
}
SELECT * FROM users WHERE username=”john@site.com”;
SELECT address, age FROM users WHERE username=”bill@yahoo.com”;
CREATE INDEX age_key ON users(age);
SELECT * FROM users WHERE age=35;
CAREFUL: think about how much data you are returning and from where that data is coming...if you don’t
know, or can’t work it out run a TRACE at the CQLSH console or run the query under DevCenter 1.3...
21. CQL TRACE - your new best friend
TRACE provides a description of each step it takes to satisfy the request, the names of nodes that are affected, the time for each step,
and the total time for the request. TRACE is the most powerful tool in a data modellers hands. (INSERT)
activity | timestamp | source | source_elapsed (microseconds)
-------------------------------------+--------------+-----------+----------------
execute_cql3_query | 16:41:00,754 | 127.0.0.1 | 0
Parsing statement | 16:41:00,754 | 127.0.0.1 | 48
Preparing statement | 16:41:00,755 | 127.0.0.1 | 658
Determining replicas for mutation | 16:41:00,755 | 127.0.0.1 | 979
Message received from /127.0.0.1 | 16:41:00,756 | 127.0.0.3 | 37
Acquiring switchLock read lock | 16:41:00,756 | 127.0.0.1 | 1848
Sending message to /127.0.0.3 | 16:41:00,756 | 127.0.0.1 | 1853
Appending to commitlog | 16:41:00,756 | 127.0.0.1 | 1891
Sending message to /127.0.0.2 | 16:41:00,756 | 127.0.0.1 | 1911
Adding to emp memtable | 16:41:00,756 | 127.0.0.1 | 1997
Acquiring switchLock read lock | 16:41:00,757 | 127.0.0.3 | 395
Message received from /127.0.0.1 | 16:41:00,757 | 127.0.0.2 | 42
Appending to commitlog | 16:41:00,757 | 127.0.0.3 | 432
Acquiring switchLock read lock | 16:41:00,757 | 127.0.0.2 | 168
Adding to emp memtable | 16:41:00,757 | 127.0.0.3 | 522
Appending to commitlog | 16:41:00,757 | 127.0.0.2 | 211
Adding to emp memtable | 16:41:00,757 | 127.0.0.2 | 359
Enqueuing response to /127.0.0.1 | 16:41:00,758 | 127.0.0.3 | 1282
Enqueuing response to /127.0.0.1 | 16:41:00,758 | 127.0.0.2 | 1024
Sending message to /127.0.0.1 | 16:41:00,758 | 127.0.0.3 | 1469
Sending message to /127.0.0.1 | 16:41:00,758 | 127.0.0.2 | 1179
Message received from /127.0.0.2 | 16:41:00,765 | 127.0.0.1 | 10966
Message received from /127.0.0.3 | 16:41:00,765 | 127.0.0.1 | 10966
Processing response from /127.0.0.2 | 16:41:00,765 | 127.0.0.1 | 11063
Processing response from /127.0.0.3 | 16:41:00,765 | 127.0.0.1 | 11066
Request complete | 16:41:00,765 | 127.0.0.1 | 11139
22. CQL TRACE - how do I invoke it?
Option 1: CQLSH
All C* installs come with a commandline client
called cqlsh, you can run any CQL commands against
a cassandra cluster using cqlsh, to invoke TRACE:
cqlsh>TRACE ON;
cqlsh>SELECT * FROM mytable WHERE id=1;
After running the query, cqlsh will return with
both the query results and the TRACE results.
Option 2: DevCenter 1.3+
For the GUI inclined (like me) DevCenter
automatically runs a TRACE on every query in a TAB
behind the execution/results screen, you can see
the formatted results there.
24. Cassandra data modelling
1. Define CQL queries 2. Design de-normalised tables
for each query.
3. Build consuming application
no JOINS -> queries first -> then denormalize tables
26. Data modelling use-case #1 - Music data
Q1
CREATE TABLE performers_by_style {
style TEXT,
name TEXT,
PRIMARY KEY(style, name)
}
WITH CLUSTERING ORDER BY (name ASC);
(partition_key:style1)
(column=name1:, value=, timestamp=1357866010549000)
(column=name2:, value=, timestamp=1357866010549000)
(column=name3:, value=, timestamp=1357866010549000)
(partition_key:style2)
(column=name4:, value=, timestamp=1357866010549000)
(column=name5:, value=, timestamp=1357866010549000)
(column=name6:, value=, timestamp=1357866010549000)
SELECT * FROM performers_by_style WHERE style=”rock”;
27. Data modelling use-case #1 - Music data
Q2
CREATE TABLE performer (
name TEXT,
type TEXT,
country TEXT,
style LIST<TEXT>,
founded INT,
born INT
died TEXT,
PRIMARY KEY (name)
);
SELECT * FROM performer WHERE name=”someName”;
(partition_key:someName)
(column=type, value=, timestamp=1357866010549000)
...
28. Data modelling use-case #1 - Music data
Q3
CREATE TABLE album (
title TEXT,
year INT,
performer TEXT,
genre TEXT,
tracks map<INT,TEXT>,
PRIMARY KEY((title,year))
);
(partition_key:myTitle:2014)
(column=performer, value=Blondie, timestamp=1357866010549000)
(column=genre, value=rock, timestamp=1357866010549000)
(column=tracks, value={1:track1, 2:track2, 3:track3}, timestamp=1357866010549000)
(partition_key:title56:1999)
...
SELECT * FROM album WHERE title=”myTitle” AND year=2014;
29. Data modelling use-case #1 - Music data
Q4
CREATE TABLE albums_by_performer (
performer TEXT,
year INT,
title TEXT,
genre TEXT,
PRIMARY KEY(performer, year, title)
)
WITH CLUSTERING ORDER BY (year DESC, title ASC);
SELECT * FROM albums_by_performer WHERE performer=”myPerformer”;
(partition_key:myPerformer)
(column=year1:, value=, timestamp=1357866010549000)
(column=title1:, value=, timestamp=1357866010549000)
(column=genre, value=rock, timestamp=1357866010549000)
(partition_key:perfomer2)
30. Data modelling use-case #1 - Music data
Q5
CREATE TABLE albums_by_genre (
genre TEXT,
performer TEXT,
year INT,
title TEXT,
PRIMARY KEY(genre, performer, year, title)
)
WITH CLUSTERING ORDER BY (performer ASC, year DESC, title ASC);
(partition_key:myGenre)
(column=performer1, value=, timestamp=1357866010549000)
(column=year1, value=, timestamp=1357866010549000)
(column=title1, value=, timestamp=1357866010549000)
(partition_key:genre2)
SELECT * FROM albums_by_genre WHERE genre=”myGenre”;
31. Data modelling use-case #1 - Music data
Q6
CREATE TABLE albums_by_track (
track TEXT,
performer TEXT,
year INT,
title TEXT,
PRIMARY KEY(track, performer, year, title)
)
WITH CLUSTERING ORDER BY (performer ASC, year DESC, title ASC);
(partition_key:myTrack)
(column=performer1, value=, timestamp=1357866010549000)
(column=year1:, value=, timestamp=1357866010549000)
(column=title1, value=, timestamp=1357866010549000)
(partition_key:track2)
SELECT * FROM albums_by_track WHERE genre=”myTrack”;
32. Data modelling use-case #1 - Music data
Q7
CREATE TABLE tracks_by_album (
album TEXT,
year INT,
number INT,
performer TEXT,
genre TEXT,
title TEXT,
PRIMARY KEY((album, year), number)
)
WITH CLUSTERING ORDER BY (number ASC);
(partition_key:myAlbum:2014)
(column=number1, value=, timestamp=1357866010549000)
(column=performer, value=performer1, timestamp=1357866010549000)
(column=genre, value=genre1, timestamp=1357866010549000)
(column=title, value=title1, timestamp=1357866010549000)
(column=number2, value=, timestamp=1357866010549000)
(column=performer, value=performer1, timestamp=1357866010549000)
(column=genre, value=genre1, timestamp=1357866010549000)
(column=title, value=title1, timestamp=1357866010549000)
SELECT title, year FROM tracks_by_album WHERE album=”myAlbum” AND year=2015;
34. Cassandra is not an RDBMS, Cassandra is vastly more powerfully than any RDBMS in existence with the
proven ability in production to run 1000x node clusters.
But as a Cassandra Data Modeller you need to *think different*, you need to think distributed and
denormalized, but ultimately you need to ask the question:
“Where is my data?”
an RDBMS