Dare to build vertical design with relational data (Entity-Attribute-Value)Ivo Andreev
Entity-Attribute-Value model is often called “anti-pattern” by the criticism. And probably they would be right if one misses to read the “Handle with Care” label on it. Enthusiastic inexperienced developers would easily compromise the benefits of relational DB but the coin has yet another side. Hierarchical object with thousands of properties, unknown schema, flexibility and millions of records. As always – we have to sacrifice one thing in order to win another. Then all it comes to priorities and ability for decision making. At this lecture you will not get a step-by-step manual but instead get ideas for how to build one for you. A challenge, a proof of concept, hard work and successful project for millions – that is the story to share.
Designing an extensible, flexible schema that supports user customization is a common requirement, but it's easy to paint yourself into a corner.
Examples of extensible database requirements:
- A database that allows users to declare new fields on demand.
- Or an e-commerce catalog with many products, each with distinct attributes.
- Or a content management platform that supports extensions for custom data.
The solutions we use to meet these requirements is overly complex and the performance is terrible. How should we find the right balance between schema and schemaless database design?
I'll briefly cover the disadvantages of Entity-Attribute-Value (EAV), a problematic design that's an example of the antipattern called the Inner-Platform Effect, That is, modeling an attribute-management system on top of the RDBMS architecture, which already provides attributes through columns, data types, and constraints.
Then we'll discuss the pros and cons of alternative data modeling patterns, with respect to developer productivity, data integrity, storage efficiency and query performance, and ease of extensibility.
- Class Table Inheritance
- Serialized BLOB
- Inverted Indexing
Finally we'll show tools like pt-online-schema-change and new features of MySQL 5.6 that take the pain out of schema modifications.
Por Pedro Martins
Nesta sessão iremos abordar como identificar bottlenecks, a analisar planos de execução e a performance do SQL Server 2012. Iremos também comparar os diferentes tipos de índice e como eles podem ajudar a melhorar o desempenho do servidor. Finalmente, iremos ver alguns truques em stored procedures.
Agenda:
Planos de execução
Índices
Otimização de Stored Procedures
Find and fix SQL Server performance problems fasterSolarWinds
Great DBAs must be able to quickly identify problems with SQL Server instances. In this presentation, you will learn how to quickly identify where your problems are using tools such as:
*Dynamic Management Views
*Query Execution Plans
*Windows Performance Monitor
*Extended Events
*Third-party tools (including SolarWinds Database Performance Analyzer)
Dare to build vertical design with relational data (Entity-Attribute-Value)Ivo Andreev
Entity-Attribute-Value model is often called “anti-pattern” by the criticism. And probably they would be right if one misses to read the “Handle with Care” label on it. Enthusiastic inexperienced developers would easily compromise the benefits of relational DB but the coin has yet another side. Hierarchical object with thousands of properties, unknown schema, flexibility and millions of records. As always – we have to sacrifice one thing in order to win another. Then all it comes to priorities and ability for decision making. At this lecture you will not get a step-by-step manual but instead get ideas for how to build one for you. A challenge, a proof of concept, hard work and successful project for millions – that is the story to share.
Designing an extensible, flexible schema that supports user customization is a common requirement, but it's easy to paint yourself into a corner.
Examples of extensible database requirements:
- A database that allows users to declare new fields on demand.
- Or an e-commerce catalog with many products, each with distinct attributes.
- Or a content management platform that supports extensions for custom data.
The solutions we use to meet these requirements is overly complex and the performance is terrible. How should we find the right balance between schema and schemaless database design?
I'll briefly cover the disadvantages of Entity-Attribute-Value (EAV), a problematic design that's an example of the antipattern called the Inner-Platform Effect, That is, modeling an attribute-management system on top of the RDBMS architecture, which already provides attributes through columns, data types, and constraints.
Then we'll discuss the pros and cons of alternative data modeling patterns, with respect to developer productivity, data integrity, storage efficiency and query performance, and ease of extensibility.
- Class Table Inheritance
- Serialized BLOB
- Inverted Indexing
Finally we'll show tools like pt-online-schema-change and new features of MySQL 5.6 that take the pain out of schema modifications.
Por Pedro Martins
Nesta sessão iremos abordar como identificar bottlenecks, a analisar planos de execução e a performance do SQL Server 2012. Iremos também comparar os diferentes tipos de índice e como eles podem ajudar a melhorar o desempenho do servidor. Finalmente, iremos ver alguns truques em stored procedures.
Agenda:
Planos de execução
Índices
Otimização de Stored Procedures
Find and fix SQL Server performance problems fasterSolarWinds
Great DBAs must be able to quickly identify problems with SQL Server instances. In this presentation, you will learn how to quickly identify where your problems are using tools such as:
*Dynamic Management Views
*Query Execution Plans
*Windows Performance Monitor
*Extended Events
*Third-party tools (including SolarWinds Database Performance Analyzer)
This presentation features the fundamentals of SQL tunning like SQL Processing, Optimizer and Execution Plan, Accessing Tables, Performance Improvement Consideration Partition Technique. Presented by Alphalogic Inc : https://www.alphalogicinc.com/
Brad McGehee's presentation on "How to Interpret Query Execution Plans in SQL Server 2005/2008".
Presented to the San Francisco SQL Server User Group on March 11, 2009.
There are many data modeling and database design terms and jargon that uses the word "key." Do you know the difference between a surrogate key and a primary key? A super key and a candidate key? Could you explain them to a technical audience? A business user or an auditor?
In this presentation, Karen Lopez covers the concepts of primary keys, foreign keys, candidate key, surrogate keys, and more.
An tutorial for sql learners in very easy way. It contains all the sql commands like ddl, dml, etc. with suitable examples.
at the end there are 3 sets of question with their solution with explanation. each set contains 40+ questions.
I will begin with a brief overview of SQL. Then the five major topics a data scientist should understand when working with relational databases: basic statistics in SQL, data preparation in SQL, advanced filtering and data aggregation, window functions, and preparing data for use with analytics tools.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. AGENDA
About my company – ANS-ASIA, and me
About SQL Performance
Indexing
Rewrite query
SQLServer tools to improve performance.
3. ABOUT
100% subsidiary of ANS Japan
Address : 10F CMC Tower, Duy Tan Street
Foudation : November 2012
Employees : 30
Business function :Business function :
Software development
Enterprise system (100% Japanese customer up to now)
(Sales management system, Enterprise system for transportation
industry, Tuition management system for university...)
IT consulting
4. ABOUT ME
Name: Trinh Hong Chuong
Skill: 6 years experience in software development
(.Net (VB.Net, C#), T-SQL, PL-SQL, VBA)
Beginner in PHP, PostgreSQL
Interesting in: Reading book, listening music,
walking alone, travelling, ….walking alone, travelling, ….
5. SQL PERFORMANCE
Assess the problem and establish numeric values
that categorize acceptable behavior.
Measure the performance of the system before
modification.
Identify the part of the system that is critical for
improving the performance. This is called
the bottleneck.
improving the performance. This is called
the bottleneck.
Modify that part of the system to remove the
bottleneck.
Measure the performance of the system after
modification.
If the modification makes the performance better,
adopt it. If the modification makes the
performance worse, put it back the way it was.
6. WHAT’S INDEXING
Index is shortcuts to real data
Data type structure: B-Tree
Types of indexes: Clustered, Non-Clustered, XML
index, Fulltext index
7. WHY’S INDEXING
An index is used to speed up searching in the
database.
Indexes can be helpful for a variety of queries
that contain SELECT, UPDATE, DELETE, or
MERGE statements.
Less items in primary keyLess items in primary key
8. CLUSTERED INDEX
Clustered indexes sort and store the data rows in
the table or view based on their key values.
root
Id(from 1 to 4) Id(from 5 to 7)Id(from 1 to 4) Id(from 5 to 7)
Id 1
Name Bill
Dept Dev
Id 2
Name Jobs
Dept HR
Id 7
Name Gate
Dept R&D
9. NON-CLUSTERED INDEX
A nonclustered index contains the nonclustered
index key values and each key value entry has a
pointer to the data row that contains the key
value.
root
Name(from A to F)
Name Bill
Id 1
Name Gate
Id 7
Name Jobs
Id 2
Name(from G to M) Name(from N to Z)
10. IMPROVE INDEX
Create Highly-Selective Indexes
Indexing on columns used in the WHERE clause of
your critical queries frequently improves
performance.
Selectivity is the ratio of qualifying rows to total
rows. If the ratio is low, the index is highly selective.
Create Multiple-Column IndexesCreate Multiple-Column Indexes
11. REWRITE QUERY
Use a search argument (SARG)
SARG operators include =, >, <, >=, <=, IN,
BETWEEN, and sometimes LIKE (in cases of prefix
matching, such as LIKE ‘Bill%')
Non-SARG operators include NOT, <>, NOT EXISTS,
NOT IN, NOT LIKE, and intrinsic functions
12. REWRITE QUERY
Rewrite sub-query into JOIN
Bad sample Good sample
SELECT "Order ID" SELECT DISTINCT O."Order ID"
FROM Orders O
WHERE EXISTS (SELECT "Order ID"
FROM "Order Details"
OD
WHERE O."Order ID" =
OD."Order ID"
AND Discount >= 0.25)
FROM Orders O
INNER JOIN "Order Details" OD
ON
O."Order ID" = OD."Order ID"
WHERE Discount >= 0.25
13. REWRITE QUERY
Don’t use intrinsic functions, type conversion on index column
Bad sample Good sample
DECLARE @limitId = 10
SELECT Name FROM
Employees
DECLARE @limitId = 10
SELECT Name FROM
EmployeesEmployees
WHERE Id - 1 = @limitId
Employees
WHERE Id = @limitId + 1
15. REWRITE QUERY
Index the ORDER-BY / GROUP-BY
CREATE INDEX Emp_Name ON Employees ("Last Name" ASC, "First Name" ASC)
Can help optimize Will not help optimize
... ORDER BY / GROUP BY "Last
Name" ...
... ORDER BY / GROUP BY
"First Name" ...Name" ...
... ORDER BY / GROUP BY "Last
Name", "First Name" ...
"First Name" ...
... ORDER BY / GROUP BY
"First Name", "Last Name" ...
16. REWRITE QUERY
Index the DISTINCT
CREATE INDEX Emp_Name ON Employees ("Last Name" ASC, "First Name" ASC)
Can help optimize Will not help optimize
... DISTINCT "Last Name", "First
Name" ...
... DISTINCT "First Name" ...
... DISTINCT "Last Name" ...Name" ...
... DISTINCT "First Name", "Last
Name" ...
... DISTINCT "Last Name" ...
17. SQLServer tools to improve performance.
Execution plan
CREATE TABLE Employees
(
Id BIGINT NOT NULL,
Name VARCHAR(20) NOT NULL,
Dept VARCHAR(10),
CONSTRAINT [PK_Employee] PRIMARY KEY
CLUSTERED
(Id ASC)
)
CREATE TABLE Employees_Mid
(
Id BIGINT NOT NULL,
Name VARCHAR(20) NOT NULL,
Dept VARCHAR(10),
CONSTRAINT [PK_Employee_Mid] PRIMARY KEY
CLUSTERED
(Id ASC)
)
Query 01
INSERT INTO Employees(Id, Name, Dept)
SELECT Id, Name, Dept FROM Employees_Mid
WHERE Employees_Mid.Id = 1000
Query 02
INSERT INTO Employees(Id, Name, Dept)
SELECT Id, Name, Dept FROM Employees_Mid
WHERE Employees_Mid.Name = ‘A00001’