Victor shares developer-focused database agnostic performance tips from his experiences on consulting on a database migration project for a major telecom.
2. Me
ThoughtWorks Consultant
Supported Database Customer Migration Project at Shaw
Cable (Major Canadian telecom) for ~2 years
OOP background
Limited (typical?) SQL experience prior to Shaw
Today sharing .. lessons learned from project
3. Customer Migration @ Shaw in a Nutshell
Large DataVolume
Tables with 10 million+ rows
80 gigs+ of data per city
Full build (migration) took ~6hrs
MSSQL, MySql, Oracle, proprietary databases
and flat files
4. Only testing with small datasets, but using large ones in
production
.. expecting same performance
Database Development Misconceptions
5. Reality
Everything changes when large data is involved
Ideally, performance test against comparable real world
volume
80/20 rule
6. Database Development Misconceptions
ORMs will “automagically” take care of everything
var sessionFactory = Fluently.Configure()
.Mappings(m => m.AutoMappings
.Add(AutoMap.AssemblyOf<Product>()))
.BuildSessionFactory();
7. Reality
ORMs are great for small data Greenfield projects
ORMs hide optimization abilities, and database specific
features
Performance tweaks become increasingly more
important the larger the data you are dealing with
8. Reduce Row Size
Use smallest possible datatype
(Small Int instead of Big Int, etc)
Use Integer instead of GUID for keys
Use fixed length if possible
(Char instead of Varchar)
Prefer Non-Null instead of Nullable columns
10. Null != Null in SQL
SELECT ‘SOME RESULT’ WHERE NULL=NULL
No ROWS RETURNED
11. Empty String != Empty String in Oracle
SELECT ‘SOME RESULT’ WHERE ‘’=’’
No ROWS RETURNED
12. Avoid N+1 Problem
for each(SELECT * FROM EMPLOYEES)
for each(SELECT * FROM SALES WHERE SALES.EMP_ID = employee.EMP_ID
//do something with sale
for each(SELECT * FROM SALES JOIN EMPLOYEES USING (EMP_ID)
//do something with sale
13. Give database as much work as possible
Reduce SQL database calls / network roundtrips
Defer query decisions to database, let it choose
optimum evaluation plan
Prefer SQL code to procedural logic (loops, cursors,
separated calls)
14. Use BULK operations
Sqlloader.exe (Oracle), Bcp.exe (MSSQL) , Bulk Insert
FAST insertion of static data
Cleaner code (CSV files instead of Insert Into’s)
~10x performance gains observed
15. Add Indexes where needed
Index Analogy - Index at end of book
Indexes SIGNIFICANTLY speeds up searching
(using ‘WHERE’ criteria)
ORMs don’t add indexes for non-key columns
Determine common searches
~100x performance gains observed
16. Gotcha - Indexes ignored with function usage
Example:
...WHERE UPPER(name) = ‘BOB’
‘name’ index will not be used!!
ORMs sometimes insert lower/upper behind the
scenes!!
18. Statistics
Databases use table statistics (row counts, data range,
data distribution etc) to determine optimum query
evaluation
Statistics are not automatically updated when data is
changed/inserted!! (as opposed to indexes)
19. Manually update Statistics on large data changes
Out-of-date statistics can cause database to make
inefficient query evaluation decisions
#1 performance optimization at Shaw
~100x performance gains observed in some situations
20. Other Performance Options
Disable/Remove Constraints
Remove/Minimize Row-based Trigger Usage
Disable Logging
~1.5x performance gains observed
Good choice for test environments
(Be careful!!!)
Use SSD’s
~2x performance gains observed
Use RAM
(Be careful!!!)
21. SQL is not Dead
NoSQL is an alternative, not replacement for SQL/RDMS
22. SQL / RDBMS best for:
Row based data
Static table definitions
Complex Table Relationships / Joins
Transactions