2. Overview database systems are designed to cope with tables in excess of hundreds of millions of rows over time, tables bloat – need to continually assess or index proactively different terminology between PostgreSQL and SQL Server – methods are fundamentally the same Indexes provide pointers to rows or ranges of data in a table FILLFACTOR means space to move rows around on the data pages
5. Finding the slow query nHibernate and others cause “problems” – only if tables and indexes aren’t updated to reflect what nHibernate will do In PostgreSQL, sometimes difficult to trap the query causing the problem - can set the log_min_duration in the log to trap long running queries In SQL, can use profiler in real time to catch the query Transactions – any open?
8. Not the worst… SELECT this_.id as id172_4_, this_.email as email172_4_, this_.postcode as postcode172_4_, this_.annual_mileage as annual4_172_4_, this_.created_at as created5_172_4_, this_.modified_at as modified6_172_4_, this_.vehicle_year as vehicle7_172_4_, this_.years_no_claims as years8_172_4_, this_.no_claims_protected as no9_172_4_, this_.vehicle_value as vehicle10_172_4_, this_.voluntary_excess as voluntary11_172_4_, this_.is_completed as is12_172_4_, this_.is_callcentre as is13_172_4_, this_.renewal_date as renewal14_172_4_, this_.vehicle_registration as vehicle15_172_4_, this_.policy_start_date as policy16_172_4_, this_.cap_id as cap17_172_4_, this_.cover_type_id as cover18_172_4_, this_.overnight_location_id as overnight19_172_4_, this_.vehicle_usage_id as vehicle20_172_4_, this_.access_point_id as access21_172_4_, this_.status_id as status22_172_4_, paymentdet3_.id as id164_0_, paymentdet3_.quote_id as quote2_164_0_, paymentdet3_.account_name as account3_164_0_, paymentdet3_.account_number as account4_164_0_, paymentdet3_.sort_code as sort5_164_0_, paymentdet3_.bank_name as bank6_164_0_, paymentdet3_.branch as branch164_0_, paymentdet3_.charge_percentage as charge8_164_0_, paymentdet3_.number_of_installments as number9_164_0_, paymentdet3_.start_date as start10_164_0_, paymentdet3_.renewal_date as renewal11_164_0_, paymentdet3_.bank_address_id as bank12_164_0_, paymentdet3_.loan_amount as loan13_164_0_, paymentdet3_.deposit as deposit164_0_, paymentdet3_.installment_amount as install15_164_0_, personalde4_.id as id167_1_, personalde4_.telephone as telephone167_1_, personalde4_.quote_id as quote3_167_1_, personalde4_.address_id as address4_167_1_, covernoten5_.id as id76_2_, covernoten5_.quote_id as quote2_76_2_, covernoten5_.campaign_id as campaign3_76_2_, covernoten5_.sequence_number as sequence4_76_2_, d1_.id as id86_3_, d1_.forename as forename86_3_, d1_.surname as surname86_3_, d1_.date_of_birth as date4_86_3_, d1_.is_female as is5_86_3_, d1_.length_of_licence as length6_86_3_, d1_.accidents_count as accidents7_86_3_, d1_.ordinal as ordinal86_3_, d1_.employers_business_id as employers9_86_3_, d1_.occupation_id as occupation10_86_3_, d1_.quote_id as quote11_86_3_, d1_.licence_type_id as licence12_86_3_, d1_.title_id as title13_86_3_ FROM quotes this_ left outer join payment_details paymentdet3_ on this_.id=paymentdet3_.quote_id left outer join personal_details personalde4_ on this_.id=personalde4_.quote_id left outer join covernote_numbers covernoten5_ on this_.id=covernoten5_.quote_id inner join drivers d1_ on this_.id=d1_.quote_id WHERE d1_.surname ilike'test%' and this_.id in ( SELECT distinct this_0_.id as y0_ FROM quotes this_0_ inner join campaign_quotes campaignqu1_ on this_0_.id=campaignqu1_.quote_id inner join campaigns campaign2_ on campaignqu1_.campaign_id=campaign2_.id WHERE this_0_.modified_at between '01/01/2010 00:00:00' and '15/01/2010 00:00:00' and (this_0_.status_id = 1 or campaignqu1_.selected_quote = True) and campaign2_.is_drive_away = True) ORDER BY this_.modified_atdesc
12. When indexes don’t work…. each database system uses what’s called an “optimiser” Factors influencing the optimiser’s choice of execution plan: statistics trivial plan match caching strategies available indexes Spread of data is important
13. Further Reading http://developer.postgresql.org/pgdocs/postgres/indexes-examine.html http://wiki.postgresql.org/wiki/Image:Explaining_EXPLAIN.pdf http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/ http://www.vbforums.com/showthread.php?t=361513 – when to use index hints in sql server http://www.mssqltips.com/tip.asp?tip=1206 – understanding sql server indexing
Editor's Notes
Hopefully introduce some concepts around index performance and tuning.
Many tables can over 100 million without performance issues – there are mechanisms to split the table up over different disks (partitioning) but this is for VLDBs (yes, Very Large Databases)By planning the database and queries effectively, indexes (with frequent stats updates) will stand up for themselvesBe aware that the terminology between SQL Server and PostgreSQL differs even though they both adhere to SQL:2008 “as far as possible”A clustered index is akin to the entries of a dictionary, whereas a non clustered index is the index in the back of the book – one clustered index per table since the data is physically sorted by the clustered index on disk, SQL 2008 : up to 999/ PostgreSQL 8.3: Fill factor saves a % of each data or index page for data movements operations – heavily updated indexes may need a fill factor of 20%, static indexes can get away with 0%
Transact SQL versus PostgreSQL Structured Query Language (based on Oracle PL/SQL)Pgsql conforms as far as possible to SQL:2008 – “PostgreSQL development aims for conformance with the latest official version of the standard where such conformance does not contradict traditional features or common sense”Different non clustered index creation statements – index naming convention is convention at Codeweavers nowCreating a primary key on a table in SQL creates a clustered index; in postgresql an index - primary key constraint is simply a combination of a unique constraint and a not-null constraint – only one per table though in both systemsFILLFACTOR - Fill Factor of 80 means leave 20% of an index page empty for updates and inserts to the index so minimal reshuffling of existing data needs to happen as new records are added or indexed fields are updated in the system
Profiler can capture all traffic in real time; at time of writing, PostgreSQL v9.0 offers limited functionalityBoth error logs can show up log running transactions, both need to have various settings turned on at boot timeBoth systems can show current activity on the server and the resources needed In PostgreSQL, due to multi version concurrency control, VACUUM is a necessary evil – ensures that dead rows are cleaned up and stats updated with ANALYZEStatistics are very important – show the spread of data. Often these are checked before looking at the underlying tables. Tables with large updates/deletes need to have their stats updatedEXPLAIN with ANALYZE shows the actual “time” – if these are very different from cost, this is an indication that the stats are out
“hidden” query generation sometimes makes it harder (read slower) to extract slow or long running queriesPG: parse the log for long running queriesSQL: run profiler to see where the long queries are – based on duration (ms) columnOpen transactions may be causing locking – so your query might not be slow, but blocked
Reads left to right
Reads right to left
Example nhibernate query for YCMI2 – QuoteSeach functionality
Reads left to rightThick lines early on, thinning to thin lines later on indicate incorrect index usageTable scans are badIndex scans are slightly betterIndex seeks are what we’re after
Log into VM PG serverShow server log from PGAdmin – make point about needing PGAdmin 1.10.3 – always use up to date versionShow postgresql.conf values – show min_log_duration as 500 msFollow the SQL Script now
Using the same script will show the SQL Server differencesFirst show the table future_residuals and the auto growth settingsShow when BCPing in there are little pauses as the log file is expanded
Responsible for interpreting the query and deciding on the best course of action:Statistics: spread of the data in the tablesThis means checking the query to see if it’s “simple” so that an in depth plan is not requiredExecution plan cachingRebuild and reindexingAn index or covering index may cover the columns you’re interested in, but if the number of rows coming back is a large percentage of the total row count, the optimiser may decide it’s quicker to table scan