Performance by DesignGuy HarrisonDirector, R&D Melbournewww.guyharrison.net
Introductions
http://www.motivatedphotos.com/?id=17760
Not worrying, just wondering...How will Oracle deal respond to Hadoop?Will Oracle play in the NoSQL database world?What will happen to MySQL?What will happen to red-shirt TOAD?
Core messageDesign limits performanceArchitecture maps requirements to designMake sure performance requirements are specifiedMake sure architecture allows for performanceMake sure performance requirements are realized
Elements of Performance by Design
Methodology
High performance can mean different thingsSpeed: response time
Efficiency: power consumption
Power: throughput
Not usually easy to change architectures
Poorly defined requirements lead to this:
The fail whale
Twitter growth
“Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system.”
Patterns of database performanceHard to distinguish patterns at low levels
Database Design
Normalize, but not too far!
Other logical design thoughtsArtificial keysGenerally more efficient than long composite keysNull valuesNot a good idea if you intend to search for “unknown” or “incomplete” valuesNull should not mean somethingBut beneficial as long as you don’t need to look for them. Data typesConstraints on precision can sometimes reduce row lengthsVariable length strings usually betterCarefully consider CLOBs vs long VARCHARs
Logical to Physical: Subtypes“Customers are people too”
Indexing, clustering and weird table typesLots’ of options:B*-Tree indexBitmap indexHash clusterIndex ClusterNested tableIndex Organized TableMost often useful:B*-Tree (concatenated) indexesBitmap indexesHash Clusters
Concatenated index effectivenessSELECT cust_idFROM sh.customers cWHERE cust_first_name = 'Connor'AND cust_last_name = 'Bishop'AND cust_year_of_birth = 1976;
Concatenated indexing guidleinesCreate a concatenated index for columns from a table that appear together in the WHERE clause.If columns sometimes appear on their own in a WHERE clause, place them at the start of the index.The more selective a column is, the more useful it will be at the leading end of the index (better single key lookups)But indexes compress better when the leading columns are less selective. (better scans) Index skip scans can make use of an index even if the leading columns are not specified, but it’s a poor second choice to a “normal” index range scan.
Bitmap indexes
Bitmap indexes
Bitmap join performance SELECT SUM (amount_sold)FROM customers JOIN sales s USING (cust_id) WHERE cust_email='flint.jeffreys@company2.com';
Index overhead
Hash ClusterCluster key determines physical location on diskSingle IO lookup by cluster keyMisconfiguration leads to overflow or sparse tables SparseOverflow
Hash Cluster vs B-tree index
Hash cluster table scan
Denormalization and partitioningRepeating groups – VARRAYS, nested tablesSummary tables – Materialized Views, Result cacheHorizontal partitioning – Oracle Partition Option In-line aggregations – Dimensions Derived columns – Virtual columnsVertical partitioning Replicated columns - triggers
Summary tablesAggregate queries on big tables often the most expensivePre-computing them makes a lot of senseBalance accuracy with overhead Aggregate QueryMV on COMMITManual SummaryResult set cacheMV stale tolerated AccuracyEfficiency
Vertical partitioning
Physical storage optionsLOB StoragePCTFREECompression Block size Partitioning
Application Architecture and implementation
The best SQL is no SQL Avoid asking for the same data twice.
11g client side cache CLIENT_RESULT_CACHE_SIZE: this is the amount of memory each client program will dedicate to the cache.Use RESULT_CACHE hint or (11GR2) table propertyOptionally set the CLIENT_RESULT_CACHE_LAG
Parse overheadIt’s easy enough in most programming languages to create a unique SQL for every query:
Bind variables are preferred
Parse overhead reduction
Identifying similar SQLsSee force_matching.sql at www.guyharrison.net
Transaction design Optimistic vs. Pessimistic
Using ORA_ROWSCNSetting ROWDEPENDENCIES will reduce false fails
Network – stored procedures
Network traffic example
Array processing - Fetch
Network overhead – Array processing
Array Insert (Java)
Array Insert: (.NET)
Array Insert – PL/SQL
Array Insert Performance
Brockman Kwik-E-Mart, Ms Krabaple, Mrs. Hoover ,WaylanSmithers2)Who is C. Montgomery Burns' assistant?Answer3)Who is Bart's Teacher? Lisa's?Answer6)Kent ______ is the local newscaster.Answer7)____-_-____ is the local convenience store.Answer

Performance By Design

Editor's Notes

  • #4 I’m worried about the Toad in the red shirt – we all know that red-shirt crewmen die in Star Trek!
  • #6 So while I worry about the red-shirt TOAD, I’m not really worried about Oracle. Oracle remains a highly technically innovative company as well as a skilled in the business of software. I’ve certainly got no regrets specializing in Oracle technology all those years ago. Quest is a fairly diversified company and has no vested interest in Oracle per see. We aim to be a strategic partner across all of your technologies: Oracle, Microsoft, Vmware and in emerging technologies.
  • #7 TelsavsMasseratiLatency vs throughputVolume?Economics?
  • #8 MethodsRequirementsMeasurementPrototypeBenchmarkData modelOptimize physical model to queryDenormalizeIndexing, clustering, partitionApplicationMinimize database accessOptimize database access
  • #10 http://www.steveschmidtracing.com/our-team-5.html
  • #13 Generally not easy to change architectures.....
  • #21 Normal form is the best starting point for an efficient database design. However, don’t go overboard in eliminating redundancy. A normalized data model is one in which any data redundancy has been eliminatedand in which data and relationships are uniquely identifiable by primaryand foreign keys. Although the normalized data model is rarely the final destinationfrom a performance point of view, the normalized data model is almost alwaysthe best starting point. Indeed, failing to normalize your logical model isfrequently a recipe for a poorly performing physical model.Relational theory provides for various levels of normal form, some of whichare of academic interest only. Third normal form is the most commonly adoptednormalization level, and it has the following characteristics:❏ All data in an entity (table) is dependent on the primary key.❏ There should be no repeating groups of attributes (columns).❏ No data in an entity is dependent on only part of the key.❏ No data in an entity is dependent on any nonkey attribute.These characteristics are often remembered through the adage “the key, thewhole key, and nothing but the key.”
  • #23 Subtypes categorize or partition a logical entity and help to classify thetypes of information that is within the entity. A subtype usually has a set of attributesthat are held in common with the parent entity (the super-type) andother attributes that are not shared with the super-type or other subtypes. Figure4-1 shows how a PEOPLE entity could be split into subtypes of CUSTOMERSand EMPLOYEES.When translating entity subtypes into tables, we have the following options:❏ Create tables for the super-type and for each subtype. The super-type tablecontains only columns that are common to both subtypes.❏ Create a table for the super-type only. Attributes from all subtypes becomecolumns in this super-table. Typically, columns from subtype attributes willbe nullable, and a category column indicates the subtype in which a row belongs.❏ Create separate tables for each subtype without creating a table for thesuper-type. Attributes from the super-type are duplicated in each table.The three solutions result in very different performance outcomes. In particular,creating tables for the super-type and each subtype is likely to reduce performancein most circumstances, except where only the super-type is subject to afull table scan. Table 4-1 compares the performance of each of the three solutionsfor common database operations.
  • #50 Remember, you don’t always have control over the network – in particular client side code may sometimes be located anywhere
  • #51 The further the code is (in network terms) from the database, the more the network effects will magnify. You can’t get any closer to the database than being inde the database as PL/SQL