SQLPASS AD404-M Spatial Index MRys


Published on

SQLPASS 2011 Presentation on Spatial Indexing in SQL Server 2008, 2008R2 and what is new in SQL Server 2012.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • ADD USING Syntax to show new tesselation scheme
  • Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
  • Clustering imposes ordering on index
  • Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
  • TBD
  • ADD Tesselation
  • Experimentation: For instance, consider this dataset: US Highways.  In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024
  • SQLPASS AD404-M Spatial Index MRys

    1. 1. Performance Tuning of SpatialQueries in SQL ServerDeep Dive into Spatial IndexingMichael Rys (@SQLServerMike)Principal Program ManagerMicrosoft Corp. October 11-14, Seattle, WA
    2. 2. DEMOA spatial query…… October 11-14, Seattle, WA
    3. 3. Q: Why is my Query so Slow?A: Usually because the index isn’t being used.Q: How do I tell?A: SELECT * FROM T WHERE g.STIntersects(@x) = 1 AD404-M| Spatial Performance 3
    4. 4. Hinting the IndexSpatial indexes can be forced if needed.SELECT *FROM T WITH(INDEX(T_g_idx))WHERE g.STIntersects(@x) = 1Use SQL Server 2008 SP1 or 2008 R2! AD404-M| Spatial Performance 4
    5. 5. But Why Isnt My Index Used?Plan choice is cost-based• QO uses various information, including cardinality EXEC sp_executesql SELECT *@x geometry = POINT (0 0) DECLARE FROM T SELECT *NSELECT * WHERE FROM T FROM TT.g.STIntersects(POINT (0 0)) = 1 WHERE WHERE T.g.STIntersects(@x) = 1, T.g.STIntersects(@x) = 1 N@x geometry, NPOINT (0 0)When can we estimate cardinality?• Variables: never• Literals: not for spatial since they are not literals under the covers• Parameters: yes, but cached, so first call matters AD404-M| Spatial Performance 5
    6. 6. Spatial Indexing Basics C D A B B D A B A Primary Filter Secondary Filter E (Index lookup) (Original predicate)In general, split predicates in two• Primary filter finds all candidates, possibly with false positives (but never false negatives)• Secondary filter removes false positivesThe index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme• Sometimes possible to skip secondary filter AD404-M| Spatial Performance 6
    7. 7. Using B+-Trees for Spatial IndexSQL Server has B+-TreesSpatial indexing is usually done through otherstructures• Quad tree, R-TreeChallenge: How do we repurpose the B+-Treeto handle spatial queries?• Add a level of indirection! AD404-M| Spatial Performance 7
    8. 8. Mapping to the B+-TreeB+-Trees handle linearly ordered sets wellWe need to somehow linearly order 2D space• Either the plane or the globeWe want a locality-preserving mapping fromthe original space to the line• i.e., close objects should be close in the index• Can’t be done, but we can approximate it AD404-M| Spatial Performance 8
    9. 9. SQL Server Spatial Indexing StoryPlanar Index Geographic Index• Requires bounding box • No bounding box• Only one grid • Two top-level projection grids Secondary Filter Indexing Filter Primary Phase 1 2 15 16 1. 4 3 14 13 5 8 9 12 3. 6 7 10 11 2. 5. 4. Apply actual CLR method 3. Intersecting for spatial 2. Identify a grid on the 1. Overlay gridsgrids identifies on query candidates to object(s) object to store in index spatial object find matches AD404-M| Spatial Performance 9
    10. 10. SQL Server Spatial Indexing StoryMulti-Level Grid• Much more flexible than a simple grid• Hilbert numbering• Modified adaptable QuadTreeGrid index features• 4 levels• Customizable grid subdivisions• Customizable maximum number of cells per object (default 16)• NEW IN SQL Server Codename “DENALI”: New Default tessellation with 8 levels of cell nesting AD404-M| Spatial Performance 10
    11. 11. Multi-Level Grid /4/2/3/1 /(“cell 0”)Deepest-cell Optimization: Only keep the lowest level cell in indexCovering Optimization: Only record higher level cells when all lowercells are completely covered by the objectCell-per-object Optimization: User restricts max number of cells per object Performance AD404-M| Spatial 11
    12. 12. Implementation of the Index Persist a table-valued function • Internally rewrite queries Spatialencoding IDcovers cellor 2) Varbinary(5) Reference table to use the 0 – cell at least touches the object (but not 1 1 – guarantee that object partially 15 columns and 2 – object limitation be the same to of gridHaveid cell to 895 byte covers cell produce match Prim_key geography Prim_key cell_id srid cell_attr 1 0x00007 42 0 1 g1 3 0x00007 42 1 2 g2 3 0x0000A 42 2 3 g3 3 0x0000B 42 0 3 0x0000C 42 1 Base Table T 1 0x0000D 42 0 2 0x00014 42 1CREATE SPATIAL INDEX sixd Internal Table for sixdON T(geography) AD404-M| Spatial Performance 12
    13. 13. New AUTO GRID Index• NEW IN SQL Server Codename “DENALI”• Has 8 levels of cell nesting• No manual grid density selection: • Fixed at HLLLLLLL• default number of cells per object: • 8 for geometry • 12 for geography• More stable performance • for windows of different size • for data with different spatial density• For default values: • Up to 2x faster for longer queries > 500 ms • More efficient primary filter • Fewer rows returned • 10ms slower for very fast queries < 50 ms • Increased tessellation time which is constant AD404-M| Spatial Performance 13
    14. 14. Spatial Index PerformanceNew grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks AD404-M| Spatial Performance 14
    15. 15. Index Creation and Maintenance Create index example GEOMETRY: CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) WITH ( BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) Create index example GEOGRAPHY: CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column) USING GEOGRAPHY_GRID WITH ( GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) NEW IN SQL Server “DENALI” (equivalent to default creation): CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) USING GEOGRAPHY_AUTO_GRID WITH (CELLS_PER_OBJECT = 20)15 Use ALTER and DROP INDEX for maintenance.
    16. 16. DEMOIndexing and Performance October 11-14, Seattle, WA
    17. 17. Spatial Methods supported by IndexGeometry: Geography:• STIntersects() = 1 • STIntersects() = 1• STOverlaps() = 1 • STOverlaps() = 1• STEquals()= 1 • STEquals()= 1 • STWithin() = 1• STTouches() = 1 • STContains() = 1• STWithin() = 1 • STDistance() < val• STContains() = 1 • STDistance() <= val• STDistance() < val • Nearest Neighbor• STDistance() <= val • Filter() = 1• Nearest Neighbor New in Denali• Filter() = 1 AD404-M| Spatial Performance 17
    18. 18. How Costing is Done• The stats on the index contain a trie constructed on the string form of the packed binary(5) typed Cell ID.• When a window query is compiled with a sniffable window object, the tessellation function on the window object is run at compile time. The results are used to construct a trie for use during compilation. • May lead to wrong compilation for later objects• No costing on: • Local variables, constants, results of expressions• Use different indices and different stored procs to account for different query characteristics AD404-M| Spatial Performance 18
    19. 19. Understanding the Index Query Plan AD404-M| Spatial Performance 19
    20. 20. Seeking into a Spatial IndexMinimize I/O and random I/OIntuition: small windows should touch small portions of the indexA cell 7.2.4 matches• Itself• Ancestors• Descendants 7 7.2 7.2.4 Spatial Index S AD404-M| Spatial Performance 20
    21. 21. Understanding the Index Query Plan Remove dup T(@g) Optional Sort ranges Ranges Spatial Index Seek AD404-M| Spatial Performance 21
    22. 22. Other Query Processing Support• Index intersection • Enables efficient mixing of spatial and non-spatial predicates• Matching • New in SQL Server “Denali”: Nearest Neighbor query • Distance queries: convert to STIntersects • Commutativity: a.STIntersects(b) = b.STIntersects(a) • Dual: a.STContains(b) = b.STWithin(a) • Multiple spatial indexes on the same column • Various bounding boxes, granularities • Outer references as window objects • Enables spatial join to use one index AD404-M| Spatial Performance 22
    23. 23. Other Spatial Performance Improvementsin SQL Server Codename “Denali”• Spatial index build time for point data can be as much as four to five times faster• Optimized spatial query plan for STDistance and STIntersects like queries• Faster point data queries• Optimized STBuffer, lower memory footprint AD404-M| Spatial Performance 23
    24. 24. Spatial Nearest Neighbor (Denali)Main scenario • Give me the closest 5 Italian restaurantsExecution plan • SQL Server 2008/2008 R2: table scan • SQL Server Codename “Denali”: uses spatial indexSpecific query pattern required• SELECT TOP(5) * FROM Restaurants r WHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULL ORDER BY r.pos.STDistance(@me) AD404-M| Spatial Performance 24
    25. 25. DEMONearest Neighbor performance October 11-14, Seattle, WA
    26. 26. Nearest Neighbor PerformanceFind the closest 50 business points (22 million in total)NN query vs best current workaround (sort all points in 10km radius)*Average time for NN query is ~236ms AD404-M| Spatial Performance 26
    27. 27. Limitations of Spatial Plan Selection• Off whenever window object is not a parameter: • Spatial join (window is an outer reference) • Local variable, string constant, or complex expression• Has the classic SQL Server parameter- sensitivity problem • SQL compiles once for one parameter value and reuses the plan for all parameter values • Different plans for different sizes of window require application logic to bucketize the windows AD404-M| Spatial Performance 27
    28. 28. Index Support• Can be built in parallel• Can be hinted• File groups/Partitioning • Aligned to base table or Separate file group • Full rebuild only• New catalog views, DDL Events• DBCC Checks• Supportability stored procedures• New in SQL Server “Denali”: Index Page and Row Compression • Ca. 50% smaller indices, 0-15% slower queries• Not supported • Online rebuild • Database Tuning advisor AD404-M| Spatial Performance 28
    30. 30. Index HintingFROM T WITH (INDEX (<Spatial_idxname>))• Spatial index is treated the same way a non-clustered index is • the order of the hint is reflected in the order of the indexes in the plan • multiple index hints are concatenated • no duplicates are allowed• The following restrictions exist: • The spatial index must be either first in the first index hint or last in the last index hint for a given table. • Only one spatial index can be specified in any index hint for a given table. AD404-M| Spatial Performance 30
    31. 31. Query Window Hinting (Denali)SELECT * FROM table twith(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for geography)• Rule of thumb: • Higher value makes primary filter phase longer but reduces work in secondary filter phase • Set higher for dense spatial data • Set lower for sparse spatial data AD404-M| Spatial Performance 31
    32. 32. DEMOQuery hinting October 11-14, Seattle, WA
    33. 33. Spatial Catalog Views• sys.spatial_indexes catalog view• sys.spatial_index_tessellations catalog view• Entries in sys.indexes for a spatial index: • A clustered index on the internal table of the spatial index • A spatial index (type = 4) for spatial index• An entry in sys.internal_tables• An entry to sys.index_columns AD404-M| Spatial Performance 35
    34. 34. New Spatial Histogram Helpers (Denali) sp_spatial_help_geometry_histogram sp_spatial_help_geography_histogram Used for spatial data and index analysisHistogram of 22 million business points over USLeft: SSMS view of a histogramRight: Custom drawing on top of Bing Maps AD404-M| Spatial Performance 38
    35. 35. Indexing Support Proceduressys.sp_help_spatial_geometry_indexsys.sp_help_spatial_geometry_index_xmlsys.sp_help_spatial_geography_indexsys.sp_help_spatial_geography_index_xmlProvide information about index:64 properties10 of which are considered core AD404-M| Spatial Performance 39
    36. 36. sys.sp_help_spatial_geometry_index Arguments Parameter Type Description @tabname nvarchar(776) the name of the table for which the index has been specified @indexname sysname the index name to be investigated @verboseoutput tinyint 0 core set of properties is reported 1 all properties are being reported @query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window. Results in property name/value pair table of the format: PropName: nvarchar(256) PropValue: sql_variant AD404-M| Spatial Performance 40
    37. 37. Some of the returned PropertiesProperty Type DescriptionNumber_Of_Rows_Selected_By_ bigint Core P = Number of rows selected by thePrimary_Filter primary filter.Number_Of_Rows_Selected_By_ bigint Core S = Number of rows selected by theInternal_Filter internal filter. For these rows, the secondary filter is not called.Number_Of_Times_Secondary_Fi bigint Core Number of times the secondary filter islter_Is_Called called.Percentage_Of_Rows_NotSelecte float Core Suppose there are N rows in the base table,d_By_Primary_Filter suppose P are selected by the primary filter. This is (N-P)/N as percentage.Percentage_Of_Primary_Filter_R float Core This is S/P as a percentage. The higher theows_Selected_By_Internal_Filter percentage, the better is the index in avoiding the more expensive secondary filter.Number_Of_Rows_Output bigint Core O=Number of rows output by the query.Internal_Filter_Efficiency float Core This is S/O as a percentage.Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter. AD404-M| Spatial Performance 43
    38. 38. DEMOIndexing Supportability October 11-14, Seattle, WA
    39. 39. Spatial Tips on index settingsSome best practice recommendations (YMMV):• Start out with new default tesselation• Point data: always use HIGH for all 4 level. CELL_PER_OBJECT are not relevant in the case.• Simple, relatively consistent polygons: set all levels to LOW or MEDIUM, MEDIUM, LOW, LOW• Very complex LineString or Polygon instances: • High number of CELL_PER_OBJECT (often 8192 is best) • Setting all 4 levels to HIGH may be beneficial• Polygons or line strings which have highly variable sizes: experimentation is needed.• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM AD404-M| Spatial Performance 45
    40. 40. What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or “Denali”• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non- spatial predicate• Run above index support procedure: • Assess effectiveness of primary filter (Primary_Filter_Efficiency) • Assess effectiveness of internal filter (Internal_Filter_Efficiency) • Redefine or define a new index with better characteristics • More appropriate bounding box for GEOMETRY • Better grid densities AD404-M| Spatial Performance 46
    41. 41. Related ContentWeblog• http://blogs.msdn.com/isaac• http://blogs.msdn.com/edkatibah• http://johanneskebeck.spaces.live.com/• http://sqlblog.com/blogs/michael_rys/Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1Whitepapers, Websites & Code• Denali CTP3: http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/08/08/new-spatial- features-in-sql-server-code-named-denali-community-technology-preview-3.aspx• Spatial Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4136.aspx• SQL Server 2008 Spatial Site: http://www.microsoft.com/sqlserver/2008/en/us/spatial- data.aspx• SQL Spatial Codeplex: http://www.codeplex.com/sqlspatialtools• http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx• http://www.codeplex.com/ProjNET• http://www.geoquery2008.com/• SIGMOD 2008 Paper: Spatial Indexing in Microsoft SQL Server 2008• And of course Books Online! AD404-M| Spatial Performance 47
    42. 42. Complete the Evaluation Formto Win!Win a Dell Mini Netbook – every day – just forsubmitting your completed form. Each sessionevaluation form represents a chance to win.Pick up your evaluation form:• In each presentation room Sponsored by Dell• Online on the PASS Summit websiteDrop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website AD404-M| Spatial Performance 48
    43. 43. Thank youfor attending this session and the2011 PASS Summit in Seattle October 11-14, Seattle, WA
    44. 44. Microsoft SQL Microsoft Expert Pods Hands-on Labs Server Clinic Product Pavilion Meet Microsoft SQL Server Engineering Work through your Talk with Microsoft SQL Get experienced through team members &technical issues with SQL Server & BI experts to self-paced & instructor- SQL MVPs Server CSS & get learn about the next led labs on our cloud architectural guidance version of SQL Server based lab platform - from SQLCAT and check out the new bring your laptop or use Database Consolidation HP provided hardware Appliance Room 611 Expo Hall 6th Floor Lobby Room 618-620 AD404-M| Spatial Performance 50