Optimizing array-based data structures to the limitRoman Leventov
Comparison of different approaches of arrays indexing, enconding discrete states, memory layout in terms of performance. Is useful for implementing array-based data structures and algorithms in Java.
A high level introduction to R statistical programming language that was presented at the Chicago Data Visualization Group's Graphing in R and ggplot2 workshop on October 8, 2012.
Spatial Data is very important for the new applications, related with Data Visualization and BI. Microsoft Azure offers possibility to use advantages of spatial data suing cloud computing. In this lecture will talk about the use of spatial data in the Microsoft Azure - loading data from Windows Azure SQL Database Spatial, optimizing Windows Azure applications and their use of different types of customers: WEB based, WPF, WP. We will learn how to import spatial data in different formats in Microsoft Azure SQL Database Spatial and will create a several demo applications, that use this data. We will also discuss the specifics, when you need to create and deploy claus applications like Azure Web Sites, Azure Cloud Services using spatial data.
Optimizing array-based data structures to the limitRoman Leventov
Comparison of different approaches of arrays indexing, enconding discrete states, memory layout in terms of performance. Is useful for implementing array-based data structures and algorithms in Java.
A high level introduction to R statistical programming language that was presented at the Chicago Data Visualization Group's Graphing in R and ggplot2 workshop on October 8, 2012.
Spatial Data is very important for the new applications, related with Data Visualization and BI. Microsoft Azure offers possibility to use advantages of spatial data suing cloud computing. In this lecture will talk about the use of spatial data in the Microsoft Azure - loading data from Windows Azure SQL Database Spatial, optimizing Windows Azure applications and their use of different types of customers: WEB based, WPF, WP. We will learn how to import spatial data in different formats in Microsoft Azure SQL Database Spatial and will create a several demo applications, that use this data. We will also discuss the specifics, when you need to create and deploy claus applications like Azure Web Sites, Azure Cloud Services using spatial data.
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLtorp42
The DE-9IM matrix is the foundation for understanding how spatial relationships are implemented in DBMSs like PostgreSQL, Oracle, and Microsoft SQL Server. This presentation makes a structure walk-through of most of the cases using a very large number of examples.
Introduction to the problem of spatial indexing for spherical coordinate systems, as those used in astronomy. Part of the virtual observatory course by Juan de Dios Santander Vela, as imparted for the MTAF (Métodos y Técnicas Avanzadas en Física, Advanced Methods and Techniques in Physics) Master at the University of Granada (UGR).
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/nvY550A5gH1
In this session, you will learn how to create, procure and leverage spatial data. You will be made aware of tools you can use to integrate your own spatial data with a variety of public data sources such as the Census Bureau, National Weather Service, etc. Functions and T-SQL commands related to spatial data analysis will be demonstrated. We will end the session by using the geometry data type to actually mimic a bitmapped picture using SQL (that's the fun part!).
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
LocationPowers OGC BigGeoData 2016
This presentation will discuss tools in the open source landscape that are used to handle big geospatial data. In particular, we will focus on how Apache frameworks such as Spark and Accumulo are "geospatially enabled" by four projects: GeoTrellis, GeoWave, GeoMesa, and GeoJinni. These four projects all participate in LocationTech, a working group under the Eclipse Foundation. In particular, we will discuss how each of these LocationTech technologies implement spatial indexing (e.g. by using space filling curves) in order to provide quick access to data, and other common themes among the four projects. Attendees should walk away from this presentation understanding important parts of the Apache big data ecosystem, a set of LocationTech projects that belong to the cutting edge of enabling those Apache project's handling of geospatial data, as well as some solutions to common problems when dealing with large geospatial data.
Presentation given at the 2013 Clojure Conj on core.matrix, a library that brings muli-dimensional array and matrix programming capabilities to Clojure
If you've tried Apache Solr 1.4, you've probably had a chance to take it for a spin indexing and searching your data, and getting acquainted with its powerful, versatile new features and functions. Now, it's time to roll up your sleeves and really master what Solr 1.4 has to offer.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
This poster was presented at the 21st International Conference on Extending Database Technology (EDBT), March 26-29, 2018.
Paper: Efficient SIMD Vectorization for Hashing in OpenCL
Abstract: Hashing is at the core of many efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-level APIs in which functions map to processorspecific SIMD instructions. Intrinsics are specific to a processor architecture and result in complex and difficult to maintain code. OpenCL is a parallel programming framework which provides a higher abstraction level than intrinsics and is portable to different processors. Thus, OpenCL avoids processor dependencies, which results in improved code maintainability. In this paper, we add efficient, vectorized hashing primitives to OpenCL. Our results show that OpenCL-based vectorization is competitive to intrinsics on CPUs but not on Xeon Phi coprocessors.
Similar to SQLBits X SQL Server 2012 Spatial Indexing (20)
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
SQLBits 2020 presentation on how you can build solutions based on the modern data warehouse pattern with Azure Synapse Spark and SQL including demos of Azure Synapse.
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
More and more customers who are looking to modernize analytics needs are exploring the data lake approach in Azure. Typically, they are most challenged by a bewildering array of poorly integrated technologies and a variety of data formats, data types not all of which are conveniently handled by existing ETL technologies. In this session, we’ll explore the basic shape of a modern ETL pipeline through the lens of Azure Data Lake. We will explore how this pipeline can scale from one to thousands of nodes at a moment’s notice to respond to business needs, how its extensibility model allows pipelines to simultaneously integrate procedural code written in .NET languages or even Python and R, how that same extensibility model allows pipelines to deal with a variety of formats such as CSV, XML, JSON, Images, or any enterprise-specific document format, and finally explore how the next generation of ETL scenarios are enabled though the integration of Intelligence in the data layer in the form of built-in Cognitive capabilities.
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
When processing TB and PB of data, running your Big Data queries at scale and having them perform at peak is essential. In this session, we show you some state-of-the art tools on how to analyze U-SQL job performances and we discuss in-depth best practices on designing your data layout both for files and tables and writing performing and scalable queries using U-SQL. You will learn how to analyze performance and scale bottlenecks and will learn several tips on how to make your big data processing scripts both faster and scale better.
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
Big data processing increasingly needs to address not just querying big data but needs to apply domain specific algorithms to large amounts of data at scale. This ranges from developing and applying machine learning models to custom, domain specific processing of images, texts, etc. Often the domain experts and programmers have a favorite language that they use to implement their algorithms such as Python, R, C#, etc. Microsoft Azure Data Lake Analytics service is making it easy for customers to bring their domain expertise and their favorite languages to address their big data processing needs. In this session, I will showcase how you can bring your Python, R, and .NET code and apply it at scale using U-SQL.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake.
During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables.
This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
Presentation from Microsoft Data Science Summit 2016
Presents 4 examples of custom U-SQL data processing: Overlapping Range Aggregation, JSON Processing, Image Processing and R with U-SQL
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
SQLBits X SQL Server 2012 Spatial Indexing
1.
2. : How do I tell?
A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
3. SELECT *
FROM T WITH(INDEX(T_g_idx))
WHERE g.STIntersects(@x) = 1
4. Plan choice is cost-based
QO uses various information, including cardinality
EXEC sp_executesql
SELECT
DECLARE*@x geometry = 'POINT (0 0)'
SELECT N'SELECT *
FROM T *
WHERE FROM T
FROM TT.g.STIntersects('POINT (0 0)') = 1
WHERE T.g.STIntersects(@x) = 1',
WHERE T.g.STIntersects(@x) = 1
N'@x geometry', N'POINT (0 0)'
When can we estimate cardinality?
Variables: never
Literals: not for spatial since they are not literals under the covers
Parameters: yes, but cached, so first call matters
5. C
B D A B A B
D A
Primary Filter Secondary Filter
E (Index lookup) (Original predicate)
In general, split predicates in two
Primary filter finds all candidates, possibly
with false positives (but never false negatives)
Secondary filter removes false positives
The index provides our primary filter
Original predicate is our secondary filter
Some tweaks to this scheme
Sometimes possible to skip secondary filter
6.
7.
8. Secondary Filter
IndexingFilter
Primary Phase
1 2 15 16
1.
4 3 14 13
5 8 9 12
3.
6 7 10 11
2.
5.
4. Apply actual on the spatial
3. Intersecting grids method
2. Identify a gridfor query on
1. Overlay gridsCLR spatialobject(s)
identifies
candidates to find matches
object to store in index
9.
10. /4/2/3/1
/
(“cell 0”)
Deepest-cell Optimization: Only keep the lowest level cell in index
Covering Optimization: Only record higher level cells when all lower
cells are completely covered by the object
Cell-per-object Optimization: User restricts max number of cells per object
11. 0 – cell at least touches the object (but not 1 or 2)
Spatial Reference ID
1 – guarantee thatto be encoding
Varbinary(5) the same to produce
Have object partially covers cell
15 columns and 2 – object covers cell id
895 byte limitation
of grid cell
match
Prim_key geography Prim_key cell_id srid cell_attr
1 0x00007 42 0
1 g1
3 0x00007 42 1
2 g2 3 0x0000A 42 2
3 g3 3 0x0000B 42 0
3 0x0000C 42 1
Base Table T 1 0x0000D 42 0
2 0x00014 42 1
CREATE SPATIAL INDEX sixd Internal Table for sixd
ON T(geography)
12.
13.
14. Create index example GEOMETRY:
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)
WITH (BOUNDING_BOX = (0, 0, 500, 500),
GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT = 20)
Create index example GEOGRAPHY:
CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column)
USING GEOGRAPHY_GRID
WITH (GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT = 20)
NEW IN SQL Server 2012 (equivalent to default creation):
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)
USING GEOGRAPHY_AUTO_GRID
WITH (CELLS_PER_OBJECT = 20)
14 Use ALTER and DROP INDEX for maintenance.
22. Optimal value (theoretical) is
somewhere between two extremes
Default values: Time needed to
512 - Geometry AUTO grid process false positives
768 - Geography AUTO grid
1024 - SELECT * FROM table t WITH
MANUAL grids (SPATIAL_WINDOW_MAX_CELLS=256)
WHERE t.geom.STIntersects(@window)=1;
23.
24.
25.
26.
27.
28. Give me the closest 5 Italian restaurants
SQL Server 2008/2008 R2: table scan
SQL Server 2012: uses spatial index
SELECT TOP(5) *
FROM Restaurants r
WHERE r.type = ‘Italian’
AND r.pos.STDistance(@me) IS NOT NULL
ORDER BY r.pos.STDistance(@me)
29.
30. Find the closest 50 business points to a specific location (out of 22 million in total)
39. Arguments
Parameter Type Description
@tabname nvarchar(776) the name of the table for which the index has been specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported
1 all properties are being reported
@query_sample geometry A representative query sample that will be used to test the
usefulness of the index. It may be a representative object or a
query window.
PropName: nvarchar(256) PropValue: sql_variant
40. Parameter Type Description
@tabname nvarchar(776) the name of the table for which the index has been
specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported
1 all properties are being reported
@query_sample geography A representative query sample that will be used to
test the usefulness of the index. It may be a
representative object or a query window.
@xml_output xml This is an output parameter that contains the
returned properties in an XML fragment
41. Property Type Description
Base_Table_Rows Bigint All Number of rows in the base table
Index properties - All index properties: bounding box, grid densities, cell per object
Total_Primary_Index_R Bigint All Number of rows in the index
ows
Total_Primary_Index_P Bigint All Number of pages in the index
ages
Total_Number_Of_Obje Bigint Core Indicates whether the representative query sample falls outside of the
ctCells_In_Level0_For_ bounding box of the geometry index and into the root cell (level 0 cell). This is
QuerySample either 0 (not in level 0 cell) or 1. If it is in the level 0 cell, then the
investigated index is not an appropriate index for the query sample.
Total_Number_Of_Obje Bigint Core Number of cell instances of indexed objects that are tessellated in level 0. For
ctCells_In_Level0_In_I geometry indexes, this will happen if the bounding box of the index is smaller
ndex than the data domain.
A high number of objects in level 0 may require a costly application of
secondary filters if the query window falls partially outside the bounding box.
If the query window falls inside the bounding box, having a high number of
objects in level 0 may actually improve the performance.
42. Property Type Description
Number_Of_Rows_Selected_By_Primary_ bigint Core P = Number of rows selected by the primary filter.
Filter
Number_Of_Rows_Selected_By_Internal_ bigint Core S = Number of rows selected by the internal filter. For
Filter these rows, the secondary filter is not called.
Number_Of_Times_Secondary_Filter_Is_ bigint Core Number of times the secondary filter is called.
Called
Percentage_Of_Rows_NotSelected_By_Pri float Core Suppose there are N rows in the base table, suppose P
mary_Filter are selected by the primary filter. This is (N-P)/N as
percentage.
Percentage_Of_Primary_Filter_Rows_Sele float Core This is S/P as a percentage. The higher the percentage,
cted_By_Internal_Filter the better is the index in avoiding the more expensive
secondary filter.
Number_Of_Rows_Output bigint Core O=Number of rows output by the query.
Internal_Filter_Efficiency float Core This is S/O as a percentage.
Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is,
the less false positives have to be processed by the
secondary filter.
Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
Clustering imposes ordering on index
Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
TB
ADD Tesselation
Experimentation: For instance, consider this dataset: US Highways. In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024