The document discusses using geodata and geospatial functions in MySQL. It provides examples of finding zip codes and points of interest within a given distance, converting a table to use spatial data types and indexes for more efficient geospatial queries, and examples querying weather data from a Mesonet database using spatial functions.
Vector Tiles with GeoServer and OpenLayersJody Garnett
The latest release of GeoServer adds support for creating Vector Tiles in GeoJSON, TopoJSON, and MapBox Vector Tiles format through its WMS service for all the vector data formats it supports. These tiles can be cached using GeoWebCache (built into GeoServer), and served with the various tiling protocols (TMS, WMTS, and WMS-C). Thanks to very recent OpenLayers 3 development, these Vector Tiles can be easily and efficiently styled on a map.
This technical talk will look at how GeoServer makes Vector Tiles accessible through standard OGC services and how they differ from normal WMS and WFS usage. It will also look at how OpenLayers 3 - as a simple-to-use vector tiles client - interacts with GeoServer to retrieve tiles and effectively manage and style them. OpenLayer 3’s extensive style infrastructure will be investigated.
State of GeoServer provides an update on our community and reviews the new and noteworthy features for the Project. The community keeps an aggressive six month release cycle with GeoServer 2.8 and 2.9 being released this year.
Each releases bring together exciting new features. This year a lot of work has been done on the user interface, clustering, security and compatibility with the latest Java platform. We will also take a look at community research into vector tiles, multi-resolution raster support and more.
Attend this talk for a cheerful update on what is happening with this popular OSGeo project. Whether you are an expert user, a developer, or simply curious what these projects can do for you, this talk is for you.
Vector Tiles with GeoServer and OpenLayersJody Garnett
The latest release of GeoServer adds support for creating Vector Tiles in GeoJSON, TopoJSON, and MapBox Vector Tiles format through its WMS service for all the vector data formats it supports. These tiles can be cached using GeoWebCache (built into GeoServer), and served with the various tiling protocols (TMS, WMTS, and WMS-C). Thanks to very recent OpenLayers 3 development, these Vector Tiles can be easily and efficiently styled on a map.
This technical talk will look at how GeoServer makes Vector Tiles accessible through standard OGC services and how they differ from normal WMS and WFS usage. It will also look at how OpenLayers 3 - as a simple-to-use vector tiles client - interacts with GeoServer to retrieve tiles and effectively manage and style them. OpenLayer 3’s extensive style infrastructure will be investigated.
State of GeoServer provides an update on our community and reviews the new and noteworthy features for the Project. The community keeps an aggressive six month release cycle with GeoServer 2.8 and 2.9 being released this year.
Each releases bring together exciting new features. This year a lot of work has been done on the user interface, clustering, security and compatibility with the latest Java platform. We will also take a look at community research into vector tiles, multi-resolution raster support and more.
Attend this talk for a cheerful update on what is happening with this popular OSGeo project. Whether you are an expert user, a developer, or simply curious what these projects can do for you, this talk is for you.
Slides from my Introduction to PostGIS workshop at the FOSS4G conference in 2009. The material is available at http://revenant.ca/www/postgis/workshop/
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Creating Stunning Maps in GeoServer: mastering SLD and CSS stylesGeoSolutions
Various software can style maps and generate a proper SLD document for OGC compliant WMS like GeoServer to use. However, in most occasions, the styling allowed by the graphical tools is pretty limited and not good enough to achieve good looking, readable and efficient cartographic output. For those that like to write their own styles CSS also represents a nice alternatives thanks to its compactness and expressiveness.
Several topics will be covered, providing examples in both SLD and CSS for each, including: mastering multi-scale styling, using GeoServer extensions to build common hatch patterns, line styling beyond the basics, such as cased lines, controlling symbols along a line and the way they repeat, leveraging TTF symbol fonts and SVGs to generate good looking point thematic maps, using the full power of GeoServer label lay-outing tools to build pleasant, informative maps on both point, polygon and line layers, including adding road plates around labels, leverage the labeling subsystem conflict resolution engine to avoid overlaps in stand alone point symbology, blending charts into a map, dynamically transform data during rendering to get more explicative maps without the need to pre-process a large amount of views.
The presentation aims to provide the attendees with enough information to master SLD/CSS documents and most of GeoServer extensions to generate appealing, informative, readable maps that can be quickly rendered on screen.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
Presented at the webinar, June 26, 2019
Materialized views are a killer feature of ClickHouse that can speed up queries 20X or more. Our webinar will teach you how to use this potent tool starting with how to create materialized views and load data. We'll then walk through cookbook examples to solve practical problems like deriving aggregates that outlive base data, answering last point queries, and using AggregateFunctions to handle problems like counting unique values, which is a special ClickHouse feature. There will be time for Q&A at the end. At that point you'll be a wizard of ClickHouse materialized views and able to cast spells of your own.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
From webinars September 11 and September 17, 2019
ClickHouse is famous for speed. That said, you can almost always make it faster! This webinar uses examples to teach you how to deduce what queries are actually doing by reading the system log and system tables. We'll then explore standard ways to increase query speed: data types and encodings, filtering, join reordering, skip indexes, materialized views, session parameters, to name just a few. In each case we'll circle back to query plans and system metrics to demonstrate changes in ClickHouse behavior that explain the boost in performance. We hope you'll enjoy the first step to becoming a ClickHouse performance guru!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Slides from my Introduction to PostGIS workshop at the FOSS4G conference in 2009. The material is available at http://revenant.ca/www/postgis/workshop/
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Creating Stunning Maps in GeoServer: mastering SLD and CSS stylesGeoSolutions
Various software can style maps and generate a proper SLD document for OGC compliant WMS like GeoServer to use. However, in most occasions, the styling allowed by the graphical tools is pretty limited and not good enough to achieve good looking, readable and efficient cartographic output. For those that like to write their own styles CSS also represents a nice alternatives thanks to its compactness and expressiveness.
Several topics will be covered, providing examples in both SLD and CSS for each, including: mastering multi-scale styling, using GeoServer extensions to build common hatch patterns, line styling beyond the basics, such as cased lines, controlling symbols along a line and the way they repeat, leveraging TTF symbol fonts and SVGs to generate good looking point thematic maps, using the full power of GeoServer label lay-outing tools to build pleasant, informative maps on both point, polygon and line layers, including adding road plates around labels, leverage the labeling subsystem conflict resolution engine to avoid overlaps in stand alone point symbology, blending charts into a map, dynamically transform data during rendering to get more explicative maps without the need to pre-process a large amount of views.
The presentation aims to provide the attendees with enough information to master SLD/CSS documents and most of GeoServer extensions to generate appealing, informative, readable maps that can be quickly rendered on screen.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
Presented at the webinar, June 26, 2019
Materialized views are a killer feature of ClickHouse that can speed up queries 20X or more. Our webinar will teach you how to use this potent tool starting with how to create materialized views and load data. We'll then walk through cookbook examples to solve practical problems like deriving aggregates that outlive base data, answering last point queries, and using AggregateFunctions to handle problems like counting unique values, which is a special ClickHouse feature. There will be time for Q&A at the end. At that point you'll be a wizard of ClickHouse materialized views and able to cast spells of your own.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
From webinars September 11 and September 17, 2019
ClickHouse is famous for speed. That said, you can almost always make it faster! This webinar uses examples to teach you how to deduce what queries are actually doing by reading the system log and system tables. We'll then explore standard ways to increase query speed: data types and encodings, filtering, join reordering, skip indexes, materialized views, session parameters, to name just a few. In each case we'll circle back to query plans and system metrics to demonstrate changes in ClickHouse behavior that explain the boost in performance. We hope you'll enjoy the first step to becoming a ClickHouse performance guru!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Postgres Conference (PgCon) New York 2019Ibrar Ahmed
PostgreSQL provides a way to communicate with external data sources. This could be another PostgreSQL instance or any other database. The other database might be a relational database such as MySQL or Oracle; or any NoSQL database such as MongoDB or Hadoop. To achieve this, PostgreSQL implements ISO Standard call SQL-MED in the form of Foreign Data Wrappers (FDW). This presentation will explain in detail how PostgreSQL FDWs work. It will include a detailed explanation of simple features and will introduce more advanced features that were added in recent versions of PostgreSQL. Examples of these would be to show how aggregate pushdown and join pushdown work in PostgreSQL. The talk will include working examples of these advanced features and demonstrating their use with different databases. These examples show how data from different database flavors can be used by PostgreSQL, including those from heterogeneous relational databases, and showing NoSQL column store joins.
Yes, you read it correctly, we are jumping from 5.7 to 8.0 (that sounds familiar, doesn't it?). The new version doesn't only change the number but also changes how you write SQL. Recursive queries will allow you to generate series and work with hierarchical data. New JSON functions and performance improvements were also added to 8.0 to help you work on non-relational data. Expect to see what is new and improved in this talk to power up your application even more.
GridSQL is commonly thought of as a replication solution along the likes of Slony and Bucardo, but the open source GridSQL project actually allows PostgreSQL queries to be parallelized across many servers allowing performance to scale nearly linearly. In this session, we will discuss the advantages to using GridSQL for large multi-terabyte data warehouses and how to design your PostgreSQL schemas and queries to leverage GridSQL. We will dig into how GridSQL plans a query capable of spanning multiple PostgreSQL servers and executes across those nodes. We will delve into some performance expectations and where GridSQL should be deployed.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
In this session, I will discuss some of the practical uses of JSON in MySQL, focusing on version 5.7 but also discussing options for previous versions, and briefly discussing MySQL 8.0. I will discuss several specific use cases, as well as some JSON antipatterns that should be avoided.
Some topics I will address:
- The evolution of JSON parsing in MySQL: from stored routines to UDFs to native functions
- Real-life use cases: Custom fields, flexible rollups, nested objects, etc.
- The power of JSON + virtual columns
- Storing JSON as text versus using new JSON data types
- Read/write balance considerations
- Disk storage implications
- Indexing JSON documents in MySQL
- Additional JSON features in MySQL 8.0
What SQL functionality was added in the past year or so. The presentation covers default expressions, functional key parts, lateral derived tables, CHECK constraints, JSON and spatial improvements. Also some other small SQL and other improvements.
Where the %$#^ Is Everybody? Geospatial Solutions For Oracle APEXJim Czuprynski
Geospatial use cases are common – closest coffeeshop, efficient delivery routing, other stores near me – and I’ll show you how to use Oracle’s Spatial and Graph feature set to tackle them within a simple-to-build APEX application.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
2. OKC MySQL
● Discuss topics about MySQL and related open source RDBMS
● Discuss complementary topics (big data, NoSQL, etc)
● Help to grow the local ecosystem through meetups and events
4. What is Geodata?
● Encompasses a wide array of topics
● Revolves around geo-positioning data (latitude/longitude)
● Point data (single lat/lon)
● Bounded areas (think radius from point)
● Defined area (think city limits outline on map)
● Often includes some function of distance
● Distance between points
● All points within x, y
5. Why do we care?
● Here are some commonly asked questions based around
geodata:
● What are the 5 closest BBQ restaurants to my hotel?
● How far is it from here to the airport?
● How many restaurants are there within 25 miles of my
hotel?
● These are all fairly common questions – especially with the
prevalence of geo-enabled devices (anyone here ever enable
“Location Services”?)
6. Other Industries
● Oil/gas exploration
● Meteorology
● Logistics companies
● < INSERT YOUR INDUSTRY HERE >
● The point: geodata is so readily available, you are likely already
using it or will be soon!
7. High Level theory and formulas...
… everyone's favorite
● Distance between points on sphere (The Haversine Formula)
● Ok, everyone get out your calculators and slide rules...
8. en MySQL por favor...
● MySQL prior to 5.6
● Get out your calculators
● MySQL 5.6
● Introduced st_distance (built-in)
9. Enough with the theory already!
SET @src_lat = 37; SET @src_lng = -133;
SET @dest_lat = 38; SET @dest_lng = -133;
SELECT (3959 * acos(cos(radians(@src_lat)) *
cos(radians(@dest_lat)) * cos(radians(@dest_lng) -
radians(@src_lng)) + sin(radians(@src_lat)) *
sin( radians(@dest_lat)))) as GreatCircleDistance
+---------------------+
| GreatCircleDistance |
+---------------------+
| 69.09758508647379 |
+---------------------+
● Huzzah!! The distance between two latitude lines is 69 miles!
10. … lets put it all together
Find me all zip codes within 25 miles of my current zip code...
● Table Structure (pre 5.6)
CREATE TABLE `zipcodes_indexed` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`ZIP` varchar(255) DEFAULT NULL,
`Latitude` decimal(10,8) DEFAULT NULL,
`Longitude` decimal(11,8) DEFAULT NULL,
`City` varchar(255) DEFAULT NULL,
`State` varchar(255) DEFAULT NULL,
`County` varchar(255) DEFAULT NULL,
`Type` varchar(255) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `lat` (`Latitude`,`Longitude`),
KEY `city` (`City`,`State`),
KEY `state` (`State`)
) ENGINE=InnoDB
11. … lets put it all together
● Helper Function (pre 5.6)
DELIMITER $$
DROP FUNCTION IF EXISTS DistanceInMiles$$
CREATE FUNCTION DistanceInMiles (src_lat decimal(10,8), src_lng
decimal(11,8), dest_lat decimal(10,8), dest_lng decimal(11,8)) RETURNS
decimal(15,8) DETERMINISTIC
BEGIN
RETURN CAST((3959 * acos(cos(radians(src_lat)) * cos(radians(dest_lat)) *
cos(radians(dest_lng) - radians(src_lng)) + sin( radians(src_lat)) * sin(
radians(dest_lat)))) as decimal(15,8));
END $$
DELIMITER ;
12. … and results!
mysql> # Norman, OK Post Office (73071)
mysql> SET @srcLat = 35.254049;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @srcLng = -97.300313;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @dist = 25;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT z.ZIP, z.City, z.State,
-> DistanceInMiles(@srcLat, @srcLng, z.Latitude, z.Longitude) as distance
-> FROM zipcodes_indexed z
-> HAVING distance < @dist
-> ORDER BY distance
-> LIMIT 10;
+-------+---------------+-------+-------------+
| ZIP | City | State | distance |
+-------+---------------+-------+-------------+
| 73071 | NORMAN | OK | 0.00000000 |
| 73026 | NORMAN | OK | 1.45586169 |
| 73072 | NORMAN | OK | 4.30645795 |
| 73165 | OKLAHOMA CITY | OK | 5.84183161 |
| 73070 | NORMAN | OK | 7.15379176 |
| 73068 | NOBLE | OK | 7.15998471 |
| 73160 | OKLAHOMA CITY | OK | 7.83641939 |
| 73069 | NORMAN | OK | 7.91904816 |
| 73019 | NORMAN | OK | 8.72433716 |
| 73150 | OKLAHOMA CITY | OK | 10.62626848 |
+-------+---------------+-------+-------------+
10 rows in set (0.79 sec)
13. Not so great...
mysql> EXPLAIN SELECT z.ZIP, z.City, z.State,
-> DistanceInMiles(@srcLat, @srcLng, z.Latitude, z.Longitude) as distance
-> FROM zipcodes_indexed z
-> HAVING distance < @dist
-> ORDER BY distance
-> LIMIT 10G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: z
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 42894
filtered: 100.00
Extra: Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)
14. More math to the rescue!
● Rather than scan the whole table, lets just look at a small
rectangle of data (i.e. a bounding box):
1o
Latitude ~= 69 miles
1o
Longitude ~= cos(lat) * 69
$lat_range = radius / 69
$lng_range = abs(radius / (cos(lat) * 69))
$lon1 = $mylng + $lng_range
$lon2 = $mylng - $lng_range
$lat1 = $mylat +$lat_range
$lat2 = $mylat - $lat_range
15. … and respectable!
mysql> SELECT z.ZIP, z.City, z.State,
-> DistanceInMiles(@srcLat, @srcLng, z.Latitude, z.Longitude) as distance
-> FROM zipcodes_indexed z
-> WHERE z.Longitude BETWEEN -97.744004 AND -96.856621
-> AND z.Latitude BETWEEN 34.891730 AND 35.616367
-> HAVING distance < @dist
-> ORDER BY distance
-> LIMIT 10;
+-------+---------------+-------+-------------+
| ZIP | City | State | distance |
+-------+---------------+-------+-------------+
| 73071 | NORMAN | OK | 0.00000000 |
| 73026 | NORMAN | OK | 1.45586169 |
| 73072 | NORMAN | OK | 4.30645795 |
| 73165 | OKLAHOMA CITY | OK | 5.84183161 |
| 73070 | NORMAN | OK | 7.15379176 |
| 73068 | NOBLE | OK | 7.15998471 |
| 73160 | OKLAHOMA CITY | OK | 7.83641939 |
| 73069 | NORMAN | OK | 7.91904816 |
| 73019 | NORMAN | OK | 8.72433716 |
| 73150 | OKLAHOMA CITY | OK | 10.62626848 |
+-------+---------------+-------+-------------+
10 rows in set (0.00 sec)
17. Converting our table to geospatial...
● Non-Spatial way:
● Lat / Lon as individual decimal columns
● Composite B-Tree index covering each column
● Spatial way:
● Use the geometry type with a POINT object (single
lat/lon)
● Use the geometry type with a POLYGON object
(defined border)
● Use a SPATIAL index on the geometry column
(MyISAM only through 5.6, added to InnoDB in 5.7)
19. So what?
● We had already written a query do just that
● And this one is still doing the same amount of handler
operations!
● This opens up the opportunity to run other GIS based
calculations as we'll see now...
20. Not just points!
● In earlier slides, we based everything on a single lat/lon
coordinate
● Several “zip code” databases follow this model
● In recent years, the prevalence of full Polygon region databases
has increased
● Enter the US Census provided tl_2014_us_zcta510 schema...
21. Not just points...
CREATE TABLE `tl_2014_us_zcta510` (
`OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
`SHAPE` geometry NOT NULL,
`zcta5ce10` varchar(5) DEFAULT NULL,
`geoid10` varchar(5) DEFAULT NULL,
`classfp10` varchar(2) DEFAULT NULL,
`mtfcc10` varchar(5) DEFAULT NULL,
`funcstat10` varchar(1) DEFAULT NULL,
`aland10` double DEFAULT NULL,
`awater10` double DEFAULT NULL,
`intptlat10` varchar(11) DEFAULT NULL,
`intptlon10` varchar(12) DEFAULT NULL,
UNIQUE KEY `OGR_FID` (`OGR_FID`),
SPATIAL KEY `SHAPE` (`SHAPE`),
KEY `zcta5ce10` (`zcta5ce10`)
) ENGINE=InnoDB
● This table defines full regions for zip codes as opposed to a
centered lat/lon for the post office
22. Difference in storage
● Old version for Norman zip code 73071:
● New version in region based version:
● Note that now, we have a full region as opposed to a point...
astext(geo): POINT(-97.300313 35.254049)
astext(shape): POLYGON((-97.44109 35.228675,-97.442413
35.228673,-97.442713 35.228669,-97.442722 35.229179,-
97.442729 35.229632,-97.442727 35.230083,-97.442726
35.230542,-97.442725 35.230998,-97.441071 35.23099,-
97.441072 35.231275,-97.441072 35.231437,-97.442724
35.231464,-97.442722 35.231912,-97.442647 35.231898,-
97.441073 35.231884,-97.441073 35.232346,...
23. GIS Functions
● st_contains
● Check if an object is entirely within another object
● st_within
● Check if an object is spatially within another object
● st_*
● Detailed list here:
http://dev.mysql.com/doc/refman/5.7/en/spatial-
relation-functions-object-shapes.html
24. Sample GIS Queries
● # Find me all the zipcodes (polygons) for my points of interest
SELECT raw.zcta5ce10 AS zipcode, pts.name
FROM my_points pts
JOIN tl_2014_us_zcta510 raw ON st_within(pts.loc, raw.SHAPE);
● # Find me all points of interest within a zip code (polygon)
SELECT raw.zcta5ce10 AS zipcode, pts.name
FROM my_points pts
JOIN tl_2014_us_zcta510 raw ON st_contains(raw.SHAPE, pts.loc)
WHERE raw.zcta5ce10 = "73071";
25. Sample GIS Queries
● # Find me all the points of interest within a county
SELECT raw.zcta5ce10 AS zipcode, pts.name
FROM my_points pts
JOIN tl_2014_us_zcta510 raw ON st_contains(raw.SHAPE, pts.loc)
WHERE raw.zcta5ce10 IN (SELECT ZIP from legacy.zipcodes_indexed
where County = "Cleveland");
26. Impact of SPATIAL Key
● I mentioned earlier that SPATIAL keys will be available within
InnoDB starting in 5.7
● Here is just quick sample of the same query, with and without a
SPATIAL key around a polygon
● Lets grab one of the earlier queries:
SELECT raw.zcta5ce10 AS zipcode, pts.name
FROM my_points pts
JOIN tl_2014_us_zcta510 raw ON st_within(pts.loc, raw.SHAPE);
29. Sample Usage
● Living in Oklahoma, the natural first choice for a quick
database generally revolves around... weather!
● This sample uses data from the readily available “current
conditions” flat files available from http://mesonet.org
30. Mesonet Collector
● Every five minutes (when new observations are posted), grab
the current CSV of all sites
● Load the raw CSV data into a “staging” table
● Clean up the data and move it to the main “observations” table
● Also, an intermediate step that was only run once extracted all
of the site information (lat/lon, name, etc) from the raw data to
populate a base site table (mesonet.sites)
31. Some Fun Queries
# Get all the mesonet sites in a county (based on GIS coords)
SELECT raw.zcta5ce10 AS zipcode, sites.name
FROM mesonet.sites
JOIN geopoly.tl_2014_us_zcta510 raw ON st_contains(raw.SHAPE, sites.pt)
WHERE raw.zcta5ce10 IN (
SELECT ZIP from legacy.zipcodes_indexed where County IN ("Oklahoma")
);
+---------+---------------------+
| zipcode | name |
+---------+---------------------+
| 73084 | Spencer |
| 73107 | Oklahoma City West |
| 73114 | Oklahoma City North |
| 73117 | Oklahoma City East |
+---------+---------------------+
4 rows in set (0.00 sec)
32. Some Fun Queries
# Get the AVG air temp for all sites in a county (based on GIS)
SELECT raw.zcta5ce10 AS zipcode, sts.name, AVG(TAIR)
FROM mesonet.sites sts
JOIN geopoly.tl_2014_us_zcta510 raw ON st_contains(raw.SHAPE, sts.pt)
JOIN mesonet.observations obs ON obs.stid = sts.stid
WHERE raw.zcta5ce10 IN (
SELECT ZIP from legacy.zipcodes_indexed where County IN ("Oklahoma")
)
GROUP BY sts.name;
+---------+---------------------+-----------+
| zipcode | name | AVG(TAIR) |
+---------+---------------------+-----------+
| 73117 | Oklahoma City East | 44.7393 |
| 73114 | Oklahoma City North | 44.3037 |
| 73107 | Oklahoma City West | 44.6729 |
| 73084 | Spencer | 44.0250 |
+---------+---------------------+-----------+
4 rows in set (0.01 sec)
35. GPX Parsing
● As I wanted another sample, I looked through my phone to find
an app I use with GPS frequently: MotionX-GPS (mainly for
hiking/hunting)
● This app lets you:
● Record waypoints
● Record full tracks
● Export tracks in GPX format (bonus)
36. GPX Parsing App
● Now to come up with something useful :)
● Off the top of my head, I thought it would be cool to record a
drive, then map those points out and see how many miles I had
driven in each zip code
● Exciting stuff, I know!
● Note some possible other uses would be real logistics tracking
(think fleet vehicles), tornado tracks, pipeline, etc