This document discusses extending PostgreSQL's BRIN index type to support geospatial data in PostGIS. BRIN indexes store summarized data for blocks of table pages to create smaller indexes. The authors implemented BRIN operator classes and support functions for PostGIS geometry and geography data types that store bounding boxes. Tests on real-world geospatial datasets show BRIN indexes outperforming GiST indexes in size and query performance, especially for sorted data. Future work could include supporting more operators and nearest-neighbor searches.
after a brief intro about indexes present in PostgreSQL, the new feature in PostgreSQL 10 are shown: parallel index scans, hash persistence, BRIN autosummarization, unbalanced support (SP-GiST) for inet data.
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
Доклад был представлен на официальной российской конференции PG Day'14 Russia, посвященной вопросам разработки и эксплуатации PostgreSQL.
Доклад посвящен улучшениям в GIN-индексах в PostgreSQL 9.4 и далее, которые выводят GIN на новый уровень производительности и расширяемости. Наиболее важные улучшения:
Сжатие постинг-листов. Индексы становятся в среднем в 2 раза компактнее. При это не требуется никаких изменений со стороны opclass'ов. pg_upgrade поддерживается, индексы сжимаются "на лету".
Алгоритм быстрого сканирования GIN-индексов позволяет пропускать части больших постинг-деревьев при сканировании. Этот алгоритм кардинально улучшает скорость поиска для hstore и jsonb операторов, а также случай "частое_слово & редкое_слово" для полнотекстового поиска.
Хранение дополнительной информации в постинг-листах. Содержимое этой дополнительной информации зависит от конкретной разновидности GIN-индекса (определяется opclass'ом). Дополнительная информация может быть полезна при самых разных видах поиска: поиск по фразам, поиск по похожести массивов, обратный полнотекстовый поиск (поиск тех tsquery, которые подходят под tsvector), обратный поиск по регулярным выражением (поиск регулярных выражений, подходящих под строку), поиск по строковой "похожести" с использованием позиционных n-грам.
Ранжирование по индексу. Это улучшение позволяет возвращать результаты из индекса таким образом, как это определяет opclass. Наиболее важное применение — возвращение результатов полнотекстового поиска в порядке релевантности, кардинально снижающее загрузку IO. Но есть также и другие применения, такие как возврат массивов или строк в порядке их "похожести".
В докладе представлены результаты "бенчмарков" полнотектового поиска, использующие реальные наборы данных (6М и 15М документов) и реальные поисковые запросы, которые демонстрируют, что улучшенный полнотекстовый поиск PostgreSQL (со всеми накладными расходами ACID) может превосходить по скорости Sphinx.
ePBCS Gridbuilder Deep Dive - Last Minute KScope SouvenirsKyle Goodfriend
I was asked to present at a "Last Minute ODTUG Kscope18 Planning Souvenirs You Will ACTUALLY Use!" presentation on Thursday. This will take you into deeper functional use cases of the Groovy GridBuilder.
after a brief intro about indexes present in PostgreSQL, the new feature in PostgreSQL 10 are shown: parallel index scans, hash persistence, BRIN autosummarization, unbalanced support (SP-GiST) for inet data.
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
Доклад был представлен на официальной российской конференции PG Day'14 Russia, посвященной вопросам разработки и эксплуатации PostgreSQL.
Доклад посвящен улучшениям в GIN-индексах в PostgreSQL 9.4 и далее, которые выводят GIN на новый уровень производительности и расширяемости. Наиболее важные улучшения:
Сжатие постинг-листов. Индексы становятся в среднем в 2 раза компактнее. При это не требуется никаких изменений со стороны opclass'ов. pg_upgrade поддерживается, индексы сжимаются "на лету".
Алгоритм быстрого сканирования GIN-индексов позволяет пропускать части больших постинг-деревьев при сканировании. Этот алгоритм кардинально улучшает скорость поиска для hstore и jsonb операторов, а также случай "частое_слово & редкое_слово" для полнотекстового поиска.
Хранение дополнительной информации в постинг-листах. Содержимое этой дополнительной информации зависит от конкретной разновидности GIN-индекса (определяется opclass'ом). Дополнительная информация может быть полезна при самых разных видах поиска: поиск по фразам, поиск по похожести массивов, обратный полнотекстовый поиск (поиск тех tsquery, которые подходят под tsvector), обратный поиск по регулярным выражением (поиск регулярных выражений, подходящих под строку), поиск по строковой "похожести" с использованием позиционных n-грам.
Ранжирование по индексу. Это улучшение позволяет возвращать результаты из индекса таким образом, как это определяет opclass. Наиболее важное применение — возвращение результатов полнотекстового поиска в порядке релевантности, кардинально снижающее загрузку IO. Но есть также и другие применения, такие как возврат массивов или строк в порядке их "похожести".
В докладе представлены результаты "бенчмарков" полнотектового поиска, использующие реальные наборы данных (6М и 15М документов) и реальные поисковые запросы, которые демонстрируют, что улучшенный полнотекстовый поиск PostgreSQL (со всеми накладными расходами ACID) может превосходить по скорости Sphinx.
ePBCS Gridbuilder Deep Dive - Last Minute KScope SouvenirsKyle Goodfriend
I was asked to present at a "Last Minute ODTUG Kscope18 Planning Souvenirs You Will ACTUALLY Use!" presentation on Thursday. This will take you into deeper functional use cases of the Groovy GridBuilder.
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformSANGHEE SHIN
I gave this talk at the FOSS4G Asia 2018 held at University of Moratuwa, Sri Lanka. I've added some of recent improvements of mago3D features including CityGML, IndoorGML supporting. Also I've talked about the future plan of mago3D toward Digital Twin platform.
Every application has to store and manage data that in one form or another has a temporal extent. For some years, database systems started integrating features that help with the management of such kinds of data. In this talk, we dive into the support that the Open Source database system PostgreSQL together with its ecosystem provides to facilitate the querying and processing of temporal data. We will also take a brief look at the projects we are working on at unibz in this context.
What we've done so far with mago3D, an open source based 'Digital Twin' platf...SANGHEE SHIN
mago3D = {Indoor, Outdoor} + {Overground, Underground} + {Objects, Phenomena} + {Static, Dynamic}
It would be awesome if you can have a virtual replica of real world that you can play with and do the simulation to see what would happen. That is 'Digital Twin', the ultimate goal of mago3D!
At the FOSS4G NA 2019, I talked about the recent achievements and improvements of mago3D project, an open source based 'Digital Twin' platform. mago3D(http://mago3d.com) is relatively new project that was first released in July 2017. The ultimate goal of mago3D project is developing an open source based digital twin platform that can replicate and simulate the real world objects, processes, and phenomena on web environment. mago3D is on its way to achieve this goal now. Currently mago3D more focuses on managing and visualization of various types of 3D data ranging from simple box style extrusion model, point clouds, realistic mesh, to complex BIM(Building Information Modeling), AEC(Architecture, Engineering, Construction) data. mago3D supports industry standards 3D formats such as IFC, CityGML, IndoorGML, 3DS, Collada DAE, OBJ, LAS, JT, and so on. mago3D has been used in various industry sectors including ship building, urban management, indoor data management, and national defense. In this talk I showcased several real projects that had employed the mago3D and talked about what I'd learned during this projects. I also talked more about the future plan of mago3D towards visualizing/simulating of {static and dynamic data}, {underground and overground features}, {indoor and outdoor spaces}, {objects and phenomena} at the same time on web browser.
As a tech-savvy country, there're lots of discussions and activities around digital twin in Korea. I also shared my real experiences on this in this talk.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.
Introduction to mago3D, an Open Source Based Digital Twin PlatformSANGHEE SHIN
This talk was given at the Busan Eco Delta City(Korea National Pilot Smart City) technical workshop held on 18th July. I talked about introduction and history of mago3D, some core technologies, real cases, and lessons learnt in this workshop.
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Prakher Hajela Saxena
MapInfo Discover3D 2015 is the latest version of the software available in the industry today with tools to enhance your Geological Mapping, and 3D Exploration modeling capabilities.
By testing a modelling approach that utilizes minimal rules and constraints against an explicit exhaustive mixed integer programming method the research presents an alternative approach. Trade offs of time, effort, compliance, configuration and usability are considered and analyzed. By utilizing approaches from Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963 much of the traditionally manual process can be automated. Further we demonstrate that a hybrid approach can enhance the productivity and usability of network planning software for telecommunications.
A travellers guide to mapping technologies in djangoAnthony Joseph
Hahmann and Burghart estimate at least 60% of all information is geospatially referenced. Fortunately, Django includes a variety of spatial and mapping tools to help build spatially-aware services. This talk will give an overview of geospatial concepts for the Australian and Django communities.
From geo-tagged photos to the route between a user and their destination, at least 60% of all information generated is geospatially referenced (Hahmann and Burghart 2013). Therefore, being able to store, query and display spatial information in a Django can be a core function of a contemporary web application. Fortunately, GeoDjango is a Django core module that adds spatial capabilities to any Django website or service and this talk will provide an introduction to the world of Django and web mapping.
This talk will cover:
an introduction to fundamental geospatial concepts, highlighting key issues for Australian developers,
a tour of common, third-party geospatial services and technologies used in websites such as maps and satellite imagery, geocoding, routing and spatial databases,
an overview of storing and querying spatial data in a Django app,
front-end libraries you can use to display spatial data, and
related topics to continue your spatial journey including big data and streaming data.
It is envisioned that this talk will provide a solid foundation for experienced Django developers new to geospatial analysis and mapping to develop their own Django web services using the GeoDjango documentation and concepts covered in this talk: avoiding common pitfalls in building and localising geospatial data. It is also envisioned that this talk would allow geospatial analysts to convert their algorithms to Django-equivalent concepts. This talk will also give an overview of future technologies that may be of interest to advanced developers to investigate.
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
Introduction to mago3D: A Web Based Open Source GeoBIM PlatformSANGHEE SHIN
I gave this talk at the FOSS4G Asia 2018 held at University of Moratuwa, Sri Lanka. I've added some of recent improvements of mago3D features including CityGML, IndoorGML supporting. Also I've talked about the future plan of mago3D toward Digital Twin platform.
Every application has to store and manage data that in one form or another has a temporal extent. For some years, database systems started integrating features that help with the management of such kinds of data. In this talk, we dive into the support that the Open Source database system PostgreSQL together with its ecosystem provides to facilitate the querying and processing of temporal data. We will also take a brief look at the projects we are working on at unibz in this context.
What we've done so far with mago3D, an open source based 'Digital Twin' platf...SANGHEE SHIN
mago3D = {Indoor, Outdoor} + {Overground, Underground} + {Objects, Phenomena} + {Static, Dynamic}
It would be awesome if you can have a virtual replica of real world that you can play with and do the simulation to see what would happen. That is 'Digital Twin', the ultimate goal of mago3D!
At the FOSS4G NA 2019, I talked about the recent achievements and improvements of mago3D project, an open source based 'Digital Twin' platform. mago3D(http://mago3d.com) is relatively new project that was first released in July 2017. The ultimate goal of mago3D project is developing an open source based digital twin platform that can replicate and simulate the real world objects, processes, and phenomena on web environment. mago3D is on its way to achieve this goal now. Currently mago3D more focuses on managing and visualization of various types of 3D data ranging from simple box style extrusion model, point clouds, realistic mesh, to complex BIM(Building Information Modeling), AEC(Architecture, Engineering, Construction) data. mago3D supports industry standards 3D formats such as IFC, CityGML, IndoorGML, 3DS, Collada DAE, OBJ, LAS, JT, and so on. mago3D has been used in various industry sectors including ship building, urban management, indoor data management, and national defense. In this talk I showcased several real projects that had employed the mago3D and talked about what I'd learned during this projects. I also talked more about the future plan of mago3D towards visualizing/simulating of {static and dynamic data}, {underground and overground features}, {indoor and outdoor spaces}, {objects and phenomena} at the same time on web browser.
As a tech-savvy country, there're lots of discussions and activities around digital twin in Korea. I also shared my real experiences on this in this talk.
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.
Introduction to mago3D, an Open Source Based Digital Twin PlatformSANGHEE SHIN
This talk was given at the Busan Eco Delta City(Korea National Pilot Smart City) technical workshop held on 18th July. I talked about introduction and history of mago3D, some core technologies, real cases, and lessons learnt in this workshop.
Exploration and 3D GIS Software - MapInfo Professional Discover3D 2015Prakher Hajela Saxena
MapInfo Discover3D 2015 is the latest version of the software available in the industry today with tools to enhance your Geological Mapping, and 3D Exploration modeling capabilities.
By testing a modelling approach that utilizes minimal rules and constraints against an explicit exhaustive mixed integer programming method the research presents an alternative approach. Trade offs of time, effort, compliance, configuration and usability are considered and analyzed. By utilizing approaches from Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963 much of the traditionally manual process can be automated. Further we demonstrate that a hybrid approach can enhance the productivity and usability of network planning software for telecommunications.
A travellers guide to mapping technologies in djangoAnthony Joseph
Hahmann and Burghart estimate at least 60% of all information is geospatially referenced. Fortunately, Django includes a variety of spatial and mapping tools to help build spatially-aware services. This talk will give an overview of geospatial concepts for the Australian and Django communities.
From geo-tagged photos to the route between a user and their destination, at least 60% of all information generated is geospatially referenced (Hahmann and Burghart 2013). Therefore, being able to store, query and display spatial information in a Django can be a core function of a contemporary web application. Fortunately, GeoDjango is a Django core module that adds spatial capabilities to any Django website or service and this talk will provide an introduction to the world of Django and web mapping.
This talk will cover:
an introduction to fundamental geospatial concepts, highlighting key issues for Australian developers,
a tour of common, third-party geospatial services and technologies used in websites such as maps and satellite imagery, geocoding, routing and spatial databases,
an overview of storing and querying spatial data in a Django app,
front-end libraries you can use to display spatial data, and
related topics to continue your spatial journey including big data and streaming data.
It is envisioned that this talk will provide a solid foundation for experienced Django developers new to geospatial analysis and mapping to develop their own Django web services using the GeoDjango documentation and concepts covered in this talk: avoiding common pitfalls in building and localising geospatial data. It is also envisioned that this talk would allow geospatial analysts to convert their algorithms to Django-equivalent concepts. This talk will also give an overview of future technologies that may be of interest to advanced developers to investigate.
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
Similar to GBroccolo JRouhaud pgconfeu2016_brin4postgis (20)
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
1. J. Rouhaud - julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Block Range INdexing
on
geospatial
data
extend BRIN support to PostGIS
2. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Who we are?Who we are?
Giuseppe BroccoloGiuseppe Broccolo
@giubro
gbroccolo7
gbroccolo
gemini__81
@rjuju123
rjuju
rjuju
Julien RouhaudJulien Rouhaud
3. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
PostgreSQL index typePostgreSQL index type
• Aka Access Methods
• Can add user defined new access methods
– Fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD (avoid catalog update)
• Generic WAL interface (crash safe)
• Some external access methods available
– Bloom (postgrespro)
– RUM (postgrespro)
4. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BtreeBtree
• Most famous access method
• Balanced tree
• Keys are sorted
• Only handle “standard” operators (=, <, <=, >, >= )
• Index and Index Only Scan
• Unique constraints
5. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Extend native access methodsExtend native access methods
• Access method use operator classes (opclass)
CREATE INDEX idx_name
USING method ON tbl (col opclass_name);
• Define, for each type and access method
– operators for the needed types
– support functions depending on the access method
• Can be extended by third part extensions
6. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN operator class for geometry
CREATE OPERATOR CLASS brin_geometry_inclusion_ops_2d
DEFAULT FOR TYPE geometry
USING brin
FAMILY brin_geometry_inclusion_ops_2d AS
OPERATOR 3 &&(geometry, geometry),
OPERATOR 7 ~(geometry, geometry),
OPERATOR 8 @(geometry, geometry),
FUNCTION 1 brin_inclusion_opcinfo(internal) ,
FUNCTION 2 geom2d_brin_inclusion_add_value(internal, internal,
internal, internal) ,
FUNCTION 3 brin_inclusion_consistent(internal, internal, internal) ,
FUNCTION 4 brin_inclusion_union(internal, internal, internal) ,
STORAGE box2df;
7. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN - OverviewBRIN - Overview
• Block Range Index
– S. Riggs, Á Herrera, H. Linnakangas in 9.5
• “Summarized” index
– Split the table in ranges
• A range is a set of pages (blocks)
• By default 128 pages
• Overloaded with the pages_per_range parameter
– Really small index, faster to create
CREATE INDEX ON table USING BRIN (col) WITH (pages_per_range = 64)
8. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – How data are storedBRIN – How data are stored
• For each range, computes a summary of the blocks
• Two kind of opclass:
– minmax : for numerical values. Key values are added follwing
sorting criteria + sorting operators
– inclusion : for more exotic data. Key values are added following
inclusion criteria + inclusion operators
• Extensibility
– Provide support functions + operator
http://www.postgresql.org/docs/9.5/static/brin-extensibility.html
9. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – internal overviewBRIN – internal overview
10. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – Search the indexBRIN – Search the index
• Finding the matching rows
– Scan the whole index for potentially matching ranges
– Scan and recheck all these table blocks to keep only matching rows
• Can be seen as partial/enhanced sequential scan
– Requires that rows are well distributed to avoid scanning too much
table blocks
– Useless on random data
11. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN - tipsBRIN - tips
• Tuning pages_per_range
– Bigger value
– Smaller index
– More false positive
– More table blocks to recheck
• Need to be tuned depending on queries
– Less selective queries can usually afford bigger pages_per_range
12. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN – More tipsBRIN – More tips
• New blocks (not in existing ranges) are not summarized
– Perform VACUUM
– Or call brin_summarize_new_value()
• Brin will never “shrinks” summarized data
– If you update / delete boundary data, need to REINDEX
13. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGISBRIN for PostGIS
• Original idea and POC by Giuseppe Broccolo
• C implementation and debugging in OSGEO Code Sprint in Paris
– Bug in PostGIS fixed (cast to BOX3d)
– Thanks a lot to Ronan Dunklau
• Code cleanup and some improvements since
• Committed in PostGIS repo on July, 31st
(2b3c01b)
• Bug found few days ago
– Fix already submitted, see trac issue #3665
– wait for PostGIS 2.3.1
14. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – featuresBRIN for PostGIS – features
• Use PostGIS infrastructure for storing bounding box
– Some helper functions and casts added
• 3 opclass
– 2D (default), 3D, 4D geometry
– geography
– box2d/box3d (cross-operators defined in the opfamily)
• Operators: &&, @, ~ (2D), &&& (3D, 4D)
• No kNN support, since BRIN doesn't handle kNN
15. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – storage datatype
• Storing only the bounding box, float-precision (same as GiST)
– gidx (3D, 4D geometry)
– box2df (2D geometry, geography)
• Fixed length, and keeps index as small as possible
• Allow indexing of big geometries ( > 8kB)
• CPU saving, comparing bounding boxes is really cheap
– Check done twice with BRIN (Bitmap Index Scan and Bitmap Heap Scan)
• Recheck with exact arithmetics for exact results needed
– Hopefully most of rows discarded by the index
– done in st_* functions (same as GiST)
16. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – C infrastructureBRIN for PostGIS – C infrastructure
• Use as much as possible BRIN inclusion infrastructure
– support functions don’t handle storing different datatype
• A new “CAST” support function would be nice :)
• brin_inclusion_add_value() need to be redefined
– Datatype is known, save indirect function calls
– Take a geometry / geography and store (or merge) its bounding box
• Fixed number of dimensions, depending on operator class
• Handle NULL and empty geometries
• Still need to import 3 private defines
– #define INCLUSION_UNION 0
– #define INCLUSION_UNMERGEABLE 1
– #define INCLUSION_CONTAINS_EMPTY 2
17. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
BRIN for PostGIS – What's next?BRIN for PostGIS – What's next?
• Handle more operators for 3D and 4D (with SFCGAL lib)
• kNN? Would require new infrastructure in PostgreSQL
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
k
ORDER BY geoms <-> point
LIMIT N
18. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
• A simple test on a sorted dataset
• OSM data
– Speed up searches of lines intersercting the buffer of a set of points
• LiDAR dataset
– Speed up searches of 3D points with X,Y coordinates included inside
a 2D polygon
Benchmarks:Benchmarks:
19. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE INDEX idx_gist ON points USING gist
-# (geom gist_geometry_ops_nd);
=# CREATE INDEX idx_brin_128 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d);
=# CREATE INDEX idx_brin_10 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d)
-# WITH (pages_per_range=10);
BRIN(128)/GiST 1/2000
BRIN(10)/GiST 1/1000
BRIN(128)/GiST 1/30
BRIN(10)/GiST 1/20
Sizes
Building
times
20. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
11stst
example: “sorted”example: “sorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# SELECT * FROM points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 50/1
BRIN(10)/GiST 6/1
BRIN(128)/SeqScan 1/60
BRIN(10)/SeqScan 1/350
Execution times
21. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
22ndnd
example: “unsorted”example: “unsorted” geometrygeometry pointspoints
~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE TABLE unsorted_points AS
-# SELECT * FROM points ORDER BY random();
=# SELECT * FROM unsorted_points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 2800/1
BRIN(10)/GiST 400/1
BRIN(128)/SeqScan 1/1
BRIN(10)/SeqScan 1/12
Execution times
23. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
...what about...what about INSERTINSERT’s?’s?
INSERT INTO points
SELECT ST_MakePoint(a, a, a)
FROM generate_series(1, 1000) AS f(a);
GiST 6 times than insertions without index
BRIN (10) 3 times than insertions without index
BRIN (128) 2 times than insertions without index
...blocks may be “not sorted” anymore!
24. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
http://earthquake.usgs.gov/http://earthquake.usgs.gov/ ~100k lat/lon WGS84 points
(earthquakes in the world)
~1M EPSG 102008 linestrings
(railways in the west side)
Which railway was closeWhich railway was close
(<10km) to an earthquake(<10km) to an earthquake
epicenter?epicenter?
http://www.openstreetmap.org/http://www.openstreetmap.org/
25. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
=# CREATE INDEX idx_rl_gist ON westrailways USING gist
-# (ST_Buffer(GEOGRAPHY(ST_Transform(geom), 4326), 10000));
=# CREATE INDEX idx_eq_brin ON worldearthquakes USING gist (coord);
BRIN/GiST Sizes ~1/100
BRIN/GiST Times ~1/200
=# CREATE INDEX idx_rl_brin ON westrailways USING brin
-# (ST_Buffer(GEOGRAPHY(ST_Trasform(geom), 4326), 10000))
-# WITH (pages_per_range=10);
=# CREATE INDEX idx_eq_brin ON worldearthquakes USING brin (coord)
-# WITH (pages_per_range=10);
BRINBRIN
GiSTGiST
26. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Earthquakes in the USA in 2016Earthquakes in the USA in 2016
=# WITH us_eq AS (
-# SELECT coord FROM world
-# WHERE coord && ‘BOX2D(-126.90 49.73, -65.83 24.73)’::box2d
-# ) SELECT * FROM railways r, us_eq u
-# WHERE ST_Buffer(GEOGRAPHY(ST_Transform(geom), 4326), 10000) && u.coord;
without indexes: ~60s
with GiST: ~20ms
with BRIN: ~400ms
BRIN: 20X slower than GiST – 150X faster than full scan
27. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
LiDAR datasetLiDAR dataset
• Build a Data Elevation Model (DEM) of the
terrain from a set of points (point cloud)
• Resolution: ~1÷50 “# of returns”/m2
→ billions of points!
28. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG
• PostgreSQL9.3 + pg_pointcloud + PostGIS
• GiST indexing in PostGIS, achieved performances:
– RAM 16GB, 1 billion of points (~80GB)
– index size ~O(table size)
– Index was used:
• up to ~300M points in bbox inclusion searches
• up to ~10M points in kNN searches
LiDAR size: ~O(109
÷1011
) → few % can be properly indexed!
http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex
29. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project
• 3D point cloud, coverage: almost the whole Netherlands
– EPSG: 28992, 8÷25 points/m2
• 1.6TB, ~250G points in ~560M patches (compression: ~10x)
– .las files imported through PDAL driver – filter.chipper
• available RAM: 16GB
• Database based on pg_pointcloud extension
– the point structure:
X Y Z scan LAS time RGB chipper
32b 32b 32b 40b 16b 64b 48b 32b
the “indexed” part (can be converted to PostGIS datatype)
(thanks to Tom Van Tilburg)
30. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Typical searches on ahn2 -Typical searches on ahn2 - x3d_viewerx3d_viewer
• Intersection with a polygon (PostGIS)
– the part that lasts longer – need indexes here!
• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)
• DEM: constrained Delaunay Triang. (SFCGAL)
31. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
WITH patches AS (
SELECT patches FROM ahn2
WHERE patches && ST_GeomFromText(‘POLYGON(...)’)
), points AS (
SELECT ST_Explode(patches) AS points
FROM patches
), sorted_points AS (
SELECT points,
ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt
FROM points ORDER BY points <#> poly_pt LIMIT 1;
), sel AS (
SELECT points FROM sorted_points
WHERE points && ST_GeomFromText(‘POLYGON(...)’)
)
SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;
All in the DB & just with one query!All in the DB & just with one query!
32. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - GiST performancepatches && polygons - GiST performance
GiST 2.5 days
GiST 29GB
index building
index size
polygon
size
timing
~O(10m) ~20ms
~O(100m) ~60ms
~O(1Km) ~3s
~O(10km) hours
searches based on GiST
index not contained in
RAM anymore
(~5G points → ~3%)
33. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygon
size
timing
~O(10m) ~150s
searches based on BRIN
index size
34. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygon
size
timing
~O(10m) ~150s
searches based on BRIN
How data was inserted...
LASLASLASLASLASLAS
chipper
chipper
chipper
PDAL driver
(parallel processes)
ahn2
index size
35. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE INDEX patch_geohash ON ahn2
USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));
CLUSTER ahn2 USING patch_geohash;
Morton order [http://geohash.org/]
Need more than 2X table size free space
36. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
37. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
45 hours
38. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
patches && polygons – BRIN performancepatches && polygons – BRIN performance
polygon
size
BRIN
timing
GiST
timing
~O(10m) ~380ms ~20ms
~O(100m) ~400ms ~60ms
~O(1Km) ~2.7s ~3s
~O(10km) ~4.7s hours
• After sorting: 150s→380ms
• GiST X20 faster than BRIN [r~O(10m)]
– BRIN X200 faster than full scan
• BRIN X1000 faster than full scan [r~O(100m)]
•
• BRIN ~ GiST [r~O(1Km)]
• Just BRIN is used for searches r>1km
39. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST scan
= x10÷x100 GiST scan
= x10÷x100 GiST scan
patch explosion
+
NN sorting
constrained
triangulation
40. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST scan
= x10÷x100 GiST scan
= x10÷x100 GiST scan
patch explosion
+
NN sorting
constrained
triangulation
41. J. Rouhaud – julien.rouhaud@dalibo.com G. Broccolo – giuseppe.broccolo@2ndquadrant.it
ConclusionsConclusions
• BRINs can be successfully used in geo DB based on PostgreSQL
– totally support PostGIS datatype, starting since PostGIS 2.3.0
– easier indexes maintenance, low indexing time
– less specific than GiST...but not to much!
• Really small indexes!
– GiST performances drop as well as it cannot be totally contained in RAM
• Can PostgreSQL be used to manage LiDAR data?
– Yes, at least for bbox inclusion searches
– make sure that data has the same sequentiality of .las