We will discuss the recently added geospatial search features in Stratio's Cassandra Lucene index using some applied use cases. These new features include indexing complex polygons, nearest neighbour search and the application of chained geometrical transformations such as bounding box, convex hull, centroid, union, intersection, exclusion and distance buffer.
To discuss the new Stratio's Cassandra Lucene index features, we will use a Cassandra cluster that stores and indexes several millions of geographical shapes taken from the US census database. These use cases will include the search for census blocks inside a geographical area, how to build heat maps using distances to fire stations, and we will also search for properties that are in the trajectory of a hurricane.
About the Speakers
Andres de la Pena Big Data Architect, Stratio
Big Data Architect at Stratio. Author of Stratio's Lucene index for Cassandra. DataStax's Apache Cassandra MVP in 2015 - 16.
Jonathan Nappee IT lead for Weather at Nephila
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
This document discusses Stratio's Cassandra Lucene index and its geospatial search features. It introduces Lucene-based secondary indexes in Cassandra that allow nodes to index their own data while maintaining Cassandra's distributed architecture. It describes geospatial mapping, search operations like bounding boxes and distance searches, and shape transformations. Business use cases are presented for an investment fund, including searching census blocks affected by natural disasters and their proximity to stations.
Geospatial and bitemporal search in cassandra with pluggable lucene indexAndrés de la Peña
Stratio presented its open source Lucene-based implementation of Cassandra’s secondary indexes at Cassandra Summit London 2014, which provided several search engine features. It used to be distributed as a fork of Apache Cassandra, which was a huge problem both for users and maintainers. Nowadays, due to some changes introduced at C* 2.1.6, we are proud to announce that it has become a plugin that can be attached to the official Apache Cassandra. With the plugin we have been able to provide C* with geospatial capabilities, making it possible to index geographical positions and perform bounding box and radial distance queries. This is achieved through Lucene’s geospatial module. Another feature we have provided with our plugin is the possibility of indexing bitemporal data models, which distinguish between system time and business time. This way, it is possible to make queries over C* such as “give me what system thought in a certain instant about what happened in another instant”. The implementation has been performed combining range prefix trees with the 4R-Tree approach exposed by Bliujūtė et al. Both full-text, geospatial and bitemporal queries can be combined with Apache Spark to avoid systematic full-scan, dramatically reducing the amount of data to be processed.
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraPiotr Kolaczkowski
The document discusses using Apache Spark and Apache Cassandra together for fast data analysis as an alternative to Hadoop. It provides examples of basic Spark operations on Cassandra tables like counting rows, filtering, joining with external data sources, and importing/exporting data. The document argues that Spark on Cassandra provides a simpler distributed processing framework compared to Hadoop.
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
Scaling out doesn’t have to mean giving up transactions and efficient joins! Relational databases can scale horizontally, and using them as a store for Spark Streaming or batch computations can help cover areas in which Spark is typically weaker. Examples will be drawn from our experience using Citus (https://github.com/citusdata/citus), an open-source extension to Postgres, but lessons learned should be applicable to many databases.
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
The document discusses time series processing with Solr and Spark. It describes a use case of monitoring data analysis for a distributed software system that generates over 6 trillion observations per year. The Chronix stack is presented as an easy-to-use solution for big time series data storage and processing on Spark. It provides a scale-out time series database with efficient storage and interactive queries by integrating with existing Solr and Spark installations. The Chronix Spark API and internals are covered, focusing on distributed data retrieval, efficient data formats and processing, and best practices for aligning Spark and Solr parallelism.
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
This document discusses using Apache Spark and Cassandra together for real-time analytics on transactional data. It provides an overview of Cassandra and how it can be used for operational applications like recommendations, fraud detection, and messaging. It then discusses how the Spark Cassandra Connector allows reading and writing Cassandra data from Spark, enabling real-time analytics on streaming and batch data using Spark SQL, MLlib, and Spark Streaming. It also covers some DataStax Enterprise features for high availability and integration of Spark and Cassandra.
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
This document discusses Stratio's Cassandra Lucene index and its geospatial search features. It introduces Lucene-based secondary indexes in Cassandra that allow nodes to index their own data while maintaining Cassandra's distributed architecture. It describes geospatial mapping, search operations like bounding boxes and distance searches, and shape transformations. Business use cases are presented for an investment fund, including searching census blocks affected by natural disasters and their proximity to stations.
Geospatial and bitemporal search in cassandra with pluggable lucene indexAndrés de la Peña
Stratio presented its open source Lucene-based implementation of Cassandra’s secondary indexes at Cassandra Summit London 2014, which provided several search engine features. It used to be distributed as a fork of Apache Cassandra, which was a huge problem both for users and maintainers. Nowadays, due to some changes introduced at C* 2.1.6, we are proud to announce that it has become a plugin that can be attached to the official Apache Cassandra. With the plugin we have been able to provide C* with geospatial capabilities, making it possible to index geographical positions and perform bounding box and radial distance queries. This is achieved through Lucene’s geospatial module. Another feature we have provided with our plugin is the possibility of indexing bitemporal data models, which distinguish between system time and business time. This way, it is possible to make queries over C* such as “give me what system thought in a certain instant about what happened in another instant”. The implementation has been performed combining range prefix trees with the 4R-Tree approach exposed by Bliujūtė et al. Both full-text, geospatial and bitemporal queries can be combined with Apache Spark to avoid systematic full-scan, dramatically reducing the amount of data to be processed.
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraPiotr Kolaczkowski
The document discusses using Apache Spark and Apache Cassandra together for fast data analysis as an alternative to Hadoop. It provides examples of basic Spark operations on Cassandra tables like counting rows, filtering, joining with external data sources, and importing/exporting data. The document argues that Spark on Cassandra provides a simpler distributed processing framework compared to Hadoop.
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
Scaling out doesn’t have to mean giving up transactions and efficient joins! Relational databases can scale horizontally, and using them as a store for Spark Streaming or batch computations can help cover areas in which Spark is typically weaker. Examples will be drawn from our experience using Citus (https://github.com/citusdata/citus), an open-source extension to Postgres, but lessons learned should be applicable to many databases.
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
The document discusses time series processing with Solr and Spark. It describes a use case of monitoring data analysis for a distributed software system that generates over 6 trillion observations per year. The Chronix stack is presented as an easy-to-use solution for big time series data storage and processing on Spark. It provides a scale-out time series database with efficient storage and interactive queries by integrating with existing Solr and Spark installations. The Chronix Spark API and internals are covered, focusing on distributed data retrieval, efficient data formats and processing, and best practices for aligning Spark and Solr parallelism.
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
This document discusses using Apache Spark and Cassandra together for real-time analytics on transactional data. It provides an overview of Cassandra and how it can be used for operational applications like recommendations, fraud detection, and messaging. It then discusses how the Spark Cassandra Connector allows reading and writing Cassandra data from Spark, enabling real-time analytics on streaming and batch data using Spark SQL, MLlib, and Spark Streaming. It also covers some DataStax Enterprise features for high availability and integration of Spark and Cassandra.
This document provides an introduction to anomaly detection using Apache Spark. It discusses techniques like clustering, K-means clustering, and using labels to evaluate clustering results. The document demonstrates performing K-means clustering on a network intrusion detection dataset from the KDD Cup 1999. It explores different approaches to clustering like normalization, handling categorical variables, and using entropy with labels to choose the optimal number of clusters. The goal is to detect anomalies that are far from any cluster of normal data points.
Spark streaming can be used for near-real-time data analysis of data streams. It processes data in micro-batches and provides windowing operations. Stateful operations like updateStateByKey allow tracking state across batches. Data can be obtained from sources like Kafka, Flume, HDFS and processed using transformations before being saved to destinations like Cassandra. Fault tolerance is provided by replicating batches, but some data may be lost depending on how receivers collect data.
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
The document provides an agenda and introduction to Couchbase and N1QL. It discusses Couchbase architecture, data types, data manipulation statements, query operators like JOIN and UNNEST, indexing, and query execution flow in Couchbase. It compares SQL and N1QL, highlighting how N1QL extends SQL to query JSON data.
The N1QL is a developer favorite because it’s SQL for JSON. Developer’s life is going to get easier with the upcoming N1QL features. We have exciting features in many areas including language to performance, indexing to search, and tuning to transactions. This session will preview new the features for both new and advanced users.
This document discusses SQL on Druid. It provides an overview of Druid, benchmarks comparing Druid to Spark, and details how SQL can be used with Druid through Hive integration and Druid's built-in SQL functionality. Hive allows SQL queries over Druid data through a Druid storage handler and by translating Hive queries into the appropriate Druid query format. Druid also natively supports SQL queries through its Avatica server, enabling SQL queries directly against Druid data sources.
The document describes a generic arithmetic system that allows uniform access to number packages with different data representations. It defines generic arithmetic procedures like add, sub, mul, and div that apply the corresponding operation for the specific number package. A scheme-number package for integer arithmetic is also installed. Generic tags are attached to values to identify their representation, and a mapping table is used to dispatch operations to appropriate handler procedures based on tags.
From the original abstract:
If you're already using Cassandra you're already aware of it’s strengths of high availability and linear scalability. The downside to this power is less query flexibility. For an OLTP system with an SLA this is an acceptable tradeoff, but for a data scientist it’s extremely limiting.
Enter Apache Spark. Apache spark complements an existing Cassandra cluster by providing a means of executing arbitrary queries, filters, sorting and aggregation. It’s possible to use functional constructs like map, filter, and reduce, as well as SQL and DataFrames.
In this presentation I’ll show you how to process Cassandra data in bulk or through a Kafka stream using Python. Then we’ll visualize our data using iPython notebooks, leveraging Pandas and matplotlib.
This is an advanced talk. We will assume existing knowledge of Cassandra and CQL.
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
This document discusses location intelligence and GeoMesa. It begins with an introduction to location intelligence and GeoMesa. It then covers spatial data types, spatial SQL, and optimizing spatial SQL queries by extending Spark's Catalyst optimizer. Examples are provided to demonstrate calculating density of activity in San Francisco and generating a speed profile of a metro area using location data. Spatial analysis techniques like spatial joins, buffers, and geohashing are explored to extract insights from spatial data at scale.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
This document summarizes Doug Cutting's presentation on using Hadoop for scalable web crawling and indexing with the Nutch project. It describes how Nutch algorithms like crawling, parsing, link inversion, and indexing were converted to MapReduce jobs that can scale to billions of web pages. The document outlines the key Nutch algorithms and how they were adapted to the Hadoop framework using MapReduce.
Leveraging the Power of Solr with SparkQAware GmbH
Lucene Revolution 2016, Boston: Talk by Johannes Weigend (@JohannesWeigend, CTO at QAware).
Abstract: Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.
Server side geo_tools_in_drupal_pnw_2012Mack Hardy
Mack Hardy @mackaffinity from Affinity Bridge @affinitybridge discusses server side mapping tools for drupal, using PostGIS as a spatial backend, generating tiles and managing large sets of geodata and displaying it in Drupal CMS
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
Stratosphere is the next generation big data processing engine.
These slides introduce the most important features of Stratosphere by comparing it with Apache Hadoop.
For more information, visit stratosphere.eu
Based on university research, it is now a completely open-source, community driven development with focus on stability and usability.
This document provides an introduction to Apache Spark, including its core components, architecture, and programming model. Some key points:
- Spark uses Resilient Distributed Datasets (RDDs) as its fundamental data structure, which are immutable distributed collections that allow in-memory computing across a cluster.
- RDDs support transformations like map, filter, reduce, and actions like collect that return results. Transformations are lazy while actions trigger computation.
- Spark's execution model involves a driver program that coordinates tasks on worker nodes using an optimized scheduler.
- Spark SQL, MLlib, GraphX, and Spark Streaming extend the core Spark API for structured data, machine learning, graph processing, and stream processing
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
When a large group of people change their habits, it can be tricky for infrastructures! Working from home and spending time indoor today means attending video calls and streaming movies and tv shows. This leads to increased internet traffic that can create congestion on the network infrastructure. So how do you get real-time visibility into your ISP connection? In this meetup, Mirko presents his setup based on a time series database and Raspberry Pi to better understand his ISP connection quality and speed — including upload and download speeds. Join us to discover how he does it using Telegraf, InfluxDB Cloud, Astro Pi, Telegram and Grafana! Finally, proof that your ISP connection is (or is not) as fast as it promises.
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...DataStax Academy
The document describes a method for indexing and searching bitemporal data in Cassandra using Lucene indexes. It proposes using four R-trees, with each R-tree represented using two DateRangePrefixTrees in Lucene to index the data by valid and transaction time ranges. Queries are transformed and distributed to search the appropriate R-trees and DateRangePrefixTrees to retrieve bitemporal data within the specified time ranges.
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Big Data Spain
Stratio presented its open source Lucene-based implementation of Cassandra’s secondary indexes at Cassandra Summit London 2014, which provided several search engine features. It used to be distributed as a fork of Apache Cassandra, which was a huge problem both for users and maintainers. Nowadays, due to some changes introduced at C* 2.1.6, we are proud to announce that it has become a plugin that can be attached to the official Apache Cassandra.
With the plugin we have been able to provide C* with geospatial capabilities, making it possible to index geographical positions and perform bounding box and radial distance queries. This is achieved through Lucene’s geospatial module.
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-9.html
This document provides an introduction to anomaly detection using Apache Spark. It discusses techniques like clustering, K-means clustering, and using labels to evaluate clustering results. The document demonstrates performing K-means clustering on a network intrusion detection dataset from the KDD Cup 1999. It explores different approaches to clustering like normalization, handling categorical variables, and using entropy with labels to choose the optimal number of clusters. The goal is to detect anomalies that are far from any cluster of normal data points.
Spark streaming can be used for near-real-time data analysis of data streams. It processes data in micro-batches and provides windowing operations. Stateful operations like updateStateByKey allow tracking state across batches. Data can be obtained from sources like Kafka, Flume, HDFS and processed using transformations before being saved to destinations like Cassandra. Fault tolerance is provided by replicating batches, but some data may be lost depending on how receivers collect data.
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
The document provides an agenda and introduction to Couchbase and N1QL. It discusses Couchbase architecture, data types, data manipulation statements, query operators like JOIN and UNNEST, indexing, and query execution flow in Couchbase. It compares SQL and N1QL, highlighting how N1QL extends SQL to query JSON data.
The N1QL is a developer favorite because it’s SQL for JSON. Developer’s life is going to get easier with the upcoming N1QL features. We have exciting features in many areas including language to performance, indexing to search, and tuning to transactions. This session will preview new the features for both new and advanced users.
This document discusses SQL on Druid. It provides an overview of Druid, benchmarks comparing Druid to Spark, and details how SQL can be used with Druid through Hive integration and Druid's built-in SQL functionality. Hive allows SQL queries over Druid data through a Druid storage handler and by translating Hive queries into the appropriate Druid query format. Druid also natively supports SQL queries through its Avatica server, enabling SQL queries directly against Druid data sources.
The document describes a generic arithmetic system that allows uniform access to number packages with different data representations. It defines generic arithmetic procedures like add, sub, mul, and div that apply the corresponding operation for the specific number package. A scheme-number package for integer arithmetic is also installed. Generic tags are attached to values to identify their representation, and a mapping table is used to dispatch operations to appropriate handler procedures based on tags.
From the original abstract:
If you're already using Cassandra you're already aware of it’s strengths of high availability and linear scalability. The downside to this power is less query flexibility. For an OLTP system with an SLA this is an acceptable tradeoff, but for a data scientist it’s extremely limiting.
Enter Apache Spark. Apache spark complements an existing Cassandra cluster by providing a means of executing arbitrary queries, filters, sorting and aggregation. It’s possible to use functional constructs like map, filter, and reduce, as well as SQL and DataFrames.
In this presentation I’ll show you how to process Cassandra data in bulk or through a Kafka stream using Python. Then we’ll visualize our data using iPython notebooks, leveraging Pandas and matplotlib.
This is an advanced talk. We will assume existing knowledge of Cassandra and CQL.
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
This document discusses location intelligence and GeoMesa. It begins with an introduction to location intelligence and GeoMesa. It then covers spatial data types, spatial SQL, and optimizing spatial SQL queries by extending Spark's Catalyst optimizer. Examples are provided to demonstrate calculating density of activity in San Francisco and generating a speed profile of a metro area using location data. Spatial analysis techniques like spatial joins, buffers, and geohashing are explored to extract insights from spatial data at scale.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
My Hadoop Ecosystem presentation at the 2011 BreizhCamp.
See the talk video (in french):
http://mediaserver.univ-rennes1.fr/videos/?video=MEDIA110628093346744
This document summarizes Doug Cutting's presentation on using Hadoop for scalable web crawling and indexing with the Nutch project. It describes how Nutch algorithms like crawling, parsing, link inversion, and indexing were converted to MapReduce jobs that can scale to billions of web pages. The document outlines the key Nutch algorithms and how they were adapted to the Hadoop framework using MapReduce.
Leveraging the Power of Solr with SparkQAware GmbH
Lucene Revolution 2016, Boston: Talk by Johannes Weigend (@JohannesWeigend, CTO at QAware).
Abstract: Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.
Server side geo_tools_in_drupal_pnw_2012Mack Hardy
Mack Hardy @mackaffinity from Affinity Bridge @affinitybridge discusses server side mapping tools for drupal, using PostGIS as a spatial backend, generating tiles and managing large sets of geodata and displaying it in Drupal CMS
• Distributed datasets loaded into named columns (similar to relational DBs or
Python DataFrames).
• Can be constructed from existing RDDs or external data sources.
• Can scale from small datasets to TBs/PBs on multi-node Spark clusters.
• APIs available in Python, Java, Scala and R.
• Bytecode generation and optimization using Catalyst Optimizer.
• Simpler DSL to perform complex and data heavy operations.
• Faster runtime performance than vanilla RDDs.
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
Stratosphere is the next generation big data processing engine.
These slides introduce the most important features of Stratosphere by comparing it with Apache Hadoop.
For more information, visit stratosphere.eu
Based on university research, it is now a completely open-source, community driven development with focus on stability and usability.
This document provides an introduction to Apache Spark, including its core components, architecture, and programming model. Some key points:
- Spark uses Resilient Distributed Datasets (RDDs) as its fundamental data structure, which are immutable distributed collections that allow in-memory computing across a cluster.
- RDDs support transformations like map, filter, reduce, and actions like collect that return results. Transformations are lazy while actions trigger computation.
- Spark's execution model involves a driver program that coordinates tasks on worker nodes using an optimized scheduler.
- Spark SQL, MLlib, GraphX, and Spark Streaming extend the core Spark API for structured data, machine learning, graph processing, and stream processing
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
When a large group of people change their habits, it can be tricky for infrastructures! Working from home and spending time indoor today means attending video calls and streaming movies and tv shows. This leads to increased internet traffic that can create congestion on the network infrastructure. So how do you get real-time visibility into your ISP connection? In this meetup, Mirko presents his setup based on a time series database and Raspberry Pi to better understand his ISP connection quality and speed — including upload and download speeds. Join us to discover how he does it using Telegraf, InfluxDB Cloud, Astro Pi, Telegram and Grafana! Finally, proof that your ISP connection is (or is not) as fast as it promises.
Stratio: Geospatial and bitemporal search in Cassandra with pluggable Lucene ...DataStax Academy
The document describes a method for indexing and searching bitemporal data in Cassandra using Lucene indexes. It proposes using four R-trees, with each R-tree represented using two DateRangePrefixTrees in Lucene to index the data by valid and transaction time ranges. Queries are transformed and distributed to search the appropriate R-trees and DateRangePrefixTrees to retrieve bitemporal data within the specified time ranges.
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Big Data Spain
Stratio presented its open source Lucene-based implementation of Cassandra’s secondary indexes at Cassandra Summit London 2014, which provided several search engine features. It used to be distributed as a fork of Apache Cassandra, which was a huge problem both for users and maintainers. Nowadays, due to some changes introduced at C* 2.1.6, we are proud to announce that it has become a plugin that can be attached to the official Apache Cassandra.
With the plugin we have been able to provide C* with geospatial capabilities, making it possible to index geographical positions and perform bounding box and radial distance queries. This is achieved through Lucene’s geospatial module.
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-9.html
Querying Nested JSON Data Using N1QL and CouchbaseBrant Burnett
There are a lot of solutions for querying JSON data available, most of which are proprietary and require a steep learning curve. Couchbase's N1QL (Non-First Normal Form Query Language) is a very powerful query language built on top of the SQL we all know and love (well, mostly love). It's really amazing how easy N1QL is for current SQL users.
In this session, we'll delve into the differences between SQL and N1QL, learning how it layers new features on top of ANSI SQL to support nested data and JSON types. We'll also go in depth into indexing JSON data using Couchbase, covering how to design and troubleshoot your indexes to drive spectacular performance at scale.
The document discusses different techniques for performing geo-spatial searches with MySQL to find points of interest near a given location. It covers calculating distance between points using the Haversine formula, optimizing queries by limiting the search area, and using spatial extensions, full-text search, or external search engines like Sphinx to enable both geo and text searches. Demo examples show finding nearby POIs matching a keyword within a radius of the user's GPS point.
Geo distance search with my sql presentationGSMboy
The document discusses various techniques for performing geo-spatial searches with MySQL to find points of interest near a given location. It covers calculating distance between points using the Haversine formula, optimizing queries by limiting the search area, and using spatial extensions, full-text search, or external search engines like Sphinx to enable both geo and text searching. Demo examples show finding nearby POIs matching a keyword within a radius of the user's GPS point.
The document discusses new features and capabilities of Bing Maps including 3D models, map styling, extensions, offline maps, and more. It provides examples of using the Bing Maps API and Spatial Data Services to perform tasks like finding nearby locations, drawing isochrones, reverse geocoding addresses, and customizing maps. Finally, it mentions some organizations that utilize Bing Maps and links to additional Bing Maps resources.
The document describes a summer project analyzing distance-related variables at the block level in New York City. The project aims to calculate distances from city blocks to various points of interest, such as subway stations, parks, and other amenities, and analyze how these distances impact property values. The methodology uses GIS network analysis to calculate walking distances and Euclidean distances to measure externalities. Distances will be classified into groups for future hedonic modeling. The results can be used as variables to understand how proximity to amenities affects property values.
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
An important underlying concept behind location-based applications is called geofencing. Geofencing is a process that allows acting on users and/or devices who enter/exit a specific geographical area, known as a geo-fence. A geo-fence can be dynamically generated—as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as secured areas, buildings, boarders of counties, states or countries). Geofencing lays the foundation for realising use cases around fleet monitoring, asset tracking, phone tracking across cell sites, connected manufacturing, ride-sharing solutions and many others. Many of the use cases mentioned above require low-latency actions taken place, if either a device enters or leaves a geo-fence or when it is approaching such a geo-fence. That’s where streaming data ingestion and streaming analytics and therefore the Kafka ecosystem comes into play. This session will present how location analytics applications can be implemented using Kafka and KSQL & Kafka Streams. It highlights the exiting features available out-of-the-box and then shows how easy it is to extend it by custom defined functions (UDFs).
This document provides an agenda for a presentation on integrating Apache Cassandra and Apache Spark. The presentation will cover RDBMS vs NoSQL databases, an overview of Cassandra including data model and queries, and Spark including RDDs and running Spark on Cassandra data. Examples will be shown of performing joins between Cassandra and Spark DataFrames for both simple and complex queries.
SenchaCon 2016: Integrating Geospatial Maps & Big Data Using CartoDB via Ext ...Sencha
Come explore with us CartoDB - Ext JS components (www.cartodb.com). These new components will allow you, as developers, to visualize and interact with geospatial data using up to a billion data points in real time. We will show you how easy it is to enable visualizations, filter dynamically, create time-lapse animations, and explore large location datasets at unprecedented scale. Come learn how to use these new open source components to build interactive geospatial visualizations that deliver solutions, value, and insights to your customers.
The document summarizes geospatial capabilities in Elasticsearch and Kibana. It covers topics like:
1. Geospatial indexing, search, and visualizations in Kibana including coordinate maps and region maps.
2. Geo field mappings for geo_point and geo_shape fields.
3. How geospatial data is indexed and searched in Elasticsearch, including improvements in Elasticsearch 5.0+.
4. Geo aggregations like geo_distance, geo_grid, and geo_centroid aggregations.
The document provides examples and discusses future improvements to geospatial features in Elasticsearch and Kibana.
A Century Of Weather Data - Midwest.ioRandall Hunt
This document summarizes the key considerations and performance tests for storing and querying a large weather dataset containing over 2.5 billion data points. It describes the schema design using MongoDB to embed data and index on location. Bulk loading of data was 10 hours on a single server but only 3 hours on a sharded cluster. Queries for a single data point were fastest on the cluster at under 1ms while worldwide queries were faster at 310/second. Analytics like maximum temperature took 2.5 hours on a single server but only 2 minutes on the cluster. The cluster provided much higher throughput and better performance for complex queries while being more expensive.
This webinar will give an overview of CREATE STATISTICS in PostgreSQL. This command allows the database to collect multi-column statistics, helping the optimizer understand dependencies between columns, produce more accurate estimates, and better query plans.
The following key topics will be covered during the webinar:
- Why CREATE STATISTICS may be needed at all
- How the command works
- Which cases CREATE STATISTICS already addresses
- What improvements are in the queue for future PostgreSQL versions (either already committed to PostgreSQL 13 or beyond)
This document discusses geographic information systems (GIS) and how to work with geospatial data using Python and related tools. It introduces common geospatial data formats like KML, GML, and GeoJSON. It also discusses storing geospatial data in spatial databases like PostGIS. The document then covers how to obtain open geospatial data from OpenStreetMap and load it into a database. It demonstrates rendering geospatial data to maps using the Mapnik library and Python. Finally, it briefly discusses tile-based map services and front-end mapping libraries like OpenLayers that can display rendered geospatial data on web maps.
PGDay.Amsterdam 2018 - Bruce Momjian - Will postgres live foreverPGDay.Amsterdam
Bruce will explain how open source software can live for a very long time, and covers the differences between proprietary and open source software life cycles. He will also cover the increased adoption of open source, and many of the ways that Postgres is innovating to continue to be relevant.
Visualizing large datasets with js using deckglMarko Letic
Slides from a talk presented at code.talks 2019 conference in Hamburg, Germany.
Note: This is a keynote presentation converted to PDF. Originally it has videos that are not included here.
Talk description:
When talking about data visualization and JavaScript your mind usually goes to D3.js. But if our data has a location-based representation, we are faced with a limited choice. The main topic of this talk is to introduce the audience with deck.gl, an open-source WebGL-powered library developed by Uber that allows us to create beautiful data visualizations of large datasets and raise the level of interactivity for the user on a whole new level. A short introduction to the library and it’s API will be demonstrated along with practical use-cases, live-code examples and it’s integration with popular frameworks such as Angular and React.
Video: https://www.youtube.com/watch?v=sG25WdhbsFg
JS Fest 2019/Autumn. Marko Letic. Saving the world with JavaScript: A Data Vi...JSFestUA
Did you know that the beginnings of data visualization are strongly tied to solving some of the biggest problems humanity has ever faced? Wouldn’t it be more interesting to say that you’re not a doctor, but you do save lives than to say you’re just a developer?
When talking about data visualization and JavaScript your mind usually goes to D3.js. But if our data has a location-based representation, we are faced with a limited choice. The main topic of this talk is to introduce the audience with deck.gl, an open-source WebGL-powered library developed by Uber that allows us to create beautiful data visualizations of large datasets and raise the level of interactivity for the user on a whole new level. We’ll see how our code can tell a story and how that story can potentially save lives. A short introduction to the library and it’s API will be demonstrated along with practical use-cases, live-code examples and it’s integration with popular frameworks such as Angular and React.
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
This document discusses geospatial analytics capabilities in IBM dashDB. It describes how dashDB supports geospatial data types and functions that allow spatial queries and analysis. This includes functions for spatial predicates, constructors, and calculations. GeoJSON and other formats can be loaded and dashDB implements OGC and ISO spatial standards. Predictive analytics is also possible using the R extension to dashDB. Overall the summary discusses dashDB's geospatial and predictive analytic capabilities for spatial data.
Building a real time big data analytics platform with solrTrey Grainger
Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.
The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.
Similar to Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & Jonathan Nappee, Nephila) | C* Summit 2016 (20)
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
Be a holiday hero—not a sorry statistic. View this on-demand webinar to learn how to drive revenue, business growth, customer satisfaction, and loyalty during the holiday season, and achieve operational excellence (and sanity!) at the same time. You’ll also hear real-world stories of companies that have experienced Black Friday nightmares—and learn how they turned things back around.
View webinar: https://pages.datastax.com/20191003-NAM-Webinar-IsYourEnterpriseReadytoShinethisHolidaySeason_1-Registration-LP.html
Explore all DataStax webinars: www.datastax.com/webinars
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
Data resiliency and availability are mission-critical for enterprises today—yet we live in a world where outages are an everyday occurrence. Whether the problem is a single server failure or losing connectivity to an entire data center, if your applications aren’t designed to be fault tolerant, recovery from an outage can be painful and slow. Watch this on-demand webinar to look at best practices for developing fault-tolerant applications with DataStax Drivers for Apache Cassandra and DataStax Enterprise (DSE).
View recording: https://youtu.be/NT2-i3u5wo0
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
To simplify deploying and managing modern applications, enterprises have been combining the benefits of hyperconverged infrastructure (HCI) with the performance and scale of a NoSQL database — and the results have been remarkable. With this combination, IT organizations have experienced more agility, improved reliability, and better application performance. Watch this on-demand webinar where you’ll learn specifically how VMware HCI with DataStax Enterprise (DSE) and Apache Cassandra™ are transforming the enterprise.
View recording: https://youtu.be/FCLGHMIB0L4
Explore all DataStax Webinars: https://www.datastax.com/resources/webinars
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
The document provides five tips for getting DataStax Enterprise Graph into production:
1) Know your data distributions and important relationships.
2) Understand your access patterns and model the data for common queries.
3) Optimize query performance by filtering vertices, choosing starting points to reduce edges traversed, and adding shortcuts.
4) Design a supernode strategy such as modeling supernodes as properties, adding edge indexes, or making vertices more granular.
5) Embrace a multi-model approach using the best tool like DSE Graph for complex connected data queries.
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment.
View webinar: https://youtu.be/RrTxQ2BAxjg
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks.
View Recording: https://youtu.be/McZg_MMzVjI
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
In this webinar, you’ll also be introduced to DataStax Apache Kafka Connector, and get a brief demonstration of this groundbreaking technology. You’ll directly experience how this tool can help you stream data from Kafka topics into DataStax Enterprise versions of Cassandra. The future of your organization won’t wait. Register now to reserve your spot in this exciting new webinar.
Youtube: https://youtu.be/HmkNb8twUNk
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
No matter how diligent your organization is at driving toward efficiency, databases are complex and it’s easy to make mistakes on your way to production. The good news is, these mistakes are completely avoidable. In this webinar, Jeff Carpenter shares with you exactly how to get started in the right direction — and stay on the path to a successful database launch.
View recording: https://youtu.be/K9Zj3bhjdQg
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar.
View recording: https://youtu.be/z8fLn8GL5as
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
In this webinar, we’ll discuss how an Active Everywhere database—a masterless architecture where multiple servers (or nodes) are grouped together in a cluster—provides a consistent data fabric between on-premises data centers and public clouds, enabling enterprises to effortlessly scale their hybrid cloud deployments and easily transition to the new hybrid cloud world, without changes to existing applications.
View recording: https://youtu.be/ob6tr-9YiF4
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
This webinar discussed how DataStax and Thales eSecurity can help organizations comply with GDPR requirements in today's hybrid cloud environments. The key points are:
1) GDPR compliance and hybrid cloud are realities organizations must address
2) A single "point solution" is insufficient - partnerships between data platform and security services providers are needed
3) DataStax and Thales eSecurity can provide the necessary access controls, authentication, encryption, auditing and other capabilities across disparate environments to meet the 7 key GDPR security requirements.
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
Most enterprises understand the value of hybrid cloud. In fact, your enterprise is already working in a multi-cloud or hybrid cloud environment, whether you know it or not. View this SlideShare to gain a greater understanding of the requirements of a geo-distributed cloud database in hybrid and multi-cloud environments.
View recording: https://youtu.be/tHukS-p6lUI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Evaluate Cloud Databases for eCommerceDataStax
The document discusses how ecommerce companies need to evaluate cloud databases to handle high transaction volumes, real-time processing, and personalized customer experiences. It outlines how DataStax Enterprise (DSE), which is built on Apache Cassandra, provides an always-on, distributed database designed for hybrid cloud environments. DSE allows companies to address the five key dimensions of contextual, always-on, distributed, scalable, and real-time requirements through features like mixed workloads, multi-model flexibility, advanced security, and faster performance. Case studies show how large ecommerce companies like eBay use DSE to power recommendations and handle high volumes of traffic and data.
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
Today’s customers want experiences that are contextual, always on, and above all — delightful. To be able to provide this, enterprises need a distributed, hybrid cloud-ready database that can easily crunch massive volumes of data from disparate sources while offering data autonomy and operational simplicity. Don’t miss this webinar, where you’ll learn how DataStax Enterprise 6 maintains hybrid cloud flexibility with all the benefits of a distributed cloud database, delivers all the advantages of Apache Cassandra with none of the complexities, doubles performance, and provides additional capabilities around robust transactional analytics, graph, search, and more.
View recording: https://youtu.be/tuiWAt2jwBw
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
This document discusses the partnership between DataStax and Microsoft Azure to empower enterprises with real-time applications in the cloud. It outlines how hybrid cloud is a strategic imperative, and how the DataStax Enterprise platform combined with Azure provides a hybrid cloud data platform for always-on applications. Examples are given of Microsoft Office 365, Komatsu, and IHS Markit using this solution to power use cases and gain benefits like increased performance, scalability, and cost savings.
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
Welcome to the Right-Now Economy. To win in the Right-Now Economy, your enterprise needs to be able to provide delightful, always-on, instantaneously responsive applications via a data layer that can handle data rapidly, in real time, and at cloud scale. Don’t miss our upcoming webinar in which Forrester Principal Analyst Brendan Witcher will discuss why a singular, contextual, 360-degree view of the customer in real-time is critical to CX success and how companies are using data to deliver real-time personalization and recommendations.
View recording: https://youtu.be/e6prezfIGMY
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Datastax - The Architect's guide to customer experience (CX)DataStax
The document discusses how DataStax Enterprise can help companies deliver superior customer experiences in the "right-now economy" by providing a unified data layer for customer-related use cases. It describes how DSE provides contextual customer views in real-time, hybrid cloud capabilities, massive scalability and continuous availability, integrated security, and a flexible data model to support evolving customer data needs. The document also provides an example of how Macquarie Bank uses DSE to drive their customer experience initiatives and transform their digital presence.
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. Join this webinar, to hear leading experts from DataStax, discuss how DataStax Enterprise, the data management platform trusted by 9 out of the top 15 global banks, enables innovation and industry transformation. They’ll cover how the right data management platform can help break down data silos and modernize old systems of record as an operational data layer that scales to meet the distributed, real-time, always available demands of the enterprise. Register now to learn how the right data management platform allows you to power innovative banking applications, gain instant insight into comprehensive customer interactions, and beat fraud before it happens.
Video: https://youtu.be/319NnKEKJzI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. How can you contextualize and analyze all this customer data in real time to meet increasingly demanding customer expectations? Join Mike Rowland, Director and National Practice Leader for CX Strategy at West Monroe Partners, and Kartavya Jain, Product Marketing Manager at DataStax, for an in-depth conversation about how customer experience frameworks, driven by Design Thinking, can help enterprises: understand their customers and their needs, define their strategy for real-time CX, create value from contextual and instant insights.
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Flutter vs. React Native: A Detailed Comparison for App Development in 2024dhavalvaghelanectarb
Choosing the right framework for your cross-platform mobile app can be a tough decision. Both Flutter and React Native offer compelling features and have earned their place in the development world. Here is a detailed comparison to help you weigh their strengths and weaknesses. Here are the pros and cons of developing mobile apps in React Native vs Flutter.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfkalichargn70th171
Moving to a more digitally focused era, the importance of software is rapidly increasing. Software tools are crucial for upgrading life standards, enhancing business prospects, and making a smart world. The smooth and fail-proof functioning of the software is very critical, as a large number of people are dependent on them.
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
Photoshop Tutorial for Beginners (2024 Edition)alowpalsadig
Photoshop Tutorial for Beginners (2024 Edition)
Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."
Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
Photoshop Tutorial for Beginners (2024 Edition)Explore the evolution of programming and software development and design in 2024. Discover emerging trends shaping the future of coding in our insightful analysis."Here's an overview:Introduction: The Evolution of Programming and Software DevelopmentThe Rise of Artificial Intelligence and Machine Learning in CodingAdopting Low-Code and No-Code PlatformsQuantum Computing: Entering the Software Development MainstreamIntegration of DevOps with Machine Learning: MLOpsAdvancements in Cybersecurity PracticesThe Growth of Edge ComputingEmerging Programming Languages and FrameworksSoftware Development Ethics and AI RegulationSustainability in Software EngineeringThe Future Workforce: Remote and Distributed TeamsConclusion: Adapting to the Changing Software Development LandscapeIntroduction: The Evolution of Programming and Software Development
The importance of developing and designing programming in 2024
Programming design and development represents a vital step in keeping pace with technological advancements and meeting ever-changing market needs. This course is intended for anyone who wants to understand the fundamental importance of software development and design, whether you are a beginner or a professional seeking to update your knowledge.
Course objectives:
1. **Learn about the basics of software development:
- Understanding software development processes and tools.
- Identify the role of programmers and designers in software projects.
2. Understanding the software design process:
- Learn about the principles of good software design.
- Discussing common design patterns such as Object-Oriented Design.
3. The importance of user experience (UX) in modern software:
- Explore how user experience can improve software acceptance and usability.
- Tools and techniques to analyze and improve user experience.
4. Increase efficiency and productivity through modern development tools:
- Access to the latest programming tools and languages used in the industry.
- Study live examples of applications
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Transforming Product Development using OnePlan To Boost Efficiency and Innova...OnePlan Solutions
Ready to overcome challenges and drive innovation in your organization? Join us in our upcoming webinar where we discuss how to combat resource limitations, scope creep, and the difficulties of aligning your projects with strategic goals. Discover how OnePlan can revolutionize your product development processes, helping your team to innovate faster, manage resources more effectively, and deliver exceptional results.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
5. Apache Lucene
• General purpose search library
• Created by Doug Cutting in 1999
• Core of popular search engines:
‒ Apache Nutch, Compass, Apache Solr, ElasticSearch
• Tons of features:
‒ Full-text search, inequalities, sorting, geospatial, aggregations…
• Rich implementation:
‒ Multiple index structures, smart query planning, cool merge policy…
5/40
6. A Lucene-based C* 2i implementation
• Each node indexes its own data
• Keep P2P architecture
• Distribution managed by C*
• Replication managed by C*
• Just a single pluggable JAR file
CLIENT
C*
node
C*
node
C*
node
Lucene
index
Lucene
index
Lucene
indexJVM
JVM
JVM
6/40
7. Creating Lucene indexes
CREATE TABLE tweets (
user text,
date timestamp,
message text,
hashtags set<text>
PRIMARY KEY (user, date));
• Built in the background
• Dynamic updates
• Immutable mapping schema
• Many columns per index
• Many indexes per table
CREATE CUSTOM INDEX tweets_idx ON tweets()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{fields : {
user : {type: "string"},
date : {type: "date", pattern: "yyyy-MM-dd"},
message : {type: "text", analyzer: "english"},
hashtags: {type: "string"}}}'};
7/40
8. Querying Lucene indexes
SELECT * FROM tweets WHERE expr(tweets_idx, '{
filter: {
must: {type: "phrase", field: "message", value: "cassandra is cool"},
not: {type: "wildcard", field: "hashtags", value: "*cassandra*"}
},
sort: {field: "date", reverse: true}
}') AND user = 'adelapena' AND date >= '2016-01-01';
• Custom JSON syntax
• Multiple query types
• Multivariable conditions
• Multivariable sorting
• Separate filtering and relevance queries
8/40
9. Java query builder
import static com.datastax.driver.core.querybuilder.QueryBuilder.*;
import static com.stratio.cassandra.lucene.builder.Builder.*;
{…}
String search = search().filter(phrase("message", "cassandra is cool"))
.filter(not(wildcard("hashtags", "*cassandra*")))
.sort(field("date").reverse(true))
.build();
session.execute(select().from("tweets")
.where(eq("lucene", search))
.and(eq("user", "adelapena"))
.and(lte("date", "2016-01-01")));
• Available for JVM languages: Java, Scala, Groovy…
• Compatible with most Cassandra clients
9/40
10. Apache Spark integration
• Compute large amount of data
• Maximizes parallelism
• Filtering push-down
• Avoid full-scan
C*
node
JVM
Lucene
index
C*
node
JVM
Lucene
index
C*
node
JVM
Lucene
index
spark
master
10/40
12. Geo point mapper
CREATE CUSTOM INDEX restaurants_idx
ON restaurants (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
location : {
type : "geo_point",
latitude : "lat",
longitude : "lon"
},
stars: {type : "integer" }
}
}
'};
CREATE TABLE restaurants(
name text PRIMARY KEY,
stars bigint,
lat double,
lon double);
14/40
13. Bounding box search
SELECT * FROM restaurants
WHERE lucene =
'{
filter :
{
type : "geo_bbox",
field : "location",
min_latitude : 40.425978,
max_latitude : 40.445886,
min_longitude : -3.808252,
max_longitude : -3.770999
}
}';
15/40
14. Distance search
SELECT * FROM restaurants
WHERE lucene =
'{
filter :
{
type : "geo_distance",
field : "location",
latitude : 40.443270,
longitude : -3.800498,
min_distance : "100m",
max_distance : "2km"
}
}';
16/40
15. Distance sorting
SELECT * FROM restaurants
WHERE lucene =
'{
sort:
{
type : "geo_distance",
field : "location",
reverse : false,
latitude : 40.442163,
longitude : -3.784519
}
}' LIMIT 10;
17/40
16. Indexing complex geospatial shapes
CREATE TABLE places(
id uuid PRIMARY KEY,
shape text -- WKT formatted
);
CREATE CUSTOM INDEX places_idx ON places()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'schema': '{
fields: {
shape: {
type: "geo_shape",
max_levels: 15,
transformations: []
}
}
}'
};
• Points, lines, polygons & multiparts
• JTS index-time transformations
18/40
17. CREATE CUSTOM INDEX places_idx ON places()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields: {
shape: {
type: "geo_shape",
max_levels: 15,
transformations: [{type: "centroid"}]
}
}
}'
};
Index-time shape transformations
• Example: Index only centroid of shapes
19/40
18. Index-time shape transformations
• Example: Index 50 km buffer zone around shapes
CREATE CUSTOM INDEX places_idx ON places()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'schema': '{
fields: {
shape: {
type: "geo_shape",
max_levels: 15,
transformations: [{
type: "buffer",
min_distance: "50km"}]
}
}
}'
};
20/40
19. CREATE CUSTOM INDEX places_idx ON places()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields: {
shape: {
type: "geo_shape",
max_levels: 8,
transformations:
[{type: "convex_hull"}]
}
}
}'
};
Index-time shape transformations
• Example: Index the convex hull of the shape
21/40
20. Search by geo shape
• Can search points and shapes using shapes
• Operations define how you search: Intersects, Is_within, Contains
• Can use transformations before searching
‒ Bounding box
‒ Buffer
‒ Centroid
‒ Convex Hull
‒ Difference
‒ Intersection
‒ Union
22/40
23. • Investment fund with large exposures to natural catastrophe insurance on properties
• Many geographical data sets:
‒ properties details
‒ natural catastrophe event data
o Hurricane tracks and affected zones
o Earthquakes impact zones
• Risks and portfolios
23/40
24. Use cases data set
• We indexed all the US census blocks shapes from the Hazus Database
‒ https://www.fema.gov/hazus
‒ These blocks contain revenue and building stats that are useful for
pricing insurance premiums and potential losses
o Average revenue
o Number of stories
‒ Some of them are very complex
o First attempt with convex hull
o Composite indexing strategy with ±2km geohash and doc values in
borders
• We also indexed all police and firestations in the US
24/40
25. Use cases data set
CREATE TABLE blocks (
state text,
bucket int,
id int,
area double,
type text,
income_ratio double,
latitude double,
longitude double,
shape text,
...
lucene text,
PRIMARY KEY ((state, bucket),
id)
);
CREATE CUSTOM INDEX block_idx ON blocks(lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds': '1',
'schema': '{
fields : {
state : {type: "string"},
type : {type: "string"},
...
center: {type: "geo_point",
max_levels: 11,
latitude: "latitude",
longitude: "longitude"},
shape : {type: "geo_shape",
max_levels: 5}
}
}'};
25/40
26. Use cases data set
CREATE TABLE fire_stations(
state text,
id text,
city text,
latitude double,
longitude double,
shape text,
...
lucene text,
PRIMARY KEY (state, id)
);
CREATE TABLE police_stations(
state text,
id text,
city text,
latitude double,
longitude double,
shape text,
...
lucene text,
PRIMARY KEY (state, id)
);
• Analogous indexing for police and fire stations tables
26/40
27. Composite spatial strategy
• Meant for indexing complex polygons
• Two spatial strategies combined
‒ GeoHash recursive prefix tree for speed
‒ Serialized doc values for accuracy
• Reduced number of geohash terms
• Doc values only for polygon borders
David Smiley blog post:
http://opensourceconnections.com/blog/2014/04/1
1/indexing-polygons-in-lucene-with-accuracy
27/40
28. Use cases: Search blocks in a shape
• We search which census blocks intersect with a shape
SELECT * FROM blocks
WHERE expr(blocks_index, '{
filter: {
type: "geo_shape",
field: "shape",
operation: "intersects",
shape: {
type: "buffer",
max_distance: "10km",
shape: {
type: "wkt",
value: "LINESTRING -80.90 29.05...)"
}
}
}
}';
28/40
29. Use cases: Search blocks far from police and fire stations
• Proximity to police and fire stations can have an impact on damage when
natural catastrophe event happens
• We can use this information to search for blocks in our portfolio that are more
than 8 miles from any station to highlight their risk
29/40
30. Use cases: Search blocks far from fire stations
SELECT * FROM fire_stations WHERE lucene = '{
filter : {
type: "geo_shape",
field: "centroid",
shape: {value: "POLYGON(…)"}}
}';
SELECT * FROM blocks WHERE lucene = '{
filter : {
must: {
type: "geo_shape",
field: "shape ",
shape: {value: "POLYGON(…)"}},
not: {
type: "geo_shape",
field: "shape",
shape: {
type: "buffer",
max_distance: "8mi",
shape: {value: "MULTIPOINT(…)"}}}
}}';
30/40
31. Use cases:
Find which blocks are affected by a moving hurricane and their
maximum wind speed exposures
• If we are modelling a hurricane we end up with a changing shape every 6
hours, with different location and wind speeds
• We want to find for each state which blocks are hit and at which maximum
wind speed
• We use transformations to represent the moving hurricane and within that the
different wind speeds
31/40
34. Conclusions
• New pluggable geospatial features in Cassandra
‒ Complex polygon search
‒ Geometrical transformations API
• Can be combined with other search predicates
• Compatible with MapReduce frameworks
• Preserves Cassandra's functionality
34/40
36. THANK YOU
UNITED STATES
Tel: (+1) 408 5998830
EUROPE
Tel: (+34) 91 828 64 73
contact@stratio.com
www.stratio.com
Editor's Notes
Hello everyone, my name is Andrés de la Peña, from Stratio, and this is Jonathan Nappée, from Nephila Capital.
Today, we are going to talk about how to indéx Geospatial data in Cassandra using Stratio's pluggable Lucene's secondary índex, and we will show some examples about how to apply these features to several Nephila's use cases.
To begin with, I'd like to introduce Stratio.
Stratio is a big data company founded in 2013, that currently has more than 200 employees.
Our technical team is currently located in Madrid but we also have offices in San Francisco and Bogotá.
We focus on óffering a big data platform based on the Spark ecosystem, and we are one of the existing certified spark distributions.
The presentation has three main points:
At first, a quick overview of Stratio's Lucene secondary indexes.
Then, we will review the geospatial search features of the plugin.
And, finally, Jonathan will show how these geospatial features are applied to three Nephila's business use cases
Stratio's Lucene index is an open source implementation of Cassandra's secondary indexes based on Lucene. It was first created in 2014 as a fork of Cassandra, and it became a plugin for Apache Cassandra during last year.
It extends Cassandra index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multidimensional, geospatial and bitemporal search.
Rather than building our own index structures, we chose using Apache Lucene as the underlying technology for several reasons:
- It is a proven stable and fast índexing solution.
- It has a lot of interesting features, such as boolean queries, range queries or relevance search.
- Solr and ElasticSearch are successful examples of distributed search éngines built on top of Lucene.
- We also like that Lucene is just a small library that can be embedded directly in Cassandra, and not an external service.
- Finally, it is an Apache project, like Cassandra, fully open source and with a large user community.
Here we can see how the integration between Cassandra and Lucene works.
There is a Lucene índex embedded in each Cassandra node, so each node indexes its own data.
This way, Lucene doesn't have anything to do with distribution and replication, which are responsibility of Cassandra.
The peer-to-peer architecture is preserved, so each node is able to coordinate any query. So, no master nodes or external coordinators are required.
A cool feature of the Lucene index is that it allows to paginate over rows sorted with a different order than the defined by the partitioner. Sorted pagination is possible thanks to a custom CQL query handler able to intercept and rewrite the Cassandra internal read commands.
If we are performing one of these top-k queries, then all the involved nodes will be queried in a parallel fashion. Otherwise, nodes will be sequentially scanned until we find the requested results.
Now we are going to see how Lucene indexes are created using the Cassandra Query Language.
Let's say that we have a table containing tweets. We store the tweet ID, its creation date, the message body, the user ID and the user name.
Then, we create the índex in CQL specifying the Stratio índex class and the índexing properties.
We specify an índex readers refresh of one second, and which columns are going to be índexed and how.
The creation date will be índexed as a date, using a pattern composed by year, month, and day, defining a precision of days
The message field will be treated as English tokenized text
And the user ID and the hashtags will be indexed as untokenized text.
Here we have an example showing how to do searches using the Lucene índex.
It includes filtering, negative filtering, sorting and routing.
We embed the Lucene index JSON syntax inside Cassandra Query Language using the recently created clause for custom expressions.
The JSON Lucene expression specifies that we are searching for all the tweets containing the phrase 'cassandra is cool'.
We also require that the matched tweets must not be labeled with the hashtag 'cassandra'.
Additionally, results should be returned sorted by descending creation date.
Also, we add some CQL regular clauses specifying that we are only interested in tweets created by a specific user during this year.
'user' is the partition key of the table, so the search will be routed to a single node, avoiding the performance problems of unrestricted secondary indexes queries.
So, with this example we can see how the Lucene índex allows to perform quite complex searches.
The index is distributed together with a fluent Java query builder that allows to programmatically build index related JSON queries.
The built Lucene index clauses are managed as plain strings, so the query builder should be compatible with most JVM-based Cassandra clients, including the popular DataStax Java driver.
This is useful because it can be easily integrated in your existing programs.
In this example, we show how to use the query builder to write the query that we saw in the previous slide. The produced JSON string can be easily used as a clause of the query builder provided by the DataStax Java driver.
A very important feature of our indexes is that they can be combined with MapReduce frameworks, especially with Spark.
The usage of Lucene predicates with Spark allows to filter the rows at the data database level.
This way, we can retrieve from Cassandra only the information that we need.
It avoids the unnecessary reads produced by the usual systematic full table scan.
And, it can reduce the amount of data to be processed, speeding up the jobs.
As you know, in this kind of deployments there is a Spark Worker running in each Cassandra node.
This is done in order to parallelize jobs preserving data locality.
Since each node indexes its own data, the locality is guaranteed when using Lucene indexes.
Now we are going to talk about the spatial search capabilities that the Lucene plugin adds to Cassandra.
These features are based on the Lucene speishal module and on the Java Topology Suite.
A small set of these spatial capabilities was presented during our talk in the last Cassandra Summit. During this year, the number of spatial features has grown significantly, and Nephila has had an important role in this.
Here we have an example to show how to indéx latitude-longitude points stored in Cassandra using CQL.
Imagine an application where we want to find restaurants around you.
In this example, the table (on the left) contains the restaurant name, and its location.
There isn`t a native point type in Cassandra, so we will use two numeric columns, latitude and longitude, to represent the location. Also a tuple or an UDT could have been used.
Then, we create the índex using the statement on the right.
In order to indéx the location, we will add a 'geo_point' mapper named 'location'.
With this mapping we must specify which columns, in the indexed table, store the latitude and the longitude.
We may combine the geo point mapper with any other non-geospatial mapper.
For example, we indéx the 'stars' column as an integer.
Now that the user locations have been índexed, we can start searching for geospeishal data.
The simplest type of query that we have is Bounding box.
In this example, we search for restaurants placed inside the visible screen of an hypothetical mobile application.
To define the bounding box, we specify, in the query, the minimum and maximum latitude and longitude values.
This way, we will collect all the restaurants within the specified coordinates
Another possible query is to search for restaurants placed inside a specific distance range from a fixed point.
For this, we must specify the latitude and the longitude of the reference point, and the desired distance range.
Max distance is mandatory, whereas min distance is optional.
Along with the distance value we can specify a distance unit.
In our example, we search for restaurants located at least one hundred meters away but no more than two kilometers away from our position.
Additionally, it is also possible to sort the results of any search by their distance to a specific location.
In the example we request the restaurants closest to the user's location.
The 'reverse' attribute controls whether the order should be ascending or descending, that is, if we are going to retrieve the closest or the farthest locations.
Finally, we use the CQL limit clause to select only the ten closest restaurants.
Although pure sorting queries are perfectly possible, it is usually a good idea to combine sorting with any other filter. This way, we will reduce the number of matched rows, and so the number of locations to be sorted.
One of the most exciting Lucene features, that we have recently added, is the ability to index complex geographical shapes, and not only latitude-longitude points.
The shapes should be stored in Cassandra in text columns in the WKT format. WKT is a popular markup language able to represent points, line strings, polygons and their muliparts.
The indexing is based in the JTS library and its integration with Spatial4j. Although WKT is the only currently supported format, we plan to add support for other popular formats such as GeoJSON.
In the example we can see a table storing places, where the text column 'shape' stores a WKT geographical shape. We create a Lucene index that maps this column as a geographical shape, specifying the maximum number of levels in the geohash search prefix tree.
It is also possible to specify a sequence of geometric transformations to be sequentially applied to the shape before indexing it. Now we will see some examples to demonstrate the utility of these transformations.
One of the available index-time transformations is calculating the centroid of the indexed shape. In these example the indexed table stores polygons but we are only interested in indexing the center of the shapes.
Another very useful transformation is indexing a buffer around the initial shape. In the example we are applying a 'buffer' transformation to index the region which is 50km around the stored shape.
This transformation could be used, for example, for storing the coverage area of a set of antennas given their lat-lon locations. It could also be used for storing the area around roads or borders defined as line strings.
Index size and performance greatly depends on the complexity of the indexed shapes. Shapes with a lot of points and precision decimals will produce many terms in the search tree. This increases the size of the index and reduces performance. So, if your use case allows it, it could be interesting to use transformations to simplify the indexed shapes. Both centroid and bounding box are typical transformations to do this. Convex hull can also be an interesting, more accurate, precision reducer.
In this example we show how convex hull transformation is used to reduce a complex shape with more than two thousand points, to a simple polygon with only eight points, dramatically increasing both indexing and searching performance.
The indexed polygons can be retrieved by the previously shown bounding box and distance searches.
We have also recently added a geo-shape search type that allows to search for shapes that are related with other shapes (and points). The currently supported spatial relations are intersects, is within, and contains.
It is also possible to apply transformations to the shapes used in the search. This allows to build complex shapes in search-time from other WKT shapes.
In this example we are searching for all the indexed shapes within a triangle. In this case we search for places within the Bermuda Triangle. Please note how we define both, the spatial operation to be applied, and the format of the search shape, which is WKT.
In addition, we can recursively define the search shape as transformations of other shapes, as we will see in the Nephila's business use cases that Jonathan will show us.
Thank you Andres, I am Jonathan Nappée, I have been working with Nephila Capital for a year now in the Bermuda office.
I first started talking to Andres while trying to solve some of our geospatial challenges with Stratio Lucene index in Cassandra.
It already contained basic point indexing and distance search features but I had more complex indexing and search cases to implement.
It turned out Andres also wanted to improve this aspect of the index.
So when we met in London in February we very quickly came up with this idea of transformations.
I want to show you now a couple of simplified examples how features could be used in the context of Nephila.
To begin with, let me introduce Nephila Capital and briefly explain it’s business.
We are an investment fund that specializes in natural catastrophe property insurance. That means we deal with house or building insurances against hurricane, earthquakes or flood kind of disaster.
As you can imagine we manipulate many different kinds of geospatial data sets, from properties we insure to the natural catastrophes impacts.
Let me explain now the setup for these examples.
We started by indexing all US census blocks, that is to say shapes of blocks of buildings, inside a Cassandra cluster.
Some of the block shapes are very complex and can contain hundreds of points and multiple polygons.
To improve the efficiency we first tried indexing the convex hull but then switched to a composite indexing strategy. I’ll go into more details on this strategy in a few slides.
We also indexed all US fire and police stations locations.
This is how the blocks table looks like.
Each block contains its shape, centroid location, state income ratio and other information.
We then indexed the different fields and in particular the block’s shapes.
This is how the stations table looks like.
Each station contains its shape, centroid location, state, city and other information.
We then indexed the different fields and in particular the locations.
Before I start showing the examples, let me explain a bit more the composite indexing strategy.
This strategy uses two separate index structures, one to achieve speed and other to achieve acuracy. The first search structure is a geohash recursive prefix tree, usually with low precision. This geohash tree is used to quickly discard most of not matching documents. Then, the second search structure, which is a simple covering index, is used to discard the false positives produced by the geohash tree.
This composite approach allows to quickly retrieve results with complete accuracy while keeping the index relatively small, and it is specially usefull with our dataset, which is composed by very complex polygons that would produce too much terms in a regular high-accuracy geohash search tree.
For our first use case we want to perform a search of blocks that intersect with a given shape.
This is an important feature for us as the most damaged properties when a hurricane landfall happens are usually the ones closest to the shore.
We can index the coast line but we usually want to see different buffer zones, the 1km, the 5 km and so on. So we use this buffer transformation to give us this flexibility.
In this second use case, we are interested in finding which blocks are far from any police or fire station.
A property far from a fire station will probably on average suffer more damage from a fire than one close to a station.
Thus we can use that information in evaluating the insurance risk.
We consider that the beyond an 8 miles radius the fire station response becomes longer and thus less efficient.
So we define a rectangular zone of interest in which we find all the fire stations.
We then build the 8 miles radius shapes from the fire stations and merge them.
Finally we search for all blocks in the rectangular zone, not within the stations safer zone.
For the last use case we consider that a Hurricane just hit the US, in this case we use Hurricane Katrina’s example.
We know every 6 hours the location of the hurricane and two zones, the larger zone with medium wind speeds and the smaller zone with highest wind speeds.
In this example we search for all the blocks in our portfolio of insurance that are hit by the hurricane.
So we merge the two wind speed zones together, further merge with all the other shapes of the hurricane at different times and then look for the blocks inside.
And last but not least, our implementation is completely open source.
It is published under the Apache License and it can be found at GitHub.
We encourage you to take a look at it, and, of course, any contribution is more than welcomed.