RESTo implements search service with semantic query analyzis on Earth Observation metadata database. It conforms to OGC 13-026 standard - OpenSearch Extension for Earth Observation
Larry Page and Sergey Brin created Google in 1998 after developing a search engine called BackRub at Stanford. In 2000, Google introduced AdWords and their toolbar. They became AOL's search partner that year. Google's services beyond search include Gmail, Maps, Drive, and more. Their PageRank algorithm and use of anchor text helped make Google a popular search engine.
Bridging Batch and Real-time Systems for Anomaly DetectionDataWorks Summit
This document discusses using a stack of Hadoop, Spark, and Elasticsearch to perform anomaly detection on large datasets in both batch and real-time. Hadoop is used for large-scale data storage and preprocessing. Spark is used to perform in-depth analysis to identify common entities and build models. Elasticsearch allows searching the data in real-time and performing aggregations to identify uncommon entities. A live loop continuously adapts the models to react to streaming data and improve anomaly detection over time.
Introduction to Apache Pig.
Apache pig is a platform which provides to modes for analyzing datasets. One is local mode over local file system and other is over HDFS. Apache Pig consists of a high-level language called PigLatin which is a Query language.
This document defines a Heap class that implements a max heap data structure using an array to store string elements. The Heap class constructor initializes the max size and empty array. The insertItem method adds an element to the heap and bubbles it up to maintain the max heap property. The Extract_Max method removes the maximum element from the root and bubbles down the new root to maintain the heap. It also defines printHeap to output the elements.
This document discusses processing large datasets stored in text files using chunked, an R package that allows working with data in chunks to overcome memory limitations. It presents chunked as Option 4 for working with large data, describing how it reads and writes data chunk-by-chunk using lazy processing. Scenarios demonstrated include preprocessing a text file and writing output (TXT->TXT), importing data into a database (TXT->DB), and extracting from a database to a text file (DB->TXT). While many dplyr verbs work chunkwise, it notes summarize, group_by, arrange, right_join, and full_join currently do not.
Frequent itemset mining on big data involves finding frequently occurring patterns in large datasets. Hadoop is an open-source framework for distributed storage and processing of big data using MapReduce. MapReduce allows distributed frequent itemset mining algorithms to scale to large datasets by partitioning the search space across nodes. Common approaches include single-pass counting, fixed and dynamic pass combined counting, and parallel FP-Growth algorithms. Distribution of the prefix tree search space and balanced partitioning are important for adapting algorithms to the MapReduce framework.
SH 2 - SES 3 - MongoDB Aggregation Framework.pptxMongoDB
The document provides an overview of MongoDB's aggregation framework. It explains that the aggregation framework allows users to process data from MongoDB collections and databases using aggregation pipeline stages similar to data aggregation operations in SQL like GROUP BY, JOIN, and filtering. The document then discusses several aggregation pipeline stages like $project, $lookup, $match, and $group. It also provides an example comparing an aggregation pipeline to a SQL query with GROUP BY and HAVING.
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
Docopt is a utility library for R that allows programmers to define command line interfaces through documentation strings. It parses command line options, arguments, switches and help messages based on usage patterns defined in the documentation. This avoids having to write complex command line parsing code. The documentation string is the specification, and docopt handles generating a fully functioning parser from it. It provides an elegant way to build command line tools in R with automatically generated help and argument handling.
Larry Page and Sergey Brin created Google in 1998 after developing a search engine called BackRub at Stanford. In 2000, Google introduced AdWords and their toolbar. They became AOL's search partner that year. Google's services beyond search include Gmail, Maps, Drive, and more. Their PageRank algorithm and use of anchor text helped make Google a popular search engine.
Bridging Batch and Real-time Systems for Anomaly DetectionDataWorks Summit
This document discusses using a stack of Hadoop, Spark, and Elasticsearch to perform anomaly detection on large datasets in both batch and real-time. Hadoop is used for large-scale data storage and preprocessing. Spark is used to perform in-depth analysis to identify common entities and build models. Elasticsearch allows searching the data in real-time and performing aggregations to identify uncommon entities. A live loop continuously adapts the models to react to streaming data and improve anomaly detection over time.
Introduction to Apache Pig.
Apache pig is a platform which provides to modes for analyzing datasets. One is local mode over local file system and other is over HDFS. Apache Pig consists of a high-level language called PigLatin which is a Query language.
This document defines a Heap class that implements a max heap data structure using an array to store string elements. The Heap class constructor initializes the max size and empty array. The insertItem method adds an element to the heap and bubbles it up to maintain the max heap property. The Extract_Max method removes the maximum element from the root and bubbles down the new root to maintain the heap. It also defines printHeap to output the elements.
This document discusses processing large datasets stored in text files using chunked, an R package that allows working with data in chunks to overcome memory limitations. It presents chunked as Option 4 for working with large data, describing how it reads and writes data chunk-by-chunk using lazy processing. Scenarios demonstrated include preprocessing a text file and writing output (TXT->TXT), importing data into a database (TXT->DB), and extracting from a database to a text file (DB->TXT). While many dplyr verbs work chunkwise, it notes summarize, group_by, arrange, right_join, and full_join currently do not.
Frequent itemset mining on big data involves finding frequently occurring patterns in large datasets. Hadoop is an open-source framework for distributed storage and processing of big data using MapReduce. MapReduce allows distributed frequent itemset mining algorithms to scale to large datasets by partitioning the search space across nodes. Common approaches include single-pass counting, fixed and dynamic pass combined counting, and parallel FP-Growth algorithms. Distribution of the prefix tree search space and balanced partitioning are important for adapting algorithms to the MapReduce framework.
SH 2 - SES 3 - MongoDB Aggregation Framework.pptxMongoDB
The document provides an overview of MongoDB's aggregation framework. It explains that the aggregation framework allows users to process data from MongoDB collections and databases using aggregation pipeline stages similar to data aggregation operations in SQL like GROUP BY, JOIN, and filtering. The document then discusses several aggregation pipeline stages like $project, $lookup, $match, and $group. It also provides an example comparing an aggregation pipeline to a SQL query with GROUP BY and HAVING.
Docopt, beautiful command-line options for R, user2014Edwin de Jonge
Docopt is a utility library for R that allows programmers to define command line interfaces through documentation strings. It parses command line options, arguments, switches and help messages based on usage patterns defined in the documentation. This avoids having to write complex command line parsing code. The documentation string is the specification, and docopt handles generating a fully functioning parser from it. It provides an elegant way to build command line tools in R with automatically generated help and argument handling.
This document discusses the integration of digital papyrological data from several sources into a linked data framework on the Papyri.info website. It describes the background databases being integrated, including their different types of data. It then explains the URI design that allows retrieving data from the various sources individually or aggregated. Finally, it provides examples of how relationships between items are represented in RDF.
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015Johan
The Mapbox Directions Pipeline aims to always have the freshest map data available for routing. It involves getting the latest OpenStreetMap data, pre-processing it for directions, loading the new data into API servers, and then repeating the process. Each step uses its own CloudFormation stack. The pipeline downloads planet files from OpenStreetMap, pre-processes them for different transport profiles, uploads the results to S3, and updates the API CloudFormation stacks to fetch the new data.
InfluxDB is an open source time series database designed to handle high write and query speeds for real-time metrics, events, and sensor data. It uses a schemaless data model and stores data as time-stamped points in measurements, which can be queried using a SQL-like language. InfluxDB excels at aggregating and analyzing time series data for use cases like monitoring, analytics, and alerting.
This document discusses enabling data-intensive biology through superior software and algorithms. It proposes a distributed graph database server that would allow querying across multiple public and private data sets. This would help address the growing data challenge in biology by providing a way to explore, query and mine large datasets in an open and collaborative manner. The goal is to incentivize data sharing and enable new types of data-driven investigations.
What is Dictionary In Python? Python Dictionary Tutorial | EdurekaEdureka!
YouTube Link: https://youtu.be/rZjhId0VkuY
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Dictionary In Python' will help you understand the concept of dictionary, why and how we can use dictionary in python and various operations that we can perform on a dictionary. Below are the topics covered in this PPT:
What Is A Dictionary In Python?
Why Use A Python Dictionary?
Lists vs Dictionary
How To Implement A Dictionary In Python?
Operations In Python Dictionary
Use Case - Nested Dictionary
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
The document discusses how MapReduce can be used for various tasks related to search engines, including detecting duplicate web pages, processing document content, building inverted indexes, and analyzing search query logs. It provides examples of MapReduce jobs for normalizing document text, extracting entities, calculating ranking signals, and indexing individual words, phrases, stems and synonyms.
“BIG DATA” is data that is big in
volume
velocity and
Variety
“TODAY’S BIG MAY BE TOMMOROW’S NORMAL”
Varieties deals with a wide range of data types
Structured data - RDMS
Semi – structured data – HTML,XML
Unstructured data – audios, videos, emails, photos, pdf, social media
hadoop
It was created by DOUG CUTTING and MICHEAL CAFARELLA in 2005
2003 – NUTCH open source search engine( lucene ,sphinx ,etc…)
(google published some papers mentioning about DFS and MAP REDUCE)
After yahoo took this initiative step
Then the creation of hadoop took place
Hadoop 0.1.0 was relesed april 2006
As of now hadoop 2.8 is available
Data Structure Concepts,Heap Data structure,Max Heap,Min Heap ,CONSTRUCTION,MAX HEAP implementation,Hashing technique,Graph,Graph traversal Algorithms,Breadth First Traversal,Depth First Traversal.C program for Hashing using Linear Probing Technique ,DEPTH FIRST SEARCH , implementation in c,INTERVIEW CONCEPTS IN DATA STRUCTURES,kRUSKAL ALGORITHM,PRIMS ALGORITHM,eXPLANATION
The document summarizes a Think Big Bootcamp project involving the ingestion and preliminary analysis of aircraft registry data from the FAA. It describes how the data was ingested using Python and Hadoop, then loaded into Hive tables. Initial exploration found the most frequently reported crafts and analyzed acceptance rates. Site comparison showed differences in average speed and altitude between two sites. Master data queries were created to summarize models, aircraft, and owners. Data visualizations analyzed fastest planes, speed vs altitude by make, unique flights by airline, and number of sightings by aircraft make.
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...Data Con LA
"At OpenX we not only use the tools in big data ecosystems to solve our business problems, but also explore the cutting edge algorithms for practical uses. HyperLogLog is one of the algorithm that we use intensively in our internal system. It has really low computation cost and can easily plug into map-reduce framework (hadoop or spark). Some of the applications that worth to highlight are:
* high cardinality test
* distinct count of unique users over time
* Visualize hyperloglog for fraud detection"
Android Lab Test : Reading the foot file list (english)Bruno Delb
Android Lab Test : Reading the foot file list (english)
Video of tutorial on : https://www.youtube.com/playlist?list=PLL2Z3bzdO25yHwIV3XdMzKs61At0Ldh6L
Visit http://www.AndroidLabTest.com
This document provides an overview of big data, Hadoop, and Hive. It discusses big data characteristics, how Hadoop allows distributed storage and processing of big data, key Hadoop components and ecosystem tools, and features of Hive. It then describes a project analyzing airline data stored on Hadoop using Hive queries. The project involves querying airport, airline, and route data to analyze operating airports and airlines by country, routes by stops and code sharing, and highest airports by country.
Cscope and ctags are Linux tools that allow developers to browse and navigate source code. Cscope interactively examines C programs and allows browsing source code in a terminal. It builds a reference database from source files that can then be queried. Ctags generates an index tag file for language objects that allows them to be quickly located by text editors. It recursively processes source files to create a tags file containing entries with the tag name, file, and location. This tags file can then be used by editors like vi to navigate between tags.
This document describes a data pipeline for processing large amounts of web crawl data from Common Crawl. It discusses ingesting over 160 TB of data per month from Common Crawl into AWS S3 storage and then using batch processing with T4 instances to index the data in Elasticsearch and store metadata in Cassandra. It also describes querying the hybrid database and some of the engineering challenges around approximating page rank with low latency.
HyperLogLog in Hive - How to count sheep efficiently?bzamecnik
This document discusses using HyperLogLog (HLL) in Hive to efficiently estimate the number of unique elements or cardinality in big datasets. It describes how HLL provides fast approximate counting using probabilistic data structures. It covers implementing HLL as user-defined functions in Hive, comparing different open source implementations, and examples of using HLL to estimate unique visitors per day and in a rolling window.
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...Patrick Rickles
This presentation was given at the Free and Open Source for Geospatial (FOSS4G) 2013 conference in Nottingham, UK on work undertaken by Dr. Claire Ellul, Nart Tamash, Feng Xian, John Stuiver, and Patrick Rickles in the hopes of automating as many of the INSPIRE metadata as possible.
Max Klymyshyn discusses data-driven programming, where the data itself controls the flow of the program rather than the program logic. He provides the example of filtering weather data from multiple sources to display only sunny days in a month. This is done through a pipeline that grabs data from various sources, parses it, validates it to pass only sunny days, and then displays the results. Data-driven programming can be effective when dealing with similar data from different sources that needs filtering to specific criteria.
Replication allows data to be copied across multiple servers for redundancy and failover. The replication process involves an oplog which records all data modifications. Slave servers copy the oplog from the master to initially sync their data and then stream ongoing changes from the master's oplog. Replica sets improve on replica pairs by allowing multiple redundant servers where an election process selects a new primary if the existing primary fails.
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
Since 1972 and the launch of Landsat 1– the first Earth Observation civilian satellite - millions of images have been acquired all over the Earth by a constantly growing fleet of more and more sophisticated satellites. Generally, searching within this huge amount of Earth Observation (EO) images is limited by the description of the acquisition conditions stored in the related metadata files, i.e. Where (footprint), When (time of acquisition) and How (viewing angles, instrument, etc.). Thus the larger community of end users misses the What filter - i.e. a way to filter search in term of image content. RESTo [1] uses the iTag [2] footprint-based tagging system to enhance image metadata and hopefully provides a way to express semantic queries on images content in term of land use. We investigated the performance of RESTo against a 12 millions simulated Sentinel-2 granules database representative of the forthcoming French national mirror site of Sentinel products (PEPS).
This document provides an introduction to Lucene, an open-source information retrieval library. It discusses Lucene's components and architecture, how it models content and performs indexing and searching. It also summarizes how to build search applications using Lucene, including acquiring content, building documents, analyzing text, indexing documents, and querying. Finally, it discusses frameworks that are built on Lucene like Compass and Solr.
This document discusses the integration of digital papyrological data from several sources into a linked data framework on the Papyri.info website. It describes the background databases being integrated, including their different types of data. It then explains the URI design that allows retrieving data from the various sources individually or aggregated. Finally, it provides examples of how relationships between items are represented in RDF.
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015Johan
The Mapbox Directions Pipeline aims to always have the freshest map data available for routing. It involves getting the latest OpenStreetMap data, pre-processing it for directions, loading the new data into API servers, and then repeating the process. Each step uses its own CloudFormation stack. The pipeline downloads planet files from OpenStreetMap, pre-processes them for different transport profiles, uploads the results to S3, and updates the API CloudFormation stacks to fetch the new data.
InfluxDB is an open source time series database designed to handle high write and query speeds for real-time metrics, events, and sensor data. It uses a schemaless data model and stores data as time-stamped points in measurements, which can be queried using a SQL-like language. InfluxDB excels at aggregating and analyzing time series data for use cases like monitoring, analytics, and alerting.
This document discusses enabling data-intensive biology through superior software and algorithms. It proposes a distributed graph database server that would allow querying across multiple public and private data sets. This would help address the growing data challenge in biology by providing a way to explore, query and mine large datasets in an open and collaborative manner. The goal is to incentivize data sharing and enable new types of data-driven investigations.
What is Dictionary In Python? Python Dictionary Tutorial | EdurekaEdureka!
YouTube Link: https://youtu.be/rZjhId0VkuY
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Dictionary In Python' will help you understand the concept of dictionary, why and how we can use dictionary in python and various operations that we can perform on a dictionary. Below are the topics covered in this PPT:
What Is A Dictionary In Python?
Why Use A Python Dictionary?
Lists vs Dictionary
How To Implement A Dictionary In Python?
Operations In Python Dictionary
Use Case - Nested Dictionary
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
The document discusses how MapReduce can be used for various tasks related to search engines, including detecting duplicate web pages, processing document content, building inverted indexes, and analyzing search query logs. It provides examples of MapReduce jobs for normalizing document text, extracting entities, calculating ranking signals, and indexing individual words, phrases, stems and synonyms.
“BIG DATA” is data that is big in
volume
velocity and
Variety
“TODAY’S BIG MAY BE TOMMOROW’S NORMAL”
Varieties deals with a wide range of data types
Structured data - RDMS
Semi – structured data – HTML,XML
Unstructured data – audios, videos, emails, photos, pdf, social media
hadoop
It was created by DOUG CUTTING and MICHEAL CAFARELLA in 2005
2003 – NUTCH open source search engine( lucene ,sphinx ,etc…)
(google published some papers mentioning about DFS and MAP REDUCE)
After yahoo took this initiative step
Then the creation of hadoop took place
Hadoop 0.1.0 was relesed april 2006
As of now hadoop 2.8 is available
Data Structure Concepts,Heap Data structure,Max Heap,Min Heap ,CONSTRUCTION,MAX HEAP implementation,Hashing technique,Graph,Graph traversal Algorithms,Breadth First Traversal,Depth First Traversal.C program for Hashing using Linear Probing Technique ,DEPTH FIRST SEARCH , implementation in c,INTERVIEW CONCEPTS IN DATA STRUCTURES,kRUSKAL ALGORITHM,PRIMS ALGORITHM,eXPLANATION
The document summarizes a Think Big Bootcamp project involving the ingestion and preliminary analysis of aircraft registry data from the FAA. It describes how the data was ingested using Python and Hadoop, then loaded into Hive tables. Initial exploration found the most frequently reported crafts and analyzed acceptance rates. Site comparison showed differences in average speed and altitude between two sites. Master data queries were created to summarize models, aircraft, and owners. Data visualizations analyzed fastest planes, speed vs altitude by make, unique flights by airline, and number of sightings by aircraft make.
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...Data Con LA
"At OpenX we not only use the tools in big data ecosystems to solve our business problems, but also explore the cutting edge algorithms for practical uses. HyperLogLog is one of the algorithm that we use intensively in our internal system. It has really low computation cost and can easily plug into map-reduce framework (hadoop or spark). Some of the applications that worth to highlight are:
* high cardinality test
* distinct count of unique users over time
* Visualize hyperloglog for fraud detection"
Android Lab Test : Reading the foot file list (english)Bruno Delb
Android Lab Test : Reading the foot file list (english)
Video of tutorial on : https://www.youtube.com/playlist?list=PLL2Z3bzdO25yHwIV3XdMzKs61At0Ldh6L
Visit http://www.AndroidLabTest.com
This document provides an overview of big data, Hadoop, and Hive. It discusses big data characteristics, how Hadoop allows distributed storage and processing of big data, key Hadoop components and ecosystem tools, and features of Hive. It then describes a project analyzing airline data stored on Hadoop using Hive queries. The project involves querying airport, airline, and route data to analyze operating airports and airlines by country, routes by stops and code sharing, and highest airports by country.
Cscope and ctags are Linux tools that allow developers to browse and navigate source code. Cscope interactively examines C programs and allows browsing source code in a terminal. It builds a reference database from source files that can then be queried. Ctags generates an index tag file for language objects that allows them to be quickly located by text editors. It recursively processes source files to create a tags file containing entries with the tag name, file, and location. This tags file can then be used by editors like vi to navigate between tags.
This document describes a data pipeline for processing large amounts of web crawl data from Common Crawl. It discusses ingesting over 160 TB of data per month from Common Crawl into AWS S3 storage and then using batch processing with T4 instances to index the data in Elasticsearch and store metadata in Cassandra. It also describes querying the hybrid database and some of the engineering challenges around approximating page rank with low latency.
HyperLogLog in Hive - How to count sheep efficiently?bzamecnik
This document discusses using HyperLogLog (HLL) in Hive to efficiently estimate the number of unique elements or cardinality in big datasets. It describes how HLL provides fast approximate counting using probabilistic data structures. It covers implementing HLL as user-defined functions in Hive, comparing different open source implementations, and examples of using HLL to estimate unique visitors per day and in a rolling window.
Using Free and Open Source GIS to Automatically Create Standards-Based Spatia...Patrick Rickles
This presentation was given at the Free and Open Source for Geospatial (FOSS4G) 2013 conference in Nottingham, UK on work undertaken by Dr. Claire Ellul, Nart Tamash, Feng Xian, John Stuiver, and Patrick Rickles in the hopes of automating as many of the INSPIRE metadata as possible.
Max Klymyshyn discusses data-driven programming, where the data itself controls the flow of the program rather than the program logic. He provides the example of filtering weather data from multiple sources to display only sunny days in a month. This is done through a pipeline that grabs data from various sources, parses it, validates it to pass only sunny days, and then displays the results. Data-driven programming can be effective when dealing with similar data from different sources that needs filtering to specific criteria.
Replication allows data to be copied across multiple servers for redundancy and failover. The replication process involves an oplog which records all data modifications. Slave servers copy the oplog from the master to initially sync their data and then stream ongoing changes from the master's oplog. Replica sets improve on replica pairs by allowing multiple redundant servers where an election process selects a new primary if the existing primary fails.
Semantic search within Earth Observation products databases based on automati...Gasperi Jerome
Since 1972 and the launch of Landsat 1– the first Earth Observation civilian satellite - millions of images have been acquired all over the Earth by a constantly growing fleet of more and more sophisticated satellites. Generally, searching within this huge amount of Earth Observation (EO) images is limited by the description of the acquisition conditions stored in the related metadata files, i.e. Where (footprint), When (time of acquisition) and How (viewing angles, instrument, etc.). Thus the larger community of end users misses the What filter - i.e. a way to filter search in term of image content. RESTo [1] uses the iTag [2] footprint-based tagging system to enhance image metadata and hopefully provides a way to express semantic queries on images content in term of land use. We investigated the performance of RESTo against a 12 millions simulated Sentinel-2 granules database representative of the forthcoming French national mirror site of Sentinel products (PEPS).
This document provides an introduction to Lucene, an open-source information retrieval library. It discusses Lucene's components and architecture, how it models content and performs indexing and searching. It also summarizes how to build search applications using Lucene, including acquiring content, building documents, analyzing text, indexing documents, and querying. Finally, it discusses frameworks that are built on Lucene like Compass and Solr.
Letting In the Light: Using Solr as an External Search ComponentJay Luker
Letting In the Light: Using Solr as an External Search Component
* Jay Luker, IT Specialist, ADS, jluker@cfa.harvard.edu
* Benoit Thiell, software developer, ADS, bthiell@cfa.harvard.edu
Code4Lib 2011, Tuesday 8 February, 14:30 - 14:50
It’s well-established that Solr provides an excellent foundation for building a faceted search engine. But what if your application’s foundation has already been constructed? How do you add Solr as a federated, fulltext search component to an existing system that already provides a full set of well-crafted scoring and ranking mechanisms?
This talk will describe a work-in-progress project at the Smithsonian/NASA Astrophysics Data System to migrate its aging search platform to Invenio, an open-source institutional repository and digital library system originally developed at CERN, while at the same time incorporating Solr as an external component for both faceting and fulltext search.
In this presentation we'll start with a short introduction of Invenio and then move on to the good stuff: an in-depth exploration of our use of Solr. We'll explain the challenges that we faced, what we learned about some particular Solr internals, interesting paths we chose not to follow, and the solutions we finally developed, including the creation of custom Solr request handlers and query parser classes.
This presentation will be quite technical and will show a measure of horrible Java code. Benoit will probably run away during that part.
Relevance trilogy may dream be with you! (dec17)Woonsan Ko
Introducing new BloomReach Experience Plugins which changes the game of DREAM (Digital Relevance Experience & Agility Management), to increase productivity and business agility.
Cross-Platform Native Mobile Development with EclipsePeter Friese
Developing great apps for mobile platforms like Android, iOS or mobile web is a challenging task. Not only do you have to take into consideration the limited resources your app has at it's disposal, you also have to follow the established UI idioms - which may differ on the respective platforms.
In this session, I will demonstrate how you can build mobile apps with tools from the Eclipse ecosystem. Based on real-world examples I will present a domain-specific language we used to mobilize enterprise systems or to create the official Eclipse Summit Europe conference app (http://bit.ly/ese_app_de). What's more, I will show you how to overcome the tedium of having to manually port your application from one platform to other platform technologies such as Objective-C or Django/Python. Finally, I will show how to integrate Eclipse tooling with external tools such as Apple's Xcode and Google App Engine.
See http://lanyrd.com/2011/eclipsecon-europe/shhmy/
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
This document provides an introduction to Elasticsearch and Kibana. It describes what Elasticsearch is and how it can scale to handle large amounts of data and queries. It also describes Kibana and how it is used for data visualization. The document then demonstrates how to use Elasticsearch and Kibana together to visualize and analyze Austin transportation and restaurant inspection data.
Django is a Python web framework that makes building websites easier. It uses the MVC pattern with models representing the database, views handling requests and responses, and templates rendering HTML. Django generates URLs, handles forms and validation, and includes an admin interface. It removes redundancy through its template inheritance system and object-relational mapper that allows interacting with databases through Python objects.
Introduction to Elasticsearch with basics of LuceneRahul Jain
Rahul Jain gives an introduction to Elasticsearch and its basic concepts like term frequency, inverse document frequency, and boosting. He describes Lucene as a fast, scalable search library that uses inverted indexes. Elasticsearch is introduced as an open source search platform built on Lucene that provides distributed indexing, replication, and load balancing. Logstash and Kibana are also briefly described as tools for collecting, parsing, and visualizing logs in Elasticsearch.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
by Karthik Kumar Odapally, Solutions Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
Elasticsearch is a popular open-source distributed search and analytics engine, widely used for log analytics and text search – and increasingly used as a primary data store. Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch. We’ll take a look at how to use Elasticsearch Service to manage these different use cases.
The ELK stack consists of the three open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch is a highly scalable search and analytics engine, Logstash is used to collect, process, and transport data, and Kibana provides visualization and exploration of data stored in Elasticsearch. The document discusses using the ELK stack for log data management, system monitoring, and other big data analysis tasks by centralized collection, normalization, and exploration of large datasets.
Cloud native programming model comparisonEmily Jiang
Emily Jiang is a senior technical staff member at IBM and advocate for MicroProfile and CDI. She presented on choosing between MicroProfile and Spring for building cloud-native microservices. Some key points from the document include:
- MicroProfile is an open specification for enterprise Java microservices that includes specifications for REST services, configuration, fault tolerance, security and more. It has seen 9 platform releases since 2016.
- Spring is a popular framework for building microservices that includes features like REST, dependency injection, API documentation, reactive programming and more.
- Both MicroProfile and Spring provide options for building cloud-native microservices in Java, with MicroProfile being more standards-based while Spring is a
The document describes the Prospero API, which allows users to search and analyze a large text index of over 20 billion records. It provides features such as hierarchical data modeling, query composition against the data model, and operations like counts, histograms, and retrieving records. The API uses a pattern language to specify index records to match for queries, including terms, boolean combinations, date ranges, and iterators. It also allows transforming results between different levels of the data model hierarchy.
1) There are several general methods for acquiring web data through R, including reading files directly, scraping HTML/XML/JSON, and using APIs that serve XML/JSON.
2) Scraping web data involves extracting structured information from unstructured HTML/XML pages when no API is available. Packages like rvest and XML can be used to parse and extract the desired data.
3) Many data sources have APIs that allow programmatic access to search, retrieve, or submit data through a set of methods. R packages like taxize and dryad interface with specific APIs to access taxonomic and research data.
Deep dive into the native multi model database ArangoDBArangoDB Database
The document describes ArangoDB, a multi-model database that can function as a document store, key-value store, and graph database. It offers querying across these models using its AQL language. The document also discusses how ArangoDB is extensible through JavaScript, can run as a microservice using Foxx, and integrates with data center operating systems like Mesosphere DC/OS for resource management and fault tolerance.
PoolParty Semantic Search Server is described technologically. How to use SKOS thesauri to map data from different sources and how to generate a semantic index. How to build precise faceted search.
Lucene is a free and open source information retrieval (IR) library written in Java. It is widely used to add search functionality to applications. Lucene features fast and scalable indexing and search, and supports various query types including phrase, wildcard, fuzzy and range queries. The Lucene project includes related sub-projects like Solr (search server), Nutch (web crawler), and Mahout (machine learning).
Similar to RESTo - restful semantic search tool for geospatial (20)
Le programme Copernicus est entré en phase opérationnelle en 2014. La composante spatiale comprend des missions Sentinel développées par l’ESA qui, pour la première fois, offrent un accès gratuit à des données multi-capteurs de très grande qualité. La mise à disposition de ces données et des services à valeur ajoutée sur ces données stimulera la recherche et le développement du secteur aval. Deux missions Sentinel sont déjà en orbite : Sentinel-1A (imageur radar) a été lancée le 3 avril 2014 et Sentinel-2A (imageur optique) a été lancée le 23 juin 2015. Le lancement de Sentinel-3A (imageur grand champ et altimètre) est actuellement prévu pour février 2016.
Les données Sentinel sont destinées à être diffusées à toutes les communautés d’utilisateurs, en Europe et dans le monde. A terme, elles généreront 13 To/jour, soit presque 5 Po de données par an.
La plateforme PEPS (Plateforme d’Exploitation des Produits Sentinel) assure la diffusion des produits Sentinel au niveau français pour soutenir la mise en place et le suivi des politiques environnementales, favoriser le développement industriel et l’émergence de services aval, et répondre aux attentes de la communauté scientifique. Le projet PEPS est une plateforme conçue pour offrir aux utilisateurs nationaux des performances accrues d’accès aux volumes très élevés des données Sentinel.
2016.02.18 big data from space toulouse data scienceGasperi Jerome
Le programme européen Copernicus vise à doter l'Europe d'une capacité opérationnelle et autonome d'observation de la Terre en tant que « services d’intérêt général européen, à accès libre, plein et entier ». A cet effet, l’ESA développe 6 familles de satellites dédiés à l’observation de la Terre - Les Sentinels. D’ici 2020, le volume de données acquises par ces satellites sera de l’ordre de 20 Pétaoctets. Cette avalanche de données offre des opportunités importantes notamment dans les domaines de la recherche, des services et de l’innovation. Elle pose aussi des défis techniques - comment stocker ces données, et au delà, comment chercher, diffuser et traiter ces données afin de fournir aux utilisateurs le service ou l’information dont ils ont besoin.
Présenté au Toulouse Data Science le 18.02.2016 - http://www.meetup.com/fr-FR/Tlse-Data-Science/events/228423095/
2015.11.12 big data from space - cusi toulouseGasperi Jerome
Les enjeux du Big Data et des technologies liées à son émergence dans les domaines du spatial et de l'information géographique sont nombreux. Après un rappel des concepts du "Big Data", la présentation s’appuiera sur des exemples concrets de réalisations utilisant la donnée spatiale, utilisations qui n'auraient pas pu voir le jour sans ces technologies.
Big Data - Accès et traitement des données d’Observation de laTerreGasperi Jerome
Le succès du programme Copernicus piloté par l’Union Européenne en coordination avec l’ESA et les Etats membres repose sur l’appui du programme aux politiques publiques européennes et sa capacité à favoriser l’innovation par le développement des services à valeur ajoutée en Europe. Une politique de données « libre et gratuite » a été promue dans cette optique avec comme conditions essentielles, d'une part la facilité d'accès aux données par les utilisateurs publics et privés, et, d'autre part, la mise en place d'un plan d’action pour stimuler le secteur aval.
A cet effet, le CNES en tant que « segment sol collaboratif » complète au travers du projet PEPS l’accès large aux produits des satellites Sentinels. Avec 4 To de données supplémentaires chaque jour pour un volume de 20 Po de données à l'horizon 2020, l'enjeu de PEPS est de garantir un accès performant aux données en proposant notamment des capacités de traitement colocalisées et ainsi favoriser l'adoption et l'utilisation des données d'Observation de la Terre
This document discusses on-demand EO processing services using the Web Processing Service (WPS) standard in a federation of European ground segments. It describes how WPS can be used to publish, describe, and execute geospatial processes. It also provides examples of using WPS for image processing, such as assisted land cover classification, and discusses the benefits of a federated approach over centralization.
1) The document discusses a presentation on the Web Processing Service (WPS) standard. WPS defines how geospatial processes can be published, described, and executed over the web.
2) WPS version 1.0 specifies operations for getting process descriptions, executing processes, and managing asynchronous executions. WPS version 2.0 adds new operations for managing process executions.
3) Examples show how WPS can be used for simple GIS processes, image processing workflows, and chaining multiple processes together across different servers.
Semantic search for Earth Observation productsGasperi Jerome
1. Semantic search helps users find the right data by characterizing products with relevant metadata and decoding natural language queries.
2. RESTo provides semantic search capabilities by using a query analyzer to translate queries into OpenSearch parameters based on recognizing words, patterns, platforms, instruments, locations, dates, and quantities in queries.
3. Issues with solely using keywords for semantic search include ambiguous interpretations and the need for a linked data approach to disambiguate terms based on context.
CEOS WGISS 36 - Frascati, Italy - 2013.09.19
Single Sign On with OAuth and OpenID used for Kalideos project and to be used within the French Land Surface Thematic Center
The document discusses the costs, volumes, and security concerns regarding data storage. It notes that storage is the main cost for both public and private cloud solutions. Data volumes are growing rapidly at an estimated 40% compound annual growth rate, reaching 45 zettabytes by 2020. Public clouds cannot guarantee data security and privacy due to legal obligations. The CNES Data Center is presented as an in-house solution that can handle future scientific mission storage and processing needs while reducing costs by sharing capacity across missions and achieving predicted maximum storage volumes of 50 petabytes by 2025.
The document discusses CNES OpenSearch implementations for facilitating study of issues related to climate and ecosystems. It describes Theia, a thematic platform offering a broad range of images and processing services. Theia provides search, visualization, and download services depending on user authorization. Take5 is described as a SPOT4 simulator of Sentinel-2 time series, observing 45 sites every 5 days from March to June 2013 to replicate Sentinel-2 orbit repeatability. Users can choose acquisitions, download individual products, or download whole series.
Toulouse, France - 2013.05.30
Centre de Compétences Techniques "Cloud Computing et Big Data"
WPS is an OGC standard which defines interfaces to publish, describe and execute geospatial processes
The Orfeo Toolbox (OTB - http://www.orfeo-toolbox.org/) is an Open Source Remote Sensing Image Processing software library developed by CNES. The aim of the toolbox is to gather a large number of state of the art algo- rithms for building processing chains for satellite images. Using the constellation server (http://www.constellation-sdi.org/), we exposed the main OTB processing chains as Web Processing Services (WPS). The WPS provides rules for standardizing inputs and outputs for invoking geospatial processing services. These services are managed from a web browser using the mapshup web client (http://mapshup.info). mapshup supports both synchronous and asynchronous processes and offers direct visualisation of results. The whole system provides user a complete and comprehensive image processing chain to produce land cover classification from satellite orthoimagery.
With an update to WPS 2.0, this chain should fit well to a Cloud architecture
Traitements de données à la demande - Introduction au Web Processing ServiceGasperi Jerome
Toulouse, France - 2013.01.10
Centre de Compétences Techniques "Extraction des données de télémesures et exploitation en temps différé"
WPS (Web Processing Service) est un standard de l'OGC qui définit des interfaces pour faciliter la publication, la description et l'exécution de traitements.
Nous avons exposé sous forme de WPS les traitements d'images de la librairie ORFEO Toolbox (OTB - http://www.orfeo-toolbox.org/) développée par le CNES. A cet effet, nous avons utilisé le serveur constellation (http://www.constellation- sdi.org /).
Ainsi exposés, les traitements sont pilotés à partir d'un navigateur Web en utilisant le client Web mapshup (http://mapshup.info). mapshup prend en charge les processus synchrones et asynchrones et offre une visualisation directe des résultats.
Data access and data extraction services within the Land Imagery PortalGasperi Jerome
Models for scientific exploitation of EO Data - Frascati - October 12th 2012
Presentation of the architecture of the french Land Imagery portal data access
Semantic search applied to Earth Observation productsGasperi Jerome
Seoul, Korea - 2012.10.10
82th OGC Technical Comittee
How to search for Earth Observation imagery that contains coastal cultivated areas ?
Semantic content extraction from image is a complex and time consuming task. A simpler approach is to use the metadata footprint against exogenous data to perform image characterization.
SLACkER (SimpLe Automated Characterization of EaRth observation products) uses Global Land Cover 2000 classification to perform automatically such characterization
Accès à l’information satellitaire dans un contexte réactif de catastrophe na...Gasperi Jerome
Les rencontres de SIG-la-lettre - Paris, 5 avril 2012
L’imagerie satellitaire est une source d'information déterminante en cas de catastrophe naturelle. Tous les processus mis en œuvre dans ces situations sont soumis à des contraintes de délai. De l’acquisition de la donnée image jusqu’à la production de cartes destinées aux acteurs de terrain, c'est une course contre la montre qui s'engage dans laquelle le choix des sources satellitaires est primordiale. Aujourd'hui, ce sont plusieurs dizaines de sources images qui sont accessibles et l'utilisateur doit pourvoir très rapidement identifier les sources les plus appropriées pour son besoin cartographique.
Dans ce contexte, le catalogue de la Charte internationale sur les Risques Majeurs (http://www.disasterschartercatalog.org) offre un accès aux données acquises dans le cadre de ce projet. En suivant les standards d'interopérabilité et en proposant une interface de recherche innovante, ce service répond aux deux grandes problématiques de la diffusion de données, l'accessibilité et "l'utilisabilité" du service.
Experimenting a cloud based solution for image processing and data accessGasperi Jerome
This document discusses cloud computing platforms for geospatial data storage, processing, and access. It provides an overview of Infrastructure as a Service (IAAS), Platform as a Service (PAAS), and Software as a Service (SAAS) models. It notes that public clouds have issues around ongoing data costs and concerns over proprietary solutions and data access laws. The document describes using the OpenStack cloud platform on a private cloud to store geospatial raster data, process it using open-source tools, and provide access to results through a Web Map Service (WMS). It finds OpenStack promising but notes challenges in accessing cloud storage from compute instances due to http incompatibility with filesystem tasks.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
7. GET / List all collections
POST / Create a new collection
GET /collection/$describe Describe collection OpenSearch service
GET /collection Search collection
POST /collection Insert a resource within collection
DELETE /collection Delete collection
PUT /collection Update collection
GET /collection/identifier Show resource metadata
GET /collection/identifier/$download Download resource product
11. Time period of 1 month within a 10x10 km2 box
SEARCH
INGEST
0.2s
0.5s
1 000 000
SPOT DATABASENew products retrieved every 3 hours from ADS catalog
Per product for a ~5000 products ingestion
Order of magnitude compute on a Dual Core 2.6 GHz | 4 Go RAM | HDD 500 To
14. During ingestion process , resources are automatically
tagged with location and land use
github.com/jjrom/itag
15. !
Tag this footprint with continent, country and Land use
!
http://goo.gl/WtbcbR
iTag
16. Additionally, conditional ingestion rules can be defined at
the collection level to provide specific tags
Add tags #mh370,#plane,#malaysianairline
to resources acquired between 2014, march 8th and 2014, april 14th
in the south of the Indian Ocean
!
http://goo.gl/W8VlPV
e.g.
18. RESTo provides semantic search capabilities
It uses a Query Analyzer to translate natural language query into
a set of EO OpenSearch parameters
19. Query Analyzer goodies
Multilingual - current languages are EN, FR, IT and DE
Synonyms supported (e.g. unit «m» is «m», «meter» or «meters»)
Each collection can define its own dedicated keywords
Automatic typing error correction using similarity
Embed a Gazetteer containing ~9 000 000 toponyms
20. « Images of urban area in the US acquired in the last 10 days with less than 5 % of cloud cover »
Example
21. « Images of urban area in the US acquired in the last 10 days with less than 5 % of cloud cover »
Example
keyword location date acquisition parameter
22. 2. Each search result has an « human readable url » that can
be indexed by web crawler (i.e. google robots)
1. Search parameters are derived from
Natural Language query
3. Keywords on resources are links to search requests :
they can be indexed by web crawler…and so on
Search (example)
23. 2. Each search result has an « human readable url » that can
be indexed by web crawler (i.e. google robots)
1. Search parameters are derived from
Natural Language query
3. Keywords on resources are links to search requests :
they can be indexed by web crawler…and so on
Search (example)
http://goo.gl/GvMEHj