The ICARUS aviation ontology was presented at the 10th international Conference on Web Intelligence, Mining and Semantics (WIMS'20) that was held virtually during June 30th – 3rd July in 2020.
The ICARUS Aviation Data Sharing and Intelligence framework was presented by Mr. Nikolaos Papagiannopoulos from Athens International Airport (AIA) at the 27th ACRIS Meeting, which was held in London on February 25th-27th, 2020.
A glimpse at the ICARUS policy perspectives provided during the European Big Data Value Forum 2018, BDVA Workshop 2.3 "Policy issues, opportunities and barriers in big data-driven transport", on November 14th, 2018, in Vienna
A quick project overview provided by the ICARUS Coordinator, Dr. Dimitris Alexandrou (UBITECH) in the BDVA MeetUp that was held on May 15th, 2018 in Sofia.
The ICARUS Aviation Data Sharing and Intelligence framework was presented by Mr. Nikolaos Papagiannopoulos from Athens International Airport (AIA) at the 27th ACRIS Meeting, which was held in London on February 25th-27th, 2020.
A glimpse at the ICARUS policy perspectives provided during the European Big Data Value Forum 2018, BDVA Workshop 2.3 "Policy issues, opportunities and barriers in big data-driven transport", on November 14th, 2018, in Vienna
A quick project overview provided by the ICARUS Coordinator, Dr. Dimitris Alexandrou (UBITECH) in the BDVA MeetUp that was held on May 15th, 2018 in Sofia.
4 th International Conference on Data Science and Machine Learning (DSML 2023)gerogepatton
4
th International Conference on Data Science and Machine Learning (DSML 2023) will
act as a major forum for the presentation of innovative ideas, approaches, developments, and
research projects in the areas of Data Science and Machine Learning. It will also serve to
facilitate the exchange of information between researchers and industry professionals to
discuss the latest issues and advancement in the area of Data Science and Machine Learning.
Authors are solicited to contribute to the Conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the Computer Networks & Communications.
4th International Conference on Data Science and Machine Learning (DSML 2023) gerogepatton
4th International Conference on Data Science and Machine Learning (DSML 2023) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the areas of Data Science and Machine Learning. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of Data Science and Machine Learning.
Authors are solicited to contribute to the Conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the Computer Networks & Communications.
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYIJDKP
Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
The Internet of Things (IoT) is all about connected devices that produce and exchange data, and producing insights from these high volumes of data is challenging. On this session, we will start by providing a quick introduction to the MQTT protocol, and focus on using AI and machine learning techniques to provide insights from data collected from IoT devices. We will present some common AI concepts and techniques used by the industry to deploy state-of-the-art smart IoT systems. These techniques allow systems to determined patterns from the data, predict and prevent failures as well as suggest actions that can be used to minimize or avoid IoT device breakdowns on an intelligent way beyond rule-based and database search approaches. We will finish with a demo that puts together all the techniques discussed in an application that uses Apache Spark and Apache Bahir support for MQTT.
Every day, 50,000 flights take off, transit and land safely within US airspace. NASA Aeronautics is behind many of the technology concepts that make this possible. With drones proliferating and traffic volume rising rapidly, NASA needs a way to stay ahead of the curve. In this session, you will learn how IBM Bluemix quickens NASA's pace in air traffic management research, and hear three lessons learned from a recent NASA project using Bluemix Mobile and Bluemix Data Analytics.
Preprint-CSAE,China,21-23 October 2022.pdfChristo Ananth
Call for Papers- Special Session: Applications of Artificial Intelligence and IoT in Computer Science and Engineering, 6th International Conference on Computer Science and Application Engineering, CSAE 2022,Nanjing, China, October 21 to 23, 2022
Christo Ananth
Professor, Samarkand State University, Uzbekistan
Shceduling iot application on cloud computingEman Ahmed
Resource scheduling considers the execution time of every distinct workload, but most importantly, the overall performance is also based on type of workload i.e. with different QoS requirements (heterogeneous workloads) and with similar QoS requirements (homogenous workloads).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
4 th International Conference on Data Science and Machine Learning (DSML 2023)gerogepatton
4
th International Conference on Data Science and Machine Learning (DSML 2023) will
act as a major forum for the presentation of innovative ideas, approaches, developments, and
research projects in the areas of Data Science and Machine Learning. It will also serve to
facilitate the exchange of information between researchers and industry professionals to
discuss the latest issues and advancement in the area of Data Science and Machine Learning.
Authors are solicited to contribute to the Conference by submitting articles that illustrate
research results, projects, surveying works and industrial experiences that describe significant
advances in the Computer Networks & Communications.
4th International Conference on Data Science and Machine Learning (DSML 2023) gerogepatton
4th International Conference on Data Science and Machine Learning (DSML 2023) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the areas of Data Science and Machine Learning. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of Data Science and Machine Learning.
Authors are solicited to contribute to the Conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the Computer Networks & Communications.
MACHINE LEARNING TECHNIQUES FOR ANALYSIS OF EGYPTIAN FLIGHT DELAYIJDKP
Flight delay has been the fiendish problem to the world's aviation industry, so there is very important
significance to research for computer system predicting flight delay propagation. Extraction of hidden
information from large datasets of raw data could be one of the ways for building predictive model. This
paper describes the application of classification techniques for analysing the Flight delay pattern in Egypt
Airline’s Flight dataset. In this work, four decision tree classifiers were evaluated and results show that the
REPTree have the best accuracy 80.3% with respect to Forest, Stump and J48. However, four rules based
classifiers were compared and results show that PART provides best accuracy among studied rule-based
classifiers with accuracy of 83.1%. By analysing running time for all classifiers, the current work
concluded that REPtree is the most efficient classifier with respect to accuracy and running time. Also, the
current work is extended to apply of Apriori association technique to extract some important information
about flight delay. Association rules are presented and association technique is evaluated.
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
The Internet of Things (IoT) is all about connected devices that produce and exchange data, and producing insights from these high volumes of data is challenging. On this session, we will start by providing a quick introduction to the MQTT protocol, and focus on using AI and machine learning techniques to provide insights from data collected from IoT devices. We will present some common AI concepts and techniques used by the industry to deploy state-of-the-art smart IoT systems. These techniques allow systems to determined patterns from the data, predict and prevent failures as well as suggest actions that can be used to minimize or avoid IoT device breakdowns on an intelligent way beyond rule-based and database search approaches. We will finish with a demo that puts together all the techniques discussed in an application that uses Apache Spark and Apache Bahir support for MQTT.
Every day, 50,000 flights take off, transit and land safely within US airspace. NASA Aeronautics is behind many of the technology concepts that make this possible. With drones proliferating and traffic volume rising rapidly, NASA needs a way to stay ahead of the curve. In this session, you will learn how IBM Bluemix quickens NASA's pace in air traffic management research, and hear three lessons learned from a recent NASA project using Bluemix Mobile and Bluemix Data Analytics.
Preprint-CSAE,China,21-23 October 2022.pdfChristo Ananth
Call for Papers- Special Session: Applications of Artificial Intelligence and IoT in Computer Science and Engineering, 6th International Conference on Computer Science and Application Engineering, CSAE 2022,Nanjing, China, October 21 to 23, 2022
Christo Ananth
Professor, Samarkand State University, Uzbekistan
Shceduling iot application on cloud computingEman Ahmed
Resource scheduling considers the execution time of every distinct workload, but most importantly, the overall performance is also based on type of workload i.e. with different QoS requirements (heterogeneous workloads) and with similar QoS requirements (homogenous workloads).
Similar to ICARUS @WIMS 2020 (June 2020, virtual) (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
ICARUS @WIMS 2020 (June 2020, virtual)
1. The ACM 10th International Conference on Web Intelligence, Mining and Semantics
(WIMS ’20), June 30 – July 3, 2020, Biarritz, France
The ICARUS ONTOLOGY:
A general aviation ontology developed using
a multi-layer approach
Dimosthenis Stefanidis, Chrysovalantis Christodoulou, Moysis Symeonidis, George Pallis, Marios D.
Dikaiakos, Loukas Pouis, Kalia Orphanou, Fenareti Lampathaki, Dimitrios Alexandrou
dstefa02@cs.ucy.ac.cy
1
2. Aviation Industry
2
Airlines
Aviation Data
Airports
More than 98
million terabytes
of data by 20261
4.1 billion passengers
and 56.1 million
tones of freight were
carried in 20172
1 www.flightglobal.com/news/articles/insight-from-flightglobal-the-big-data-landscape-446681
2 https://www.icao.int/annual-report-2017/Pages/the-world-of-air-transport-in-2017.aspx
3. Aviation Data
Complex
Derive from heterogeneous data sources
Lack of standardization
Data integration and linking challenge
3
“One of the biggest challenges is to integrate the different data silos, for example weather data, live airspace usage or data from the
airports. There is really no standard and that complicates things. Even internally we are still merging different data sets from different areas in
the company. Insights emerge when you put different departmental data sets together, but at the moment, it is not a smooth process.”
By Rey-Villaverde (Head of Data Science, EasyJet)
Varying data formats
A significant bottleneck with huge cost
4. Data Integration Problem
With no common standard, aviation data models can vary along
various dimensions!
Data providers can use different formats to encode aviation data
e.g. an airline carrier ID field could be stored as a three (IATA) or four-letter code (ICAO)
Field names assigned to values can be misleading
e.g. a provider may use the name "AT" while another may use "arrTime" for the field "aircraft arrival time"
Even if two data fields are identical, that doesn't ensure that the data represents the same information
e.g. the "aircraft arrival time“ may correspond to a scheduled or an actual arrival time.
Data values may be recorded at different temporal frequencies (e.g. once per hour) or spatial regions (e.g.
airspace sectors, geographic regions)
Measurement units are often omitted in the data storage schemes and can lead to problems when different
units are employed across different systems (e.g. metric vs imperial, feet vs flight level)
4
5. ICARUS Platform
5
A novel Big Data platform to deal with
the data integration challenges in the
aviation
Allows exploration, sharing, trading,
curation, integration and deep analysis
in a trusted manner
Original and derivative data, characterized
by different volume, velocity and variety
www.icarus2020.aero
7. The ICARUS Ontology
Represents meaningfully entities of the ICARUS Platform
e.g. datasets, algorithms, services (a combination of data and ML algorithms), usage statistics,
registered experts etc.
Captures structural and semantic characteristics of entities by using semantic annotation of
datasets
Extracts metadata from ICARUS Platform operations to construct the ICARUS knowledge-base
Supports continuous integration of new datasets, services, and users into the platform
Supports search, query and linking over multiple data sources and information assets
Feeds the ICARUS recommendation engine with useful information
7
8. ICARUS Ontology - A Multi-layer Approach
8
Meta Contexts
and attributes
(top-level ontology
related to metadata
of entities)
Domain-specific
context and attributes
(domain ontologies
related to aviation)
C
C1
C2
C3
C4
Weather Airport Aircraft
FlightPassenger Health
Top-level ontology for the
metadata of entities
…
9. ICARUS Ontology - Design Process
9
Expand
the domain-level
ontologies
Capture important
concepts,
relationships and
data fields from
aviation
stakeholders’ data
Integrate
existing aviation
domain ontology
(e.g. NASA
ontology1)
Create a top-level
ontology for
describing
platform’s
concepts
Ontology Coding
Based on a formal language (OWL) using Protégé.
1 Keller, R. M. (2016, September). Ontologies for aviation data management. In 2016
IEEE/AIAA 35th Digital Avionics Systems Conference (DASC) (pp. 1-9). IEEE.
15. Possible competencies questions that ICARUS Ontology
can answer
15
‘‘Which datasets contain columns about flight delay time?’’
‘‘Which is the airport departure terminal for a specific flight?’’
‘‘How many were the occupied seats on a specific flight?’’
17. Scenario: Twitter
Based on Twitter data (e.g. travelers' tweets) that are related to the aviation,
the popularity (e.g. sentiment score) of airlines and airports can be found
and extracted.
Providing such kind of statistics (e.g. popularity) could help airlines to find:
the most common problems in case of bad flight
e.g. late flight, long lines, lost luggage, customer service, etc.
popularity of one airline versus competitors
17
18. Scenario: Twitter
1. Ontology Extension: expand the ICARUS ontology based on the
concepts and entities related to Twitter e.g. twitter user account, the
number of followers, tweets, etc.
2. Data Collection: retrieve tweets and airlines accounts via the Twitter
Streaming API
3. Data Pre-processing: apply data cleaning and natural language
processing (NLP) techniques to the retrieved tweets
18
19. Scenario: Twitter
4. Emotions Extraction: perform sentiment analysis (e.g. VADER) on a
set of retrieved tweets, by including emotion categories in the
ontology
5. Storage: store to the ICARUS ontology (knowledge base)
6. Query KB: Use SPARQL queries to answer possible questions of
ICARUS users
e.g. "Which airline has the lowest popularity?" (searching for the entity airline with the
most negative sentiment based on the stored aggregated statistics)
19
20. Scenario: Recommendations
20
Provide high-quality recommendations of datasets and services to the
ICARUS users by utilizing the ICARUS ontology.
We can recommend assets that are connected indirectly to the user’s
preferences and needs by:
capturing structural and semantic characteristics of the various
ICARUS entities
inferring relationships (e.g. inheritance) between users and assets that
were hidden
21. Scenario: Recommendations
21
1. Data Collection: retrieve data related to users and assets of the
ICARUS platform (e.g. preferences, interactions like purchases with
datasets and services, metadata of datasets, etc.).
2. Storage: store to the ICARUS ontology (knowledge base)
3. Reasoning: apply a reasoning algorithm (Pellet) to reveal hidden
relationships (e.g. inheritance)
22. Scenario: Recommendations
22
4. Recommender: Use a weighted-based hybrid recommendation
system approach to provide recommendation of datasets and
services to each user
Content-Based: Use SPARQL queries to retrieve users’ preferences, geolocation
and organization types with the respective information of the given datasets and
services
Collaborative Filtering: Use SPARQL queries to retrieve the interplay between
users and assets to construct the interaction matrix
23. Scenario: COVID-19
Current challenges of health organizations:
locate, collect, explore and integrate reliable data about airline and
human mobility, with a sufficient geographical coverage and
resolution.
Improving such level of detail would result:
in more accurate epidemic predictions and
a possible estimation of relative revenue losses to be expected in different
pandemic scenarios
23
24. Scenario: COVID-19
The ICARUS ontology and the relationships between each entity
can be utilized to combine epidemics data with other aviation-
related data for data analytics and epidemic forecasts.
1. Ontology Extension: expand the ICARUS ontology based on the
concepts and entities related to COVID e.g. mortality rate, cases per
city/country, etc.
2. Data Collection: retrieve open data related to COVID-19 and aviation
data
24
25. Scenario: COVID-19
3. Storage: store to the ICARUS ontology (knowledge base)
4. Query KB: Use SPARQL queries, aviation-related data and COVID-
19 data to answer possible questions of ICARUS users
e.g. "Which datasets can help me predict a virus transmission from incoming flights?"
(Find datasets that are related to incoming flights and virus and they are utilized by
forecasting services for virus transmission)
25
26. Conclusion
We presented the ICARUS ontology, an aviation domain ontology
designed using a multi-layer approach for enabling:
the integration and reasoning over multiple sources of heterogeneous aviation-
related data
the semantic description of metadata produced by the ICARUS platform
Main strengths of the proposed ontology:
extendibility and interoperability due to the multi-layer design
ease of use on multiple aviation data sources of different format and structure
26
27. Thank You!
The ontology is available here (open source):
https://github.com/UCY-LINC-LAB/icarus-ontology
Co-funded by the European
Commission Horizon 2020
- Grant # 780792
27