Presentation of the spatiotemporal RDF store Strabon at the Linked Data Europe Workshop, co-located with the European Data Forum in Athens, Greece (21 March 2014)
AI & Topology concluding remarks - "The open-source landscape for topology in...Umberto Lupo
A short concluding speech for the AI & Topology session at the 2020 Applied Machine Learning Days (28 January 2020).
We remark on the strength of the case for using topological methods in various domains of machine learning. We then comment on our views on integrating topology with the practice of machine learning at a fundamental level. We give an (inexhaustive) overview of the open-source landscape for topological machine learning and data analysis, including our contribution, the giotto-tda Python package. Finally, we mention some promising future directions in the field.
What am I going to get from this course?
Provides a basic conceptual understanding of how clustering works
Provides intuitive understanding of the mathematics behind various clustering algorithms
Walk through Python code examples on how to use various cluster algorithms
Show how clustering is applied in various industry applications
Check it on Experfy: https://www.experfy.com/training/courses/unsupervised-learning-clustering
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Educational slides on TRACLUS, an algorithm for clustering trajectory data created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
Presentation of the spatiotemporal RDF store Strabon at the Linked Data Europe Workshop, co-located with the European Data Forum in Athens, Greece (21 March 2014)
AI & Topology concluding remarks - "The open-source landscape for topology in...Umberto Lupo
A short concluding speech for the AI & Topology session at the 2020 Applied Machine Learning Days (28 January 2020).
We remark on the strength of the case for using topological methods in various domains of machine learning. We then comment on our views on integrating topology with the practice of machine learning at a fundamental level. We give an (inexhaustive) overview of the open-source landscape for topological machine learning and data analysis, including our contribution, the giotto-tda Python package. Finally, we mention some promising future directions in the field.
What am I going to get from this course?
Provides a basic conceptual understanding of how clustering works
Provides intuitive understanding of the mathematics behind various clustering algorithms
Walk through Python code examples on how to use various cluster algorithms
Show how clustering is applied in various industry applications
Check it on Experfy: https://www.experfy.com/training/courses/unsupervised-learning-clustering
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Educational slides on TRACLUS, an algorithm for clustering trajectory data created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07.
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
Slide for study session given by Dr. Enrico Rinaldi at Arithmer inc.
It is a summary of recent methods for real-time instance segmentation "YOLACT", which is especially useful in robotics.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Track 13. New Trends in Digital Humanities
Authors: Alejandro Benito; Antonio G. Losada; Roberto Theron; Amelie Dorn; Melanie Seltmann; Eveline Wandl-Vogt
https://youtu.be/5tTot6vinZk
An Efficient Arabic Text Spotting from Natural Scenes ImagesReham Marzouk
The objective of the work was to reveal the
efficiency of state of the arts methods on spotting texts in natural scene images on the Arabic text images.
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
For my presentation for a reading group. I have not in any way contributed this study, which is done by the researchers named on the first slide.
https://papers.nips.cc/paper/6418-interaction-networks-for-learning-about-objects-relations-and-physics
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...ijsrd.com
Moving Object Databases (MOD), although ubiquitous, still call for methods that will be able to understand, search, analyze, and browse their spatiotemporal content. In this paper, we propose a method for trajectory segmentation and sampling based on the representativeness of the (sub) trajectories in the MOD. In order to find the most representative sub trajectories, the following methodology is proposed. First, a novel global voting algorithm is performed, based on local density and trajectory similarity information. This method is applied for each segment of the trajectory, forming a local trajectory descriptor that represents line segment representativeness. The sequence of this descriptor over a trajectory gives the voting signal of the trajectory, where high values correspond to the most representative parts. Then, a novel segmentation algorithm is applied on this signal that automatically estimates the number of partitions and the partition borders, identifying homogenous partitions concerning their representativeness. Finally, a sampling method over the resulting segments yields the most representative sub trajectories in the MOD. Our experimental results in synthetic and real MOD verify the effectiveness of the proposed scheme, also in comparison with other sampling techniques.
Slide for study session given by Dr. Enrico Rinaldi at Arithmer inc.
It is a summary of recent methods for real-time instance segmentation "YOLACT", which is especially useful in robotics.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Track 13. New Trends in Digital Humanities
Authors: Alejandro Benito; Antonio G. Losada; Roberto Theron; Amelie Dorn; Melanie Seltmann; Eveline Wandl-Vogt
https://youtu.be/5tTot6vinZk
An Efficient Arabic Text Spotting from Natural Scenes ImagesReham Marzouk
The objective of the work was to reveal the
efficiency of state of the arts methods on spotting texts in natural scene images on the Arabic text images.
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
For my presentation for a reading group. I have not in any way contributed this study, which is done by the researchers named on the first slide.
https://papers.nips.cc/paper/6418-interaction-networks-for-learning-about-objects-relations-and-physics
Trajectory Segmentation and Sampling of Moving Objects Based On Representativ...ijsrd.com
Moving Object Databases (MOD), although ubiquitous, still call for methods that will be able to understand, search, analyze, and browse their spatiotemporal content. In this paper, we propose a method for trajectory segmentation and sampling based on the representativeness of the (sub) trajectories in the MOD. In order to find the most representative sub trajectories, the following methodology is proposed. First, a novel global voting algorithm is performed, based on local density and trajectory similarity information. This method is applied for each segment of the trajectory, forming a local trajectory descriptor that represents line segment representativeness. The sequence of this descriptor over a trajectory gives the voting signal of the trajectory, where high values correspond to the most representative parts. Then, a novel segmentation algorithm is applied on this signal that automatically estimates the number of partitions and the partition borders, identifying homogenous partitions concerning their representativeness. Finally, a sampling method over the resulting segments yields the most representative sub trajectories in the MOD. Our experimental results in synthetic and real MOD verify the effectiveness of the proposed scheme, also in comparison with other sampling techniques.
Detection of Dense, Overlapping, Geometric Objects gerogepatton
Using a unique data collection, we are able to study the detection of dense geometric objects in image data
where object density, clarity, and size vary. The data is a large set of black and white images of
scatterplots, taken from journals reporting thermophysical property data of metal systems, whose plot
points are represented primarily by circles, triangles, and squares. We built a highly accurate single class
U-Net convolutional neural network model to identify 97 % of image objects in a defined set of test images,
locating the centers of the objects to within a few pixels of the correct locations. We found an optimal way
in which to mark our training data masks to achieve this level of accuracy. The optimal markings for object
classification, however, required more information in the masks to identify particular types of geometries.
We show a range of different patterns used to mark the training data masks, and how they help or hurt our
dual goals of location and classification. Altering the annotations in the segmentation masks can increase
both the accuracy of object classification and localization on the plots, more than other factors such as
adding loss terms to the network calculations. However, localization of the plot points and classification of
the geometric objects require different optimal training data.
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSgerogepatton
Using a unique data collection, we are able to study the detection of dense geometric objects in image data where object density, clarity, and size vary. The data is a large set of black and white images of scatterplots, taken from journals reporting thermophysical property data of metal systems, whose plot points are represented primarily by circles, triangles, and squares. We built a highly accurate single class U-Net convolutional neural network model to identify 97 % of image objects in a defined set of test images, locating the centers of the objects to within a few pixels of the correct locations. We found an optimal way in which to mark our training data masks to achieve this level of accuracy. The optimal markings for object classification, however, required more information in the masks to identify particular types of geometries. We show a range of different patterns used to mark the training data masks, and how they help or hurt our dual goals of location and classification. Altering the annotations in the segmentation masks can increase both the accuracy of object classification and localization on the plots, more than other factors such as adding loss terms to the network calculations. However, localization of the plot points and classification of the geometric objects require different optimal training data
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTSijaia
Using a unique data collection, we are able to study the detection of dense geometric objects in image data where object density, clarity, and size vary. The data is a large set of black and white images of scatterplots, taken from journals reporting thermophysical property data of metal systems, whose plot points are represented primarily by circles, triangles, and squares. We built a highly accurate single class U-Net convolutional neural network model to identify 97 % of image objects in a defined set of test images, locating the centers of the objects to within a few pixels of the correct locations. We found an optimal way in which to mark our training data masks to achieve this level of accuracy. The optimal markings for object classification, however, required more information in the masks to identify particular types of geometries. We show a range of different patterns used to mark the training data masks, and how they help or hurt our dual goals of location and classification. Altering the annotations in the segmentation masks can increase both the accuracy of object classification and localization on the plots, more than other factors such as
adding loss terms to the network calculations. However, localization of the plot points and classification of the geometric objects require different optimal training data.
Introduction to image processing and pattern recognitionSaibee Alam
this power point presentation provides a brief introduction to image processing and pattern recognition and its related research papers including conclusion
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Balistrocchi, M., Metulini, R., Carpita, M., and Ranzi, R.: Dynamic maps of human exposure to floods based on mobile phone data, Nat. Hazards Earth Syst. Sci. Discuss., https://doi.org/10.5194/nhess-2020-201, in press, 2020
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A novel technique for speech encryption based on k-means clustering and quant...journalBEEI
In information transmission such as speech information, higher security and confidentiality are specially required. Therefore, data encryption is a pre-requisite for a secure communication system to protect such information from unauthorized access. A new algorithm for speech encryption is introduced in this paper. It depends on the quantum chaotic map and k-means clustering, which are employed in keys generation. Also, two stages of scrambling were used: the first relied on bits using the proposed algorithm (binary representation scrambling BiRS) and the second relied on k-means using the proposed algorithm (block representation scrambling BlRS). The objective test used statistical analysis measures (signal-to-noise-ratio, segmental signal-to-noise-ratio, frequency-weighted signal-to-noise ratio, correlation coefficient, log-likelihood ratio) applied to evaluate the proposed system. Via MATLAB simulations, it is shown that the proposed technique is secure, reliable and efficient to be implemented in secure speech communication, as well as also being characterized by high clarity of the recovered speech signal.
Similar to Unsupervised Learning: Similarities and distance functions for IoT data (20)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Unsupervised Learning: Similarities and distance functions for IoT data
1. UNSUPERVISED LEARNING: SIMILARITIES AND
DISTANCE FUNCTIONS FOR IOT DATA
G. CASOLLA, V. SCHIANO, S. CUOMO, F. GIAMPAOLO, F. PICCIALLI
University of Naples Federico II
INTRODUCTION (1)
In our research we studied visitors’ behaviours inside one of the most
important museum in Italy, the National Archaeological Museum of
Naples, through a deployed IoT framework. It is composed by smart
sensor boards with Bluetooth and Wi-Fi capabilities. The IoT system
is able to track a visitor path by collecting his position and the related
time-of-stay inside the museum rooms [1]. Our dataset relies on about
19 000 unique users behavioural data composed by two features: (i) the
visiting Path (non-numerical data) and (ii) the Time spent (numerical
data).
Classifying unstructured and unlabelled data is a key challenge dealing
with three main research tasks:
• the study and selection of the appropriate similarity and distance
functions,
• the selection of the number of clusters in which partitioning the
data,
• the choice of the most suitable algorithm to achieve an accurate
data clustering.
SIMILARITIES AND DISTANCE FUNCTIONS (3)
(a) - Cosine similarity
scos(S, T; q) =
v(S; q) · v(T; q)
v(S; q) 2 v(T; q) 2
(b) - Jaccard similarity
sJac(S, T; q) =
|Q(S; q) ∩ Q(T; q)|
|Q(S; q) ∪ Q(T; q)|
(c) - Longest Common Substring
dLCS(S, T) =
LCS(n, m)
n + m
where
LCS(i, j) =
max(i, j) if min(i, j) = 0,
LCS(i − 1, j − 1) if S(i) = T(j),
1 + min {LCS(i − 1, j), LCS(i, j − 1)} otherwise
(d) - Generalized Levenshtein distance
dLv(S, T) =
Lv(n, m)
max{n, m}
where
Lv(i, j) =
max(i, j) if min(i, j) = 0,
min
Lv(i − 1, j) + 1
Lv(i, j − 1) + 1
Lv(i − 1, j − 1) + 1(Si=Tj )
otherwise.
(a) (b)
(c) (d)
More in detail, the q-grams based distances
are calculated from q-grams based similari-
ties with the formula:
dx(S, T) = 1 − sx(S, T).
COMPARISON (4)
An ordered F-measure heatmap representing the comparison among all the experimented
clustering methodologies.
THEORY (2)
The Clustering algorithms we considered
are:
• Hierarchical Clustering
• K-Medoids (PAM)
A similarity function is a tool that gives the
strength of the relationship between two or
more data items.
To deal with the non-numerical Path fea-
ture of our dataset, that can be treated as a
string, we have analyzed four strings simi-
larity functions. A string can be defined as
sequence of finite characters from a finite al-
phabet. A q-gram is a string with q consecu-
tive characters. The q-grams from a string S
are realized by collecting in sequences q char-
acters from S and saving the appearing q-
grams. To deal with the numerical Time fea-
ture we have pre-computed the distance ma-
trix by using the well-known Euclidean dis-
tance and then merged with the other strings’
distances with the formula defined as fol-
lows:
Dsum =
K
i=1
ωiD(i)
where D(i)
is a single distance matrix, K is
the total number of distances that we want to
merge and ωi is the inverse of the maximum
element of the i-th distance matrix D(i)
. It’s
shown that, to improve the performance of a
classification, it is better to combine multiple
distances than to use a single distance matrix
[2].
REFERENCES (5)
[1] F. Piccialli, Y. Yoshimura, P. Benedusi, C. Ratti, and
S. Cuomo, Lessons learned from longitudinal mod-
eling of mobile-equipped visitors in a complex mu-
seum. Neural Comp. and App., pp. 1-17, 2019
[2] A. Ibba, R. Duin, and W.-J. Lee, A study on com-
bining sets of differently measured dissimilarities.
ICPR’10, pp. 3360-3363, 2010