This document proposes a catalog of 20 patterns for multilingual linked open data (MLOD). The patterns are classified into activities like naming, dereferencing, labeling, and linking. Each pattern contains a name, description, context, example, and discussion. The goal is to provide generic solutions to common problems in MLOD based on experiences with DBPedia internationalization. Future work may involve feedback, additional patterns, and handling large datasets.
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
The comprehension of whole manually written records is a testing issue which incorporates various
difficult undertakings. Given a written by hand archive, its format needs to be dissected to detach different
content sorts in a first step. These different content sorts can then be coordinated to specific frameworks,
including writer style, image, or table recognizers. Research in programmed author recognizable proof has
principally centred around the measurable methodology. This has prompted the particular and extraction
of factual elements, for example, run-length appropriations, incline dissemination, entropy, and edge-pivot
conveyance. The edge-pivot conveyance highlight out flanks all other measurable elements. Edge-pivot
circulation is an element that portrays the adjustments in bearing of a written work stroke in written by
hand content. The edge- pivot circulation is extricated by method for a window that is slid over an edgerecognized
on offline scanned images. At whatever point the focal pixel of the window is on, the two edge
pieces (i.e. associated successions of pixels) rising up out of this focal pixel are considered. Their bearings
are measured and put away as sets. A joint likelihood dissemination is gotten from an extensive recognition
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...IJCI JOURNAL
A large amount of data in Indian languages stored digitally is in ASCII-based font formats. ASCII has 128
character-set, therefore it is unable to represent all the characters necessary to deal with the variety of
scripts available worldwide. Moreover, these ASCII-based fonts are not based on a single standard
mapping between the character-codes and the individual characters, for a particular Indian script, unlike
the English language fonts based on the standard ASCII mapping. Therefore, it is required that the fonts for
a particular script must be available on the system to accurately represent the data in that script. Also, the
conversion of data in one font into another is a difficult task. The non-standard ASCII-based fonts also pose
problems in performing search on texts in Indian languages available over web. There are 25 official
languages in India, and the amount of digital text available in ASCII-based fonts is much larger than the
text available in the standard ISCII (Indian Script Code for Information Interchange) or Unicode formats.
This paper discusses the work done in the field of font-detection (to identify the font of the given text) and
font-converters (to convert the ASCII-format text into the corresponding Unicode text).
Why are languages and language technologies important in our societies?
How to deal with a less-studied language?
How to build and exploit new language resources?
How much time is needed? How to represent and use time (temporal information in NLP applications)?
How to efficiently use time in research and… personal life?...
These are questions to be answered having the main focus on Romanian.
Mining Object Movement Patterns from Trajectory DataNhatHai Phan
We propose three step framework to select informative patterns from thousand of very redundan. 1) All in One: Mining Multiple Movement Patterns, 2) Fuzzy Moving Object Clusters and Time Relaxed Gradual Trajectory Patterns, and 3) Mining Representative Object Movement Patterns.
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana Roxana Stanoi: Automatic Term Ambiguity Detection. ACL (2) 2013: 804-809
Abstract:
While the resolution of term ambiguity is important for information extraction (IE) systems, the cost of resolving each instance of an entity can be prohibitively expensive on large datasets. To combat this, this work looks at ambiguity detection at the term, rather than the instance,
level. By making a judgment about the general ambiguity of a term, a system is able to handle ambiguous and unambiguous cases differently, improving through-put and quality. To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling. Results over a dataset of entities from four product domains show that the
proposed approach achieves significantly above baseline F-measure of 0.96.
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
The comprehension of whole manually written records is a testing issue which incorporates various
difficult undertakings. Given a written by hand archive, its format needs to be dissected to detach different
content sorts in a first step. These different content sorts can then be coordinated to specific frameworks,
including writer style, image, or table recognizers. Research in programmed author recognizable proof has
principally centred around the measurable methodology. This has prompted the particular and extraction
of factual elements, for example, run-length appropriations, incline dissemination, entropy, and edge-pivot
conveyance. The edge-pivot conveyance highlight out flanks all other measurable elements. Edge-pivot
circulation is an element that portrays the adjustments in bearing of a written work stroke in written by
hand content. The edge- pivot circulation is extricated by method for a window that is slid over an edgerecognized
on offline scanned images. At whatever point the focal pixel of the window is on, the two edge
pieces (i.e. associated successions of pixels) rising up out of this focal pixel are considered. Their bearings
are measured and put away as sets. A joint likelihood dissemination is gotten from an extensive recognition
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...IJCI JOURNAL
A large amount of data in Indian languages stored digitally is in ASCII-based font formats. ASCII has 128
character-set, therefore it is unable to represent all the characters necessary to deal with the variety of
scripts available worldwide. Moreover, these ASCII-based fonts are not based on a single standard
mapping between the character-codes and the individual characters, for a particular Indian script, unlike
the English language fonts based on the standard ASCII mapping. Therefore, it is required that the fonts for
a particular script must be available on the system to accurately represent the data in that script. Also, the
conversion of data in one font into another is a difficult task. The non-standard ASCII-based fonts also pose
problems in performing search on texts in Indian languages available over web. There are 25 official
languages in India, and the amount of digital text available in ASCII-based fonts is much larger than the
text available in the standard ISCII (Indian Script Code for Information Interchange) or Unicode formats.
This paper discusses the work done in the field of font-detection (to identify the font of the given text) and
font-converters (to convert the ASCII-format text into the corresponding Unicode text).
Why are languages and language technologies important in our societies?
How to deal with a less-studied language?
How to build and exploit new language resources?
How much time is needed? How to represent and use time (temporal information in NLP applications)?
How to efficiently use time in research and… personal life?...
These are questions to be answered having the main focus on Romanian.
Mining Object Movement Patterns from Trajectory DataNhatHai Phan
We propose three step framework to select informative patterns from thousand of very redundan. 1) All in One: Mining Multiple Movement Patterns, 2) Fuzzy Moving Object Clusters and Time Relaxed Gradual Trajectory Patterns, and 3) Mining Representative Object Movement Patterns.
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana Roxana Stanoi: Automatic Term Ambiguity Detection. ACL (2) 2013: 804-809
Abstract:
While the resolution of term ambiguity is important for information extraction (IE) systems, the cost of resolving each instance of an entity can be prohibitively expensive on large datasets. To combat this, this work looks at ambiguity detection at the term, rather than the instance,
level. By making a judgment about the general ambiguity of a term, a system is able to handle ambiguous and unambiguous cases differently, improving through-put and quality. To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling. Results over a dataset of entities from four product domains show that the
proposed approach achieves significantly above baseline F-measure of 0.96.
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
Presentation hold at the SEMANTiCS 2014, in regard of this paper: http://doi.acm.org/10.1145/2660517.2660520
In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.
Roy Tennant, Senior Program Officer, OCLC Research
As library collections shift from print materials to digital formats, and as the web enables ubiquitous and instantaneous discovery of information, library users expect to find and access materials online. It’s not enough to have pages “on the web”; library data must be “woven into the web” and integrated into the sites and services that library users frequent daily – Google, Wikipedia, social networks. When information about a library’s collection is locked up behind a specific web site (such as an OPAC), it is often exceedingly difficult for services, such as search engines, to consume that data. Information seekers need to be connected back to their local library resources from wherever they are on the web. The imperative is to make library data available in new data formats that are native to the web, exposing it to the wider web community, making it easily discoverable by other sites, services, and ultimately consumers. Roy Tennant will shed light on what linked data is and how to re-envision, expose and share library data as entities that are part of the web.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via user-defined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets. In this talk, I will present new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce "pilot runs", which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
Presentation hold at the SEMANTiCS 2014, in regard of this paper: http://doi.acm.org/10.1145/2660517.2660520
In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.
Roy Tennant, Senior Program Officer, OCLC Research
As library collections shift from print materials to digital formats, and as the web enables ubiquitous and instantaneous discovery of information, library users expect to find and access materials online. It’s not enough to have pages “on the web”; library data must be “woven into the web” and integrated into the sites and services that library users frequent daily – Google, Wikipedia, social networks. When information about a library’s collection is locked up behind a specific web site (such as an OPAC), it is often exceedingly difficult for services, such as search engines, to consume that data. Information seekers need to be connected back to their local library resources from wherever they are on the web. The imperative is to make library data available in new data formats that are native to the web, exposing it to the wider web community, making it easily discoverable by other sites, services, and ultimately consumers. Roy Tennant will shed light on what linked data is and how to re-envision, expose and share library data as entities that are part of the web.
RDF and other linked data standards — how to make use of big localization dataDave Lewis
The standards and interoperability challenge to using the Resource Description Framework for data resource in linked data. Based on work from CNGL (www.cngl.ie), the FALCON project (www.falcon-project.eu) and the LIDER project (www.lider-project.eu)
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via user-defined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets. In this talk, I will present new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce "pilot runs", which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
Continuous bag of words (cbow) word2vec word embedding work is that it tends to predict the
probability of a word given a context. A context may be a single word or a group of words. But for
simplicity, I will take a single context word and try to predict a single target word.
The purpose of this question is to be able to create a word embedding for the given data set.
data set text:
In linguistics word embeddings were discussed in the research area of distributional semantics. It
aims to quantify and categorize semantic similarities between linguistic items based on their
distributional properties in large samples of language data. The underlying idea that "a word is
characterized by the company it keeps" was popularized by Firth.
The technique of representing words as vectors has roots in the 1960s with the development of
the vector space model for information retrieval. Reducing the number of dimensions using
singular value decomposition then led to the introduction of latent semantic analysis in the late
1980s.In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language
models" to reduce the high dimensionality of words representations in contexts by "learning a
distributed representation for words". (Bengio et al, 2003). Word embeddings come in two different
styles, one in which words are expressed as vectors of co-occurring words, and another in which
words are expressed as vectors of linguistic contexts in which the words occur; these different
styles are studied in (Lavelli et al, 2004). Roweis and Saul published in Science how to use
"locally linear embedding" (LLE) to discover representations of high dimensional data structures.
The area developed gradually and really took off after 2010, partly because important advances
had been made since then on the quality of vectors and the training speed of the model.
There are many branches and many research groups working on word embeddings. In 2013, a
team at Google led by Tomas Mikolov created word2vec, a word embedding toolkit which can train
vector space models faster than the previous approaches. Most new word embedding techniques
rely on a neural network architecture instead of more traditional n-gram models and unsupervised
learning.
Limitations
One of the main limitations of word embeddings (word vector space models in general) is that
possible meanings of a word are conflated into a single representation (a single vector in the
semantic space). Sense embeddings are a solution to this problem: individual meanings of words
are represented as distinct vectors in the space.
For biological sequences: BioVectors
Word embeddings for n-grams in biological sequences (e.g. DNA, RNA, and Proteins) for
bioinformatics applications have been proposed by Asgari and Mofrad. Named bio-vectors
(BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins
(amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representa.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
ISO 25964: Thesauri and Interoperability with Other VocabulariesMarcia Zeng
ISO 25964: Thesauri and Interoperability with Other Vocabularies + IFLA Guidelines for Multilingual Thesauri + A KOS Resource Application Profile. Presented at 2009 NKOS Workshop.
El modelo de traducción de voz de extremo a extremo de alta calidad se basa en una gran escala de datos de entrenamiento de voz a texto,
que suele ser escaso o incluso no está disponible para algunos pares de idiomas de bajos recursos. Para superar esto, nos
proponer un método de aumento de datos del lado del objetivo para la traducción del habla en idiomas de bajos recursos. En particular,
primero generamos paráfrasis del lado objetivo a gran escala basadas en un modelo de generación de paráfrasis
que incorpora varias características de traducción automática estadística (SMT) y el uso común
función de red neuronal recurrente (RNN). Luego, un modelo de filtrado que consiste en similitud semántica
y se propuso la co-ocurrencia de pares de palabras y habla para seleccionar la fuente con la puntuación más alta
pares de paráfrasis de los candidatos. Resultados experimentales en inglés, árabe, alemán, letón, estonio,
La generación de paráfrasis eslovena y sueca muestra que el método propuesto logra resultados significativos.
y mejoras consistentes sobre varios modelos de referencia sólidos en conjuntos de datos PPDB (http://paraphrase.
org/). Para introducir los resultados de la generación de paráfrasis en la traducción de voz de bajo recurso,
proponen dos estrategias: recombinación de pares audio-texto y entrenamiento de referencias múltiples. Experimental
Los resultados muestran que los modelos de traducción de voz entrenados en nuevos conjuntos de datos de audio y texto que combinan
los resultados de la generación de paráfrasis conducen a mejoras sustanciales sobre las líneas de base, especialmente en
lenguas de escasos recursos.
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Deep Learning Architectures for Semantic Relation Detection Tasks
Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.
Although RDF can be considered the corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who want to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will briefly introduce both technologies using some examples and compare them. We will also present some challenges and applications related with RDF data shapes.
Talk given at: KTH Royal Institute of Technology, School of Industrial Engineering and Management, Mechatronics Division, 7th February, 2020
Although RDF is a corner stone of semantic web and knowledge graphs, it has not been embraced by everyday programmers and software architects who need to safely create and access well-structured data. There is a lack of common tools and methodologies that are available in more conventional settings to improve data quality by defining schemas that can later be validated. Two technologies have recently been proposed for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). In the talk, we will review the history and motivation of both technologies. We will also and enumerate some challenges and future work with regards to RDF validation.
Como publicar los datos: datos abiertos y enlazados
Charla impartida en Jornadas Open Data y Transparencia: Ayuntamiento de Oviedo
11 de septiembre de 2017
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Multlingual Linked Data Patterns
1. Multilingual Linked Open Data
Patterns
Jose Emilio Labra Gayo
University of Oviedo, Spain
Joint work with:
Dimitris Kontokostas Sören Auer
Universität Leipzig, Germany
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
2. MLOD Patterns
From best practices (Dublin) to patterns (Rome)
We propose a catalog of 20 MLOD patterns
Pattern = generic solution to a problem in a context
Common vocabulary
Patterns can be related to each other
Each pattern contains
Name
Some patterns can contradict other patterns Description
There are already Linked data patterns Context
We focus on multilingual linked data patterns Example
Discussion
See also
Based on DBPedia I18n experience
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
3. MLOD Patterns
Patterns are classified by activity:
Naming:
URI design, URIs, IRIs, etc.
Dereference:
How is the content that we return affected by multilingualism
Labeling:
Handling multilingual labels
Longer descriptions:
Longer textual descriptions
Linking
Links between concepts in different languages?
Reuse
Vocabularies and multilingualism
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
4. General overview
Goal Pattern Description
Descriptive URIs Use descriptive URIs with ASCII characters, % encoding extended characters
Opaque URIs Use non human-readable URIs
Naming Full IRIs Use IRIs with unicode characters
Internationalized local names Use Unicode characters only for local names
Language in URIs Include language information in the URI
Return Language independent data Return the same triples independently of the language
Dereference
Language content negotiation Return different triples depending on user agent preferences
Label everything Define labels for all the resources
Labeling Multilingual labels Add language tags to labels
Labels without language tag Add labels without language tags in a default language
Divide longer descriptions Replace long descriptions by more resources with labels
Longer descriptions Add lexical information Add lexical information to long descriptions
Structured literals Use HTML/XML literals for longer descriptions
Identity links Use owl:sameAs and similar predicates
Linking Soft links Use predicates with soft semantics
Linguistic metadata Add linguistic metadata about the dataset terms
Monolingual vocabularies Attach labels to vocabularies in a single language
Multilingual vocabularies Prefer multilingual vocabularies
Reuse
Localize existing vocabularies Translate labels of existing vocabularies
More info: http://www.weso.es/MLODPatterns ones
Create new localized vocabulariesCreate custom vocabularies and link to existing
Jose Emilio Labra Gayo
5. Motivating example
Juan is an armenian professor at the University of León.
birthPlace
Armenia
Professor
position
Juan
worksAt
University of León
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
6. http://օրինակ.օրգ#Հայաստան
http://օրինակ.օրգ#Հայաստան
Naming
Selecting a URI scheme for Armenia
Descriptive URIs http://example.org/Armenia
Human-readable Only ASCII, %-encode non-ASCII characters
Opaquesupport
Good tool URIs http://example.org/I23AX45
http://example.org/Universidad%20de%Le%C3%B3n
Independence between concept and Non Human-readable
http://օրինակ.օրգ#Հայաստան
Full IRIs
natural language representation Difficult to handle by developers
Internationalized local names
More natural for non-Latin based http://example.org#Հայաստան
Subject to visual spoofing attacks
languages Not so good tool support
Avoids domain name spoofing http://hy.example.org#Հայաստան
Visual spoofing attacks can still possible
Include language in URIs
More human-friendly identifiers http://en.example.org#Armenia
Adding language info to the URI can become unwieldy
Independent development of datasets Example: languages & sublanguages
by language hy-Latin-IT-arevela
Where should we put the language tag?
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
7. Dereference
Language based content negotiation?
No language content negotiation http://example.org/Armenia
Always returns the same data
Easy to develop Clients have to filter triples in other languages
Language content negotiation
Consistency of data http://example.org/Armenia
Computation & network overhead
Returns different data depending on
Accept-language
Accept-language:en Accept-language:hy
:Armenia rdfs:label "Armenia"@en . :Armenia rdfs:label "Հայաստան "@hy .
Improves clients performance Difficult to implement
Less network overhead Semantic equivalence between data
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
8. Labeling
Label everything
:Armenia rdfs:label "Armenia" .
:juan rdfs:label "Juan López" .
:position rdfs:label "Job position" .
User agents can show labels instead or URIs
Multilingual Labels Not always feasible, which labels?
Labels can be used for searching Labels are for humans
:juan :position "Professor"@en .
Avoid machine-oriented notations
:juan :position "Catedrático"@es .
Labels without are part of RDF Model
Multilingual labels language tag SPARQL can be more difficult
SELECT * WHERE {
:juan :position "Professor"@en ?x :position "Professor" .
.
:juan :position "Catedrático"@es .
}
:juan :position "Professor" .
SPARQLing is easier Choosing a default language is controversial
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
9. Longer descriptions
Divide long descriptions
:juan :jobtitle
"Professor at the University of León"@en .
:juan :position :professor .
:juan :workPlace :uniLeón .
:professor rdfs:label "Professor"@en .
:uniLeón rdfs:label "University of León"@en .
Fine-grained data is more amenable to semantic web apps More complexity of the model
Apps can generate more readable information Not always possible
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
10. Longer descriptions
Provide lexical information
:uniLeón a lemon:LexicalEntry ;
lemon:decomposition (
[ lemon:element :University ]
More value to dataset [ lemon:element :Of ]
Automatic manipulation [ lemon:element :León ]
Can improve message generation );
rdfs:label "University of León"@en .
Complexity overhead :University a lemon:LexicalEntry ;
Feasibility lexinfo:partOfSpeech lexinfo:commonNoun ;
rdfs:label "University"@en ;
rdfs:label "Universidad"@es .
:Of a lemon:LexicalEntry ;
lexinfo:partOfSpeech lexinfo:preposition ;
rdfs:label "of"@en ;
rdfs:label "de"@es .
:León a lemon:LexicalEntry ;
lexinfo:partOfSpeech lexinfo:properNoun ;
rdfs:label "León"
Example provided by J. McCrae
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
11. Longer descriptions
Structured literals
:uniLeón :desc
"<p>University of
<span translate="no">León</span>,
Spain.
</p>"^^rdf:XMLLiteral .
Leverage existing I18N techniques Interaction between 2 abstracion models
Bidi, Ruby, Localization notes, ... RDF vs XML/HTML
Large portions of structured literals can hinder LD
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
12. Linking
Inter-language identity links
<http://hy.example.org#Հայաստան>
owl:sameAs
<http://en.example.org#Armenia> .
Soft Inter-language links
owl:sameAs is a well known property Too strong semantics of owl:sameAs
Already supported by linked data applications Concepts may not be the same
<http://hy.example.org#Հայաստան>
Contradictions
rdfs:seeAlso
<http://en.example.org#Armenia> .
Link linguistic meta-data
Several predicates No standard property
rdfs:seeAlso No support for inference
:Catedrático
skos:related
dbo:wikiPageLanguageLink
lexvo:means wordnet:Professor ;
lexvo:language
<http://lexvo.org/id/iso639-3/spa> .
Links between multilingual labels No standard practice
Can declare language of a dataset Semantic equivalence between concepts
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
13. Reuse
Monolingual vocabularies
FOAF, Dublin Core, OWL, RDF Schema, ... are
only in English
Multilingual vocabularies
Easier to control vocabulary evolution Monolingual vocabularies in multilingual applications require
:position a owl:DatatypeProperty ;
Avoid bad translations, ambiguities a translation layer
rdfs:domain :UniversityStaff ;
rdfs:label "Position"@en ;
rdfs:label "Puesto"@es .
:UniversityStaff a owl:Class ;
rdfs:label "University staff"@en ;
rdfs:label "Trabajador universitario"@es .
Elegant solution in multilingual contexts Some common vocabularies are monolingual
More control over translations Maintenance is more difficult
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
14. Reuse
Localize existing vocabularies
dc:contributor rdfs:label "Colaborador"@es .
Transparently select label in preferred language Polluting well known vocabularies = controversial
Principle AAA
Anyone can say Anything about Any topic
Create new localized vocabularies
dc:contributor
owl:equivalentProperty :colaborador .
:colaborador
rdfs:label "Colaborador"@es .
Freedom to taylor vocabulary to specific needs More difficult to humans/agents to recognize new
properties/classes
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
15. Future work
The catalog is not closed
Other issues & patterns
Microdata & RDFa
Other I18n topics
Handle big datasets
Localization workflows
Feedback from the community
Best practices, Patterns, Anti-patterns?
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
16. End of presentation
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo
17. DBPedia I18n
From Descriptive URIs to Internationalized
local names
Soft & Identity inter-language links
Label everything
More info: http://www.weso.es/MLODPatterns
Jose Emilio Labra Gayo