folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsTadahiro Taniguchi
This is a material for invited talk in the workshop on Machine Learning Methods for High-
Level Cognitive Capabilities in Robotics 2016 (ML-HLCR2016) held in IROS2016, Korea.
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
Many historic periods (or events) are remembered
by slogans, expressions or words that are strongly linked to them. Educated people are also able to determine whether a particular word or expression is related to a specific period in human history. The present paper aims to establish correlations between significant historic periods (or events) and the texts written in that period. In order to achieve this, we have developed a system that automatically links words (and topics discovered using Latent Dirichlet Allocation) to periods of time in the recent history. For this analysis to be relevant and conclusive, it must be undertaken on a representative set of texts written throughout history. To this end, instead of relying on manually selected texts, the Google Books Ngram corpus has been chosen as a basis for the analysis. Although it provides only word n-gram statistics for the texts written in a given year, the resulting time series can be used to provide insights about the most important periods and events in recent history, by automatically linking them with specific keywords or even LDA topics.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsTadahiro Taniguchi
This is a material for invited talk in the workshop on Machine Learning Methods for High-
Level Cognitive Capabilities in Robotics 2016 (ML-HLCR2016) held in IROS2016, Korea.
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
Many historic periods (or events) are remembered
by slogans, expressions or words that are strongly linked to them. Educated people are also able to determine whether a particular word or expression is related to a specific period in human history. The present paper aims to establish correlations between significant historic periods (or events) and the texts written in that period. In order to achieve this, we have developed a system that automatically links words (and topics discovered using Latent Dirichlet Allocation) to periods of time in the recent history. For this analysis to be relevant and conclusive, it must be undertaken on a representative set of texts written throughout history. To this end, instead of relying on manually selected texts, the Google Books Ngram corpus has been chosen as a basis for the analysis. Although it provides only word n-gram statistics for the texts written in a given year, the resulting time series can be used to provide insights about the most important periods and events in recent history, by automatically linking them with specific keywords or even LDA topics.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology
How Emotional Are Users' Needs? Emotion in Query LogsMarina Santini
Emotional behaviour seems to be ubiquitous on the web. Predictably, social media web genres such as tweets, blog posts and blog comments show high emotional involvement. What about other genres on the web? In this talk, the focus is on the search query log genre. According to recent IR research, searchers’ behaviour is not only limited to traditional informational, navigational and transactional needs. A novel hypothesis is that the seeking behaviour is driven by emotion. But can emotion be detected by analysing the queries typed by users in a search box? In this talk, I will present the results of some experiments carried out to investigate whether it is possible to identify emotion in the query log genre, and discuss how emotion could be utilized to improve the relevance of retrieved documents in searches. These experiments are part of SearchInFocus, a study centred on search.
Towards Contextualized Information: How Automatic Genre Identification Can HelpMarina Santini
Genre is one of the textual dimensions that can be used to reconstruct the communicative context needed to assess the value of information with respect to a purpose (business, learning, finding, monitoring, predicting, etc.). When we know the genre of a text, we can surmise the CONTEXT where a text has been created and for which purpose. Therefore we can more confidently decide whether a text contains the information we are looking for. For example, factual texts might have more credibility than opinionated texts. In this respect, genres such as press conferences, declarations or announcements by a White House spokesman might be more reliable than subjective genres, e.g. newspapers’ editorials or op-ed articles. On the other hand, if we want to test the pulse and explore the feelings about a product or a politician, we might give more weight to more emotional genres like blogs, forums or social networks’ microposts.
In recent years, important steps forward have been taken in Automatic Genre Identification (AGI). AGI can be defined as a meta-discipline that leverages on and spans Computational Linguistics, NLP, Corpus Linguistics, Information Retrieval, Information Extraction, Text Mining, Text Analytics, Sentiment Analysis and LIS, among others. Promising computational models have been proposed to automatically identify the genre(s) of a text, although no agreement has been reached on the definition of the concept of genre itself. AGI research has shown that genre classes such as blogs, online newspaper front pages, FAQs, DIYs can be automatically identified using a wide range of genre-revealing features -- from linguistic cues to character n-grams -- with a variety of classification algorithms.
In a world where information overload is still pervasive and where technology encourages massive text production through emailing, blogging, tweeting and social network communication, it is likely that the concept of genre and AGI are useful to convert unclassified and unstructured textual data to more structured and contextualized information.
This talk presents a summary of the state-of-the-art in AGI and discusses how genre-aware applications could help extract actionable information from raw textual data.
Slides for my tutorial at the ESWC Summer School 2015, giving an introduction to information extraction with Linked Data and an introduction to one of the applications of information extraction, opinion mining.
study or concern about what kinds of things exist
what entities there are in the universe.
the ontology derives from the Greek onto (being) and logia (written or spoken). It is a branch of metaphysics , the study of first principles or the root of things.
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
The presentation provides an overview of what an ontology is and how it can be used for representing information and for retrieving data with a particular focus on the linguistic resources available for supporting this kind of task. Overview of semantic-based retrieval approaches by highlighting the pro and cons of using semantic approaches with respect to classic ones. Use cases are presented and discussed
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Swoogle: Showcasing the Significance of Semantic SearchIDES Editor
The World Wide Web hosts vast repositories of
information. The retrieval of required information from the
Internet is a great challenge since computer applications
understand only the structure and layout of web pages and
they do not have access to their intended meaning. Semantic
web is an effort to enhance the Internet, so that computers
can process the information presented on WWW, interpret
and communicate with it, to help humans find required
essential knowledge. Application of Ontology is the
predominant approach helping the evolution of the Semantic
web. The aim of our work is to illustrate how Swoogle, a
semantic search engine, helps make computer and WWW
interoperable and more intelligent. In this paper, we discuss
issues related to traditional and semantic web searching. We
outline how an understanding of the semantics of the search
terms can be used to provide better results. The experimental
results establish that semantic search provides more focused
results than the traditional search.
Keynote presentation for the International Semantic Web Conference in Athens Greece, on November 9, 2023. The talk addresses the generative AI explosion and its potential impacts on the Semantic Web and Knowledge Graph communities and, in fact, may spark a research Renaissance.
Abstract:
We are living in an age of rapidly advancing technology. History may view this period as one in which generative artificial intelligence is seen as reshaping the landscape and narrative of many technology-based fields of research and application. Times of disruptions often present both opportunities and challenges. We will discuss some areas that may be ripe for consideration in the field of Semantic Web research and semantically-enabled applications. Semantic Web research has historically focused on representation and reasoning and enabling interoperability of data and vocabularies. At the core are ontologies along with ontology-enabled (or ontology-compatible) knowledge stores such as knowledge graphs. Ontologies are often manually constructed using a process that (1) identifies existing best practice ontologies (and vocabularies) and (2) generates a plan for how to leverage these ontologies by aligning and augmenting them as needed to address requirements. While semi-automated techniques may help, there is typically a significant portion of the work that is often best done by humans with domain and ontology expertise. This is an opportune time to rethink how the field generates, evolves, maintains, and evaluates ontologies. We consider how hybrid approaches, i.e., those that leverage generative AI components along with more traditional knowledge representation and reasoning approaches to create improved processes. The effort to build a robust ontology that meets a use case can be large. Ontologies are not static however and they need to evolve along with knowledge evolution and expanded usage. There is potential for hybrid approaches to help identify gaps in ontologies and/or refine content. Further, ontologies need to be documented with term definitions and their provenance. Opportunities exist to consider semi-automated techniques for some types of documentation, provenance, and decision rationale capture for annotating ontologies. The area of human-AI collaboration for population and verification presents a wide range of areas of research collaboration and impact. Ontologies need to be populated with class and relationship content. Knowledge graphs and other knowledge stores need to be populated with instance data in order to be used for question answering and reasoning. Population of large knowledge graphs can be time consuming. Generative AI holds the promise to create candidate knowledge graphs that are compatible with the ontology schema. The knowledge graph should contain provenance information identifying how the content was populated and its source and correctness and currency should be checked. A human-AI assistant approach is presented.
An Ontology Model for Knowledge Representation over User ProfilesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Marko Grobelnik
At the HrTAL2016 conference I presented the talk on "Language as a Social Sensor to operate with Knowledge". The talk included a section on language as an interface between physical nature and the world of human mind and human society. The role of language as a 'sensor'has several consequences in uncertainties and inexactness of the language evolution, as we know it. The talk was accompanies with several live demonstrations of the systems on semantic annotation (wikifier.org) and media monitoring (eventregistry.org).
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
The amount of information generated in the Web has grown enormously over the years. This information is significant to individuals, businesses and organizations. If analyzed, understood and utilized, it will provide a valuable insight to its stakeholders. However, many of these information are semi-structured or unstructured which makes it difficult to draw in-depth understanding of the implications behind those information. This is where Ontology-based Information Extraction (OBIE) and social media content analysis come into play. OBIE has now become a popular way to extract information coming from machine-readable sources. This paper presents a survey of OBIE, Ontology languages and tools and the process to build an ontology model and framework. The author made a comparison of two ontology building frameworks and identified which framework is complete.
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenient because their creation is fast and inexpensive. Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. Less attention has been paid to assess the representativeness of specialized or domain-specific web corpora. In this paper, we focus on the assessment of domain representativeness of web corpora and we claim that it is possible to assess the degree of domainspecificity, or domainhood, of web corpora. We present a case study where we explore the effectiveness of different measures - namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficient, Kullback– Leibler divergence, log-likelihood and burstiness - to gauge domainhood. Our findings indicate that burstiness is the most suitable measure to single out domain-specific words from a specialized corpus and to allow for the quantification of domainhood.
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
In this study, we focus on the creation and evaluation of domain-specific web corpora. To this purpose, we propose a two-step approach, namely the (1) the automatic extraction and evaluation of term seeds from personas and use cases/scenarios; (2) the creation and evaluation of domain-specific web corpora bootstrapped with term seeds automatically extracted in step 1. Results are encouraging and show that: (1) it is possible to create a fairly accurate term extractor for relatively short narratives; (2) it is straightforward to evaluate a quality such as domain-specificity of web corpora using well-established metrics.
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
In this study, we put forward two claims: 1) it is possible to design a dynamic and extensible corpus without running the risk of getting into scalability problems; 2) it is possible to devise noise-resistant Language Technology applications without affecting performance. To support our claims, we describe the design, construction and limitations of a very specialized medical web corpus, called eCare_Sv_01, and we present two experiments on lay-specialized text classification. eCare_Sv_01 is a small corpus of web documents written in Swedish. The corpus contains documents about chronic diseases. The sublanguage used in each document has been labelled as "lay" or "specialized" by a lay annotator. The corpus is designed as a flexible text resource, where additional medical documents will be appended over time. Experiments show that the layspecialized labels assigned by the lay annotator are reliably learned by standard classifiers. More specifically, Experiment 1 shows that scalability is not an issue when increasing the size of the datasets to be learned from 156 up to 801 documents. Experiment 2 shows that lay-specialized labels can be learned regardless of the large amount of disturbing factors, such as machine translated documents or low-quality texts, which are numerous in the corpus.
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
We present a preliminary study that explores whether text features used for readability assessment are reliable genre-revealing features. We empirically explore the difference between genre and domain. We carry out two sets of experiments with both supervised and unsupervised methods. Findings on the Swedish national corpus (the SUC) show that readability cues are good indicators of genre variation.
word sense disambiguation, wsd, thesaurus-based methods, dictionary-based methods, supervised methods, lesk algorithm, michael lesk, simplified lesk, corpus lesk, graph-based methods, word similarity, word relatedness, path-based similarity, information content, surprisal, resnik method, lin method, elesk, extended lesk, semcor, collocational features, bag-of-words features, the window, lexical semantics, computational semantics, semantic analysis in language technology.
inferential statistics, statistical inference, language technology, interval estimation, confidence interval, standard error, confidence level, z critical value, confidence interval for proportion, confidence interval for the mean, multiplier,
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
1. Seman&c
Analysis
in
Language
Technology
http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm
Semantic Word Clouds
Marina
San(ni
san$nim@stp.lingfil.uu.se
Department
of
Linguis(cs
and
Philology
Uppsala
University,
Uppsala,
Sweden
Spring
2016
3. Semantic Web & Ontologies
• The
goal
of
the
Seman(c
Web
is
to
allow
web
informa(on
and
services
to
be
more
effec(vely
exploited
by
humans
and
automated
tools.
• Essen(ally,
the
focus
of
the
seman(c
web
is
to
share
data
instead
of
documents.
• This
data
must
be
”meaningful”
both
for
human
and
for
machines
(ie
automated
tools
and
web
applica(ons)
• Q:
How
are
we
going
to
represent
meaning
and
knowledge
on
the
web?
• A:
…
via
annota&on.
• Knowledge
is
represented
in
the
form
of
rich
conceptual
schemas/formalisms
called
ontologies.
• Therefore,
ontologies
are
the
backbone
of
the
Seman(c
Web.
• Ontologies
give
formally
defined
meanings
to
the
terms
used
in
annota&ons,
transforming
them
into
seman&c
annota&ons.
3
4. Ontologies
are…
• …
concepts
that
are
hierarchically
organized
4
Tree
of
Porphyry,
III
AD
Wordnet,
XXI
AD
(see
Lect
5,
ex
similarity
measures)
5. Reasoning:
RDF/OWL
vs
Databases
(and
other
data
structures)
OWL
axioms
behave
like
inference
rules
rather
than
database
constraints.
!
Class: Phoenix!
!SubClassOf: isPetOf only Wizard!
!
Individual: Fawkes!
Types: Phoenix!
Facts: isPetOf Dumbledore!
• Fawkes
is
said
to
be
a
Phoenix
and
to
be
the
pet
of
Dumbledore,
and
it
is
also
stated
that
only
a
Wizard
can
have
a
pet
Phoenix.
• In
OWL,
this
leads
to
the
implica(on
that
Dumbledore
is
a
Wizard.
That
is,
if
we
were
to
query
the
ontology
for
instances
of
Wizard,
then
Dumbledore
would
be
part
of
the
answer.
• In
a
database
se[ng
the
schema
could
include
a
similar
statement
about
the
Phoenix
class,
but
in
this
case
it
would
be
interpreted
as
a
constraint
on
the
data:
adding
the
fact
that
Fawkes
isPetOf
Dumbledore
without
Dumbledore
being
already
known
to
be
a
Wizard
would
lead
to
an
invalid
database
state,
and
such
an
update
would
therefore
be
rejected
by
a
database
management
system
as
a
constraint
viola(on.
5
6. So, what is an ontology for us?
6
“An
ontology
is
a
FORMAl,
EXPLICIT
specifica&on
of
a
SHARED
conceptualiza&on”
Studer,
Benjamins,
Fensel.
Knowledge
Engineering:
Principles
and
Methods.
Data
and
Knowledge
Engineering.
25
(1998)
161-‐197
An ontology is an explicit specification of a conceptualization
Gruber, T. A translation Approach to portable ontology specifications. Knowledge Acquisition. Vol. 5. 1993. 199-220
Abstract model and
simplified view of some
phenomenon in the world
that we want to represent
Machine-readable
Concepts, properties
relations, functions,
constraints, axioms,
are explicitly defined
Consensual
Knowledge
7. How
to
build
an
ontology
Generally
speaking
(and
roughly
said),
when
designing
an
ontology,
four
main
components
are
used:
1. Classes
2. Rela(ons
3. Axioms
4. Instances
7
8. Prac(cal
Ac(vity:
emo(ons
8
Your
remarks:
• Emo(ons
are
ambiguous:
eg.
happiness
can
be
also
ill-‐directed
• The
polarity
of
some
emo(ons
cannot
be
assessed…
• etc.
Classes
Rela(ons
Axioms
Instances
etc.
9. Occupa(onal
psychology
(wikipedia)
• Industrial
and
organiza(onal
psychology
(also
known
as
I–O
psychology,
occupa(onal
psychology,
work
psychology,
WO
psychology,
IWO
psychology
and
business
psychology)
is
the
scien$fic
study
of
human
behavior
in
the
workplace
and
applies
psychological
theories
and
principles
to
organiza(ons
and
individuals
in
their
workplace.
• I-‐O
psychologists
are
trained
in
the
scien(st–prac((oner
model.
I-‐O
psychologists
contribute
to
an
organiza(on's
success
by
improving
the
performance,
mo(va(on,
job
sa(sfac(on,
occupa(onal
safety
and
health
as
well
as
the
overall
health
and
well-‐being
of
its
employees.
An
I–O
psychologist
conducts
research
on
employee
behaviors
and
a[tudes,
and
how
these
can
be
improved
through
hiring
prac(ces,
training
programs,
feedback,
and
management
systems.
9
10. In
summary…
Why
to
build
an
ontology?
• To
share
common
understanding
of
the
structure
of
informa(on
among
people
or
machines
• To
make
domain
assump$ons
explicit
• Ojen
based
on
controlled
vocabulary
• To
analyze
domain
knowledge
• To
enable
reuse
of
domain
knowledge
10
11. Ontologies
and
Tags
• Ontologies
and
tagging
systems
are
two
different
ways
to
organize
the
knowledge
present
in
Web.
• The
first
one
has
a
formal
fundamental
that
derives
from
descrip(ve
logic
and
ar(ficial
intelligence.
Domain
experts
decide
the
terms.
• The
other
one
is
simpler
and
it
integrates
heterogeneous
contents,
and
it
is
based
on
the
collabora(on
of
users
in
the
Web
2.0.
User-‐
generated
annota(on.
11
12. Folksonomies
• Tagging
facili(es
within
Web
2.0
applica(ons
have
shown
how
it
might
be
possible
for
user
communi$es
to
collabora$vely
annotate
web
content,
and
create
simple
forms
of
ontology
via
the
development
of
loosely-‐hierarchically
organised
sets
of
tags,
oNen
called
folksonomies….
12
13. Folksonomy=Social
Tagging
• Folksonomies
(also
known
as
social
tagging)
are
user-‐defined
metadata
collec(ons.
• Users
do
not
deliberately
create
folksonomies
and
there
is
rarely
a
prescribed
purpose,
but
a
folksonomy
evolves
when
many
users
create
or
store
content
at
par(cular
sites
and
iden(fy
what
they
think
the
content
is
about.
• “Tag
clouds”
pinpoint
the
frequency
of
certain
tags.
13
14. • A
common
way
to
organize
tags
is
in
tag
clouds…
14
15. Automa(c
folksonomy
construc(on
• The
collec(ve
knowledge
expressed
though
user-‐
generated
tags
has
a
great
poten(al.
• However,
we
need
tools
to
efficiently
aggregate
data
from
large
numbers
of
users
with
highly
idiosyncra$c
vocabularies
and
invented
words
or
expressions.
• Many
approaches
to
automa(c
folksonomy
construc(on
combine
tags
using
sta(s(cal
methods
...
• Ample
space
for
improvement…
15
16. Ontology,
taxonomy,
folksonomy,
etc.
• Many
different
defini(ons…
• A
good
summary
and
interpreta(on
is
here:
hpp://www.ideaeng.com/taxonomies-‐
ontologies-‐0602
16
17. Today…
• We
will
talk
more
generally
about
word
clouds…
17
18. Further
Reading
Seman&c
Similarity
from
Natural
Language
and
Ontology
Analysis
by
Sébas(en
Harispe,
Sylvie
Ranwez,
Stefan
Janaqi,
and
Jacky
Montmain
Synthesis
Lectures
on
Human
Language
Technologies,
May
2015,
Vol.
8,
No.
1
• The
two
state-‐of-‐the-‐art
approaches
for
es(ma(ng
and
quan(fying
seman(c
similari(es/relatedness
of
seman(c
en((es
are
presented
in
detail:
the
first
one
relies
on
corpora
analysis
and
is
based
on
Natural
Language
Processing
techniques
and
seman(c
models
while
the
second
is
based
on
more
or
less
formal,
computer-‐
readable
and
workable
forms
of
knowledge
such
as
seman(c
networks,
thesauri
or
ontologies.
18
20. Acknowledgements
This
presenta(on
is
based
on
the
following
paper:
• Barth
et
al.
(2014).
Experimental
Comparison
of
Seman(c
Word
Cloud.
In
Experimental
Algorithms,
Volume
8504
of
the
series
Lecture
Notes
in
Computer
Science
pp
247-‐258
– Link:
hpps://www.cs.arizona.edu/~kobourov/wordle2.pdf
Some
slides
have
been
borrowed
from
Sergey
Pupyrev.
20
21. Today
• Experiments
on
seman&cs-‐preserving
word
clouds,
in
which
seman(cally
related
words
are
close
to
each
other.
21
22. Outline
• What
is
a
Word
Cloud?
• 3
early
algorithms
• 3
new
algorithms
• Metrics
&
Quan(ta(ve
Evalua(on
22
23. Word
Clouds
• Word
clouds
have
become
a
standard
tool
for
abstrac(ng,
visualizing
and
comparing
texts…
• We
could
apply
the
same
or
similar
techniques
to
the
huge
amonts
of
tags
produced
by
users
interac(ng
in
the
social
networks
23
24. Comparison
&
conceptualiza(on
Tool
24
• Word
Clouds
as
a
tool
for
”conceptualizing”
documents.
Cf
Ontologies
• Ex:
2008,
comparison
of
speeches:
Obama
vs
McCain
Cf.
Lect
10:
Extrac(ve
summariza(on
&
Abstrac(ve
summariza(on
25. Word
Clouds
and
Tag
Clouds…
• …
are
ojen
used
to
represent
importance
among
terms
(ex,
band
popularity)
or
serve
as
a
naviga(on
tool
(ex,
Google
search
results).
25
26. The
Problem…
• How
to
compute
seman(c-‐preserving
word
clouds
in
which
seman(cally-‐related
words
are
close
to
each
other?
26
27. Wordle
hpp://www.wordle.net
• Prac(cal
tools,
like
Wordle,
make
word
cloud
visualiza(on
easy.
They
offer
an
appealing
way
to
SUMMARIZE
text…
Shortoming:
they
do
not
capture
the
rela(onships
between
words
in
any
way
since
word
placement
is
independent
of
context
27
28. Many
word
clouds
are
arranged
randomly
(look
also
at
the
scapered
colours)
28
29. Paperns
and
Vicinity/Adjacency
Humans
are
spontaneously
papern-‐seekers:
if
they
see
two
words
close
to
each
other
in
a
word
cloud,
they
spontaneously
think
they
are
related…
29
30. In
Linguis(cs
and
NLP…
• This
natural
tendency
in
linking
spacial
vicinity
to
seman&c
relatedness
is
exploited
as
evidence
that
words
are
seman(cally
related
or
seman(cally
similar…
Remember?
:
”You
shall
know
a
word
by
the
company
it
keeps
(Firth,
J.
R.
1957:11)”
30
31. So,
it
makes
sense
to
place
such
related
words
close
to
each
other
(look
also
at
the
color
distribu(on)
31
32. Seman(c
word
clouds
have
higher
user
sa(sfac(on
compared
to
other
layouts…
32
33. All
recent
word
cloud
visualiza(on
tools
aim
to
incoprorate
seman(cs
in
the
layout…
33
34. …
but
none
of
them
provide
any
guarantee
about
the
quality
of
the
layout
in
terms
of
seman(cs
34
35. Early
algorithms:
Force-‐Directed
Graph
• Most
of
the
exis(ng
algorithms
are
based
on
force-‐directed
graph
layout.
• Force-‐directed
graph
drawing
algorithms
are
a
class
of
algorithms
for
drawing
graphs
in
an
aesthe(cally
pleasing
way
– Aprac(ve
forces
between
pairs
to
reduce
empty
space
– Repulsive
forces
ensure
that
words
do
not
overlap
– Final
force
preserve
seman(c
rela(ons
between
words.
35
Some
of
the
most
flexible
algorithms
for
calcula(ng
layouts
of
simple
undirected
graphs
belong
to
a
class
known
as
force-‐directed
algorithms.
Such
algorithms
calculate
the
layout
of
a
graph
using
only
informa(on
contained
within
the
structure
of
the
graph
itself,
rather
than
relying
on
domain-‐specific
knowledge.
Graphs
drawn
with
these
algorithms
tend
to
be
aesthe(cally
pleasing,
exhibit
symmetries,
and
tend
to
produce
crossing-‐
free
layouts
for
planar
graphs.
36. Newer
Algorithms:
rectangle
representa(on
of
graphs
• Vertex-‐weighted
and
edge-‐weighed
graph:
– The
ver(ces
of
the
graph
are
the
words
• Their
weight
correspond
to
some
measure
of
importance
(eg.
word
frequencies)
– The
edges
capture
the
seman(c
relatedness
of
pair
of
words
(eg.
co-‐occurrence)
• Their
weight
correspond
to
the
strength
of
the
rela(on
– Each
vertex
can
be
drawn
as
a
box
(rectangle)
with
a
dimension
determing
by
its
weight
– A
realized
adjacency
is
the
sum
of
the
edge
weights
for
all
pairs
of
touching
boxes.
– The
goal
is
to
maximize
the
realized
adjacencies.
36
37. Purpose
of
the
experiments
that
are
shown
here:
• Seman(cs
preserva(on
in
terms
of
closeness/
vicinity/adjacency
37
38. Example
• A
contact
of
2
boxes
is
a
common
boundary.
• The
contact
of
two
boxes
is
interpredet
as
seman(c
relatedness
• The
contact
of
2
boxes
can
be
calculated,
so
the
adjacency
can
be
computed
and
evaluated.
38
41. Lect
6:
Repe((on
large
data
computer
apricot
1
0
0
digital
0
1
2
informa(on
1
6
1
41
Which
pair
of
words
is
more
similar?
cosine(apricot,informa(on)
=
cosine(digital,informa(on)
=
cosine(apricot,digital)
=
cos(
v,
w)=
v•
w
v
w
=
v
v
•
w
w
=
viwii=1
N
∑
vi
2
i=1
N
∑ wi
2
i=1
N
∑
1+0+0
1+0+0
1+36+1
1+36+1
0+1+4
0+1+4
1+0+0
0+6+2
0+0+0
=
1
38
=.16
=
8
38 5
=.58
= 0
43. Input
-‐
Output
• The
input
for
all
algorithms
is
– a
collec(on
of
n
rectangles,
each
with
a
fixed
width
and
height
propor(onal
to
the
rank
of
the
word
– A
similarity/dissimilarity
matrix
• The
output
is
a
set
of
non-‐overlapping
posi(ons
for
the
rectangles.
43
44. Early
Algorithms
1. Wordle
(Random)
2. Context-‐Preserving
Word
Cloud
Visualiza(on
(CPWCV)
3. Seam
Carving
44
45. Wordle
à
Random
•
The
Wordle
algorithm
places
one
word
at
a
(me
in
a
greedy
fashion,
ie
aiming
to
use
space
as
efficiently
as
possible.
• First
the
words
are
sorted
by
weight/rank
in
decreasing
order.
• Then
for
each
word
in
the
order,
a
posi(on
is
picked
at
random.
45
52. Context-‐Preserving
Word
Cloud
Visualiza(on
(CPWCV)
• First,
a
dissimilarity
matrix
is
computed
and
Mul(dimensional
Scaling
(MDS)
is
performed
• Second,
effort
to
create
a
compact
layout
52
Mul(dimensional
Scaling
(MDS)
aims
at
detec(ng
meaningful
underlying
dimensions
in
the
data.
64. 3
New
Algorithms
1. Inflate
and
Push
2. Star
Forest
3. Cycle
Cover
64
65. Inflate-‐and-‐Push
• Simple
heuris(c
method
for
word
layout,
which
aims
to
preserve
seman(c
rela(ons
between
pair
of
words.
• Based
on
1. Heuris(cs:
scaling
down
all
word
rectangles
by
some
constant;
2. Compu(ng
MDS
(mul(dimensional
scaling)
on
the
dissimilarity
matrix
3. Iteretavely
increase
the
size
of
rectangles
by
5%
(ie
”inflate”
words;
4. When
words
overlaps,
apply
a
force-‐directed
algorithm
to
”push”
words
away.
65
71. Star
Forest
• A
star
is
a
tree
• A
star
forest
is
a
forest
whose
connected
components
are
all
stars.
71
72. Repe((on:
trees
and
graphs
• A
tree
is
special
form
of
graph
i.e.
minimally
connected
graph
and
having
only
one
path
between
any
two
ver(ces.
• In
a
graph
there
can
be
more
than
one
path
i.e.
graph
can
have
uni-‐direc(onal
or
bi-‐direc(onal
paths
(edges)
between
nodes.
72
73. Three
steps
1. Extrac(ng
the
star
forest:
par&&on
a
graph
into
disjoint
stars
2. Realising
a
star:
build
a
word
cloud
for
every
star
3. Pack
all
the
stars
together
73
74. Star
Forest
:
star
=
tree
1. Extract
stars
greedily
from
a
dissimilarity
matrix
à
disjoint
stars
=
star
forest
2. Compute
the
op(mal
stars,
ie
the
best
set
of
words
to
be
adjacent
3. Aprac(ve
force
to
get
a
compact
layout
74
75. Cycle
Cover
• This
algorithm
is
based
on
a
similarity
matrix.
• First,
a
similarity
path
is
created
• Then,
the
op(mal
level
of
compact-‐ness
is
computed
75
76. Quan(ta(ve
Metrics
76
1. Realized
Adjacenies
– how
close
are
similar
words
to
each
other?
2. Distor(on
– how
distant
are
dissimilar
words?
3. Uniform
Area
U(liza(on
– uniformity
of
the
distribu(on
(overpopulated
vs
sparse
areas
in
the
word
cloud)
4. Comptactness
– how
well
u(lized
is
the
drawing
area?
5. Aspect
Ra(o
– width
and
height
of
the
bounding
box
6. Running
Time
– execu(on
(me
77. 2
datasets
(1)
WIKI
,
a
set
of
112
plain-‐text
ar(cles
extracted
from
the
English
Wikipedia,
each
consis(ng
of
at
least
200
dis(nct
words
(2)
PAPERS
,
a
set
of
56
research
papers
published
in
conferences
on
experimental
algorithms
(SEA
and
ALENEX)
in
2011-‐2012.
77