Il s'agit ici non pas d'une recherche et d'une analyse exhaustive, mais bien d'un premier rapport d'analyse sur la première passation (6 autres depuis) d'un test visant à évaluation les connaissances et compétences informationnelles d'étudiants à la maîtrise de type recherche en ingénierie.
Reference Manager Software for managing your review references and collaboration (with an introduction to Mendeley)
Presenter: Dr. Amy Price, MA, MSc, Ph.D. – DPhil student, Department of Primary Health Care Sciences and Department of Continuing Education, The University of Oxford
Amy Price is a Trustee of the ThinkWell charity where she leads the PLOT-IT (Public Led Online Trials-Infrastructure and Tools) project. Her goal is to build clear channels to propel evidence into practice by supplying the public, and those in low resource areas, with tools to make evidence-based healthcare choices. Responsible shared decision-making requires access to standardized and accurate shared knowledge. Her desire is to mentor others to reach their full potential. Amy’s experience has shown her that shared knowledge, interdisciplinary collaboration and evidence-based research is the voice that will shape and develop the future. Her background in international relief work, clinical neurocognitive rehabilitation, service on the boards of multiple patient and medical organizations, and as a trauma survivor has equipped her with the flexible mindset to relate to all stakeholders and cultures and to adapt quickly to new technology and help others bridge this gap.
Reference Manager Software for managing your review references and collaboration
Summary: Sharing, editing and managing review references with multiple authors who use different operating systems and software can be a rewarding but daunting task. This hands-on workshop will share tips and tricks for simple ways of organizing, sharing, importing and exporting references and full PDFs across multiple software packages.
Methods: You will be introduced to the use of bibliographic tools, with a specific emphasis on Mendeley (a free cross-platform, multi-device reference manager program) and Google Scholar. The workshop includes an introduction to the basic functions: importing pdf's, web importer, reading and annotating, Word plugin and literature search. Easily develop a research network to manage your papers online, discover research trends and statistics, and to connect with like-minded researchers.
Purpose: This workshop is useful for those who are starting your first review as well as for those of us who have done multiple research projects but find it easier to search on Google than find the resources already saved on the computer. The tools demonstrated can be used on a computer, tablet or even a smartphone.
Changes in legal citation network over 30 years(PyCon Korea 2018)재윤 김
PyCon Korea 2018 program, data analytics session.
Analyzed Korean legal codes' citation structure with Python.
Used matplotlib, networkX, powerlaw, seaborn, pandas libraries.
https://www.pycon.kr/2018/program/21
Il s'agit ici non pas d'une recherche et d'une analyse exhaustive, mais bien d'un premier rapport d'analyse sur la première passation (6 autres depuis) d'un test visant à évaluation les connaissances et compétences informationnelles d'étudiants à la maîtrise de type recherche en ingénierie.
Reference Manager Software for managing your review references and collaboration (with an introduction to Mendeley)
Presenter: Dr. Amy Price, MA, MSc, Ph.D. – DPhil student, Department of Primary Health Care Sciences and Department of Continuing Education, The University of Oxford
Amy Price is a Trustee of the ThinkWell charity where she leads the PLOT-IT (Public Led Online Trials-Infrastructure and Tools) project. Her goal is to build clear channels to propel evidence into practice by supplying the public, and those in low resource areas, with tools to make evidence-based healthcare choices. Responsible shared decision-making requires access to standardized and accurate shared knowledge. Her desire is to mentor others to reach their full potential. Amy’s experience has shown her that shared knowledge, interdisciplinary collaboration and evidence-based research is the voice that will shape and develop the future. Her background in international relief work, clinical neurocognitive rehabilitation, service on the boards of multiple patient and medical organizations, and as a trauma survivor has equipped her with the flexible mindset to relate to all stakeholders and cultures and to adapt quickly to new technology and help others bridge this gap.
Reference Manager Software for managing your review references and collaboration
Summary: Sharing, editing and managing review references with multiple authors who use different operating systems and software can be a rewarding but daunting task. This hands-on workshop will share tips and tricks for simple ways of organizing, sharing, importing and exporting references and full PDFs across multiple software packages.
Methods: You will be introduced to the use of bibliographic tools, with a specific emphasis on Mendeley (a free cross-platform, multi-device reference manager program) and Google Scholar. The workshop includes an introduction to the basic functions: importing pdf's, web importer, reading and annotating, Word plugin and literature search. Easily develop a research network to manage your papers online, discover research trends and statistics, and to connect with like-minded researchers.
Purpose: This workshop is useful for those who are starting your first review as well as for those of us who have done multiple research projects but find it easier to search on Google than find the resources already saved on the computer. The tools demonstrated can be used on a computer, tablet or even a smartphone.
Changes in legal citation network over 30 years(PyCon Korea 2018)재윤 김
PyCon Korea 2018 program, data analytics session.
Analyzed Korean legal codes' citation structure with Python.
Used matplotlib, networkX, powerlaw, seaborn, pandas libraries.
https://www.pycon.kr/2018/program/21
Presentation by Miguel Alvarez-Rodriguez, DG DIGIT, European Commission, at seminar 2, held on 18 March 2021, which addresses digital government principles and building blocks. This 2nd event takes place in the framework of a series of three webinars organised by the SIGMA Programme, a joint initiative of the OECD and EU, principally financed by the EU, on the role of life events in end-to-end public service delivery.
ASSIGNMENT
DRIVE
SPRING 2017
PROGRAM
Master of Business Administration- MBA
SEMESTER
Semester 1
SUBJECT CODE & NAME
MBA105 - MANAGERIAL ECONOMICS
BK ID
B1625
CREDIT & MARKS
4 Credits, 30 marks
Note –The Assignment is divided into 2 sets. You have to answer all questions in both sets and submit as one document. Average of both assignments marks scored by you will be considered as your IA marks. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Assignment Set -1
Questions
Q.No
Questions
Marks
Total Marks
1
Explain the meaning and Features of demand forecasting?
Meaning Of demand Forecasting
3
Features of demand forecasting
7
10
2
Explain the cost output relationship and nature and behavior of cost curve in the short run with hypothetical cost schedule?
The cost output relationship and nature and behavior of cost curve in the short run with hypothetical cost schedule
10
10
3
Write short notes on:
a) Consumption Function
b) Investment Function
Define Consumption Function
5
Define Investment Function
5
10
Note – Answer all questions. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Note – Answer all questions. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Q.No
Assignment Set -2
Questions
Total Marks
1
What are the various role of fiscal policy in economic development?
10
Role of fiscal policy
10
2
Explain the law of variable proportions in detail with diagrammatic representation.
10
Law of variable proportions in detail with diagrammatic representation.
10
3
What are the various factors which bring changes in supply?
10
various factors which bring changes in supply
10
Vehicle Intercom System – Trend and Market AnalysisAryanRaj496746
Vehicle intercom system has been useful in many situation and thus they found their uses in many places. From military to commercial vehicles, we can see intercom system installed for secure and disturbance free communication. Many big companies are working to develop more effective communication system. In this report, you will find about different types of vehicle intercom uses and its market future.
Data enrichment is vital for leveraging heterogeneous data sources in various business analyses, AI applications, and data-driven services. Knowledge Graphs (KGs) support the enrichment of heterogeneous data sources by making entities first-class citizens: links to entities help interconnect heterogeneous data pieces or even ease access to external data sources to eventually augment the original data. Data annotation algorithms to find and link entities in reference KGs, as well as to identify out-of-KG entities have been proposed and applied to different types of data, such as tables, and texts. However, despite recent progress in annotation algorithms, the output of these algorithms does not always meet the quality requirements that make the enriched data valuable in downstream applications. As a result, semantic data enrichment remains an effort-consuming and error-prone task. In this seminar, we discuss the relationships between annotation algorithms, data enrichment, and KG construction, highlighting challenges and open problems. In addition, we advocate for a native human-in-the-loop perspective that enables users to control the outcome of the enrichment and, eventually, improve the quality of the enriched data. We focus in particular on the annotation and enrichment of tabular data and briefly discuss the application of a similar paradigm to the enrichment of textual data in the legal domain, e.g., on court decisions and criminal investigation documents.
The insurance business is rapidly migrating from paper-based processes through digitalization and automation to electronic processes. Digitalization and automation require standardization of information interchange if electronic business processes connect independent organizations. The standards developed by CEN/TC 445 will focus on the information interchange which connects insurance companies with their customers and their market partners, e.g. brokers, sales organizations, portals, service providers, and other insurers. This CEN/TC 445 newsletter intends to be a progress report and will inform on actual developments.
CPO ARENA Service Provider Synopsis (Real Sourcing Network)CPOARENA
Season 1, Episode 2 - Real Sourcing Network (Service Provider)
Coming before a global panel of 5 CPOs – Industry Experts all, service providers are invited to present their offering within the framework of the following format:
Show Format
1. In the first 5 minutes, the provider will tell the panel about their company, procurement technology, and their unique value proposition.
2. In the next 7 to 8 minutes, they will demonstrate a unique feature from their technology platform that they believe is a differentiator in the marketplace.
3. In the final 10 minutes, the panel will ask the provider questions based on what they have heard and seen.
4. In the closing 5 minutes, the panel will provide the provider with their preliminary feedback.
The document you are about to read is a synopsis of the panel’s observations regarding the provider’s solution offering from the standpoint of functionality and potential areas of both benefit and improvement.
Show Replay: https://youtu.be/AKPqCMsBEAw
vCon, an Open Standard for Conversation Data.pdfAlan Quayle
In December 2021 Thomas Howe, CTO of STROLID, introduced vCon.
We have iCal that enables anyone to store and exchange calendaring and scheduling information such as events, to-dos, journal entries, and free/busy information. And vCards so anyone can store and exchange electronic business cards, name and address information, phone numbers, e-mail addresses, URLs (Universal Resource Locator), logos, photographs, and audio clips.
Yet despite all the talk about conversations in the programmable communications industry, conversation data remains trapped in silos. vCon (virtual Conversation, like vCard), is a new open standard for sharing conversation data: transcript, video, audio, participants, metadata like timestamps and location, tamper protections, certifications, etc. vCon has the potential to create an ecosystem of innovators focused on creating new conversation intelligence tools, in addition to established platform providers.
This whitepaper provides an introduction to vCon, and an invitation to join the creation of an open standard whose aim is to greatly improve the programmable communications industry. Check out the vCon github repository and get involved, it’s open to everyone.
Sharing conversation data is a significant problem faced by programmable communications developers today. vCon makes working with conversations easier, hence the addressable market of developers expands ten thousand fold across web and enterprise developers. This happened one decade ago when telecoms became easy to use with simple web-centric APIs. This will happen again for conversations thanks to vCon.
An innovation ecosystem of specialists in conversation intelligence will flourish solving specific business pain points across operations, compliance, privacy, security, ethics, etc. Being able to access customer data often trapped within communication platforms. Think of vCon as ‘robot food’, enabling conversation data to be presented in a common format and more easily cleaned for training of machine learning. ASR and conversation AI solutions do not meet the needs of some businesses with respect to accuracy, vCon will help our industry close the gap with respect to the hype.
Through the creation of a common unit of exchange and openness, industries have been transformed. Global trade in the 50 years from 1967 to 2017 went from 22% to roughly 60% of global GDP. SMS in 5 years grew 3000%, and created multi-tens of billion dollar industries across CPaaS and A2P SMS. Open banking in the UK became ubiquitous in 4 years.
vCon will do the same. Conversations currently trapped in silos of communication platforms or simply stored in the company’s data lake will become common units, where innovators will create new value, solve business problems, and deliver valuable insights because their business lives/dies on delivering that. vCon is the next leap forward in the programmable communications industry.
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 AnalyticsWSO2
Today’s digital businesses are flooding with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. WSO2 Analytics enables businesses to do just that by providing real-time, interactive, predictive and batch analysis capabilities together.
In this hands on session we will
Plug in the WSO2 Analytics platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data and analyze, predict and communicate effectively
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Presentation by Miguel Alvarez-Rodriguez, DG DIGIT, European Commission, at seminar 2, held on 18 March 2021, which addresses digital government principles and building blocks. This 2nd event takes place in the framework of a series of three webinars organised by the SIGMA Programme, a joint initiative of the OECD and EU, principally financed by the EU, on the role of life events in end-to-end public service delivery.
ASSIGNMENT
DRIVE
SPRING 2017
PROGRAM
Master of Business Administration- MBA
SEMESTER
Semester 1
SUBJECT CODE & NAME
MBA105 - MANAGERIAL ECONOMICS
BK ID
B1625
CREDIT & MARKS
4 Credits, 30 marks
Note –The Assignment is divided into 2 sets. You have to answer all questions in both sets and submit as one document. Average of both assignments marks scored by you will be considered as your IA marks. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Assignment Set -1
Questions
Q.No
Questions
Marks
Total Marks
1
Explain the meaning and Features of demand forecasting?
Meaning Of demand Forecasting
3
Features of demand forecasting
7
10
2
Explain the cost output relationship and nature and behavior of cost curve in the short run with hypothetical cost schedule?
The cost output relationship and nature and behavior of cost curve in the short run with hypothetical cost schedule
10
10
3
Write short notes on:
a) Consumption Function
b) Investment Function
Define Consumption Function
5
Define Investment Function
5
10
Note – Answer all questions. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Note – Answer all questions. Kindly note that answers for 10 marks questions should be approximately of 400 words. Each question is followed by evaluation scheme.
Q.No
Assignment Set -2
Questions
Total Marks
1
What are the various role of fiscal policy in economic development?
10
Role of fiscal policy
10
2
Explain the law of variable proportions in detail with diagrammatic representation.
10
Law of variable proportions in detail with diagrammatic representation.
10
3
What are the various factors which bring changes in supply?
10
various factors which bring changes in supply
10
Vehicle Intercom System – Trend and Market AnalysisAryanRaj496746
Vehicle intercom system has been useful in many situation and thus they found their uses in many places. From military to commercial vehicles, we can see intercom system installed for secure and disturbance free communication. Many big companies are working to develop more effective communication system. In this report, you will find about different types of vehicle intercom uses and its market future.
Data enrichment is vital for leveraging heterogeneous data sources in various business analyses, AI applications, and data-driven services. Knowledge Graphs (KGs) support the enrichment of heterogeneous data sources by making entities first-class citizens: links to entities help interconnect heterogeneous data pieces or even ease access to external data sources to eventually augment the original data. Data annotation algorithms to find and link entities in reference KGs, as well as to identify out-of-KG entities have been proposed and applied to different types of data, such as tables, and texts. However, despite recent progress in annotation algorithms, the output of these algorithms does not always meet the quality requirements that make the enriched data valuable in downstream applications. As a result, semantic data enrichment remains an effort-consuming and error-prone task. In this seminar, we discuss the relationships between annotation algorithms, data enrichment, and KG construction, highlighting challenges and open problems. In addition, we advocate for a native human-in-the-loop perspective that enables users to control the outcome of the enrichment and, eventually, improve the quality of the enriched data. We focus in particular on the annotation and enrichment of tabular data and briefly discuss the application of a similar paradigm to the enrichment of textual data in the legal domain, e.g., on court decisions and criminal investigation documents.
The insurance business is rapidly migrating from paper-based processes through digitalization and automation to electronic processes. Digitalization and automation require standardization of information interchange if electronic business processes connect independent organizations. The standards developed by CEN/TC 445 will focus on the information interchange which connects insurance companies with their customers and their market partners, e.g. brokers, sales organizations, portals, service providers, and other insurers. This CEN/TC 445 newsletter intends to be a progress report and will inform on actual developments.
CPO ARENA Service Provider Synopsis (Real Sourcing Network)CPOARENA
Season 1, Episode 2 - Real Sourcing Network (Service Provider)
Coming before a global panel of 5 CPOs – Industry Experts all, service providers are invited to present their offering within the framework of the following format:
Show Format
1. In the first 5 minutes, the provider will tell the panel about their company, procurement technology, and their unique value proposition.
2. In the next 7 to 8 minutes, they will demonstrate a unique feature from their technology platform that they believe is a differentiator in the marketplace.
3. In the final 10 minutes, the panel will ask the provider questions based on what they have heard and seen.
4. In the closing 5 minutes, the panel will provide the provider with their preliminary feedback.
The document you are about to read is a synopsis of the panel’s observations regarding the provider’s solution offering from the standpoint of functionality and potential areas of both benefit and improvement.
Show Replay: https://youtu.be/AKPqCMsBEAw
vCon, an Open Standard for Conversation Data.pdfAlan Quayle
In December 2021 Thomas Howe, CTO of STROLID, introduced vCon.
We have iCal that enables anyone to store and exchange calendaring and scheduling information such as events, to-dos, journal entries, and free/busy information. And vCards so anyone can store and exchange electronic business cards, name and address information, phone numbers, e-mail addresses, URLs (Universal Resource Locator), logos, photographs, and audio clips.
Yet despite all the talk about conversations in the programmable communications industry, conversation data remains trapped in silos. vCon (virtual Conversation, like vCard), is a new open standard for sharing conversation data: transcript, video, audio, participants, metadata like timestamps and location, tamper protections, certifications, etc. vCon has the potential to create an ecosystem of innovators focused on creating new conversation intelligence tools, in addition to established platform providers.
This whitepaper provides an introduction to vCon, and an invitation to join the creation of an open standard whose aim is to greatly improve the programmable communications industry. Check out the vCon github repository and get involved, it’s open to everyone.
Sharing conversation data is a significant problem faced by programmable communications developers today. vCon makes working with conversations easier, hence the addressable market of developers expands ten thousand fold across web and enterprise developers. This happened one decade ago when telecoms became easy to use with simple web-centric APIs. This will happen again for conversations thanks to vCon.
An innovation ecosystem of specialists in conversation intelligence will flourish solving specific business pain points across operations, compliance, privacy, security, ethics, etc. Being able to access customer data often trapped within communication platforms. Think of vCon as ‘robot food’, enabling conversation data to be presented in a common format and more easily cleaned for training of machine learning. ASR and conversation AI solutions do not meet the needs of some businesses with respect to accuracy, vCon will help our industry close the gap with respect to the hype.
Through the creation of a common unit of exchange and openness, industries have been transformed. Global trade in the 50 years from 1967 to 2017 went from 22% to roughly 60% of global GDP. SMS in 5 years grew 3000%, and created multi-tens of billion dollar industries across CPaaS and A2P SMS. Open banking in the UK became ubiquitous in 4 years.
vCon will do the same. Conversations currently trapped in silos of communication platforms or simply stored in the company’s data lake will become common units, where innovators will create new value, solve business problems, and deliver valuable insights because their business lives/dies on delivering that. vCon is the next leap forward in the programmable communications industry.
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 AnalyticsWSO2
Today’s digital businesses are flooding with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. WSO2 Analytics enables businesses to do just that by providing real-time, interactive, predictive and batch analysis capabilities together.
In this hands on session we will
Plug in the WSO2 Analytics platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data and analyze, predict and communicate effectively
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
4. ANAC Dataset
- 4 Millions of public procurements extracted for the year 2017
- 14.000 Public Administrations (PA)
- 500.000 private companies
- We integrate data with other sources:
- Indice PA: contains detailed information about PA
- Open Consip: contains detailed information about 100.000 companies
5. ANAC Dataset
- cig: id of a public procurement (very noisy)
- Info about pa: fiscal code, name
- Info about the winner company and other
competitors: fiscal code, name (very noisy)
- Info about a public procurement: title, type,
cost/revenue in euro, starting date, ending date
6. What information can we extract?
Who are the top companies in terms of won public procurements and their
earnings? How their revenues depend on the public sector?

7. What information can we extract?
Given a Company, show detailed information about won public procurements,
types of Pa involved, region involved.
8. What information can we extract?
Given a Company, show detailed information about won public procurements,
types of Pa involved, region involved.
13. To hidden relationships
Revenues
Public
Administrations
Companies
Idea: Extracting hidden relationships among Public Administrations and
Private Companies using unstructured information as public procurement titles.
Goal:
- Identifying PA with same needs
(i.e. require similar services);
- Identifying indirect Competitors
among Companies;
- Help a PA to find companies
(e.g. find all companies that sell
wine)
14. How to represent PA and Companies?
Synthetic documents: for each PA and companies we concatenate their public
procurement titles obtaining a synthetic document
fornitura di carburante per automezzi comunali
fornitura di carburante per automezzi comunali
impegno di spesa per fornitura carburante per i mezzi in
dotazione alla polizia locale
fornitura carburante per mezzi in dotazione al gruppo comunale
volontari protezione civileaib mediante adesione alla
convenzione consip spa assunzione relativo impegno di spesa
per il periodo 0101 2017 al 31122017
fornitura carburante comune di testico determina n 692017
fornitura carburante comuni di stellanello determina n 69 del
06032017
acquisto carburante panda e moto
fornitura carburante per autotrazione mediante adesione alla
convenzione consip spa tramite buoni carburante per il periodo
010931122017
Synthetic doc
Multidimensional
representation
15. Analyze Unstructured Information
Each document is represented in a continuous vector space.
We consider three different representations:
- Vector Space Model
- Embedding Centroid
- Weighted Embedding
16. Vector Space Model
Each document is represented as a vector with the size of the vocabulary, where
each element gives the frequency of a word in the document or some weight
derived from the frequency (tf-idf).
Fornitura sale per disgelo stradale
Fornitura sale uso disgelo
d1
d2
cosine_sim(d1,d2) = 0.79
17. Vector Space Model
Each document is represented as a vector with the size of the vocabulary, where
each element gives the frequency of a word in the document or some weight
derived from the frequency.
Impegno di spesa per acquisto sale da disgelo per
il servizio di sgombero neve
Fornitura sale per disgelo stradale
VSM methods perform poorly
on short documents (e.g.
titles). This is simply because
word overlap is minimal even
for related documents.
cosine_sim(d1,d2) = 0.33
d1
d3
...
...
18. Embedding centroids
“You shall know a word by the company it keeps” (Firth 1957)
- Word embeddings are powerful representations and contain a great deal of
contextual information
- Intuition: the context represents the semantics (e.g. smart and intelligents
would have similar context)
A medical doctor is a person who uses medicine to treat illness and injuries
Some medical doctors only work in certain diseases or injuries
Medical doctors examine, diagnose and treat patients
Context is used to represent
doctor/doctors
20. Embedding centroids
Each document is represented as the centroid between all its word vectors.
Example:
d1: Impegno spesa per acquisto sale da disgelo per servizio di sgombero neve
d2: Fornitura sale per disgelo stradale
d3: Fornitura per ufficio
Impegno
spesa
acquisto
disgelo
neve
servizio
sale
fornitura
stradale
d1
d2
cancelleriacartoleria
ufficio d3
cosine_sim(d1,d2) = 0.80
cosine_sim(d1,d3) = 0.30
21. Embedding centroids
The intuition is that word embeddings will help in shorter technical text such as
titles or abstracts, where exact word overlap may not often be enough.
Using embeddings on long text adds some degree of noise and computing a
centroid may lead to some information loss.
Embedding for a boiler seller
(cf: 02254030204)
modificato
determinazione
sostituzione
valvola
caldaia
materna
comma manutenzione
palestra alloggio
idrauliche
centralina
scuola
comunale
via
contrarre
pavimento
presso
lettera
22. Weighted Embedding
We may compute a weighted centroid using the VSM vectors (i.e. TF-IDF) as
coefficients.
Embedding for a boiler seller
(cf: 02254030204)
modificato
determinazione
sostituzione
valvola
caldaia
materna
comma manutenzione
palestra alloggio
idrauliche
centralina
scuola
comunale
via
contrarre
pavimento
presso