vectorSpaceModelPeterBurden.ppt

•Download as PPT, PDF•

0 likes•5 views

pepe3059

VSM

The Vector Space
Model
…and applications in
Information Retrieval

Part 1
Introduction to the Vector
Space Model

Overview
 The Vector Space Model (VSM) is a
way of representing documents through
the words that they contain
 It is a standard technique in Information
Retrieval
 The VSM allows decisions to be made
about which documents are similar to
each other and to keyword queries

How it works: Overview
 Each document is broken down into a
word frequency table
 The tables are called vectors and can
be stored as arrays
 A vocabulary is built from all the words
in all documents in the system
 Each document is represented as a
vector based against the vocabulary

Example
 Document A
– “A dog and a cat.”
 Document B
– “A frog.”
a dog and cat
2 1 1 1
a frog
1 1

Example, continued
 The vocabulary contains all words used
– a, dog, and, cat, frog
 The vocabulary needs to be sorted
– a, and, cat, dog, frog

Example, continued
 Document A: “A dog and a cat.”
– Vector: (2,1,1,1,0)
 Document B: “A frog.”
– Vector: (1,0,0,0,1)
a and cat dog frog
2 1 1 1 0
a and cat dog frog
1 0 0 0 1

Queries
 Queries can be represented as vectors
in the same way as documents:
– Dog = (0,0,0,1,0)
– Frog = ( )
– Dog and frog = ( )

Similarity measures
 There are many different ways to measure
how similar two documents are, or how
similar a document is to a query
 The cosine measure is a very common
similarity measure
 Using a similarity measure, a set of
documents can be compared to a query and
the most similar document returned

The cosine measure
 For two vectors d and d’ the cosine similarity
between d and d’ is given by:
 Here d X d’ is the vector product of d and d’,
calculated by multiplying corresponding
frequencies together
 The cosine measure calculates the angle
between the vectors in a high-dimensional
virtual space
'
'
d
d
d
d 

Example
 Let d = (2,1,1,1,0) and d’ = (0,0,0,1,0)
– dXd’ = 2X0 + 1X0 + 1X0 + 1X1 + 0X0=1
– |d| = (22+12+12+12+02) = 7=2.646
– |d’| = (02+02+02+12+02) = 1=1
– Similarity = 1/(1 X 2.646) = 0.378
 Let d = (1,0,0,0,1) and d’ = (0,0,0,1,0)
– Similarity =

Ranking documents
 A user enters a query
 The query is compared to all documents
using a similarity measure
 The user is shown the documents in
decreasing order of similarity to the
query term

Vocabulary
 Stopword lists
– Commonly occurring words are unlikely to
give useful information and may be
removed from the vocabulary to speed
processing
– Stopword lists contain frequent words to be
excluded
– Stopword lists need to be used carefully
• E.g. “to be or not to be”

Term weighting
 Not all words are equally useful
 A word is most likely to be highly
relevant to document A if it is:
– Infrequent in other documents
– Frequent in document A
 The cosine measure needs to be
modified to reflect this

Normalised term frequency (tf)
 A normalised measure of the importance of a
word to a document is its frequency, divided
by the maximum frequency of any term in the
document
 This is known as the tf factor.
 Document A: raw frequency vector:
(2,1,1,1,0), tf vector: ( )
 This stops large documents from scoring
higher

Inverse document frequency (idf)
 A calculation designed to make rare
words more important than common
words
 The idf of word i is given by
 Where N is the number of documents
and ni is the number that contain word i
i
i
n
N
idf log


tf-idf
 The tf-idf weighting scheme is to
multiply each word in each document by
its tf factor and idf factor
 Different schemes are usually used for
query vectors
 Different variants of tf-idf are also used

A Large number of digital text information is generated every day. Effectively searching, managing and exploring the text data has become a main task. In this paper, we first represent an introduction to text mining and a probabilistic topic model Latent Dirichlet allocation. Then two experiments are proposed - Wikipedia articles and users’ tweets topic modelling. The former one builds up a document topic model, aiming to a topic perspective solution on searching, exploring and recommending articles. The latter one sets up a user topic model, providing a full research and analysis over Twitter users’ interest. The experiment process including data collecting, data pre-processing and model training is fully documented and commented. Further more, the conclusion and application of this paper could be a useful computation tool for social and business research.

A Text Mining Research Based on LDA Topic Modelling

csandit

A Large number of digital text information is gener ated every day. Effectively searching, managing and exploring the text data has become a m ain task. In this paper, we first represent an introduction to text mining and a probabilistic topic model Latent Dirichlet allocation. Then two experiments are proposed - Wikipedia articles a nd users’ tweets topic modelling. The former one builds up a document topic model, aiming to a topic perspective solution on searching, exploring and recommending articles. The latter one sets up a user topic model, providing a full research and analysis over Twitter users’ interest. The experiment process including data collecting, data pre-processing and model training is fully documented and commented. Further more, the conclusion and applica tion of this paper could be a useful computation tool for social and business research.

Simple semantics in topic detection and trackingGeorge Ang

text

nyomans1

Mapping Subsets of Scholarly Information

Paul Houle

A Document Exploring System on LDA Topic Model for Wikipedia Articles

ijma

A Large number of digital text information is generated every day. Effectively searching, managing and exploring the text data has become a main task. In this paper, we first present an introduction to text mining and LDA topic model. Then we deeply explained how to apply LDA topic model to text corpus by doing experiments on Simple Wikipedia documents. The experiments include all necessary steps of data retrieving, pre-processing, fitting the model and an application of document exploring system. The result of the experiments shows LDA topic model working effectively on documents clustering and finding the similar documents. Furthermore, the document exploring system could be a useful research tool for students and researchers.

Farthest Neighbor Approach for Finding Initial Centroids in K- Means

Waqas Tariq

Text document clustering is gaining popularity in the knowledge discovery field for effectively navigating, browsing and organizing large amounts of textual information into a small number of meaningful clusters. Text mining is a semi-automated process of extracting knowledge from voluminous unstructured data. A widely studied data mining problem in the text domain is clustering. Clustering is an unsupervised learning method that aims to find groups of similar objects in the data with respect to some predefined criterion. In this work we propose a variant method for finding initial centroids. The initial centroids are chosen by using farthest neighbors. For the partitioning based clustering algorithms traditionally the initial centroids are chosen randomly but in the proposed method the initial centroids are chosen by using farthest neighbors. The accuracy of the clusters and efficiency of the partition based clustering algorithms depend on the initial centroids chosen. In the experiment, kmeans algorithm is applied and the initial centroids for kmeans are chosen by using farthest neighbors. Our experimental results shows the accuracy of the clusters and efficiency of the kmeans algorithm is improved compared to the traditional way of choosing initial centroids.

Web and text

Institute of Technology Telkom

Chapter 10 Data Mining Techniques

Houw Liong The

Copy of 10text (2)Uma Se

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks

Leonardo Di Donato

Experimental work done regarding the use of Topic Modeling for the implementation and the improvement of some common tasks of Information Retrieval and Word Sense Disambiguation. First of all it describes the scenario, the pre-processing pipeline realized and the framework used. After we we face a discussion related to the investigation of some different hyperparameters configurations for the LDA algorithm. This work continues dealing with the retrieval of relevant documents mainly through two different approaches: inferring the topics distribution of the held out document (or query) and comparing it to retrieve similar collection’s documents or through an approach driven by probabilistic querying. The last part of this work is devoted to the investigation of the word sense disambiguation task.

International Journal of Engineering and Science Invention (IJESI)

inventionjournals

International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation

Boston Institute of Analytics

Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单

yhkoc

CU毕业证【微信95270640】（卡尔顿大学毕业证成绩单本科学历）Q微信95270640(补办CU学位文凭证书)卡尔顿大学留信网学历认证怎么办理卡尔顿大学毕业证成绩单精仿本科学位证书硕士文凭证书认证Seneca College diplomaoffer,Transcript办理硕士学位证书造假卡尔顿大学假文凭学位证书制作CU本科毕业证书硕士学位证书精仿卡尔顿大学学历认证成绩单修改制作，办理真实认证、留信认证、使馆公证、购买成绩单，购买假文凭，购买假学位证，制造假国外大学文凭、毕业公证、毕业证明书、录取通知书、Offer、在读证明、雅思托福成绩单、假文凭、假毕业证、请假条、国际驾照、网上存档可查！如果您是以下情况，我们都能竭诚为您解决实际问题：【公司采用定金+余款的付款流程，以最大化保障您的利益，让您放心无忧】 1、在校期间，因各种原因未能顺利毕业，拿不到官方毕业证+微信95270640 2、面对父母的压力，希望尽快拿到卡尔顿大学卡尔顿大学毕业证文凭证书； 3、不清楚流程以及材料该如何准备卡尔顿大学卡尔顿大学毕业证文凭证书； 4、回国时间很长，忘记办理； 5、回国马上就要找工作，办给用人单位看； 6、企事业单位必须要求办理的；面向美国乔治城大学毕业留学生提供以下服务: 【★卡尔顿大学卡尔顿大学毕业证文凭证书毕业证、成绩单等全套材料，从防伪到印刷，从水印到钢印烫金，与学校100%相同】【★真实使馆认证（留学人员回国证明），使馆存档可通过大使馆查询确认】【★真实教育部认证，教育部存档，教育部留服网站可查】【★真实留信认证，留信网入库存档，可查卡尔顿大学卡尔顿大学毕业证文凭证书】我们从事工作十余年的有着丰富经验的业务顾问，熟悉海外各国大学的学制及教育体系，并且以挂科生解决毕业材料不全问题为基础，为客户量身定制1对1方案，未能毕业的回国留学生成功搭建回国顺利发展所需的桥梁。我们一直努力以高品质的教育为起点，以诚信、专业、高效、创新作为一切的行动宗旨，始终把“诚信为主、质量为本、客户第一”作为我们全部工作的出发点和归宿点。同时为海内外留学生提供大学毕业证购买、补办成绩单及各类分数修改等服务；归国认证方面，提供《留信网入库》申请、《国外学历学位认证》申请以及真实学籍办理等服务，帮助众多莘莘学子实现了一个又一个梦想。专业服务，请勿犹豫联系我如果您真实毕业回国，对于学历认证无从下手，请联系我，我们免费帮您递交诚招代理：本公司诚聘当地代理人员，如果你有业余时间，或者你有同学朋友需要，有兴趣就请联系我你赢我赢，共创双赢你做代理，可以帮助卡尔顿大学同学朋友你做代理，可以拯救卡尔顿大学失足青年你做代理，可以挽救卡尔顿大学一个个人才你做代理，你将是别人人生卡尔顿大学的转折点你做代理，可以改变自己，改变他人，给他人和自己一个机会他交友与城里人交友但他俩就好像是两个世界里的人根本拢不到一块儿不知不觉山娃倒跟周围出租屋里的几个小伙伴成了好朋友因为他们也是从乡下进城过暑假的小学生快乐的日子总是过得飞快山娃尚未完全认清那几位小朋友时他们却一个接一个地回家了山娃这时才恍然发现二个月的暑假已转到了尽头他的城市生活也将划上一个不很圆满的句号了值得庆幸的是山娃早记下了他们的学校和联系方式说也奇怪在山娃离城的头一天父亲居然请假陪山娃耍了活

Similar to vectorSpaceModelPeterBurden.ppt

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494

Sean Golliher

Lec 4,5alaa223

IRT Unit_ 2.pptx

thenmozhip8

Document similarity

Hemant Hatankar

Chapter 4 IR Models.pdf

Habtamu100

A-Study_TopicModelingSardhendu Mishra

International Journal of Engineering Research and Development (IJERD)

IJERD Editor

Artificial Intelligencevini89

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING

cscpconf

A Text Mining Research Based on LDA Topic Modelling

csandit

A Large number of digital text information is gener ated every day. Effectively searching, managing and exploring the text data has become a m ain task. In this paper, we first represent an introduction to text mining and a probabilistic topic model Latent Dirichlet allocation. Then two experiments are proposed - Wikipedia articles a nd users’ tweets topic modelling. The former one builds up a document topic model, aiming to a topic perspective solution on searching, exploring and recommending articles. The latter one sets up a user topic model, providing a full research and analysis over Twitter users’ interest. The experiment process including data collecting, data pre-processing and model training is fully documented and commented. Further more, the conclusion and applica tion of this paper could be a useful computation tool for social and business research.

Simple semantics in topic detection and trackingGeorge Ang

text

nyomans1

Mapping Subsets of Scholarly Information

Paul Houle

A Document Exploring System on LDA Topic Model for Wikipedia Articles

ijma

Farthest Neighbor Approach for Finding Initial Centroids in K- Means

Waqas Tariq

Web and text

Institute of Technology Telkom

Chapter 10 Data Mining Techniques

Houw Liong The

Copy of 10text (2)Uma Se

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks

Leonardo Di Donato

International Journal of Engineering and Science Invention (IJESI)

inventionjournals

Similar to vectorSpaceModelPeterBurden.ppt (20)

Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494

Lec 4,5

IRT Unit_ 2.pptx

Document similarity

Chapter 4 IR Models.pdf

A-Study_TopicModeling

International Journal of Engineering Research and Development (IJERD)

Artificial Intelligence

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING

A Text Mining Research Based on LDA Topic Modelling

Simple semantics in topic detection and tracking

text

Mapping Subsets of Scholarly Information

A Document Exploring System on LDA Topic Model for Wikipedia Articles

Farthest Neighbor Approach for Finding Initial Centroids in K- Means

Web and text

Chapter 10 Data Mining Techniques

Copy of 10text (2)

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks

International Journal of Engineering and Science Invention (IJESI)

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation

Boston Institute of Analytics

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单

yhkoc

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理

oz8q3jxlp

原版定制【微信:41543339】【(Deakin毕业证书)迪肯大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单

ewymefz

UofM毕业证【微信95270640】办文凭{明尼苏达大学毕业证}Q微Q微信95270640UofM毕业证书成绩单/学历认证UofM Diploma未毕业、挂科怎么办？+QQ微信：Q微信95270640-大学Offer（申请大学）、成绩单（申请考研）、语言证书、在读证明、使馆公证、办真实留信网认证、真实大使馆认证、学历认证办国外明尼苏达大学明尼苏达大学毕业证假文凭教育部学历学位认证留信认证大使馆认证留学回国人员证明修改成绩单信封申请学校offer录取通知书在读证明offer letter。快速办理高仿国外毕业证成绩单： 1明尼苏达大学毕业证+成绩单+留学回国人员证明+教育部学历认证（全套留学回国必备证明材料给父母及亲朋好友一份完美交代）; 2雅思成绩单托福成绩单OFFER在读证明等留学相关材料（申请学校转学甚至是申请工签都可以用到）。 3.毕业证 #成绩单等全套材料从防伪到印刷从水印到钢印烫金高精仿度跟学校原版100%相同。专业服务请勿犹豫联系我！联系人微信号：95270640诚招代理：本公司诚聘当地代理人员如果你有业余时间有兴趣就请联系我们。国外明尼苏达大学明尼苏达大学毕业证假文凭办理过程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。有一次山娃坐在门口写作业写着写着竟伏在桌上睡着了迷迷糊糊中山娃似乎听到了父亲的脚步声当他晃晃悠悠站起来时才诧然发现一位衣衫破旧的妇女挎着一只硕大的蛇皮袋手里拎着长铁钩正站在门口朝黑色的屋内张望不好坏人小偷山娃一怔却也灵机一动立马仰起头双手拢在嘴边朝楼上大喊：“爸爸爸——有人找——那人一听朝山娃尴尬地笑笑悻悻地走了山娃立马“嘭的一声将铁门锁死心却咚咚地乱跳当山娃跟父亲说起这事时父亲很吃惊抚摸着山娃母

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...

pchutichetpong

M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years. Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success. MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies. According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”

一比一原版(YU毕业证)约克大学毕业证成绩单

enxupq

YU毕业证【微信95270640】（约克大学毕业证高仿学位证书((+《Q微信95270640》))）购买YU毕业证修改YU成绩单购买约克大学毕业证办YU文凭办高仿毕业证约克大学毕业证购买修改成绩单挂科退学如何进行学历认证留学退学办毕业证书/ 出国留学无法毕业买毕业证留学被劝退买毕业证（非正常毕业教育部认证咨询） York University 办理国外约克大学毕业证书 #成绩单改成绩 #教育部学历学位认证 #毕业证认证 #留服认证 #使馆认证（留学回国人员证明） #（证）等真实教育部认证教育部存档中国教育部留学服务中心认证（即教育部留服认证）网站100%可查. 真实使馆认证（即留学人员回国证明）使馆存档可通过大使馆查询确认. 留信网认证国家专业人才认证中心颁发入库证书留信网永久存档可查. 约克大学约克大学毕业证学历书毕业证 #成绩单等全套材料从防伪到印刷从水印到钢印烫金跟学校原版100%相同. 国际留学归国服务中心：实体公司注册经营行业标杆精益求精！国外毕业证学位证成绩单办理流程： 1客户提供办理约克大学约克大学毕业证学历书信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作约克大学毕业证成绩单电子图； 3约克大学毕业证成绩单电子版做好以后发送给您确认； 4约克大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5约克大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快递邮寄约克大学约克大学毕业证学历书）。心温柔地对待我将渐渐老去的父母体谅他们以反哺之心奉敬父母以感恩之心孝顺父母哪怕只为父母换洗衣服为父母喂饭送汤按摩酸痛的腰背握着父母的手扶着他们一步一步地慢慢散步.让我们的父母幸福快乐地度过一生挽着清风芒耀似金的骄阳如将之绽放的花蕾一般静静的从远方的山峦间缓缓升起这一片寂静的城市默默的等待着它的第一缕光芒将之唤醒那飘散在它前方的几层薄云像是新娘的婚纱一般为它的光芒添上了几分淡淡的浮晕在悄无声息间这怕

Q1’2024 Update: MYCI’s Leap Year Rebound

Oppotus

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单

ewymefz

IIT毕业证【微信95270640】购买（伊利诺伊理工大学毕业证成绩单硕士学历）Q微信95270640代办IIT学历认证留信网伪造伊利诺伊理工大学学位证书精仿伊利诺伊理工大学本科/硕士文凭证书补办伊利诺伊理工大学 diplomaoffer,Transcript购买伊利诺伊理工大学毕业证成绩单购买IIT假毕业证学位证书购买伪造伊利诺伊理工大学文凭证书学位证书,专业办理雅思、托福成绩单，学生ID卡，在读证明，海外各大学offer录取通知书，毕业证书，成绩单，文凭等材料:1:1完美还原毕业证、offer录取通知书、学生卡等各种在读或毕业材料的防伪工艺（包括烫金、烫银、钢印、底纹、凹凸版、水印、防伪光标、热敏防伪、文字图案浮雕，激光镭射，紫外荧光，温感光标）学校原版上有的工艺我们一样不会少，不论是老版本还是最新版本，都能保证最高程度还原，力争完美以求让所有同学都能享受到完美的品质服务。 #一整套伊利诺伊理工大学文凭证件办理#—包含伊利诺伊理工大学伊利诺伊理工大学毕业证假文凭学历认证|使馆认证|归国人员证明|教育部认证|留信网认证永远存档教育部学历学位认证查询办理国外文凭国外学历学位认证#我们提供全套办理服务。一整套留学文凭证件服务：一：伊利诺伊理工大学伊利诺伊理工大学毕业证假文凭毕业证 #成绩单等全套材料从防伪到印刷水印底纹到钢印烫金二：真实使馆认证（留学人员回国证明）使馆存档三：真实教育部认证教育部存档教育部留服网站永久可查四：留信认证留学生信息网站永久可查国外毕业证学位证成绩单办理方法： 1客户提供办理伊利诺伊理工大学伊利诺伊理工大学毕业证假文凭信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。教育部文凭学历认证认证的用途：如果您计划在国内发展那么办理国内教育部认证是必不可少的。事业性用人单位如银行国企公务员在您应聘时都会需要您提供这个认证。其他私营 #外企企业无需提供！办理教育部认证所需资料众多且烦琐所有材料您都必须提供原件我们凭借丰富的经验帮您快速整合材料让您少走弯路。实体公司专业为您服务如有需要请联系我: 微信95270640声和哐咣的关门声待山娃醒来时父亲早已上班去了床头总搁着山娃最爱吃的馒头和肉包还有白花花的豆浆父亲中午留在工地吃饭和午休山娃的中饭是对面快餐店送来的不用山娃付钱父亲早跟老板谈妥了钱到时一起结父亲给山娃配了台手机二手货诺基亚的父亲说有什么事只管给他挂电话能拥有自己的手机山娃很高兴除了玩游戏发短信除了挂电话给爷爷奶奶和母亲山娃还给班主任邱老师连挂了二个电话并给同学阿强和阿昌家挂山娃兴奋地向他们诉说城市一

Empowering Data Analytics Ecosystem.pptx

benishzehra469

Show drafts volume_up Empowering the Data Analytics Ecosystem: A Laser Focus on Value The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem: 1. Democratize Access, Not Data: Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse. Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources. 2. Foster Collaboration with Clear Roles: Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities. Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together. 3. Leverage Advanced Analytics Strategically: AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis. Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems. 4. Prioritize Data Quality with Automation: Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues. Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors. 5. Cultivate a Data-Driven Mindset: Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making. Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action. Benefits of a Precise Ecosystem: Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency. Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights. Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement. Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation. By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

ukgaet

UVic毕业证【微信95270640】（维多利亚大学毕业证成绩单本科学历）Q微信95270640(补办UVic学位文凭证书)维多利亚大学留信网学历认证怎么办理维多利亚大学毕业证成绩单精仿本科学位证书硕士文凭证书认证Seneca College diplomaoffer,Transcript办理硕士学位证书造假维多利亚大学假文凭学位证书制作UVic本科毕业证书硕士学位证书精仿维多利亚大学学历认证成绩单修改制作，办理真实认证、留信认证、使馆公证、购买成绩单，购买假文凭，购买假学位证，制造假国外大学文凭、毕业公证、毕业证明书、录取通知书、Offer、在读证明、雅思托福成绩单、假文凭、假毕业证、请假条、国际驾照、网上存档可查！【实体公司】办维多利亚大学维多利亚大学毕业证文凭证书学历认证学位证文凭认证办留信网认证办留服认证办教育部认证（网上可查实体公司专业可靠） — — — 留学归国服务中心 — — - 【主营项目】一.维多利亚大学毕业证成绩单使馆认证教育部认证成绩单等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 国外毕业证学位证成绩单办理流程： 1客户提供维多利亚大学维多利亚大学毕业证文凭证书办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。专业服务请勿犹豫联系我！本公司是留学创业和海归创业者们的桥梁。一次办理终生受用一步到位高效服务。详情请在线咨询办理,欢迎有诚意办理的客户咨询!洽谈。招聘代理：本公司诚聘英国加拿大澳洲新西兰美国法国德国新加坡各地代理人员如果你有业余时间有兴趣就请联系我们咨询顾问：+微信:95270640刀劈开抑或用拳头砸开每人抱起一大块就啃啃得满嘴满脸猴屁股般的红艳大家一个劲地指着对方吃吃地笑瓜裂得古怪奇形怪状却丝毫不影响瓜味甜丝丝的满嘴生津遍地都是瓜横七竖八的活像掷满了一地的大石块摘走二三只爷爷是断然发现不了的即便发现爷爷也不恼反而教山娃辨认孰熟孰嫩孰甜孰淡名义上是护瓜往往在瓜棚里坐上一刻饱吃一顿后山娃就领着阿黑漫山遍野地跑阿黑是一条黑色的大猎狗挺机灵的是山娃多年的忠实伙伴平时山娃上学阿黑也静

Adjusting primitives for graph : SHORT REPORT / NOTES

Subhajit Sahu

Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is Multiply with different modes (map) 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Comparing various launch configs for CUDA based vector multiply. Sum with different storage types (reduce) 1. Performance of vector element sum using float vs bfloat16 as the storage type. Sum with different modes (reduce) 1. Performance of sequential execution based vs OpenMP based vector element sum. 2. Performance of memcpy vs in-place based CUDA based vector element sum. 3. Comparing various launch configs for CUDA based vector element sum (memcpy). 4. Comparing various launch configs for CUDA based vector element sum (in-place). Sum with in-place strategies of CUDA mode (reduce) 1. Comparing various launch configs for CUDA based vector element sum (in-place).

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单

vcaxypu

ArtEZ毕业证【微信95270640】☀《ArtEZ艺术学院毕业证购买》Q微信95270640《ArtEZ毕业证模板办理》文凭、本科、硕士、研究生学历都可以做,《文凭ArtEZ毕业证书原版制作ArtEZ成绩单》《仿制ArtEZ毕业证成绩单ArtEZ艺术学院学位证书pdf电子图》毕业证 [留学文凭学历认证(留信认证使馆认证)ArtEZ艺术学院毕业证成绩单毕业证证书大学Offer请假条成绩单语言证书国际回国人员证明高仿教育部认证申请学校等一切高仿或者真实可查认证服务。多年留学服务公司,拥有海外样板无数能完美1:1还原海外各国大学degreeDiplomaTranscripts等毕业材料。海外大学毕业材料都有哪些工艺呢？工艺难度主要由：烫金.钢印.底纹.水印.防伪光标.热敏防伪等等组成。而且我们每天都在更新海外文凭的样板以求所有同学都能享受到完美的品质服务。国外毕业证学位证成绩单办理方法： 1客户提供办理ArtEZ艺术学院ArtEZ艺术学院毕业证假文凭信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） — — — — 我们是挂科和未毕业同学们的福音我们是实体公司精益求精的工艺！ — — — - 一真实留信认证的作用(私企外企荣誉的见证): 1：该专业认证可证明留学生真实留学身份同时对留学生所学专业等级给予评定。 2：国家专业人才认证中心颁发入库证书这个入网证书并且可以归档到地方。 3：凡是获得留信网入网的信息将会逐步更新到个人身份内将在公安部网内查询个人身份证信息后同步读取人才网入库信息。 4：个人职称评审加20分个人信誉贷款加10分。 5：在国家人才网主办的全国网络招聘大会中纳入资料供国家500强等高端企业选择人才。却怎么也笑不出来山娃很迷惑父亲的家除了一扇小铁门连窗户也没有墓穴一般阴森森有些骇人父亲的城也便成了山娃的城父亲的家也便成了山娃的家父亲让山娃呆在屋里做作业看电视最多只能在门口透透气不能跟陌生人搭腔更不能乱跑一怕迷路二怕拐子拐人山娃很惊惧去年村里的田鸡就因为跟父亲进城一不小心被人拐跑了至今不见踪影害得田鸡娘天天哭得死去活来疯了一般那情那景无不令人摧肝裂肺山娃很听话天天呆在小屋里除了看书写作业就是睡带

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

nscud

CBU毕业证【微信95270640】《如何办理不列颠海角大学毕业证认证》【办证Q微信95270640】《不列颠海角大学文凭毕业证制作》《CBU学历学位证书哪里买》办理不列颠海角大学学位证书扫描件、办理不列颠海角大学雅思证书！国际留学归国服务中心《如何办不列颠海角大学毕业证认证》《CBU学位证书扫描件哪里买》实体公司，注册经营，行业标杆，精益求精！ 1:1完美还原海外各大学毕业材料上的工艺：水印阴影底纹钢印LOGO烫金烫银LOGO烫金烫银复合重叠。文字图案浮雕激光镭射紫外荧光温感复印防伪。可办理以下真实不列颠海角大学存档留学生信息存档认证： 1不列颠海角大学真实留信网认证（网上可查永久存档无风险百分百成功入库）； 2真实教育部认证（留服）等一切高仿或者真实可查认证服务（暂时不可办理）； 3购买英美真实学籍（不用正常就读直接出学历）； 4英美一年硕士保毕业证项目（保录取学校挂名不用正常就读保毕业）留学本科/硕士毕业证书成绩单制作流程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询不列颠海角大学不列颠海角大学本科学位证成绩单）； 2开始安排制作不列颠海角大学毕业证成绩单电子图； 3不列颠海角大学毕业证成绩单电子版做好以后发送给您确认； 4不列颠海角大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5不列颠海角大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） — — — — — — — — — — — 《文凭顾问Q/微：95270640》这么大这么美的地方赚大钱高楼大厦鳞次栉比大街小巷人潮涌动山娃一路张望一路惊叹他发现城里的桥居然层层叠叠扭来扭去桥下没水却有着水一般的车水马龙山娃惊诧于城里的公交车那么大那么美不用买票乖乖地掷下二枚硬币空调享受还能坐着看电视呢屡经辗转山娃终于跟着父亲到家了山娃没想到父亲城里的家会如此寒碜更没料到父亲的城里竟有如此简陋的鬼地方父亲的家在高楼最底屋最下面很矮很黑是很不显眼的地下室父亲的家安在别人脚底下孰

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样

axoqas

原版定制【Q微信:741003700】《(usq毕业证书)南昆士兰大学毕业证研究生文凭证书》【Q微信:741003700】成绩单、雅思、外壳、留信学历认证永久存档查询，采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【Q微信741003700】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信741003700】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Discussion on Vector Databases, Unstructured Data and AI https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样

axoqas

原版定制【Q微信:741003700】《(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书》【Q微信:741003700】成绩单、雅思、外壳、留信学历认证永久存档查询，采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【Q微信741003700】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信741003700】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理

slg6lamcq

原版定制【微信:41543339】【(UniSA毕业证书)南澳大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Influence of Marketing Strategy and Market Competition on Business Plan

jerlynmaetalle

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx

AnirbanRoy608946

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...

一比一原版(YU毕业证)约克大学毕业证成绩单

Q1’2024 Update: MYCI’s Leap Year Rebound

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单

Empowering Data Analytics Ecosystem.pptx

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

Adjusting primitives for graph : SHORT REPORT / NOTES

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理

Influence of Marketing Strategy and Market Competition on Business Plan

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx

vectorSpaceModelPeterBurden.ppt

1. The Vector Space Model …and applications in Information Retrieval

2. Part 1 Introduction to the Vector Space Model

3. Overview  The Vector Space Model (VSM) is a way of representing documents through the words that they contain  It is a standard technique in Information Retrieval  The VSM allows decisions to be made about which documents are similar to each other and to keyword queries

4. How it works: Overview  Each document is broken down into a word frequency table  The tables are called vectors and can be stored as arrays  A vocabulary is built from all the words in all documents in the system  Each document is represented as a vector based against the vocabulary

5. Example  Document A – “A dog and a cat.”  Document B – “A frog.” a dog and cat 2 1 1 1 a frog 1 1

6. Example, continued  The vocabulary contains all words used – a, dog, and, cat, frog  The vocabulary needs to be sorted – a, and, cat, dog, frog

7. Example, continued  Document A: “A dog and a cat.” – Vector: (2,1,1,1,0)  Document B: “A frog.” – Vector: (1,0,0,0,1) a and cat dog frog 2 1 1 1 0 a and cat dog frog 1 0 0 0 1

8. Queries  Queries can be represented as vectors in the same way as documents: – Dog = (0,0,0,1,0) – Frog = ( ) – Dog and frog = ( )

9. Similarity measures  There are many different ways to measure how similar two documents are, or how similar a document is to a query  The cosine measure is a very common similarity measure  Using a similarity measure, a set of documents can be compared to a query and the most similar document returned

10. The cosine measure  For two vectors d and d’ the cosine similarity between d and d’ is given by:  Here d X d’ is the vector product of d and d’, calculated by multiplying corresponding frequencies together  The cosine measure calculates the angle between the vectors in a high-dimensional virtual space ' ' d d d d 

11. Example  Let d = (2,1,1,1,0) and d’ = (0,0,0,1,0) – dXd’ = 2X0 + 1X0 + 1X0 + 1X1 + 0X0=1 – |d| = (22+12+12+12+02) = 7=2.646 – |d’| = (02+02+02+12+02) = 1=1 – Similarity = 1/(1 X 2.646) = 0.378  Let d = (1,0,0,0,1) and d’ = (0,0,0,1,0) – Similarity =

12. Ranking documents  A user enters a query  The query is compared to all documents using a similarity measure  The user is shown the documents in decreasing order of similarity to the query term

13. VSM variations

14. Vocabulary  Stopword lists – Commonly occurring words are unlikely to give useful information and may be removed from the vocabulary to speed processing – Stopword lists contain frequent words to be excluded – Stopword lists need to be used carefully • E.g. “to be or not to be”

15. Term weighting  Not all words are equally useful  A word is most likely to be highly relevant to document A if it is: – Infrequent in other documents – Frequent in document A  The cosine measure needs to be modified to reflect this

16. Normalised term frequency (tf)  A normalised measure of the importance of a word to a document is its frequency, divided by the maximum frequency of any term in the document  This is known as the tf factor.  Document A: raw frequency vector: (2,1,1,1,0), tf vector: ( )  This stops large documents from scoring higher

17. Inverse document frequency (idf)  A calculation designed to make rare words more important than common words  The idf of word i is given by  Where N is the number of documents and ni is the number that contain word i i i n N idf log 

18. tf-idf  The tf-idf weighting scheme is to multiply each word in each document by its tf factor and idf factor  Different schemes are usually used for query vectors  Different variants of tf-idf are also used

vectorSpaceModelPeterBurden.ppt

Recommended

Recommended

More Related Content

Similar to vectorSpaceModelPeterBurden.ppt

Similar to vectorSpaceModelPeterBurden.ppt (20)

Recently uploaded

Recently uploaded (20)

vectorSpaceModelPeterBurden.ppt