Ferosa - Insights

•

0 likes•1,688 views

Amrith Krishna

Initial draft for FeRoSA

Engineering

FeRoSA
F a c e t e d R e c o m m e n d a t i o n
S y s t e m f o r S c i e n t i f i c
A r t i c l e s

Scientific ArticlesA C L A n t h o l o g y – A c o l l e c t i o n o f 2 0 , 0 0 0
a r t i c l e s i n c o m p u t a t i o n a l l i n g u i s t i c s

FacetedN o t j u s t r e c o m m e n d a t i o n s , b u t h o w t h e y
a r e r e l a t e d

www.ferosa.orgL i v e a n d r u n n i n g

•Edge labelling task
b
d
l
A b
A
d
l
• Set of Nodes
• Links between similar nodes
• Label the edges
• Analogy
• Nudge user – suggest why
one should buy the combo
offered in Flipkart
• Type of social ties in a
friendship network

CHALLENGES
Quality
Accessibility
Ranking
Scalable
Q
R
A
S
• High Specificity & Precision
• Outperforms current system for
Scientific Articles retrieval by high
margin
• Individual ranking per facet
• Most relevant entry comes first
• Aggregation of ranklists over Content
and Citation network info
• Categorized into 4 facets
• Easy to streamline as per need
and filter results
• Random Walks (with restarts)
• Independent of domain

InformationOverload
Even for Relatively closed community like ACL
IRTools
Rather than text based indexing
Varyingintentions
Streamlined results based on intention, entries
may appear, which otherwise may not appear in
flat recommendations

Dataset
ACL Anthology Collection
Statistics Full Filtered
Number of papers 21,212 9,843
Average number of references
(within ACL only)
5.23 6.21
Number of unique authors 17,551 7,892
Number of unique venues 451 280
• Computational Linguistics
• 1961 – 2013
• text data open to public

FormCitationNetwork
• Identify Citation Contexts and Section heading - parscit
• Section heading to Facet Mapping
• Refinement of facets from prior works
Number of citation contexts
extracted
61,051
Number of BG Edges 23,022
Number of AA Edges 10,797
Number of MD Edges 8,828
Number of CM Edges 18,404
AA – Alternative
Approaches
BG – Background
CM – Comparison
MD – Method

InducedSubgraphs
• Query Paper
• 2 hop citation in either direction
• Highly similar papers based on cosine similarity
Nodes
• Edges belonging to a particular facet
• 4 different subgraphs for each query paperEdges

RandomWalks
• Random walks with restarts
• The walker iteratively moves to its neighbourhood with a probability proportional to the
edge weights.
• Restart probability c = 0.4, to return to the starting node i.
• Teleportation with probability 0.3

RankAggregation
Aggregation of
ranked lists based on
Content
similarity
RWR
Values
R package
Optimization
problem
Spearman
footrule

EXPERIMENTAL RESULTS
• most cosine similar paper comes in 1 hop or 2 hop itself
• less edge density as citation increases (due to single edges or few edges)
• MD sub-graphs have nodes with high degree
• Average path length increases with citation count
• clustering coefficient correlates wit edge density
• 1-hop nodes contribute more in this measurement.

EVALUATION
FeRoSA
Google Scholar
Microsoft Academic
Search
LDA based system
(Liang et.al, 2011)

EVALUATION
• All systems perform better in >2 hop
• cosine similarity - FeRoSA works in all sections, while others works marginally better or equivalent to
ferosa only in high or mid
• Pr, - FeRoSA in all 3 buckets, others suffer in low citation buckets

Scalable solution
High specificity
Stratification
Flat recommendation
Multi-hop neighbors
Low citation buckets

What's hot

Scientific databases 2021 2022Amjad Iqbal Falak

BISG DOI OverviewCrossref

Locating Journal Articles on DatabasesNational College of Art & Design Library

Sources & citationsSearcher108

Phrase Based Indexingbalaabirami

Paper as a Research Objectalexander garcia

What are academic databases? (Business)exeterlibraryhelp

Folksonomies: a bottom-up social categorization systemdomenico79

2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge

ResearchingSean

UConn Waterbury - English1010 1011 University of Connecticut, Waterbury

CitophobiaCheckIt Out

Semantic Web and Linked Open DataUniversity of Wisconsin-Madison

The use of controlled and structured vocabularies in a digitally joined-up worldKepa J. Rodriguez

Introduction to Databases National College of Art & Design Library

Role of Text Mining in Search EngineJay R Modi

FedX - Optimization Techniques for Federated Query Processing on Linked Dataaschwarte

Search engine. ElasticsearchSelecto

SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem

Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge

What's hot (20)

Scientific databases 2021 2022

BISG DOI Overview

Locating Journal Articles on Databases

Sources & citations

Phrase Based Indexing

Paper as a Research Object

What are academic databases? (Business)

Folksonomies: a bottom-up social categorization system

2018 02 20_biological_databases_part1_v_upload

Researching

UConn Waterbury - English1010 1011

Citophobia

Semantic Web and Linked Open Data

The use of controlled and structured vocabularies in a digitally joined-up world

Introduction to Databases

Role of Text Mining in Search Engine

FedX - Optimization Techniques for Federated Query Processing on Linked Data

Search engine. Elasticsearch

SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes

Bio ontologies and semantic technologies[2]

Similar to Ferosa - Insights

DOIs for Book PublishersCarol Anne Meyer

DOI Overview for the Book Industry Standards Group BISG Identifiers CommitteeCarol Anne Meyer

CrossRef DOIs for BooksCrossref

"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...Stefan Adam

literature based discoveryalexander garcia

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.

RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu

Enhancing article visibility and impactSciELO - Scientific Electronic Library Online

Database retrieval system and related semantic web applicationShailendra Kumar

It19 20140721 linked data personal perspectiveJanifer Gatenby

Role of libraries in research and scholarly communicationNikesh Narayanan

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference InformationKai Schlegel

A Brief Overview of BIBFRAME, by Angela KroegerAngela Kroeger

Case study of Rujhaan.com (A social news app )Rahul Jain

A theory of Metadata enriching & filteringCuerpo Academico 'Estudios de la Información'

CIKM 2010 Demo - SEQUEL: query completion via pattern mining on multi-column ...Chuancong Gao

EDS for IFLACliveRWright

Semantic web 101: Benefits for geologistsdgarijo

Registering content to enable connections - Rachael LammeyCrossref

Crossref LIVE UK OnlineCrossref

Similar to Ferosa - Insights (20)

DOIs for Book Publishers

DOI Overview for the Book Industry Standards Group BISG Identifiers Committee

CrossRef DOIs for Books

"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...

literature based discovery

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...

RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...

Enhancing article visibility and impact

Database retrieval system and related semantic web application

It19 20140721 linked data personal perspective

Role of libraries in research and scholarly communication

balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information

A Brief Overview of BIBFRAME, by Angela Kroeger

Case study of Rujhaan.com (A social news app )

A theory of Metadata enriching & filtering

CIKM 2010 Demo - SEQUEL: query completion via pattern mining on multi-column ...

EDS for IFLA

Semantic web 101: Benefits for geologists

Registering content to enable connections - Rachael Lammey

Crossref LIVE UK Online

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

UNIT - IV - Air Compressors and its Performancesivaprakash250

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

result management system report for college projectTonystark477637

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Coefficient of Thermal Expansion and their Importance.pptx

UNIT - IV - Air Compressors and its Performance

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

result management system report for college project

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

KubeKraft presentation @CloudNativeHooghly

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

Water Industry Process Automation & Control Monthly - April 2024

Ferosa - Insights

1. FeRoSA F a c e t e d R e c o m m e n d a t i o n S y s t e m f o r S c i e n t i f i c A r t i c l e s

2. Recommendation Engine

3. Scientific ArticlesA C L A n t h o l o g y – A c o l l e c t i o n o f 2 0 , 0 0 0 a r t i c l e s i n c o m p u t a t i o n a l l i n g u i s t i c s

4. FacetedN o t j u s t r e c o m m e n d a t i o n s , b u t h o w t h e y a r e r e l a t e d

5. www.ferosa.orgL i v e a n d r u n n i n g

6. •Edge labelling task b d l A b A d l • Set of Nodes • Links between similar nodes • Label the edges • Analogy • Nudge user – suggest why one should buy the combo offered in Flipkart • Type of social ties in a friendship network

7. CHALLENGES Quality Accessibility Ranking Scalable Q R A S • High Specificity & Precision • Outperforms current system for Scientific Articles retrieval by high margin • Individual ranking per facet • Most relevant entry comes first • Aggregation of ranklists over Content and Citation network info • Categorized into 4 facets • Easy to streamline as per need and filter results • Random Walks (with restarts) • Independent of domain

8. InformationOverload Even for Relatively closed community like ACL IRTools Rather than text based indexing Varyingintentions Streamlined results based on intention, entries may appear, which otherwise may not appear in flat recommendations

10. Dataset ACL Anthology Collection Statistics Full Filtered Number of papers 21,212 9,843 Average number of references (within ACL only) 5.23 6.21 Number of unique authors 17,551 7,892 Number of unique venues 451 280 • Computational Linguistics • 1961 – 2013 • text data open to public

11. FormCitationNetwork • Identify Citation Contexts and Section heading - parscit • Section heading to Facet Mapping • Refinement of facets from prior works Number of citation contexts extracted 61,051 Number of BG Edges 23,022 Number of AA Edges 10,797 Number of MD Edges 8,828 Number of CM Edges 18,404 AA – Alternative Approaches BG – Background CM – Comparison MD – Method

12. InducedSubgraphs • Query Paper • 2 hop citation in either direction • Highly similar papers based on cosine similarity Nodes • Edges belonging to a particular facet • 4 different subgraphs for each query paperEdges

13. RandomWalks • Random walks with restarts • The walker iteratively moves to its neighbourhood with a probability proportional to the edge weights. • Restart probability c = 0.4, to return to the starting node i. • Teleportation with probability 0.3

14. RankAggregation Aggregation of ranked lists based on Content similarity RWR Values R package Optimization problem Spearman footrule

15. EXPERIMENTAL RESULTS • most cosine similar paper comes in 1 hop or 2 hop itself • less edge density as citation increases (due to single edges or few edges) • MD sub-graphs have nodes with high degree • Average path length increases with citation count • clustering coefficient correlates wit edge density • 1-hop nodes contribute more in this measurement.

16. EVALUATION FeRoSA Google Scholar Microsoft Academic Search LDA based system (Liang et.al, 2011)

17. EVALUATION

18. EVALUATION • All systems perform better in >2 hop • cosine similarity - FeRoSA works in all sections, while others works marginally better or equivalent to ferosa only in high or mid • Pr, - FeRoSA in all 3 buckets, others suffer in low citation buckets

19. Scalable solution High specificity Stratification Flat recommendation Multi-hop neighbors Low citation buckets

20. THANKS

Ferosa - Insights

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ferosa - Insights

Similar to Ferosa - Insights (20)

More from Amrith Krishna

More from Amrith Krishna (16)

Recently uploaded

Recently uploaded (20)

Ferosa - Insights