SlideShare a Scribd company logo
1 of 4
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
ALGORITHMSEER: A SYSTEM FOR EXTRACTING AND SEARCHING
FOR ALGORITHMS IN SCHOLARLY BIG DATA
Abstract
Algorithms are usually published in scholarly articles, especially in the
computational sciences and related disciplines. The ability to automatically find
and extract these algorithms in this increasingly vast collection of scholarly
digital documents would enable algorithm indexing, searching, discovery, and
analysis. Recently, AlgorithmSeer, a search engine for algorithms, has been
investigated as part of CiteSeerX with the intent of providing a large algorithm
database. Currently, over 200,000 algorithms have been extracted from over 2
million scholarly documents. This paper proposes a novel set of scalable
techniques used by AlgorithmSeer to identify and extract algorithm
representations in a heterogeneous pool of scholarly documents. Specifically,
hybrid machine learning approaches are proposed to discover algorithm
representations. Then, techniques to extract textual metadata for each
algorithm are discussed. Finally, a demonstration version of AlgorithmSeer that
is built on Solr/Lucene open source indexing and search system is presented.
Existing System
Standard algorithms1 are usually collected and cataloged manually in
algorithm textbooks , encyclopedias (especially the ones available online such
as Wikipedia2), and websites that provide references for computer
programmers (e.g. Rosettacode.org3). We parsed Wikipedia algorithm pages
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
published in 2010, and found that roughly 1,765 standard algorithms were
cataloged. The National Instituteof Standards and Technology (NIST)4 also has
a dictionary of over 289 standard algorithms. While most standard algorithms
are already cataloged and made searchable, especially those in online catalogs,
newly published algorithms only appear in new articles. The explosion of newly
developed algorithms in scientific and technical documents makes it infeasible
to manually catalog these newly developed algorithms. Manually searching for
these newly published algorithms is a nontrivial task. Researchers and others
who aim to discover efficient and innovative algorithms would have to actively
search and monitor relevant new publications in their fields of study to keep
abreast of latest algorithmic developments. The problem is worse for
algorithm searchers.
Proposed System
We proposea set of hybrid techniques based on ensemble machine learning to
discover PCs and APs in scholarly documents. We find that the textual
metadata that can be directly extracted from an algorithm representation
(especially PCs), is inadequate for effective retrieval, and propose to use the
synopsis generation method, which automatically generates a comprehensive
description for a document element, and the tag-based automatic metadata
annotation technique, which automatically textually enriches metadata
records, to extract more meaningful textual information for PCs.
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
Conclusion
The prototype of AlgorithmSeer, a search system for algorithms in large scale
scholarly documents, along with providing an illustration of the actual demo
system. Specifically, we proposed a set of scalable machine learning based
methods to detect algorithms in scholarly documents, we discussed using the
synopsis generation and document annotation methods to extract textual
metadata for pseudo-codes, and finally we explained how algorithms are
indexed and made searchable.
REFERENCES
[1] Approximating a data stream for querying and estimation: Algorithms and
performance evaluation. ICDE ’02, pages 567–576, 2002.
[2] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and
inference for topic models. In Proceedings of the Twenty-Fifth Conference on
Uncertainty in Artificial Intelligence, UAI ’09, pages 27–34, Arlington, Virginia,
United States, 2009. AUAI Press.
[3] J. B. Baker, A. P. Sexton, V. Sorge, and M. Suzuki. Comparing approaches to
mathematical document analysis from pdf. ICDAR ’11, pages 463–467, 2011.
[4] S. Bhatia and P. Mitra. Summarizing figures, tables, and algorithms in
scientific publications to augment search results. ACM Trans. Inf. Syst.,
30(1):3:1–3:24, Mar. 2012.
CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[5] S. Bhatia and P. Mitra. Summarizing figures, tables, and algorithms in
scientific publications to augment search results. ACM Trans. Inf. Syst.,
30(1):3:1–3:24, Mar. 2012.
[6] S. Bhatia, P. Mitra, and C. L. Giles. Finding algorithms in scientific articles.
WWW ’10, pages 1061–1062, 2010.
[7] S. Bhatia, S. Tuarob, P. Mitra, and C. L. Giles. An Algorithm Search Engine
for Software Developers. 2011.
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach.
Learn. Res., 3:993–1022, Mar. 2003.
[9] L. Breiman. Random forest. Machine Learning, 45(1):5–32, 2001.
[10] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote:
synthetic minority over-sampling technique. J. Artif. Int. Res., 16(1):321–357,
2002.

More Related Content

More from Nexgen Technology

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...Nexgen Technology
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennaiNexgen Technology
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Nexgen Technology
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Nexgen Technology
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Nexgen Technology
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Nexgen Technology
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Nexgen Technology
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Nexgen Technology
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Nexgen Technology
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...Nexgen Technology
 
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...
BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...Nexgen Technology
 
BULK IEEE 2019-20 PROJECTS IN PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...
 BULK IEEE  2019-20 PROJECTS IN  PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH... BULK IEEE  2019-20 PROJECTS IN  PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...
BULK IEEE 2019-20 PROJECTS IN PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...Nexgen Technology
 
BULK IEEE 2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...
 BULK IEEE  2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR... BULK IEEE  2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...
BULK IEEE 2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...Nexgen Technology
 
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...
 BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM... BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...Nexgen Technology
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medicalNexgen Technology
 
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTUREOPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURENexgen Technology
 

More from Nexgen Technology (20)

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
 
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...
BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EMB...
 
BULK IEEE 2019-20 PROJECTS IN PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...
 BULK IEEE  2019-20 PROJECTS IN  PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH... BULK IEEE  2019-20 PROJECTS IN  PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...
BULK IEEE 2019-20 PROJECTS IN PYTHON,BULK IEEE PROJECTS, IEEE 2019-20 PYTH...
 
BULK IEEE 2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...
 BULK IEEE  2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR... BULK IEEE  2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...
BULK IEEE 2019-20 PROJECTS IN JAVA,BULK IEEE PROJECTS, IEEE 2019-20 JAVA PR...
 
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...
 BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM... BULK IEEE  2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...
BULK IEEE 2019-20 PROJECTS IN EMBEDDED ,BULK IEEE PROJECTS, IEEE 2019-20 EM...
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medical
 
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTUREOPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Algorithmseer a system for extracting and searching for algorithms in scholarly big data

  • 1. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com ALGORITHMSEER: A SYSTEM FOR EXTRACTING AND SEARCHING FOR ALGORITHMS IN SCHOLARLY BIG DATA Abstract Algorithms are usually published in scholarly articles, especially in the computational sciences and related disciplines. The ability to automatically find and extract these algorithms in this increasingly vast collection of scholarly digital documents would enable algorithm indexing, searching, discovery, and analysis. Recently, AlgorithmSeer, a search engine for algorithms, has been investigated as part of CiteSeerX with the intent of providing a large algorithm database. Currently, over 200,000 algorithms have been extracted from over 2 million scholarly documents. This paper proposes a novel set of scalable techniques used by AlgorithmSeer to identify and extract algorithm representations in a heterogeneous pool of scholarly documents. Specifically, hybrid machine learning approaches are proposed to discover algorithm representations. Then, techniques to extract textual metadata for each algorithm are discussed. Finally, a demonstration version of AlgorithmSeer that is built on Solr/Lucene open source indexing and search system is presented. Existing System Standard algorithms1 are usually collected and cataloged manually in algorithm textbooks , encyclopedias (especially the ones available online such as Wikipedia2), and websites that provide references for computer programmers (e.g. Rosettacode.org3). We parsed Wikipedia algorithm pages
  • 2. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com published in 2010, and found that roughly 1,765 standard algorithms were cataloged. The National Instituteof Standards and Technology (NIST)4 also has a dictionary of over 289 standard algorithms. While most standard algorithms are already cataloged and made searchable, especially those in online catalogs, newly published algorithms only appear in new articles. The explosion of newly developed algorithms in scientific and technical documents makes it infeasible to manually catalog these newly developed algorithms. Manually searching for these newly published algorithms is a nontrivial task. Researchers and others who aim to discover efficient and innovative algorithms would have to actively search and monitor relevant new publications in their fields of study to keep abreast of latest algorithmic developments. The problem is worse for algorithm searchers. Proposed System We proposea set of hybrid techniques based on ensemble machine learning to discover PCs and APs in scholarly documents. We find that the textual metadata that can be directly extracted from an algorithm representation (especially PCs), is inadequate for effective retrieval, and propose to use the synopsis generation method, which automatically generates a comprehensive description for a document element, and the tag-based automatic metadata annotation technique, which automatically textually enriches metadata records, to extract more meaningful textual information for PCs.
  • 3. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com Conclusion The prototype of AlgorithmSeer, a search system for algorithms in large scale scholarly documents, along with providing an illustration of the actual demo system. Specifically, we proposed a set of scalable machine learning based methods to detect algorithms in scholarly documents, we discussed using the synopsis generation and document annotation methods to extract textual metadata for pseudo-codes, and finally we explained how algorithms are indexed and made searchable. REFERENCES [1] Approximating a data stream for querying and estimation: Algorithms and performance evaluation. ICDE ’02, pages 567–576, 2002. [2] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 27–34, Arlington, Virginia, United States, 2009. AUAI Press. [3] J. B. Baker, A. P. Sexton, V. Sorge, and M. Suzuki. Comparing approaches to mathematical document analysis from pdf. ICDAR ’11, pages 463–467, 2011. [4] S. Bhatia and P. Mitra. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Trans. Inf. Syst., 30(1):3:1–3:24, Mar. 2012.
  • 4. CONTACT: PRAVEEN KUMAR. L (,+91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [5] S. Bhatia and P. Mitra. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Trans. Inf. Syst., 30(1):3:1–3:24, Mar. 2012. [6] S. Bhatia, P. Mitra, and C. L. Giles. Finding algorithms in scientific articles. WWW ’10, pages 1061–1062, 2010. [7] S. Bhatia, S. Tuarob, P. Mitra, and C. L. Giles. An Algorithm Search Engine for Software Developers. 2011. [8] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. [9] L. Breiman. Random forest. Machine Learning, 45(1):5–32, 2001. [10] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. J. Artif. Int. Res., 16(1):321–357, 2002.