This document provides an overview and syllabus for a course on Introduction to Information Retrieval and Applications. The course will be taught on Thursdays from 9:10-12:00am in classroom R1322. It will cover topics such as indexing, vector space models, evaluation methods, relevance feedback, probabilistic models and applications like text classification, document clustering and web search. Students will complete programming exercises and a term project, and the course will include homework, a midterm exam and a final project.
Digital preservation faces challenges of scalability, cost, and uncertainty. The thesis proposes applying computational intelligence techniques like swarm intelligence to develop self-preserving digital objects that can autonomously manage their own preservation through replication and format migration using a social network environment. The research will study appropriate behaviors for self-preserving objects, their architecture as intelligent agents, and how social networks can support preservation. Preliminary work in modeling object behaviors has been done, and the status reports completing literature review, implementing simulation platforms, experimenting in identified research areas, and publishing results to develop and test a full prototype.
This document describes research on accessing and documenting relational databases through OWL ontologies. It introduces the topic and outlines the key contributions: a general approach for annotating data sources with ontologies, an extension of the Relational.OWL ontology to model relational databases, automatic extraction of ontologies from relational schemas, and applications of the framework. The paper presents an infrastructure for ontology extraction, including a data model ontology (DMO) to represent relational structure, a data source ontology (DSO) extracted from schemas, and a schema design ontology (SDO) that maps the DSO to the DMO. It also discusses query answering by rewriting SPARQL queries to SQL using the generated ontologies
imPlag: Detecting Image Plagiarism Using Hierarchical Near Duplicate RetrievalPrerana Mukherjee
The document presents a method for detecting image plagiarism using hierarchical near duplicate retrieval. The key contributions are a hierarchical feature extraction and indexing technique, evaluation of feature extraction techniques against deformations, and a dataset for testing plagiarism algorithms. The proposed method uses perceptual hashing and SIFT features with hierarchical approximate matching to balance time and accuracy, achieving 81% accuracy.
Listing of Intellectual work of patanjali kashyap , mainly contains , name, details and reference of papers , presentations , patents , public speaking
Ted Bundy was a serial killer who was born in 1946 in Vermont and executed in 1989 in Florida. He killed 30-36 victims in several states including Washington, Utah, Florida, Colorado, Oregon, Idaho and California in the 1970s. Bundy used manipulation and deceit to gain the trust of his victims before killing them, often by striking them with an object or strangling them.
Personality is most important in a girlfriend/boyfriend according to one person, while another thinks looks are most important when it comes to dating. They have a brief discussion on why each person values different traits over others when seeking a romantic partner.
Digital preservation faces challenges of scalability, cost, and uncertainty. The thesis proposes applying computational intelligence techniques like swarm intelligence to develop self-preserving digital objects that can autonomously manage their own preservation through replication and format migration using a social network environment. The research will study appropriate behaviors for self-preserving objects, their architecture as intelligent agents, and how social networks can support preservation. Preliminary work in modeling object behaviors has been done, and the status reports completing literature review, implementing simulation platforms, experimenting in identified research areas, and publishing results to develop and test a full prototype.
This document describes research on accessing and documenting relational databases through OWL ontologies. It introduces the topic and outlines the key contributions: a general approach for annotating data sources with ontologies, an extension of the Relational.OWL ontology to model relational databases, automatic extraction of ontologies from relational schemas, and applications of the framework. The paper presents an infrastructure for ontology extraction, including a data model ontology (DMO) to represent relational structure, a data source ontology (DSO) extracted from schemas, and a schema design ontology (SDO) that maps the DSO to the DMO. It also discusses query answering by rewriting SPARQL queries to SQL using the generated ontologies
imPlag: Detecting Image Plagiarism Using Hierarchical Near Duplicate RetrievalPrerana Mukherjee
The document presents a method for detecting image plagiarism using hierarchical near duplicate retrieval. The key contributions are a hierarchical feature extraction and indexing technique, evaluation of feature extraction techniques against deformations, and a dataset for testing plagiarism algorithms. The proposed method uses perceptual hashing and SIFT features with hierarchical approximate matching to balance time and accuracy, achieving 81% accuracy.
Listing of Intellectual work of patanjali kashyap , mainly contains , name, details and reference of papers , presentations , patents , public speaking
Ted Bundy was a serial killer who was born in 1946 in Vermont and executed in 1989 in Florida. He killed 30-36 victims in several states including Washington, Utah, Florida, Colorado, Oregon, Idaho and California in the 1970s. Bundy used manipulation and deceit to gain the trust of his victims before killing them, often by striking them with an object or strangling them.
Personality is most important in a girlfriend/boyfriend according to one person, while another thinks looks are most important when it comes to dating. They have a brief discussion on why each person values different traits over others when seeking a romantic partner.
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)Chung Yen Chang
This document contains a schedule of NBA games in China including warm-up matches, games for the NBA Cares program in Shenzhen, training courses in China, NBA Global Games in Shenzhen and Shanghai, and 54 regular season games. The document also notes that the music used is "I have a Dream" by ABBA and that all images are from Sports Vision.
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...ixfinito
This article discusses 10 lessons that can be learned from NBA player Jeremy Lin's recent success with the New York Knicks. The lessons are: 1) Believe in yourself even when others don't, 2) Seize opportunities when they arise, and 3) Rely on family for support and be there for them in return.
This document discusses an animal idiom and asks readers to look at a picture, decipher the real meaning of the idiom, and share their ideas about related words and the possible meaning with their group by posting in a blog.
This document discusses an animal idiom and asks students to look at a picture, discuss with partners what the idiom is about, write down their ideas and related words, and post their list on a blog.
The document provides definitions and examples of common English idioms related to animals. It explains idioms such as "birds of a feather flock together" which means people who are alike tend to group together, and "killing two birds with one stone" which means accomplishing two tasks simultaneously. It also discusses idioms involving cats, chickens, horses, fish and other animals.
The document discusses different types and stages of love over the course of a lifetime. It begins with being loved unconditionally by parents as a child. It then describes experiencing puppy love and adolescent crushes during teenage years that are not serious. It goes on to discuss falling truly in love with the right person as an adult and expressing public displays of affection. The document notes relationships can hit rocky periods but that reconciling after quarrels is possible. Ultimately, it suggests some people find their perfect match to marry.
A short PPT with five idioms of happiness. There is a short test at the end. Simple but effective. With thanks to Presentation Magazines free PPT templates.
This document contains definitions for 30 common English idioms, including their meanings and example sentences. Some idioms defined are "bark up the wrong tree", "cat got your tongue", "easy as pie", "fair-weather friend", "get this show on the road", and "leave no stone unturned". The idioms cover a wide range of topics from being easily frightened to making trouble to showing one's emotions openly. All definitions and examples are sourced from the Scholastic Dictionary of Idioms copyright 1996.
This document contains a list of common idioms and sayings related to animals such as birds, cats, dogs, horses, pigs, and others. Many of the idioms refer to typical animal behaviors and characteristics applied metaphorically to people. The list includes idioms like "birds of a feather flock together", "curiosity killed the cat", "let the sleeping dog lie", "lion's share", and "you can lead a horse to water but you can't make it drink".
The document defines and provides examples of common English weather idioms including "raining cats and dogs" meaning raining heavily, "face like thunder" meaning being very angry, and "storm in a teacup" meaning exaggerating a problem. It also covers idioms such as "chasing rainbows", "lightning fast", "head in the clouds", "snowed under", and "under the weather".
This document defines and provides examples of common idioms that use animals in their meaning or description. It explains idioms such as "monkey business" meaning mischievous behavior, "rat race" meaning an exhausting routine, "cat burglar" referring to a thief who climbs buildings, and "top dog" or most important person in a group. Additionally, it outlines idioms like "cash cow" as a dependable source of income, "eager beaver" describing an enthusiastic hard worker, "road hog" referring to a dangerous driver, and "black sheep" denoting an undesirable member of a group. Sources for the idiom definitions and images are provided.
A man watched as a butterfly struggled to emerge from its cocoon, and decided to help by cutting open the cocoon. However, the butterfly's body was small and shriveled, and its wings did not expand properly, so it was never able to fly. The struggle within the cocoon is necessary for the butterfly to pump fluids into its wings to prepare for flight. Similarly, struggles in life strengthen us and without obstacles, we would not grow strong enough to achieve our potential.
The document describes an activity where a character named George goes through different idioms and their definitions. For each idiom, the user selects the correct definition from multiple choice options. Some of the idioms explained include "raining cats and dogs" which means to rain very heavily, "let the cat out of the bag" which means to tell something that is supposed to be a secret, and "open a can of worms" which means to do something that will cause problems. The purpose is to learn common idioms and their contextual meanings.
Animal Farm by George Orwell is about farm animals who rebel against their human farmer. The animals establish Animal Farm, governed by the principle that all animals are equal. However, the pigs, led by Napoleon, gradually start walking on two legs and acting more like humans, establishing themselves as the new ruling class. Snowball is driven out and the Seven Commandments are altered to justify the pigs' behavior. In the end, the farm has essentially been transformed back into an ordinary farm run by humans, with the pigs and the farmer indistinguishable.
Test yourself with our selection of English language quizzes covering grammar, usage and vocabulary for beginner and intermediate level English students. Simply answer all of the questions in the quiz and submit to see your score and other statistics.
A motivational story on Power Point by Bro. Oh Teik Bin, Lower Perak Buddhist Association, Teluk Intan, Malaysia. A Life Lesson for all, particularly the many young ones today who take so many things for granted and do not count their blessings. May our Compassion grow.
The document shares a message of friendship and appreciation for the positive impact one can have on others. It expresses that the recipient is special and important to others, reminding them that their kindness and communication has brought smiles and gladness to people during sad times. The writer is grateful for the friendship and hopes the recipient has a great day.
A blind boy was sitting with a hat out, hoping for donations. The hat only had a few coins. A man changed the sign to say "Today is a day and I cannot see it", appealing to people's empathy. Many more people then donated to the blind boy. The man explained that his new sign conveyed the same message as the original, but in a more impactful way. This story teaches us to appreciate what we have and help those who have less, as well as to think creatively about solving problems.
This document summarizes the key steps and considerations for conducting a systematic literature review (SLR). It provides two examples of SLRs conducted on software architecture evolution and CBSE publications. The main steps discussed are developing a review protocol, searching literature sources, selecting primary studies, extracting and analyzing data, and synthesizing findings. For the software architecture evolution SLR, 82 primary studies were analyzed and classified into different categories. For the CBSE publications SLR, 318 papers were analyzed to understand impact, topics covered, and maturity levels. Both SLRs extracted statistical data on publications and citations to synthesize new findings on research trends and maturity. Validation of the SLR process and findings is also an important consideration.
Big Data and Computer Science EducationJames Hendler
- The document discusses the Rensselaer Institute for Data Exploration and Applications (IDEA) and its work in applying data science across various domains like healthcare, business, and the sciences.
- It outlines graduate projects in IDEA that involve collaborations with other Rensselaer research centers and applying data exploration tools.
- It also discusses changes made to Rensselaer's computer science and information technology curriculum to incorporate more training in data analytics, data science challenges, and working with large, unstructured datasets. This includes new concentrations in data science and information dominance.
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)Chung Yen Chang
This document contains a schedule of NBA games in China including warm-up matches, games for the NBA Cares program in Shenzhen, training courses in China, NBA Global Games in Shenzhen and Shanghai, and 54 regular season games. The document also notes that the music used is "I have a Dream" by ABBA and that all images are from Sports Vision.
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...ixfinito
This article discusses 10 lessons that can be learned from NBA player Jeremy Lin's recent success with the New York Knicks. The lessons are: 1) Believe in yourself even when others don't, 2) Seize opportunities when they arise, and 3) Rely on family for support and be there for them in return.
This document discusses an animal idiom and asks readers to look at a picture, decipher the real meaning of the idiom, and share their ideas about related words and the possible meaning with their group by posting in a blog.
This document discusses an animal idiom and asks students to look at a picture, discuss with partners what the idiom is about, write down their ideas and related words, and post their list on a blog.
The document provides definitions and examples of common English idioms related to animals. It explains idioms such as "birds of a feather flock together" which means people who are alike tend to group together, and "killing two birds with one stone" which means accomplishing two tasks simultaneously. It also discusses idioms involving cats, chickens, horses, fish and other animals.
The document discusses different types and stages of love over the course of a lifetime. It begins with being loved unconditionally by parents as a child. It then describes experiencing puppy love and adolescent crushes during teenage years that are not serious. It goes on to discuss falling truly in love with the right person as an adult and expressing public displays of affection. The document notes relationships can hit rocky periods but that reconciling after quarrels is possible. Ultimately, it suggests some people find their perfect match to marry.
A short PPT with five idioms of happiness. There is a short test at the end. Simple but effective. With thanks to Presentation Magazines free PPT templates.
This document contains definitions for 30 common English idioms, including their meanings and example sentences. Some idioms defined are "bark up the wrong tree", "cat got your tongue", "easy as pie", "fair-weather friend", "get this show on the road", and "leave no stone unturned". The idioms cover a wide range of topics from being easily frightened to making trouble to showing one's emotions openly. All definitions and examples are sourced from the Scholastic Dictionary of Idioms copyright 1996.
This document contains a list of common idioms and sayings related to animals such as birds, cats, dogs, horses, pigs, and others. Many of the idioms refer to typical animal behaviors and characteristics applied metaphorically to people. The list includes idioms like "birds of a feather flock together", "curiosity killed the cat", "let the sleeping dog lie", "lion's share", and "you can lead a horse to water but you can't make it drink".
The document defines and provides examples of common English weather idioms including "raining cats and dogs" meaning raining heavily, "face like thunder" meaning being very angry, and "storm in a teacup" meaning exaggerating a problem. It also covers idioms such as "chasing rainbows", "lightning fast", "head in the clouds", "snowed under", and "under the weather".
This document defines and provides examples of common idioms that use animals in their meaning or description. It explains idioms such as "monkey business" meaning mischievous behavior, "rat race" meaning an exhausting routine, "cat burglar" referring to a thief who climbs buildings, and "top dog" or most important person in a group. Additionally, it outlines idioms like "cash cow" as a dependable source of income, "eager beaver" describing an enthusiastic hard worker, "road hog" referring to a dangerous driver, and "black sheep" denoting an undesirable member of a group. Sources for the idiom definitions and images are provided.
A man watched as a butterfly struggled to emerge from its cocoon, and decided to help by cutting open the cocoon. However, the butterfly's body was small and shriveled, and its wings did not expand properly, so it was never able to fly. The struggle within the cocoon is necessary for the butterfly to pump fluids into its wings to prepare for flight. Similarly, struggles in life strengthen us and without obstacles, we would not grow strong enough to achieve our potential.
The document describes an activity where a character named George goes through different idioms and their definitions. For each idiom, the user selects the correct definition from multiple choice options. Some of the idioms explained include "raining cats and dogs" which means to rain very heavily, "let the cat out of the bag" which means to tell something that is supposed to be a secret, and "open a can of worms" which means to do something that will cause problems. The purpose is to learn common idioms and their contextual meanings.
Animal Farm by George Orwell is about farm animals who rebel against their human farmer. The animals establish Animal Farm, governed by the principle that all animals are equal. However, the pigs, led by Napoleon, gradually start walking on two legs and acting more like humans, establishing themselves as the new ruling class. Snowball is driven out and the Seven Commandments are altered to justify the pigs' behavior. In the end, the farm has essentially been transformed back into an ordinary farm run by humans, with the pigs and the farmer indistinguishable.
Test yourself with our selection of English language quizzes covering grammar, usage and vocabulary for beginner and intermediate level English students. Simply answer all of the questions in the quiz and submit to see your score and other statistics.
A motivational story on Power Point by Bro. Oh Teik Bin, Lower Perak Buddhist Association, Teluk Intan, Malaysia. A Life Lesson for all, particularly the many young ones today who take so many things for granted and do not count their blessings. May our Compassion grow.
The document shares a message of friendship and appreciation for the positive impact one can have on others. It expresses that the recipient is special and important to others, reminding them that their kindness and communication has brought smiles and gladness to people during sad times. The writer is grateful for the friendship and hopes the recipient has a great day.
A blind boy was sitting with a hat out, hoping for donations. The hat only had a few coins. A man changed the sign to say "Today is a day and I cannot see it", appealing to people's empathy. Many more people then donated to the blind boy. The man explained that his new sign conveyed the same message as the original, but in a more impactful way. This story teaches us to appreciate what we have and help those who have less, as well as to think creatively about solving problems.
This document summarizes the key steps and considerations for conducting a systematic literature review (SLR). It provides two examples of SLRs conducted on software architecture evolution and CBSE publications. The main steps discussed are developing a review protocol, searching literature sources, selecting primary studies, extracting and analyzing data, and synthesizing findings. For the software architecture evolution SLR, 82 primary studies were analyzed and classified into different categories. For the CBSE publications SLR, 318 papers were analyzed to understand impact, topics covered, and maturity levels. Both SLRs extracted statistical data on publications and citations to synthesize new findings on research trends and maturity. Validation of the SLR process and findings is also an important consideration.
Big Data and Computer Science EducationJames Hendler
- The document discusses the Rensselaer Institute for Data Exploration and Applications (IDEA) and its work in applying data science across various domains like healthcare, business, and the sciences.
- It outlines graduate projects in IDEA that involve collaborations with other Rensselaer research centers and applying data exploration tools.
- It also discusses changes made to Rensselaer's computer science and information technology curriculum to incorporate more training in data analytics, data science challenges, and working with large, unstructured datasets. This includes new concentrations in data science and information dominance.
This document discusses the landscape of patterns for Internet of Things (IoT) and machine learning (ML). It analyzes 33 papers on IoT patterns to classify them by abstraction level, domain specificity, and quality characteristics addressed. It also identifies common issues in ML system development by analyzing 9 papers and categorizes ML practices. Finally, it summarizes the publication trends of ML architecture and design patterns based on 10 papers and 28 gray documents.
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
This document discusses the Sustainable Environment Actionable Data (SEAD) project, which aims to lower the costs and increase the value of data curation through a data lifecycle approach. SEAD provides lightweight data services to support sustainability research, including secure project workspaces, active and social curation tools, and integrated lifecycle support for data from ingest to long-term preservation. By leveraging technologies like Web 2.0 and standards, SEAD simplifies and automates curation processes using metadata captured from data producers and users. This allows curation activities to begin earlier in the data lifecycle and be distributed across researchers and curators.
This document discusses transforming textbooks for authentic learning through digital and e-textbook technologies. It provides background on the speaker, Prof. Dr. Yasuhisa Tamura, and his research interests. It then discusses definitions and examples of digital textbooks and e-textbooks, highlighting added digital functions. International projects toward standardizing e-textbook formats and functions are summarized, including the EDUPUB specification. National movements toward digital textbooks in various countries are overviewed, and debates around replacing traditional textbooks with e-textbooks are presented. The document concludes with discussions of how classrooms may transform with more learner-centered e-textbook use and references for further reading.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
This presentation was given by guest lecturer Martin Szomszor of Electric Data Solutions LTD, during the seventh session of the NISO Spring training series "Working with Scholarly APIs." Session Seven, Methods and Tools for Scholarly Data Analytics, was moderated by Phill Jones of MoreBrains Cooperative and held on June 9, 2022.
The document summarizes the experimental project of registering Digital Object Identifiers (DOIs) for research data at the Japan Link Center (JaLC). The project aims to establish workflows for registering DOIs for research data and test the registration of data DOIs. It involves 9 research projects and 14 organizations registering and integrating DOIs for their data through the JaLC system. The project addresses several issues in registering DOIs for dynamic research data, such as data lifecycles, granularity, persistence, and handling changes over time.
An Introduction to Information Retrieval and Applicationssathish sak
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
Proposals are *required* for each team, and will be counted in the score
Demonstrating a Framework for KOS-based Recommendations SystemsGESIS
This document describes a framework for knowledge organization system (KOS)-based recommendation systems. It discusses two funded projects (IRM I and IRM II) that aimed to implement value-added information retrieval services for digital libraries based on applying scholarly models. These services include term suggestion, query suggestion, and bibliometric analysis. The document outlines the Information Retrieval Service Assessment (IRSA) component, which calculates search term suggestions using co-occurrence analysis of controlled vocabularies harvested via OAI-PMH. It demonstrates the IRSA system and discusses limitations and references.
The document discusses Japan Link Center's (JaLC) experiment to register DOIs for research data. The experiment aims to establish workflows for registering DOIs for research data using JaLC's system. It involves 9 projects with 14 organizations testing DOI registration for research data. The document outlines several issues in registering DOIs for data, including operations flow, persistent access, granularity, dynamics of data, and quantity of data. It also provides examples of how projects can involve multiple institutions and how data lifecycles differ from literature.
This document provides an overview of resources and techniques for finding and evaluating research evidence. It discusses developing effective search strategies, databases and resources available, evaluating information quality and relevance, avoiding plagiarism, and managing references. Key resources covered include subject guides, reading lists, Summon search tool, interlibrary loans, and bibliographic management software. Techniques for developing search terms, paraphrasing sources, and citing references are also summarized.
INNOVATION AND RESEARCH (Digital Library Information Access)Libcorpio
Innovation and research, Digital Library Information Access, LIS Education, Library and Information Science, LIS Studies, Information Management, Education and Learning, Library science, Information science, Digital Libraries, Research on Digital Libraries, DL, Innovation in libraries and publishing, Areas of Research for DL, Information Discovery, Collection Management and Preservation, Interoperability, Economic, Social and Legal Issues, Core Topics In Digital Libraries, DL Research Around The World
Leveraging Computational Methods for Theorizing IS PhenomenaMalmi Amadoru
The rapid development of computational methods expands the horizon of opportunities in research methods. Scholars have acknowledged the potential of computationally intensive research approaches for theorizing IS phenomena. However, computationally intensive theory building is still at a nascent stage. This presentation focuses on how to leverage computational methods in the theorizing process, associated challenges, and respective strategies.
Reproducibility in human cognitive neuroimaging: a community-driven data sha...Nolan Nichols
The document summarizes Nolan Nichols' dissertation defense on a community-driven data sharing framework for integrating and interoperating neuroimaging provenance information. His research aimed to enhance the reusability of neuroimaging data and workflows by advancing data exchange standards that incorporate provenance. Through two phases involving multiple collaborations, he extended existing standards and developed neuroimaging data models and web services to compute and discover provenance from brain imaging workflows in order to improve reproducibility in cognitive neuroimaging research.
RQ1. What are the differences between e-commerce and s-commerce?
RQ2. What are the characteristics of s-commerce?
RQ3. What are the activities of s-commerce?
RQ4. What are the research themes that are addressed in s-commerce studies?
RQ5. What are the limitations and gaps in current research of s-commerce?
The document outlines the procedures for conducting a systematic literature review on social commerce (s-commerce), including developing research questions, defining a search strategy, selecting studies, assessing study quality, extracting and synthesizing data. The review aims to understand the key concepts of s-commerce, explore common research themes,
(a slightly updated version of this talk is at https://doi.org/10.6084/m9.figshare.10301741.v1)
A talk on the role of software in research and how NCSA is responding in terms of people and roles - given at the 2019 Data Science Leadership Summit (https://sites.google.com/msdse.org/datascienceleadership2019/).
This is partially based on a previous paper: Daniel S. Katz, Kenton McHenry, Caleb Reinking, Robert Haines, "Research Software Development & Management in Universities: Case Studies from Manchester's RSDS Group, Illinois' NCSA, and Notre Dame's CRC", 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science)
doi: https://doi.org/10.1109/SE4Science.2019.00009
preprint: https://arxiv.org/abs/1903.00732
This document describes an ISO 15926 reference data engineering methodology developed by TechInvestLab. It uses the ISO 24744 standard to describe work products, the reference data lifecycle stages of identification, characterization, mapping, transfer and verification, processes, roles and tools. The methodology focuses on the tasks of a reference data engineer, including developing and evaluating data for inclusion in reference data libraries. It was developed informally but provides a checklist for reference data projects. However, lessons learned are that it could be improved by rewriting using the OMG Essence methodology language to separate the abstract kernel from concrete practices.
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
This document discusses software as infrastructure for science and engineering research. It outlines how software is essential to many areas of science, with about half of recent science papers involving software-intensive projects. It also discusses how "long-tail" scientists need advanced infrastructure to handle large data and simulations. The document notes challenges around larger teams, more data and complex systems, and changing hardware and software. It positions software as a critical part of cyberinfrastructure and outlines NSF programs like SI2 and CDS&E that support development of sustainable scientific software infrastructure.
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusJakub Ruzicka
Social Web: (Big) Data Mining | ISS FSV UK | Charles University in Prague | Faculty of Social Sciences | Institute of Sociological Studies | bachelor’s course | JSB454 | summer semester 2014/2015
Course Syllabus (version 1.1)
Introduction to Data Mining & Data Analysis | Data Science | Digital Humanities
Big Data | Types of Data | Data Formats | Information Retrieval | Business Intelligence | Law & Ethics of Data Mining
Introduction to Web Technologies for Non-Tech Students | Database Systems | Web Programming | Semantic Web | APIs
Graph Theory | Social Network Analysis | Statistical Procedures, Apps&Tools
Pseudocoding | Introduction to Programming in Python & data mining alternatives comparison | Data Exploration & Preprocessing
Web Scraping | Data Cleaning & Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Social Media Mining | Data Cleaning & Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Text Mining | Natural Language Processing | Python Implementation &Libraries, Statistical Procedures, Apps &Tools
Data Visualization | Data Storytelling | Electronic Publishing | Python Implementation & Libraries, Statistical Procedures, Apps & Tools
Student Webinars Week |Introducing Various Free &Open Source Data Mining Software &Apps
Machine Learning, Recommender Systems & OtherMoreAdvanced Topics | Large-ScaleDataSets| MapReduce, Hadoop, NoSQL
Course Review | Semestral Projects Consultation & Adjustments | The Remaining 99% of Data Science | Data Science Buzzwords
2. Instructor & TA
• Instructor
– J. H. Wang ( 王正豪 )
– Assistant Professor, CSIE, NTUT
– Office: R1534, Technology Building
– E-mail: jhwang@csie.ntut.edu.tw
– Tel: ext. 4238
– Office Hour: 9:00-12:00 am, every Tuesday and
Wednesday
• TA
– Mr. Liu ( 劉瀚之 )
– R1424, Technology Building
IR, Spring 2012 NTUT CSIE 2
3. Course Description
• Course Web Page
– http://www.ntut.edu.tw/~jhwang/IR/
• Time: 9:10-12:00am, Thu.
• Classroom: R1322, Technology Building
• Textbook:
– Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schuetze, Introduction to Information Retrieval, Cambridge
University Press, 2008.
• Available online
• International Student Edition, imported by Kai-Fa ( 開發 )
Publishing
• Prerequisites:
– Basic knowledge of data structures and algorithms, linear
algebra, and probability theory
– Programming experience is *required* for homeworks &
projects
IR, Spring 2012 NTUT CSIE 3
4. Additional References
• References:
– Ricardo Baeza-Yates and Berthier Ribeiro-Neto,
Modern Information Retrieval: The Concepts and
Technology behind Search, Addison-Wesley, 2011.
• This is the second edition of their book Modern Information
Retrieval in 1999. ( 華通 )
– Stefan Buettcher, Charles L.A. Clarke, and Gordon V.
Cormack, Information Retrieval: Implementing and
Evaluating Search Engines, MIT Press, 2010.
– Bruce Croft, Donald Metzler, and Trevor Strohman,
Search Engines: Information Retrieval in Practice,
Addison-Wesley, 2010. ( 全華 )
IR, Spring 2012 NTUT CSIE 4
5. More Books on IR
• Gerald Salton, Automatic information organization and
retrieval, McGraw-Hill, 1968.
• Gerald Salton and M.J. McGill, Introduction to modern
information retrieval, McGraw-Hill, 1983.
– Two classics, but out-of-print.
• C. J. van Rijsbergen, Information Retrieval, Butterworths,
1979.
– The classic. More than 40 years old, but still worth reading.
• K. Sparck Jones, P. Willett, Readings in Information
Retrieval, Morgan Kaufmann, 1997.
– A collection of classical IR papers. (out of print)
• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann,
Managing Gigabytes, 2nd edition, 1999.
– The authority on index construction and compression.
IR, Spring 2012 NTUT CSIE 5
6. Grading Policy
• Homework assignments and
programming exercises: 40%
• Mid-term exam: 25%
• Term project: 35%
– Including the proposal and final report
IR, Spring 2012 NTUT CSIE 6
7. Programming Exercises and Term
Project
• About 3 programming exercises
– Team-based (at most 2 persons per team)
– You can either write your own code or reuse existing
open source code
• The term project
– Either team-based system development (the same as
programming exercises)
– Or academic paper presentation
• Only one person per team allowed
– A proposal is required before midterm (Apr. 12, 2012)
IR, Spring 2012 NTUT CSIE 7
8. About the Term Project
• The score you get depends on the difficulty and
quality of your project
– For system development:
• System functions and correctness
– For academic paper presentation
• Quality and your presentation of the paper
• Major methods/experimental results *must* be presented
• Papers from top conferences are strongly suggested
– E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
• Proposals are *required* for each team, and will counted in
the score
IR, Spring 2012 NTUT CSIE 8
9. Online Submission
• Submission instructions
– Programs, project proposals, and project
reports in electronic files must be submitted to
the TA online at:
• http://140.124.183.39/ir/
– Before submission:
• User name: Your student ID
• Please change your default password at your first
login
IR, Spring 2012 NTUT CSIE 9
10. What this Course is NOT about
• This course will NOT tell you
– The tips and tricks of using search engines,
although power users might have better ideas on how
to improve them
• There’re plenty of books and websites on that…
– How to find books in libraries,
although it’s somewhat related to the basic IR
concepts
– How to make money on the Web,
although the currently largest search engine did it
IR, Spring 2012 NTUT CSIE 10
29. What Is Information Retrieval?
• “Information retrieval is a field concerned
with the structure, analysis, organization,
storage, searching, and retrieval of
information.” (Salton, 1968)
IR, Spring 2012 NTUT CSIE 29
30. Goal
• Information retrieval (IR): a research field
that targets at effectively and efficiently
searching information in text and
multimedia documents
• In this course, we will introduce the basic
text and query models in IR, retrieval
evaluation, indexing and searching, and
applications for IR
IR, Spring 2012 NTUT CSIE 30
32. User
Interface
user need Text
Text Operations
logical view Doc representation
Query
Indexing
Indexing
user feedback Expansion
query inverted file
Retrieval
Retrieval Inverte
d Index
retrieved docs Document
Collection
Ranking
Ranking
ranked docs
IR, Spring 2012 NTUT CSIE 32
33. Topics
• Text IR
– Indexing and searching
– Query languages and operations
• Retrieval evaluation
• Modeling
– Boolean model
– Vector space model
– Probabilistic model
• Applications for IR
– Multimedia IR
– Web search
– Digital libraries
IR, Spring 2012 NTUT CSIE 33
34. Organization of the Textbook
• Basics in IR (focus)
– Inverted indexes for boolean queries (Ch.1-5)
– Term weighting and vector space model (Ch. 6-7)
– Evaluation in IR (Ch. 8)
• Advanced Topics
– Relevance feedback (Ch. 9)
– XML retrieval (Ch. 10)
– Probabilistic IR (Ch. 11)
– Language models (Ch. 12)
• Machine learning in IR (useful)
– Text classification (Ch. 13-15)
– Document clustering (Ch. 16-18)
• Web Search
– Web crawling and indexes (Ch. 19-20)
– Link analysis (Ch. 21)
IR, Spring 2012 NTUT CSIE 34
35. Pointers to Other Topics
• Cross-language IR
• Image, video, and multimedia IR
• Speech retrieval
• Music retrieval
• User interfaces
• Parallel, distributed, and P2P IR
• Digital libraries
• Information science perspective
• Logic-based approaches to IR
• Natural language processing techniques
IR, Spring 2012 NTUT CSIE 35
36. Tentative Schedule
• Before midterm
– Boolean retrieval (1 wk)
– Indexing (2 wks)
– Vector space model and evaluation (2 wk)
– Relevance feedback (1 wk)
– Probabilistic IR (2 wk)
• After midterm
– Text classification (1-2 wk)
– Document clustering (1-2 wk)
– Web search (2 wks)
– Advanced topics: CLIR, IE, … (2 wks)
– Term Project Presentation (3 wks)
IR, Spring 2012 NTUT CSIE 36
37. Generic Resources
• Wikipedia page on Information Retrieval:
http://en.wikipedia.org/wiki/Information_re
• Information Retrieval Resources:
http://www-
csli.stanford.edu/~hinrich/information-
retrieval.html
•
IR, Spring 2012 NTUT CSIE 37
38. Academic Resources
• Journals
– ACM TOIS: Transactions on Information Systems
– JASIST: Journal of the American Society of Information Sciences
– IP&M: Information Processing and Management
– IEEE TKDE: Transactions on Knowledge and Data Engineering
• Conferences
– ACM SIGIR: International Conference on Information Retrieval
– WWW: World Wide Web Conference
– ACM CIKM: Conference on Information Knowledge and
Management
– JCDL: ACM/IEEE Joint Conference on Digital Libraries
– ACM WSDM: International Conference on Web Search and Data
Mining
– TREC: Text Retrieval Conference
IR, Spring 2012 NTUT CSIE 38