This document provides an overview of an information retrieval and data mining course. It defines information retrieval as finding unstructured documents that satisfy an information need. Data mining is defined as discovering useful patterns from large amounts of data. The document outlines topics covered in the course such as retrieval models, text processing, and data mining algorithms like decision trees and clustering. Popular data mining tools and applications are also mentioned.
The document summarizes the author's reflections and learnings from attending the SIGIR 2011 conference. It discusses notable scholars and research institutions in the IR field, experiences from the conference sessions, and other related conferences to consider. The author gained understanding of prevalent topics, methods, and the importance of research teams by observing presentations from different universities. Attending SIGIR helped broaden the author's perspective of the IR domain.
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
This document discusses the opportunities and challenges of big data for libraries, researchers, and digital humanities. It notes that big data is growing exponentially from sensors, internet data, and scientific instruments. Libraries and librarians have new roles to play in data management, curation, and research data services. Researchers need help with data literacy, data management plans, and archiving research data. Digital humanities can use big data and visualization to gain new insights. Standards like TEI and services like data repositories are important to enable access and reuse of data.
The document provides an overview of the data mining concepts and techniques course offered at the University of Illinois at Urbana-Champaign. It discusses the motivation for data mining due to abundant data collection and the need for knowledge discovery. It also describes common data mining functionalities like classification, clustering, association rule mining and the most popular algorithms used.
The document discusses instructions for a term paper assignment on data mining case analysis. Students must write a 5-8 page paper analyzing a data mining application domain or technique. They are instructed to describe the data, challenges, goals, users, relevant algorithms, and how data mining is used within the chosen topic. Examples of possible case topics and paper requirements are provided.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of data from various sources. Data mining involves knowledge discovery from large datasets using techniques from machine learning, statistics, pattern recognition and databases. The document outlines common data mining tasks like classification, regression, clustering and discusses applications in domains like fraud detection, customer churn prediction, and sky survey cataloging.
This document provides an overview of data mining concepts and techniques. It discusses how data mining has evolved from traditional data analysis due to the massive amounts of data now available. It defines data mining as the extraction of interesting patterns from large datasets. The document also outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It concludes by identifying the top 10 most popular data mining algorithms.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated process of discovering patterns and knowledge from large data sets. The chapter outlines several key aspects of data mining, including the types of data that can be mined, the patterns that can be discovered, the technologies used, and its applications across various domains.
The document summarizes the author's reflections and learnings from attending the SIGIR 2011 conference. It discusses notable scholars and research institutions in the IR field, experiences from the conference sessions, and other related conferences to consider. The author gained understanding of prevalent topics, methods, and the importance of research teams by observing presentations from different universities. Attending SIGIR helped broaden the author's perspective of the IR domain.
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
This document discusses the opportunities and challenges of big data for libraries, researchers, and digital humanities. It notes that big data is growing exponentially from sensors, internet data, and scientific instruments. Libraries and librarians have new roles to play in data management, curation, and research data services. Researchers need help with data literacy, data management plans, and archiving research data. Digital humanities can use big data and visualization to gain new insights. Standards like TEI and services like data repositories are important to enable access and reuse of data.
The document provides an overview of the data mining concepts and techniques course offered at the University of Illinois at Urbana-Champaign. It discusses the motivation for data mining due to abundant data collection and the need for knowledge discovery. It also describes common data mining functionalities like classification, clustering, association rule mining and the most popular algorithms used.
The document discusses instructions for a term paper assignment on data mining case analysis. Students must write a 5-8 page paper analyzing a data mining application domain or technique. They are instructed to describe the data, challenges, goals, users, relevant algorithms, and how data mining is used within the chosen topic. Examples of possible case topics and paper requirements are provided.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of data from various sources. Data mining involves knowledge discovery from large datasets using techniques from machine learning, statistics, pattern recognition and databases. The document outlines common data mining tasks like classification, regression, clustering and discusses applications in domains like fraud detection, customer churn prediction, and sky survey cataloging.
This document provides an overview of data mining concepts and techniques. It discusses how data mining has evolved from traditional data analysis due to the massive amounts of data now available. It defines data mining as the extraction of interesting patterns from large datasets. The document also outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It concludes by identifying the top 10 most popular data mining algorithms.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated process of discovering patterns and knowledge from large data sets. The chapter outlines several key aspects of data mining, including the types of data that can be mined, the patterns that can be discovered, the technologies used, and its applications across various domains.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to gain insights and make predictions.
01Introduction to data mining chapter 1.pptadmsoyadm4
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to help analyze data and discover useful knowledge.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
This document provides an introduction to data mining concepts. It discusses why data mining is important due to the massive growth of data. It defines data mining as the automated analysis of large datasets to discover hidden patterns and unknown correlations. The document presents a multi-dimensional view of data mining, including the types of data that can be mined, the patterns that can be discovered, techniques used, and applications. It provides an overview of the key concepts in data mining.
This document provides an overview of data mining concepts and techniques from the third edition of the textbook "Data Mining: Concepts and Techniques" by Jiawei Han, Micheline Kamber, and Jian Pei. It introduces why data mining is important due to the massive growth of data, defines data mining, and discusses the multi-dimensional nature of data mining including the types of data, patterns, techniques and applications. The chapter also covers data mining functions such as generalization, association analysis, classification, and cluster analysis.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This chapter introduces the notion of Information Retrieval (IR). it discusses after a survey of classification of various IR systems and major components of an IR system, the notion of Boolean Retrieval model and Invertex Index and extended Boolean are presented.
This document provides an overview of a course on Predictive Modeling using IBM SPSS Statistics. The course is divided into 5 units that cover topics such as reading, organizing, and transforming data in SPSS; conducting descriptive and inferential statistics; creating graphical displays; and performing statistical analyses like t-tests, ANOVA, correlation, regression, and predictive analysis. Students will learn how to import, manage, and analyze data in SPSS through illustrative problems and projects involving both parametric and non-parametric statistical tests. The goal is for students to gain experience in using SPSS to conduct statistical analyses and predictive modeling on data.
This document provides information about a computational intelligence and soft computing course including the instructor's contact information, class times, required text, and an overview of upcoming lectures on data mining with neural networks. It then discusses key issues in data mining such as theory, methods/algorithms, processes, applications, and tools/techniques. Several example data mining projects are also summarized along with homework and exam due dates for the course.
The document discusses principles for proper data management and reuse from the perspective of the Research Data Alliance (RDA). It notes that RDA has over 2000 members with diverse opinions. There is an ongoing discussion around trends in data practices and principles that most members agree with, such as data needing to be findable, accessible, combinable and interpretable by others. The document outlines some results from RDA working groups, including a common data model using persistent identifiers, a data type registry, a generic application programming interface for persistent identifier records, and a set of best practice policies for typical data management and data processing tasks.
This document provides an overview of an information retrieval course. The course will cover topics related to information retrieval models, techniques, and systems. Students will complete exams, assignments, and a major project to build a search engine using both text-based and semantic retrieval techniques. The document defines key concepts in information retrieval and discusses different types of information retrieval systems and techniques.
The Digital Curation Centre (DCC) helps research institutions and funders develop data management plans and policies. The DCC created an online tool called DMP Online that allows researchers to create customized data management plans that meet funder requirements. DMP Online provides guidance and templates on best practices. The DCC also analyzes funder policies and develops training and resources to help institutions build data management strategies and capabilities.
This document outlines a course on data warehousing and data mining. It introduces key concepts like relational databases, data warehouses, dimensional modeling, and data mining techniques. It also details the course objectives, schedule, assignments, and policies. The goal is for students to gain experience applying data mining methods and understanding the relationship between data mining and other fields.
This document introduces data mining concepts and techniques. It defines data mining as the process of discovering interesting patterns from large amounts of data. The document outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It also discusses popular data mining algorithms, major issues in data mining, and provides a brief history of the data mining field and community.
iConference: Overview of data management planningCarly Strasser
This document summarizes a presentation about data management planning. It discusses how data management plans (DMPs) are now required by many funding agencies to accompany grant applications. DMPs describe how research data will be managed and shared during and after a research project. Tools like DMPTool and DMPOnline provide templates to help researchers create DMPs by walking them through the key components, such as the data that will be collected, metadata standards, data sharing and preservation plans. The presenters provided a brief history of the development of DMP requirements and discussed ongoing efforts to improve DMP tools and guidelines.
The document discusses a Faculty Development Program (FDP) on database management systems that was held on December 6, 2018 at the University College of Engineering Tindivanam in Tindivanam, India. The FDP covered recent research perspectives in different database management systems and the importance of database management systems in Digital India. It was conducted by Dr. A. Karthirvel, Professor and Head of the Computer Science and Engineering Department at MNM Jain Engineering College in Chennai.
Introduction To Data Mining: Introduction - The evolution of database
system technology - Steps in knowledge discovery from database process
- Architecture of a data mining systems - Data mining on different kinds
of data - Different kinds of pattern - Technologies used - Applications -
Major issues in data mining - Classification of data mining systems - Data
mining task primitives - Integration of a data mining system with a
database or data warehouse system.
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to gain insights and make predictions.
01Introduction to data mining chapter 1.pptadmsoyadm4
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
This document provides an introduction to data mining concepts and techniques. It discusses why data mining has become important due to the massive growth of digital data. Data mining aims to extract useful patterns from large datasets through techniques like generalization, association analysis, classification, and cluster analysis. It can be applied to many types of data and has uses in domains such as business, science, and healthcare to help analyze data and discover useful knowledge.
This chapter introduces data mining and discusses its rise due to the massive growth of digital data. It describes data mining as the automated extraction of meaningful patterns from large data sets, and notes it draws on techniques from machine learning, statistics, pattern recognition, and database systems. The chapter outlines different types of data that can be mined, patterns that can be discovered, and applications of data mining in various domains including business, science, and on the web.
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
This document provides an introduction to data mining concepts. It discusses why data mining is important due to the massive growth of data. It defines data mining as the automated analysis of large datasets to discover hidden patterns and unknown correlations. The document presents a multi-dimensional view of data mining, including the types of data that can be mined, the patterns that can be discovered, techniques used, and applications. It provides an overview of the key concepts in data mining.
This document provides an overview of data mining concepts and techniques from the third edition of the textbook "Data Mining: Concepts and Techniques" by Jiawei Han, Micheline Kamber, and Jian Pei. It introduces why data mining is important due to the massive growth of data, defines data mining, and discusses the multi-dimensional nature of data mining including the types of data, patterns, techniques and applications. The chapter also covers data mining functions such as generalization, association analysis, classification, and cluster analysis.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This chapter introduces the notion of Information Retrieval (IR). it discusses after a survey of classification of various IR systems and major components of an IR system, the notion of Boolean Retrieval model and Invertex Index and extended Boolean are presented.
This document provides an overview of a course on Predictive Modeling using IBM SPSS Statistics. The course is divided into 5 units that cover topics such as reading, organizing, and transforming data in SPSS; conducting descriptive and inferential statistics; creating graphical displays; and performing statistical analyses like t-tests, ANOVA, correlation, regression, and predictive analysis. Students will learn how to import, manage, and analyze data in SPSS through illustrative problems and projects involving both parametric and non-parametric statistical tests. The goal is for students to gain experience in using SPSS to conduct statistical analyses and predictive modeling on data.
This document provides information about a computational intelligence and soft computing course including the instructor's contact information, class times, required text, and an overview of upcoming lectures on data mining with neural networks. It then discusses key issues in data mining such as theory, methods/algorithms, processes, applications, and tools/techniques. Several example data mining projects are also summarized along with homework and exam due dates for the course.
The document discusses principles for proper data management and reuse from the perspective of the Research Data Alliance (RDA). It notes that RDA has over 2000 members with diverse opinions. There is an ongoing discussion around trends in data practices and principles that most members agree with, such as data needing to be findable, accessible, combinable and interpretable by others. The document outlines some results from RDA working groups, including a common data model using persistent identifiers, a data type registry, a generic application programming interface for persistent identifier records, and a set of best practice policies for typical data management and data processing tasks.
This document provides an overview of an information retrieval course. The course will cover topics related to information retrieval models, techniques, and systems. Students will complete exams, assignments, and a major project to build a search engine using both text-based and semantic retrieval techniques. The document defines key concepts in information retrieval and discusses different types of information retrieval systems and techniques.
The Digital Curation Centre (DCC) helps research institutions and funders develop data management plans and policies. The DCC created an online tool called DMP Online that allows researchers to create customized data management plans that meet funder requirements. DMP Online provides guidance and templates on best practices. The DCC also analyzes funder policies and develops training and resources to help institutions build data management strategies and capabilities.
This document outlines a course on data warehousing and data mining. It introduces key concepts like relational databases, data warehouses, dimensional modeling, and data mining techniques. It also details the course objectives, schedule, assignments, and policies. The goal is for students to gain experience applying data mining methods and understanding the relationship between data mining and other fields.
This document introduces data mining concepts and techniques. It defines data mining as the process of discovering interesting patterns from large amounts of data. The document outlines several data mining functionalities including classification, clustering, association rule mining, and outlier detection. It also discusses popular data mining algorithms, major issues in data mining, and provides a brief history of the data mining field and community.
iConference: Overview of data management planningCarly Strasser
This document summarizes a presentation about data management planning. It discusses how data management plans (DMPs) are now required by many funding agencies to accompany grant applications. DMPs describe how research data will be managed and shared during and after a research project. Tools like DMPTool and DMPOnline provide templates to help researchers create DMPs by walking them through the key components, such as the data that will be collected, metadata standards, data sharing and preservation plans. The presenters provided a brief history of the development of DMP requirements and discussed ongoing efforts to improve DMP tools and guidelines.
The document discusses a Faculty Development Program (FDP) on database management systems that was held on December 6, 2018 at the University College of Engineering Tindivanam in Tindivanam, India. The FDP covered recent research perspectives in different database management systems and the importance of database management systems in Digital India. It was conducted by Dr. A. Karthirvel, Professor and Head of the Computer Science and Engineering Department at MNM Jain Engineering College in Chennai.
Introduction To Data Mining: Introduction - The evolution of database
system technology - Steps in knowledge discovery from database process
- Architecture of a data mining systems - Data mining on different kinds
of data - Different kinds of pattern - Technologies used - Applications -
Major issues in data mining - Classification of data mining systems - Data
mining task primitives - Integration of a data mining system with a
database or data warehouse system.
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Information and Communication Technology in EducationMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 2)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐈𝐂𝐓 𝐢𝐧 𝐞𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧:
Students will be able to explain the role and impact of Information and Communication Technology (ICT) in education. They will understand how ICT tools, such as computers, the internet, and educational software, enhance learning and teaching processes. By exploring various ICT applications, students will recognize how these technologies facilitate access to information, improve communication, support collaboration, and enable personalized learning experiences.
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐧𝐞𝐭:
-Students will be able to discuss what constitutes reliable sources on the internet. They will learn to identify key characteristics of trustworthy information, such as credibility, accuracy, and authority. By examining different types of online sources, students will develop skills to evaluate the reliability of websites and content, ensuring they can distinguish between reputable information and misinformation.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Slide 01.pdf
1. 1/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Information retrieval (IR) and Data mining (DM)
By: Dr. LOUNNAS Bilal
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
2. 2/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Slide 01: Introduction and contents - Course contents
Course outline
Introduction
of IR and DM
Data In-
dexation
Information
Retrieval IR
Data Min-
ing DM
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
3. 3/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Slide 01: Introduction and contents - Textbook, website, and stuff
I m WebSite : https://sites.google.com/view/lounnasbilal
I Books : Introduction to information retrieval, Data mining
Concepts and technics.
I Other stuff : TP (prefered C#), Weka, R, SQL Server BI,....
others, Projects.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
4. 4/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Definition
Definition
Information retrieval (IR) is finding material (usually documents)
of an unstructured nature (usually text) that satisfies an informa-
tion need from within large collections (usually stored on com-
puters).
I Also
Information retrieval (IR) is the activity of obtaining infor-
mation resources relevant to an information need from a
collection of information resources
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
5. 5/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
History
I The idea of using
computers to search
for relevant pieces of
information and that
was popularized in the
article “As We May
Think” by Vannevar
Bush in 1945
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
6. 6/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
History
I Before 70 ies : Manual IR in libraries: manual indexing; manual
categorization.
I Between 70 and 80 ies : Automatic IR in libraries.
I After 90 ies : IR on the web and in digital libraries.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
7. 7/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Terminology
I General: Information Retrieval, Information Need, Query,
Retrieval Model,Retrieval Engine, Search Engine, Relevance,
Relevance Feedback, Evalua-tion, Information Seeking,
Human-Computer-Interaction, Browsing, Inter-faces, Ad-hoc
Retrieval, Filtering
I Related: Document Management, Knowledge Engineering
I Expert: term frequency, document frequency, inverse document
frequency,vector-space model, probabilistic model, BM25, DFR,
page rank, stemming,precision
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
8. 8/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Terminology
A great glossary has been written by the Berkeley University titled by
The Modern Information Retrieval Glossary.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
9. 9/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Automated information retrieval
Information retrieval in computer science
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
10. 10/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Topics of IR
I Retrieval models
I Text processing
I Efficiency, compression, MapReduce, Scalability
I Distributed IR
I Multimedia: image, video, sound, speech
I Web retrieval and social media search
I Cross-lingual IR (FIRE), Structured Data (XML),
I Digital libraries, Enterprise Search, Legal IR, Patent Search,
Genomics IR
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
11. 11/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Conferences of IR
I SIGIR: Conference on Research and Development in Information
Retrieval
I ECIR: European Conference on Information Retrieval
I CIKM: Conference on Information and Knowledge Management
I WWW: International World Wide Web Conference
I WSDM: Conference on Web Search and Data Mining
I ICTIR: International Conference on Theory of Information
Retrieval
I TREC: Text REtrieval Conference
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
12. 12/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Definition
In the past decad the evolution of data repositories has reach a huge
amount of data, and that make a difficult task to extract a useful
information to be work on.
What is Data Mining
DM is the process of discovering interesting knowledge from
large amounts of data stored in databases, data warehouses,
or other information repositories, and summarizing it into useful
information.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
13. 13/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Definition
Data mining as simply an essential step in the process of knowledge
discovery.
1 Data cleaning
2 Data integration
3 Data selection
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
14. 14/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Definition
DM as a step of KDD
1 Data transformation
2 Data mining
3 Pattern evaluation
4 Knowledge presentation
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
15. 15/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Why data mining is important?
Why DM is important?
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
16. 16/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Why data mining is important?
Why DM is important?
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
17. 17/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tasks
/ Data mining can be categorized into tasks, according to different
goals of a data mining practitioner. The two "high-level" primary goals
of data mining, in practice, are prediction and description
Prediction
Description
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
18. 18/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tasks
Classification
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
19. 19/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining algorithms
Some DM algorithms
Algorithm Task
C4.0 Classification
K-Means Clustering
SVM Classification and regression
Apriori Association rules
EM Estimation
PageRank Classification
AdaBoost Classification and regression
kNN Clustering
Naïve Bayes Estimation
CART Classification
Table: Data mining most known algorithms and their classification
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
20. 20/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining algorithms
Some DM algorithms
1 C4.0 : Decision tree, very popular - TOP 10 algorithm 2008
springer LNCS.
2 K-Means : Clustering algorithm.
Clustering is the task of grouping a set of objects in such a way
that objects in the same group are more similar to each other.
3 SVM - Support vector machine : Classification.
Given a set of training examples, each marked as belonging to one
or the other of two categories, a classification algorithm builds a
model that assigns new examples to one category or the other
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
21. 21/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining algorithms
Some DM algorithms
1 Apriori : Association rule learning, used for frequent item set
mining.
Association rule is a method for discovering interesting relations
between variables in large databases.
Example: onions + potatoes = burger
2 EM - Expectation maximization : Estimation.
Example: Missing values exist among the data
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
22. 22/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
DM process
CRISP-DM Cross Industry Standard Process for Data Mining - early
90’s
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
23. 23/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
DM process
CRISP-DM This methodology should make large data mining
projects faster, cheaper, more reliable and more manageable.
The life cycle of a data mining project consists of six phases.
The sequence of the phases is not rigid. Moving back and forth
between theme is always required. It depends on the outcome
of each phase which phase or which particular task of a phase,
has to be performed next. The arrows indicate the most impor-
tant and frequent dependencies between phases.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
24. 24/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Kinds of data mining
1 Graph mining : circuits, chemical compounds, protein structures,
biological networks, social networks, workflows.
2 Spatial Data Mining : maps, preprocessed remote sensing or
medical imaging data.
3 Multimedia Data Mining : audio, video, image, graphics, speech.
4 Text Mining : unstructured data such as news articles, research
papers, books, digital libraries, e-mail messages, and Web pages.
5 Mining the World Wide Web : Web mining is a more challenging
task that searches for Web structures, ranks the importance of
Web contents, discovers the regularity and dynamics of Web
contents, and mines Web access patterns.
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
25. 25/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining application
1 Data Mining for Finance
2 Data mining for the Industry sectors
3 Data Mining for the Telecommunication Industry
4 Data Mining for Biology
5 Data mining for Intrusion Detection
6 Data mining for Education
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
26. 26/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 SAS Enterprise Miner
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
27. 27/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 Clementine, from SPSS
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
28. 28/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 Statistica Data Miner from Statsoft
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
29. 29/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 Oracle Data Mining (ODM)
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
30. 30/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 Microsoft SQL Server 2008R2 - Analysis Services
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
31. 31/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 Weka
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas
32. 32/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Data mining tools
1 RapidMiner
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas