Slide 01.pdf

1/32
About the course Primitives of IR Topics and Conferences of IR Primitives of DM Advanced DM
Information retrieval (IR) and Data mining (DM)
By: Dr. LOUNNAS Bilal
Information retrieval (IR) and Data mining (DM) Dr B. Lounnas

2/32
Slide 01: Introduction and contents - Course contents
Course outline
Introduction
of IR and DM
Data In-
dexation
Information
Retrieval IR
Data Min-
ing DM

3/32
Slide 01: Introduction and contents - Textbook, website, and stuff
I m WebSite : https://sites.google.com/view/lounnasbilal
I Books : Introduction to information retrieval, Data mining
Concepts and technics.
I Other stuff : TP (prefered C#), Weka, R, SQL Server BI,....
others, Projects.

4/32
Definition
Definition
Information retrieval (IR) is finding material (usually documents)
of an unstructured nature (usually text) that satisfies an informa-
tion need from within large collections (usually stored on com-
puters).
I Also
Information retrieval (IR) is the activity of obtaining infor-
mation resources relevant to an information need from a
collection of information resources

5/32
History
I The idea of using
computers to search
for relevant pieces of
information and that
was popularized in the
article “As We May
Think” by Vannevar
Bush in 1945

6/32
History
I Before 70 ies : Manual IR in libraries: manual indexing; manual
categorization.
I Between 70 and 80 ies : Automatic IR in libraries.
I After 90 ies : IR on the web and in digital libraries.

7/32
Terminology
I General: Information Retrieval, Information Need, Query,
Retrieval Model,Retrieval Engine, Search Engine, Relevance,
Relevance Feedback, Evalua-tion, Information Seeking,
Human-Computer-Interaction, Browsing, Inter-faces, Ad-hoc
Retrieval, Filtering
I Related: Document Management, Knowledge Engineering
I Expert: term frequency, document frequency, inverse document
frequency,vector-space model, probabilistic model, BM25, DFR,
page rank, stemming,precision

8/32
Terminology
A great glossary has been written by the Berkeley University titled by
The Modern Information Retrieval Glossary.

9/32
Automated information retrieval
Information retrieval in computer science

10/32
Topics of IR
I Retrieval models
I Text processing
I Efficiency, compression, MapReduce, Scalability
I Distributed IR
I Multimedia: image, video, sound, speech
I Web retrieval and social media search
I Cross-lingual IR (FIRE), Structured Data (XML),
I Digital libraries, Enterprise Search, Legal IR, Patent Search,
Genomics IR

11/32
Conferences of IR
I SIGIR: Conference on Research and Development in Information
Retrieval
I ECIR: European Conference on Information Retrieval
I CIKM: Conference on Information and Knowledge Management
I WWW: International World Wide Web Conference
I WSDM: Conference on Web Search and Data Mining
I ICTIR: International Conference on Theory of Information
Retrieval
I TREC: Text REtrieval Conference

12/32
Definition
In the past decad the evolution of data repositories has reach a huge
amount of data, and that make a difficult task to extract a useful
information to be work on.
What is Data Mining
DM is the process of discovering interesting knowledge from
large amounts of data stored in databases, data warehouses,
or other information repositories, and summarizing it into useful
information.

13/32
Definition
Data mining as simply an essential step in the process of knowledge
discovery.
1 Data cleaning
2 Data integration
3 Data selection

14/32
Definition
DM as a step of KDD
1 Data transformation
2 Data mining
3 Pattern evaluation
4 Knowledge presentation

15/32
Why data mining is important?
Why DM is important?

16/32
Why data mining is important?
Why DM is important?

17/32
Data mining tasks
/ Data mining can be categorized into tasks, according to different
goals of a data mining practitioner. The two "high-level" primary goals
of data mining, in practice, are prediction and description
Prediction
Description

18/32
Data mining tasks
Classification

19/32
Data mining algorithms
Some DM algorithms
Algorithm Task
C4.0 Classification
K-Means Clustering
SVM Classification and regression
Apriori Association rules
EM Estimation
PageRank Classification
AdaBoost Classification and regression
kNN Clustering
Naïve Bayes Estimation
CART Classification
Table: Data mining most known algorithms and their classification

20/32
Some DM algorithms
1 C4.0 : Decision tree, very popular - TOP 10 algorithm 2008
springer LNCS.
2 K-Means : Clustering algorithm.
Clustering is the task of grouping a set of objects in such a way
that objects in the same group are more similar to each other.
3 SVM - Support vector machine : Classification.
Given a set of training examples, each marked as belonging to one
or the other of two categories, a classification algorithm builds a
model that assigns new examples to one category or the other

21/32
Some DM algorithms
1 Apriori : Association rule learning, used for frequent item set
mining.
Association rule is a method for discovering interesting relations
between variables in large databases.
Example: onions + potatoes = burger
2 EM - Expectation maximization : Estimation.
Example: Missing values exist among the data

22/32
DM process
CRISP-DM Cross Industry Standard Process for Data Mining - early
90’s

23/32
DM process
CRISP-DM This methodology should make large data mining
projects faster, cheaper, more reliable and more manageable.
The life cycle of a data mining project consists of six phases.
The sequence of the phases is not rigid. Moving back and forth
between theme is always required. It depends on the outcome
of each phase which phase or which particular task of a phase,
has to be performed next. The arrows indicate the most impor-
tant and frequent dependencies between phases.

24/32
Kinds of data mining
1 Graph mining : circuits, chemical compounds, protein structures,
biological networks, social networks, workflows.
2 Spatial Data Mining : maps, preprocessed remote sensing or
medical imaging data.
3 Multimedia Data Mining : audio, video, image, graphics, speech.
4 Text Mining : unstructured data such as news articles, research
papers, books, digital libraries, e-mail messages, and Web pages.
5 Mining the World Wide Web : Web mining is a more challenging
task that searches for Web structures, ranks the importance of
Web contents, discovers the regularity and dynamics of Web
contents, and mines Web access patterns.

25/32
Data mining application
1 Data Mining for Finance
2 Data mining for the Industry sectors
3 Data Mining for the Telecommunication Industry
4 Data Mining for Biology
5 Data mining for Intrusion Detection
6 Data mining for Education

26/32
Data mining tools
1 SAS Enterprise Miner

27/32
Data mining tools
1 Clementine, from SPSS

28/32
Data mining tools
1 Statistica Data Miner from Statsoft

29/32
Data mining tools
1 Oracle Data Mining (ODM)

30/32
Data mining tools
1 Microsoft SQL Server 2008R2 - Analysis Services

31/32
Data mining tools
1 Weka

32/32
Data mining tools
1 RapidMiner

Slide 01.pdf

Recommended

Recommended

More Related Content

Similar to Slide 01.pdf

Similar to Slide 01.pdf (20)

Recently uploaded

Recently uploaded (20)

Slide 01.pdf