This document discusses using the Xpress-Mosel modeling environment for solving data mining problems. It provides an overview of key Mosel features like integration of modeling and solving. It then discusses how Mosel can be used to model various common data mining problems like classification, regression, and clustering as optimization problems. These include formulations as linear programs, mixed integer programs, and stochastic programs.
5. 5
Mosel: key features
• Integration of modeling and solving
• Programming facilities for pre/post processing, algorithms
– No separation between modeling statement and procedure
to solve the problem
• Open, modular architecture
• Highly flexible and extensible
6. 6
Embedding a Mosel model
Problem
Solving
Program
Starts
Program
Terminates
Model
Execution
Result
Retrieval
Data
Input
Output
Results
7. 7
Mosel: A modeling language
• Decision variables, linear constraints
• Arrays, (index-) sets
• Operators: standard arithmetic, aggregate and set
operators e.g. sum, prod, max, and, or, union, inter
• Loops, Selections e.g. forall-[do], if-then-[elif-then]
• Subroutines: functions and procedures
8. 8
Architecture
LP
Xpress-Mosel
e.g. decision
variables and
constracts, etc.
Enterprise
dataEnterprise
data
• Customer history
• Available products
• Profitability models
• Content
• …Pre- and
Post-
processing
Algorithms
e.g.Optimal solutions
MIP
Constraint
Programming
Stochastic
Programming
9. 9
Mosel: components and interfaces
• Mosel Language: to implement problems and solution algorithms
→ Model or Mosel program
• Mosel Model Compiler and Run-time Libraries: to compile,
execute and access models from a programming language
→ C, C++ , Java or VB program
• Mosel Native Interface (NI): to provide new or extend existing
functionality of Mosel Language
→ module
11. 11
Xpress - IVE
• Development environment
• Enables rapid prototyping and testing
• Entity tree for data, variables and constraints
• Matrix visualization
• Branch and Bound tree visualization
• LP, MIP and user defined charts
15. 15
Data Mining Application Areas
Extracting useful information from large datasets
of various nature and origin arising in
• Finance
• Manufacturing
• Biomedicine
• Telecommunications
• Military Systems
• Other areas
17. 17
Approaches
• LP, MIP
• QP, MIQP
• Network Optimization
• Statistical Preprocessing
• Combinations of these Approaches
18. 18
Classification Problems:
general setup
• “Training dataset”: N elements (xi, yi),
i = 1,…,N.
xi is an n-dimensional vector of
element’s attributes (features)
yi denotes the class attribute
(the number of classes is specified)
19. 19
Classification Problems:
general setup
• A new element with known attributes x,
but unknown class attribute y
• The problem is to determine, which class
this element belongs to
• The classification model is “trained” on
the training dataset and applied to new
elements
20. 20
Classification Problems:
general setup
• Main Idea: Constructing separating
surfaces in the n-dimensional space that
would divide it into several regions
• Each region corresponds to a certain
class
• The new element is classified according
to its geometrical location in the vector
space
22. 22
Classification Problems:
LP approach
• Consider binary classification, one
separating plane
• The plane is represented by the standard
equation
• The problem is to find the optimal values
of the parameters w and γ
23. 23
Classification Problems:
LP approach
• Suppose that vectors xi from the training
dataset are stored in two matrices
A(m×n) and B(k×n) corresponding to m
elements of the 1st class and k elements
of the 2nd class.
• The plane will perfectly separate
elements in A and elements in B if
24. 24
Classification Problems:
LP approach
• Extra variables y and z are introduced to
model classification errors:
• The parameters w and γ are determined from
the LP problem of minimizing the total
misclassification error
26. 26
Classification Problems:
generalized approaches
• Using multiple, non-linear separating
surfaces (e.g., polynomial, exponential,
logarithmic)
– Finding parameters of these surfaces can also
be reduced to LP
• Selecting a minimum number of attributes
(features) that are taken into account in
classification – feature selection
28. 28
Regression Problems:
General Setup
• N elements (xi, yi), i = 1,…,N, xi is a vector in
Rn, yi is a scalar in R
• Find a linear relationship between xi and yi,
i.e., find a vector β in Rn, such that
• We need to minimize
or
30. 30
Clustering Problems
• Given a dataset, we need to assign the
elements to K clusters, according to an
appropriate similarity criteria. The number of
clusters K is usually not known a priori.
• Standard algorithms for fixed number of
clusters:
– K-median
– K-mean
31. 31
Integer Programming approach to
classification and regression using
clustering techniques
• CRIO software package (Bertsimas & Shioda, 2002)
• Similar approaches for both classification and
regression
• Outline
– Preprocess data by assigning points to small clusters to
reduce the dimensionality
– Solve a mixed integer problem that assigns clusters to groups
and removes outliers. In the case of regression the model also
selects the regression coefficients for each group.
– Solve continuous optimization problems (quadratic
optimization problems for classification and linear optimization
problems for regression) that assign groups to polyhedral
regions.
32. 32
Extending MOSEL-Native Int.
• Modular environment and open
architecture
• Module = dynamic libraries
• Not dedicated to any particular use:
– Solvers: Xpress-Optimizer, CHIP, OptQuest
– Database access: ODBC
– System commands
34. 34
Stochastic Programming (SP)
• Stochastic Programming: Decision
making under uncertainty
– Model future uncertainty into mathematical
programming as scenarios
– Make optimal decisions to hedge against
future
35. 35
Available features
New Types
• Svalue: Stochastic values that take different
values with certain probability e.g demand
• Smpvar: Stochastic decision variables that
take different values under different scenarios
• Slinctr: Stochastic constraints built on linear
expressions containing real,Svalue and
Smpvar
38. 38
Advantages
• Elimination of scenario indexed entities e.g
T=3
x: array(1..T) of Smpvar
Dem:array(1..T-1) of Svalue
c:Slinctr
c:=sum(t in 1..T) x(t)<=Inventory
instead of
Scenarios=1..6
x: array(1..T,Scenarios) of mpvar
Dem:array(1..T-1 ,Scenarios) of real
c: arrray(Scenarios) of linctr
forall(s in Scenarios )
c(s):=sum(t in 1..T) x(t,s)<= Inventory
40. 40
Statistical Preprocessing of the
Data
• In many cases, it is helpful to use
statistical preprocessing of the data
before applying mathematical
programming techniques