Data Mining Tool
Neeraj Goswami
Contents
• Data mining
• Data warehouse
• Orange Software
• Orange Widgets
• Demo
What is Data Mining ?
• process of analyzing
data from different
perspectives
• summarizing it into
useful information
• information that can be
used to increase
revenue, cuts costs, or
both.
Analysis(cont…)
Data mining helps analysts recognize significant
• facts
• relationships
• trends
• patterns
• Exceptions
• anomalies
that might otherwise go unnoticed.
Industries Using Data Mining
• retail
• finance
• heath care
• manufacturing transportation
• aerospace
Major Data Mining Tasks
1)Classification: Predicting an item class
2)Clustering: descriptive, finding groups of
items
3)Deviation Detection: predictive, finding
changes
4)Forecasting: predicting a parameter value
5)Description: describing a group
6)Link analysis: finding relationships and
associations
Data Warehouse
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
what they can understand
and use in a business
context.
Data Warehouse-Layers
Decision Tree(classification algo.)
20 No Low
25 Yes High
44 Yes High
18 No Low
55 No High
35 No Low
Smoke
Age
Yes No
0-35 36 - 100
Insurance
Risk
High
High
Low
Age Smoke Risk
Decision tree advantages
• Its model is simple to understand and
interpret
• Requires little data preparation
• Possible to validate a model using
statistical tests.
• Robust
ORANGE SOFTWARE
 Open source
 Component based
 data visualization
 analysis for novice and
experts.
 Data mining through visual
programming or Python
scripting.
 Add-ons for bioinformatics
and text mining.
 Packed with features for data
analytics.
Orange Developments
• In1997-developed in Bioinformatics Laboratory
of the Faculty of Computer and
Information Science, Slovenia.
• In 2005- extents data analysis
in bioinformatics
• In 2008- installation packages were developed.
• In 2009- over 100 widgets were created and
maintained.
Widgets ?
• Orange widgets provide a graphical user’s
interface to Orange’s data mining and
machine learning methods. They include
widgets for
• data entry and preprocessing
• data visualization,
• Classification
Data Widget
Classify Widget
Examples
• Any of your schemas
should probably start
with the file widget. In
the schema below, the
widget is used to read
the data that is then
sent to both data
table widget and to
widget that
displays attributes
statistics.
Scatter Plot( a widget)
Scripting
Visualization
DEMO
References
– http://orange.biolab.si/docs/latest/
– http://en.wikipedia.org/wiki/Data_mining
– http://www.oracle.com/technetwork/database/o
ptions/advanced-analytics/odm/index.html
– http://orange.biolab.si/features/
– http://en.wikipedia.org/wiki/Orange_(software)
– http://eprints.fri.uni-lj.si/1150/1/DataMining-
Kyoto.pdf
THANK YOU
Questions
????

DATA MINING TOOL- ORANGE

  • 1.
  • 2.
    Contents • Data mining •Data warehouse • Orange Software • Orange Widgets • Demo
  • 3.
    What is DataMining ? • process of analyzing data from different perspectives • summarizing it into useful information • information that can be used to increase revenue, cuts costs, or both.
  • 4.
    Analysis(cont…) Data mining helpsanalysts recognize significant • facts • relationships • trends • patterns • Exceptions • anomalies that might otherwise go unnoticed.
  • 5.
    Industries Using DataMining • retail • finance • heath care • manufacturing transportation • aerospace
  • 6.
    Major Data MiningTasks 1)Classification: Predicting an item class 2)Clustering: descriptive, finding groups of items 3)Deviation Detection: predictive, finding changes 4)Forecasting: predicting a parameter value 5)Description: describing a group 6)Link analysis: finding relationships and associations
  • 7.
    Data Warehouse A single,complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
  • 8.
  • 9.
    Decision Tree(classification algo.) 20No Low 25 Yes High 44 Yes High 18 No Low 55 No High 35 No Low Smoke Age Yes No 0-35 36 - 100 Insurance Risk High High Low Age Smoke Risk
  • 10.
    Decision tree advantages •Its model is simple to understand and interpret • Requires little data preparation • Possible to validate a model using statistical tests. • Robust
  • 11.
    ORANGE SOFTWARE  Opensource  Component based  data visualization  analysis for novice and experts.  Data mining through visual programming or Python scripting.  Add-ons for bioinformatics and text mining.  Packed with features for data analytics.
  • 12.
    Orange Developments • In1997-developedin Bioinformatics Laboratory of the Faculty of Computer and Information Science, Slovenia. • In 2005- extents data analysis in bioinformatics • In 2008- installation packages were developed. • In 2009- over 100 widgets were created and maintained.
  • 13.
    Widgets ? • Orangewidgets provide a graphical user’s interface to Orange’s data mining and machine learning methods. They include widgets for • data entry and preprocessing • data visualization, • Classification
  • 14.
  • 15.
  • 16.
    Examples • Any ofyour schemas should probably start with the file widget. In the schema below, the widget is used to read the data that is then sent to both data table widget and to widget that displays attributes statistics.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    References – http://orange.biolab.si/docs/latest/ – http://en.wikipedia.org/wiki/Data_mining –http://www.oracle.com/technetwork/database/o ptions/advanced-analytics/odm/index.html – http://orange.biolab.si/features/ – http://en.wikipedia.org/wiki/Orange_(software) – http://eprints.fri.uni-lj.si/1150/1/DataMining- Kyoto.pdf
  • 22.
  • 23.