SlideShare a Scribd company logo
1 of 29
Download to read offline
Michael Gilman, Ph.D.
Copyright Data Mining Technologies Inc. 2012
            www.data-mine.com
              631 –692-4400 ext. 100

                                               1
What is Data Mining?

          An information extraction activity which
          has as its goal the discovery of hidden facts
          contained in databases. It finds patterns and
          subtle relationships in data, inferring rules
          and generalizations that allow the prediction
          of future results. To be a true knowledge
          discovery method, a data mining tool should
          unearth information automatically.

pres071911a                                               2
Overview

              The purpose of this presentation is to
              introduce a new and powerful
              methodology and associated software
              that overcomes many of the limitations
              of the other data mining methods in
              use today.


pres071911a                                            3
Background

              The manual extraction of patterns from data has occurred for centuries.
              Early methods of identifying patterns in data included statistical
              methods such as Bayes' theorem (1700s) and regression analysis
              (1800s). The proliferation, ubiquity and increasing power of computer
              technology has increased data collection, storage and manipulations.
              As data sets have grown in size and complexity, direct hands-on data
              analysis has increasingly been augmented with indirect, automatic data
              processing. This has been aided by other discoveries in computer
              science, such as neural networks, clustering, genetic algorithms
              (1950s), decision trees (1960s) and support vector machines (1990s).
              Data mining is the process of applying these methods to data with the
              intention of uncovering hidden patterns. (Wikipedia)


pres071911a                                                                             4
How Does Data Mining Work?
          Data Mining Involves Building Predictive Models
          that enable better understanding of how to proceed
          in some enterprise in a better way.
          In order to build a predictive model, several steps
          are necessary. Before we outline these steps, here
          is a real world problem.




pres071911a                                                     5
Question:


How can we keep healthcare quality
    high and keep costs down ?
Input Data:
File containing clinical data and costs




                    pres071911a           7
Steps in Data Mining
   Define the problem goals

   Identify data sources




                     pres071911a   8
Then build the model




            Mine Data


                               Model
Data
                           If-Then Rules




             pres071911a               9
Results
  Model containing rules showing what is
   best of breed treatment for each case and
   why
If diagnosis = Congestive HF and Age =60-
   70 and previous. bypass = yes and . . .
   Then BOB Treatment = aortic stent



                   pres071911a                 10
Steps in a Data Mining Project

          1. Define the business or scientific problem
             Example: Which of my current customers are likely to
             become inactive in the next 6 months.
          2. Gather historical data file
             Prepare file of customers (present and past) which
             include predictive descriptors such as start date, date of
             first sale, date of last sales , sales by month, how
             acquired, etc.
             Include current status (active or inactive) for each
             customer

pres071911a                                                               11
Steps in a Data Mining Project(continued)

          3. Cleanse the Data
              Data cleansing reduces noisy and missing data and removes
          erroneous data.
          4. Add Derived Attributes
                Create additional variables from the original data if necessary
                (example: compute customer account duration from start date and
                current date)
              5. Create   Test and Holdout Files
               Randomly separate the original file into two parts called the test
               and holdout files
               Build predictive model with modeling software

pres071911a                                                                         12
Steps in a Data Mining Project(continued)

          6. Validate the Data
              Validation uses a test set of data which was not used when
              building the model. This is the holdout set defined previously.
              The learned patterns are applied to this test set and the resulting
              output is evaluated for accuracy.
              For example, a data mining algorithm trying to distinguish spam
              from legitimate emails would be trained on a training set of
              sample emails. Once trained, the learned patterns would be
              applied to the test set of emails on which it had not been trained.
              The accuracy of these patterns can then be measured from how
              many emails they correctly classify.




pres071911a                                                                         13
Steps in a Data Mining Project
(continued)




At this point a model has been created and it can
now be used.
7. Use the Model to Make Predictions
   The final step of knowledge discovery from data is
   to use the model produced by the data mining
   algorithms. As new data come in, the model is
   then applied to this data to make predictions.


                      pres071911a                   14
Comparison of Methods
     Nuggets offers benefits that the other methods don’t
     offer Here are a few:
     Handles missing data
     Handles very large amounts of predictor attributes
     Fast Model Development
     Able to model small data patterns missed by other
     methods
     Handles wide variety of data types
     Doesn’t require highly trained specialists

pres071911a                                                 15
Principal Data Mining Techniques
                Industry Standard Methods
         Statistics

         Neural Nets

         Decision Trees

     Following is a comparison of Nuggets with
     these principal competitors
pres071911a                                      16
Nuggets
Nuggets is a proprietary technology that uses proprietary
search algorithms to intelligently prospect data for valid
hypotheses.
In the act of searching, the algorithms “learn” about the
training data as they proceed.
The result is a very fast and efficient discovery strategy
that does not preclude any potential rule or generalization
from being found. This document outlines its advantages
over its competitors in providing useful and profitable
information from the vast store of data that are being
accumulated at an ever increasing rate.

                         pres071911a                          17
Statistics Methods Pros/Cons
        Statistics Pros
        Statistical analysis is sometimes a good ‘first step’ in
        understanding data. These methods deal well with numerical
        data where important mathematical facts such as the
        underlying probability distributions of the data are known.
        However, in today’s world these mathematical facts are rarely
        known. These methods are not as good with nominal data
        values such as “good”, “better”, “best” in the case of a
        preference attribute or “Europe”, “North America”, “Asia” or
        “South America” in the case of a location attribute.


pres071911a                                                             18
Method Pros/Cons
        Statistics (continued)
        Some of the statistical methods commonly used are
        regression analysis, correlation, Chaid analysis,
        hypothesis testing, and discriminant analysis.
        Statistical analysis is sometimes a good “first step”
        in understanding data. These methods deal well with
        numerical data where the underlying probability
        distributions of the data are known. This is not often
        the case in real world problems.

pres071911a                                                      19
Statistics Methods Pros/Cons (cont.)
        Nuggets Advantages Over Statistics
          Statistical methods require statistical expertise, or a project
          person well versed in statistics who is heavily involved.
          Such methods require difficult to verify statistical
          assumptions. They suffer from the “black box aversion
          syndrome”. This means that that non-technical decision
          makers, those who will either accept or reject the results of
          the study, are often unwilling to make important decisions
          based on a technology that gives them answers but does
          not explain how it got the answers.


pres071911a                                                             20
Statistics Method Pros/Cons
        Nuggets Advantages Over Statistics
              To tell a non-statistician CEO that she or he must make a crucial
              business decision because of a favorable R statistic or some other
              arcane statistical reason is not usually well received. With Nuggets®
              you can be told exactly how the conclusion was arrived at.

              Another problem is that statistical methods are valid only if certain
              assumptions about the data are met. Some of these assumptions are:
              linear relationships between pairs of variables, non-multicollinearity,
              normal probability distributions and independence of samples. If you
              do not validate these assumptions because of time limitations or are
              not familiar with them, your analysis may be faulty and therefore your
              results may not be valid. Even if you know about them you may not
              have the time or information to verify the assumptions.
pres071911a                                                                             21
Method Pros/Cons - Neural Nets
    Neural Networks
      This is a popular technology, particularly in the financial
      community. This method was originally developed in the
      1940’s to model biological nervous systems in an attempt
      to mimic human thought processes.




pres071911a                                                         22
Method Pros/Cons - Neural Nets
        Pros
          The end result of a Neural Net project is a
          mathematical model of the process. It deals
          primarily with numerical attributes such as age,
          income, height, etc., but not as well with nominal
          data such as state, brand preference, vehicle make,
          etc.




pres071911a                                                 23
Method Pros/Cons - Neural Nets
       Nuggets Advantages
         There is still much controversy regarding the
         efficacy of Neural Nets. One major objection to
         the method is that the development of a Neural
         Net model is partly an art and partly a science in
         that the results often depend on the individual
         who built the model. That is, the model form
         (called the network topology) and hence the
         results, may differ from one researcher to another
         for the same data.
pres071911a                                                   24
Method Pros/Cons - Neural Nets
      There is also the problem with Neural Nets of “overfitting”
      that results in good prediction of the data used to build the
      model but bad results with new data. Neural Nets often use a
      sigmoid function in its computations. This is a mathematical
      function resembling the shape of the letter “S”. Questions
      exist whether there is any theoretical justification for this
      somewhat arbitrary choice and makes this approach
      somewhat ad hoc.
      Another issue is that the modeling results produced by a
      Neural Net method are not intuitive. The method is called a
      “black box” to indicate the lack of intuitive understanding of
      its results. Neural Nets are still in use but becoming less
      popular due to these issues.
pres071911a                                                            25
Method Pros/Cons Decision Trees

        Decision Trees (Cart, Chaid, etc.)
              Decision tree methods are techniques for
              partitioning a training file into a tree
              representation. The starting node is called the root
              node. Depending upon the results of a test this
              node is then partitioned into two or more sub-sets.
              Each node is then further partitioned until a tree is
              built. This tree can be mapped into a set of rules.
              These rules in the form of a data tree are used to
              generate forecasts.
pres071911a                                                           26
Method Pros/Cons Decision Trees

        Nuggets Advantages
        By far the most important negative for decision trees is that
        they are forced to make decisions along the way based on
        limited information that implicitly leaves out of consideration
        the vast majority of potential patterns in the training file. This
        approach may leave valuable patterns undiscovered since
        decisions made early in the process will preclude some good
        rules from being discovered later. This is called “greedy
        optimization” and lessens the accuracy of the resulting model.
        Furthermore large numbers of predictor attributes as exist in
        most of today’s data sets are not handled with decision trees.
        
pres071911a                                                                  27
Method Pros/Cons Decision Trees

        Nuggets Advantages
        Nuggets does not make these greedy
        decisions. Instead it “implicitly”
        searches all possible patterns and
        thus is able to find patterns that are
        useful but that wouldn’t be found
        with decision trees.
pres071911a                                      28
Summary of Comparison With Other Methods

        Nuggets Advantages
              Nuggets offers many advantages over other methods in
              common use. A few were presented here.
              Nuggets advantages vary from method to method and
              most are due to the limiting assumptions required by these
              older methods which limit their effectiveness.
              Nuggets is designed to circumvent these disadvantages
              and offer a superior methodology that can work with the
              challenges of the large number of complex data bases that
              exist in today’s world.

pres071911a                                                                29

More Related Content

What's hot

Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive ModelDKALab
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobilityEktimo
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches Venkat Projects
 
Mental Model for Exploratory Data Analysis Applications for Structured Proble...
Mental Model for Exploratory Data Analysis Applications for Structured Proble...Mental Model for Exploratory Data Analysis Applications for Structured Proble...
Mental Model for Exploratory Data Analysis Applications for Structured Proble...Jukka-Matti Turtiainen
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 

What's hot (20)

Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Image Analytics In Healthcare
Image Analytics In HealthcareImage Analytics In Healthcare
Image Analytics In Healthcare
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobility
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Machine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case StudyMachine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case Study
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches
 
Mental Model for Exploratory Data Analysis Applications for Structured Proble...
Mental Model for Exploratory Data Analysis Applications for Structured Proble...Mental Model for Exploratory Data Analysis Applications for Structured Proble...
Mental Model for Exploratory Data Analysis Applications for Structured Proble...
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
7 steps to Predictive Analytics
7 steps to Predictive Analytics 7 steps to Predictive Analytics
7 steps to Predictive Analytics
 

Viewers also liked

01/28/13 US Supreme Court Response (italian)
01/28/13 US Supreme Court Response (italian)01/28/13 US Supreme Court Response (italian)
01/28/13 US Supreme Court Response (italian)VogelDenise
 
082512 us supreme court response (swahili)
082512   us supreme court response (swahili)082512   us supreme court response (swahili)
082512 us supreme court response (swahili)VogelDenise
 
102912 vogel denise slideshare documents (yiddish)
102912   vogel denise slideshare documents (yiddish)102912   vogel denise slideshare documents (yiddish)
102912 vogel denise slideshare documents (yiddish)VogelDenise
 
092712 julian assange (president obama's audacity) - catalan
092712   julian assange (president obama's audacity) - catalan092712   julian assange (president obama's audacity) - catalan
092712 julian assange (president obama's audacity) - catalanVogelDenise
 
ABU HAMZA al-MASRI ARTICLES
ABU HAMZA al-MASRI ARTICLESABU HAMZA al-MASRI ARTICLES
ABU HAMZA al-MASRI ARTICLESVogelDenise
 
031808 obama speech (welsh)
031808   obama speech (welsh)031808   obama speech (welsh)
031808 obama speech (welsh)VogelDenise
 
092712 julian assange (president obama's audacity) - polish
092712   julian assange (president obama's audacity) - polish092712   julian assange (president obama's audacity) - polish
092712 julian assange (president obama's audacity) - polishVogelDenise
 
Evaluation presentation
Evaluation presentationEvaluation presentation
Evaluation presentationgracieglancy
 
092712 julian assange (president obama's audacity) - korean
092712   julian assange (president obama's audacity) - korean092712   julian assange (president obama's audacity) - korean
092712 julian assange (president obama's audacity) - koreanVogelDenise
 
Interpol bringing the united states to justice (vietnamese)
Interpol   bringing the united states to justice (vietnamese)Interpol   bringing the united states to justice (vietnamese)
Interpol bringing the united states to justice (vietnamese)VogelDenise
 
021013 adecco email (bengali)
021013   adecco email (bengali)021013   adecco email (bengali)
021013 adecco email (bengali)VogelDenise
 
021013 adecco email (czech)
021013   adecco email (czech)021013   adecco email (czech)
021013 adecco email (czech)VogelDenise
 
BARACK OBAMA – Benghazi COVER UP (bulgarian)
BARACK OBAMA – Benghazi COVER UP (bulgarian)BARACK OBAMA – Benghazi COVER UP (bulgarian)
BARACK OBAMA – Benghazi COVER UP (bulgarian)VogelDenise
 
BARACK OBAMA – Benghazi COVER UP (basque)
BARACK OBAMA – Benghazi COVER UP (basque)BARACK OBAMA – Benghazi COVER UP (basque)
BARACK OBAMA – Benghazi COVER UP (basque)VogelDenise
 
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONSVogelDenise
 
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)VogelDenise
 
United States of America – IMMIGRATION REFORM - AZERBAIJANI
United States of America – IMMIGRATION REFORM - AZERBAIJANIUnited States of America – IMMIGRATION REFORM - AZERBAIJANI
United States of America – IMMIGRATION REFORM - AZERBAIJANIVogelDenise
 
United States of America – IMMIGRATION REFORM - IRISH
United States of America – IMMIGRATION REFORM - IRISHUnited States of America – IMMIGRATION REFORM - IRISH
United States of America – IMMIGRATION REFORM - IRISHVogelDenise
 
060812 EEOC Response (ICELANDIC)
060812   EEOC Response (ICELANDIC)060812   EEOC Response (ICELANDIC)
060812 EEOC Response (ICELANDIC)VogelDenise
 

Viewers also liked (20)

01/28/13 US Supreme Court Response (italian)
01/28/13 US Supreme Court Response (italian)01/28/13 US Supreme Court Response (italian)
01/28/13 US Supreme Court Response (italian)
 
082512 us supreme court response (swahili)
082512   us supreme court response (swahili)082512   us supreme court response (swahili)
082512 us supreme court response (swahili)
 
102912 vogel denise slideshare documents (yiddish)
102912   vogel denise slideshare documents (yiddish)102912   vogel denise slideshare documents (yiddish)
102912 vogel denise slideshare documents (yiddish)
 
092712 julian assange (president obama's audacity) - catalan
092712   julian assange (president obama's audacity) - catalan092712   julian assange (president obama's audacity) - catalan
092712 julian assange (president obama's audacity) - catalan
 
ABU HAMZA al-MASRI ARTICLES
ABU HAMZA al-MASRI ARTICLESABU HAMZA al-MASRI ARTICLES
ABU HAMZA al-MASRI ARTICLES
 
031808 obama speech (welsh)
031808   obama speech (welsh)031808   obama speech (welsh)
031808 obama speech (welsh)
 
092712 julian assange (president obama's audacity) - polish
092712   julian assange (president obama's audacity) - polish092712   julian assange (president obama's audacity) - polish
092712 julian assange (president obama's audacity) - polish
 
Evaluation presentation
Evaluation presentationEvaluation presentation
Evaluation presentation
 
092712 julian assange (president obama's audacity) - korean
092712   julian assange (president obama's audacity) - korean092712   julian assange (president obama's audacity) - korean
092712 julian assange (president obama's audacity) - korean
 
Interpol bringing the united states to justice (vietnamese)
Interpol   bringing the united states to justice (vietnamese)Interpol   bringing the united states to justice (vietnamese)
Interpol bringing the united states to justice (vietnamese)
 
021013 adecco email (bengali)
021013   adecco email (bengali)021013   adecco email (bengali)
021013 adecco email (bengali)
 
Test
TestTest
Test
 
021013 adecco email (czech)
021013   adecco email (czech)021013   adecco email (czech)
021013 adecco email (czech)
 
BARACK OBAMA – Benghazi COVER UP (bulgarian)
BARACK OBAMA – Benghazi COVER UP (bulgarian)BARACK OBAMA – Benghazi COVER UP (bulgarian)
BARACK OBAMA – Benghazi COVER UP (bulgarian)
 
BARACK OBAMA – Benghazi COVER UP (basque)
BARACK OBAMA – Benghazi COVER UP (basque)BARACK OBAMA – Benghazi COVER UP (basque)
BARACK OBAMA – Benghazi COVER UP (basque)
 
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS
032115 - NOTICE OF APPEAL-PETITION TO EEOC OFFICE OF FEDERAL OPERATIONS
 
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)
BARACK OBAMA - READ MY LIPS - ObamaFraudGate (armenian)
 
United States of America – IMMIGRATION REFORM - AZERBAIJANI
United States of America – IMMIGRATION REFORM - AZERBAIJANIUnited States of America – IMMIGRATION REFORM - AZERBAIJANI
United States of America – IMMIGRATION REFORM - AZERBAIJANI
 
United States of America – IMMIGRATION REFORM - IRISH
United States of America – IMMIGRATION REFORM - IRISHUnited States of America – IMMIGRATION REFORM - IRISH
United States of America – IMMIGRATION REFORM - IRISH
 
060812 EEOC Response (ICELANDIC)
060812   EEOC Response (ICELANDIC)060812   EEOC Response (ICELANDIC)
060812 EEOC Response (ICELANDIC)
 

Similar to Key benefits of Nuggets data mining software

DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .rajasrichalamala3zen
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .rajasrichalamala3zen
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabadakhilamadupativibhin
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabadmadhupriya3zen
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabadmadhupriya3zen
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabadrajasrichalamala3zen
 
Data Mining : Healthcare Application
Data Mining : Healthcare ApplicationData Mining : Healthcare Application
Data Mining : Healthcare Applicationosman ansari
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Exploratory data analysis for business MODULE 1.pptx
Exploratory data analysis for business MODULE 1.pptxExploratory data analysis for business MODULE 1.pptx
Exploratory data analysis for business MODULE 1.pptxYashwanthKumar306128
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsDhruv Saxena
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncodemy
 
Datasets for Machine Learning.docx
Datasets for Machine Learning.docxDatasets for Machine Learning.docx
Datasets for Machine Learning.docxShalini104884
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET Journal
 

Similar to Key benefits of Nuggets data mining software (20)

DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabad
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabad
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabad
 
data science.pptx
data science.pptxdata science.pptx
data science.pptx
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabad
 
Data Mining : Healthcare Application
Data Mining : Healthcare ApplicationData Mining : Healthcare Application
Data Mining : Healthcare Application
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Exploratory data analysis for business MODULE 1.pptx
Exploratory data analysis for business MODULE 1.pptxExploratory data analysis for business MODULE 1.pptx
Exploratory data analysis for business MODULE 1.pptx
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Uncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdfUncover Trends and Patterns with Data Science.pdf
Uncover Trends and Patterns with Data Science.pdf
 
Datasets for Machine Learning.docx
Datasets for Machine Learning.docxDatasets for Machine Learning.docx
Datasets for Machine Learning.docx
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Key benefits of Nuggets data mining software

  • 1. Michael Gilman, Ph.D. Copyright Data Mining Technologies Inc. 2012 www.data-mine.com 631 –692-4400 ext. 100 1
  • 2. What is Data Mining? An information extraction activity which has as its goal the discovery of hidden facts contained in databases. It finds patterns and subtle relationships in data, inferring rules and generalizations that allow the prediction of future results. To be a true knowledge discovery method, a data mining tool should unearth information automatically. pres071911a 2
  • 3. Overview The purpose of this presentation is to introduce a new and powerful methodology and associated software that overcomes many of the limitations of the other data mining methods in use today. pres071911a 3
  • 4. Background The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data included statistical methods such as Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has increased data collection, storage and manipulations. As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1990s). Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns. (Wikipedia) pres071911a 4
  • 5. How Does Data Mining Work? Data Mining Involves Building Predictive Models that enable better understanding of how to proceed in some enterprise in a better way. In order to build a predictive model, several steps are necessary. Before we outline these steps, here is a real world problem. pres071911a 5
  • 6. Question: How can we keep healthcare quality high and keep costs down ?
  • 7. Input Data: File containing clinical data and costs pres071911a 7
  • 8. Steps in Data Mining  Define the problem goals  Identify data sources pres071911a 8
  • 9. Then build the model Mine Data Model Data If-Then Rules pres071911a 9
  • 10. Results  Model containing rules showing what is best of breed treatment for each case and why If diagnosis = Congestive HF and Age =60- 70 and previous. bypass = yes and . . . Then BOB Treatment = aortic stent pres071911a 10
  • 11. Steps in a Data Mining Project 1. Define the business or scientific problem Example: Which of my current customers are likely to become inactive in the next 6 months. 2. Gather historical data file Prepare file of customers (present and past) which include predictive descriptors such as start date, date of first sale, date of last sales , sales by month, how acquired, etc. Include current status (active or inactive) for each customer pres071911a 11
  • 12. Steps in a Data Mining Project(continued) 3. Cleanse the Data Data cleansing reduces noisy and missing data and removes erroneous data. 4. Add Derived Attributes Create additional variables from the original data if necessary (example: compute customer account duration from start date and current date) 5. Create Test and Holdout Files Randomly separate the original file into two parts called the test and holdout files Build predictive model with modeling software pres071911a 12
  • 13. Steps in a Data Mining Project(continued) 6. Validate the Data Validation uses a test set of data which was not used when building the model. This is the holdout set defined previously. The learned patterns are applied to this test set and the resulting output is evaluated for accuracy. For example, a data mining algorithm trying to distinguish spam from legitimate emails would be trained on a training set of sample emails. Once trained, the learned patterns would be applied to the test set of emails on which it had not been trained. The accuracy of these patterns can then be measured from how many emails they correctly classify. pres071911a 13
  • 14. Steps in a Data Mining Project (continued) At this point a model has been created and it can now be used. 7. Use the Model to Make Predictions The final step of knowledge discovery from data is to use the model produced by the data mining algorithms. As new data come in, the model is then applied to this data to make predictions. pres071911a 14
  • 15. Comparison of Methods Nuggets offers benefits that the other methods don’t offer Here are a few: Handles missing data Handles very large amounts of predictor attributes Fast Model Development Able to model small data patterns missed by other methods Handles wide variety of data types Doesn’t require highly trained specialists pres071911a 15
  • 16. Principal Data Mining Techniques Industry Standard Methods  Statistics  Neural Nets  Decision Trees Following is a comparison of Nuggets with these principal competitors pres071911a 16
  • 17. Nuggets Nuggets is a proprietary technology that uses proprietary search algorithms to intelligently prospect data for valid hypotheses. In the act of searching, the algorithms “learn” about the training data as they proceed. The result is a very fast and efficient discovery strategy that does not preclude any potential rule or generalization from being found. This document outlines its advantages over its competitors in providing useful and profitable information from the vast store of data that are being accumulated at an ever increasing rate. pres071911a 17
  • 18. Statistics Methods Pros/Cons Statistics Pros Statistical analysis is sometimes a good ‘first step’ in understanding data. These methods deal well with numerical data where important mathematical facts such as the underlying probability distributions of the data are known. However, in today’s world these mathematical facts are rarely known. These methods are not as good with nominal data values such as “good”, “better”, “best” in the case of a preference attribute or “Europe”, “North America”, “Asia” or “South America” in the case of a location attribute. pres071911a 18
  • 19. Method Pros/Cons Statistics (continued) Some of the statistical methods commonly used are regression analysis, correlation, Chaid analysis, hypothesis testing, and discriminant analysis. Statistical analysis is sometimes a good “first step” in understanding data. These methods deal well with numerical data where the underlying probability distributions of the data are known. This is not often the case in real world problems. pres071911a 19
  • 20. Statistics Methods Pros/Cons (cont.) Nuggets Advantages Over Statistics Statistical methods require statistical expertise, or a project person well versed in statistics who is heavily involved. Such methods require difficult to verify statistical assumptions. They suffer from the “black box aversion syndrome”. This means that that non-technical decision makers, those who will either accept or reject the results of the study, are often unwilling to make important decisions based on a technology that gives them answers but does not explain how it got the answers. pres071911a 20
  • 21. Statistics Method Pros/Cons Nuggets Advantages Over Statistics To tell a non-statistician CEO that she or he must make a crucial business decision because of a favorable R statistic or some other arcane statistical reason is not usually well received. With Nuggets® you can be told exactly how the conclusion was arrived at. Another problem is that statistical methods are valid only if certain assumptions about the data are met. Some of these assumptions are: linear relationships between pairs of variables, non-multicollinearity, normal probability distributions and independence of samples. If you do not validate these assumptions because of time limitations or are not familiar with them, your analysis may be faulty and therefore your results may not be valid. Even if you know about them you may not have the time or information to verify the assumptions. pres071911a 21
  • 22. Method Pros/Cons - Neural Nets Neural Networks This is a popular technology, particularly in the financial community. This method was originally developed in the 1940’s to model biological nervous systems in an attempt to mimic human thought processes. pres071911a 22
  • 23. Method Pros/Cons - Neural Nets Pros The end result of a Neural Net project is a mathematical model of the process. It deals primarily with numerical attributes such as age, income, height, etc., but not as well with nominal data such as state, brand preference, vehicle make, etc. pres071911a 23
  • 24. Method Pros/Cons - Neural Nets Nuggets Advantages There is still much controversy regarding the efficacy of Neural Nets. One major objection to the method is that the development of a Neural Net model is partly an art and partly a science in that the results often depend on the individual who built the model. That is, the model form (called the network topology) and hence the results, may differ from one researcher to another for the same data. pres071911a 24
  • 25. Method Pros/Cons - Neural Nets There is also the problem with Neural Nets of “overfitting” that results in good prediction of the data used to build the model but bad results with new data. Neural Nets often use a sigmoid function in its computations. This is a mathematical function resembling the shape of the letter “S”. Questions exist whether there is any theoretical justification for this somewhat arbitrary choice and makes this approach somewhat ad hoc. Another issue is that the modeling results produced by a Neural Net method are not intuitive. The method is called a “black box” to indicate the lack of intuitive understanding of its results. Neural Nets are still in use but becoming less popular due to these issues. pres071911a 25
  • 26. Method Pros/Cons Decision Trees Decision Trees (Cart, Chaid, etc.) Decision tree methods are techniques for partitioning a training file into a tree representation. The starting node is called the root node. Depending upon the results of a test this node is then partitioned into two or more sub-sets. Each node is then further partitioned until a tree is built. This tree can be mapped into a set of rules. These rules in the form of a data tree are used to generate forecasts. pres071911a 26
  • 27. Method Pros/Cons Decision Trees Nuggets Advantages By far the most important negative for decision trees is that they are forced to make decisions along the way based on limited information that implicitly leaves out of consideration the vast majority of potential patterns in the training file. This approach may leave valuable patterns undiscovered since decisions made early in the process will preclude some good rules from being discovered later. This is called “greedy optimization” and lessens the accuracy of the resulting model. Furthermore large numbers of predictor attributes as exist in most of today’s data sets are not handled with decision trees.  pres071911a 27
  • 28. Method Pros/Cons Decision Trees Nuggets Advantages Nuggets does not make these greedy decisions. Instead it “implicitly” searches all possible patterns and thus is able to find patterns that are useful but that wouldn’t be found with decision trees. pres071911a 28
  • 29. Summary of Comparison With Other Methods Nuggets Advantages Nuggets offers many advantages over other methods in common use. A few were presented here. Nuggets advantages vary from method to method and most are due to the limiting assumptions required by these older methods which limit their effectiveness. Nuggets is designed to circumvent these disadvantages and offer a superior methodology that can work with the challenges of the large number of complex data bases that exist in today’s world. pres071911a 29