© 2003 Hewlett-Packard Development Company, L.P.
Upcoming SlideShare
Loading in...5

© 2003 Hewlett-Packard Development Company, L.P.






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

© 2003 Hewlett-Packard Development Company, L.P. © 2003 Hewlett-Packard Development Company, L.P. Presentation Transcript

  • Product Part Number Classification Application Using SAS Text Miner™ Randy Collica – Sr. Business Analyst Hewlett-Packard
  • Overview
    • Business Problem definition & description.
    • Review of potential solutions.
      • Manual classification.
      • Rule-based classification.
      • Data/Text Mining solution.
    • Text Mining part description fields.
    • Classification Model.
    • Scoring – Model deployment.
    • Application Template to SRM.
    • Summary – Q&A.
  • Business Problem Definition
    • Many part numbers being added and/or combined from previous product hierarchy.
    • Manual classification is very consuming.
    • Need some sort of semi-automated classifier so that updates can be made in a timely fashion.
    • Accuracy of 90% was acceptable for this business problem.
  • Business Problem Definition Part Family Line Abc-xyz Bus-Unit1 Comm. 4ys-fhe Bus-Unit2 Bus. . . . . . . Part No. Abc-xyz Jkf-fah 4ys-fhe … … …
  • Business Problem Definition
    • Part numbers are placed into various line and family hierarchies.
    • These hierarchies were developed by product divisions for their use; a High-end server could be found in several lines and families.
    • What the business needed to know was how many total high-end servers were purchased/sold?
    • Hierarchy needed to be “flattened.”
  • Review of Potential Solutions
    • Manual Classification.
      • Is doable, but is labor intensive and time consuming. Not practical due to person-time investment.
    • Rule-Based Classification.
      • This is also doable, however, requires editing code when new products appear and is almost as time consuming as the manual method.
  • Review of Potential Solutions
    • Data/Text Mining Classification.
      • This solution provides the following attractive benefits:
        • Scoring or classifying “new” product records takes little time with the model.
        • Require relatively little manual intervention.
        • Can be used for several months before refitting model depending on frequency of new products.
        • Provides approximately 90% accuracy when tested and reviewed manually.
  • Text Mining part description fields.
    • Part descriptions from various hierarchies were concatenated together to form a long text field.
    Part Description Field Modular array adapter EA1234 Line Description Field Adapter, array 128v storage Storage Products Group Storage Options Family Description Field Final Text Description Field Modular array adapter EA1234 Adapter, Array 128v storage Storage Products Group Storage Options
  • Classification Model with SAS Text Miner™
  • Classification Model with SAS Text Miner™
    • Multi-level ordinal logistic model was used as the classifier.
    • Memory-based reasoning could also have been used.
    • Results depict very good performance across all 25 levels.
  • Classification Model with SAS Text Miner™
    • Results on Test data set. 95% overall accuracy in classification.
    • Test misclassification rate was 4.7%
  • Scoring – Model Deployment
    • Scoring is accomplished by feeding new records through Text Miner and the Score node will classify new records once through the text parsing.
    • Then, analyst takes scored records and adds to SAS format the product group classification.
    • Approx. turn time is 45min.
  • Another Application to SRM
    • This application of classification of part descriptions could also be applied to Supplier Relationship Mgmt. problems.
    • Such problems like differing part descriptions for the same part id or number.
  • Another Application to SRM Part No. Abc-xyz Jkf-fah 4ys-fhe … … … Part No. Ffh,348-1 Ua6,f9f Nwx,w38 … … … Vendor A Vendor B Both vendors have differing part numbers, but some of the products are actually the same item! Business need would be to classify like part descriptions into similar product groups. Nwx,w38 = Universal thig-a-ma-gig ver. 1 Jkf-fah = Universal thig-a-ma-gig ver. 2 Where version 1 and 2 are different dates but for all practical purposes, are the same part!
  • Summary and Q&A
    • SAS Text Miner™ was used in conjunction with other data mining nodes to create a classification model that predicts part number product categories with ~95% accuracy.
    • This model can be used as a general template for SRM applications as well.
    • Many thanks goes to HP and my supervisor Sally Dyer and Carrie Fraser for allowing me to present this.
  • Ending on a Quote
    • “ All models are wrong, but some models are useful.”
    • G. E. P. Box
    • North Carolina State