Semantic Spend Classification

Arivoli Tirouvingadame
Principal Member of Technical Staff, Oracle America, Inc.

Acknowledgements

Sincere thanks to

Keshava Rangarajan,
Chief Architect, Halliburton Corporation

for all the contribution and guidance, without which this research would not
have been possible.

What is Spend Classification ?
•Definition: Process of determining a purchase code for each spend record
(Requisitions, Purchase Orders, Receipts, Invoices, etc.) from a hierarchical
structure (Taxonomy).

Requisitions, POs, Receipts, Invoices, etc.

Why to classify spend ?
•Once all spend transactions are classified with a standard code from a
taxonomy – simple queries can be answered like

•What are my top 10 spend categories ?
•What is my travel spend ?
•What is my spend for a given Supplier ?
•What is my spend for a given Part ?
•What is my spend for a given Business Unit ?

•If your classification is done on a consolidated data across all systems in your
organization, you get visibility across all systems with classification.

What is Taxonomy ?
•A simple hierarchical level of coding structure used to classify spend at
different levels.

Segment

Family

Class

Commodity

What is the Spend Classification challenge ?
•Categorization at source
•Categorization itself is inconsistent or missing completely
•Multiple disparate Taxonomies may exist in a company
•Classifying into “MISCELLANEOUS” category
•No standardization of Taxonomies

What is the “Categorization at source” challenge ?

Exercise: Buying a work laptop and expensing via procurement

X Category: Facility. Building.Hardware
Category: IT.Hardware.Laptop

Characteristics:
•User entered, hence error-prone
•No standardization across the supply chain – business units, customers, or
suppliers.

What is the “inconsistent/missing
Categorization” challenge ?
• Category: IT.Hardware.Laptop
• Category: IT.Hardware.Computers.Laptop

What is the “multiple disparate Taxonomies”
challenge ?
•Multiple (and disparate) taxonomies may also exist in the organization
where classification could be carried out business unit-wise without regard
to, or referring to, the taxonomies used in other business units.

Business Unit 3

Business Unit 2

Business Unit 1

Taxonomy 3

Taxonomy 2

Taxonomy 1

What is the “MISCELLANEOUS category” challenge ?

•Spend transactions are classified into the 'Miscellaneous' category, making it
very difficult for business analysts to figure out which category the item
should actually belong to.
•Spend analytics data will then show a weighted 'Miscellaneous' category,
which is incorrect and thus does not reflect a true picture of spend by
categories for the organization.

•Similar popular categories: OTHERS, UNCATEGORIZED

What is the standardization of Taxonomies need ?

•An enterprise may have multiple taxonomies at different levels – corporate,
strategic, business unit and regional center.

•Multiple taxonomies at various levels creates a number of issues when
analyzing spend, therefore it is important to create or use standard
taxonomies across the enterprise.

What are the types of Spend Classification
Taxonomies ?
SPEND
CLASSIFICATION
TAXONOMY

Standard

Custom

Standard Taxonomies
•UNSPSC: United Nations Standard Products and Services Code. It is 5 level
hierarchy coded as an 8-digit number.

Example:
•Segment 44. Office Equipment and Accessories and Supplies.
• Family 10. Office machines and their supplies and accessories.
• Class 15. Duplicating machines.
• Commodity 01. Photocopiers.
• Business Function 14. Retail.

Custom Taxonomies
•If your own coding structure is strong enough for your business, or you think
your business is more acquainted with your own structure

1) Requisitions ERP Category
2) Purchase Orders
3) Receipts
4) Invoices

Procurement & Spend Analysis

Item Invoice Categories Supplier
Description Description Description
And Attribute And Attribute And Attribute

ERP Taxonomy UNSPSC Code Custom
Taxonomies

Data
Mining

Spend Classification

What is Spend Analysis?
•Process of collecting, cleansing, classifying and analyzing expenditure data
with the purpose of reducing procurement.
•Process of aggregating, classifying, and leveraging spend data for the purpose
of gaining visibility into cost reduction, performance improvement, and contract
compliance opportunities.
•Enables to answer the following questions:

•Who is buying ?
•What ?
•From whom ?
•When ?
•(optionally) Where ?
•At what price ?

Who needs Spend Analysis?
•It is the process of organizing a company’s spend in such a way that one
understand it, slice it, dice it and uncover hidden savings opportunities.
•Impacts more than just the sourcing team
•Spend analysis/ visibility serves three internal user community groups:

•Leadership and CxOs: who need up-to-date reports to drive strategic direction
•Managers, accountants: who need to drill down into a spend data set to explore specific areas
of interest or track down payment specifics
•Sourcing power users: who need to locate, drive, and monitor the next set of savings initiatives

What is Spend Management?
•Process in which companies control and optimize the money they spend.
•Involves cutting operating and other costs associated with doing business.
•Includes spend analysis, sourcing, procurement, receiving, payment settlement and
management of accounts payable and general ledger accounts.

•In an enterprise, spend management is managing how to spend money to best
effect in order to build products and services.
•Encompasses processes such as outsourcing, procurement, e-procurement, and supply chain
management.

Benefits of Spend Management
•Decreasing "maverick" spend
•Increase of spend economies of scale
•Strategic sourcing (also called "supplier rationalization")
•Sourcing optimization
•Co-operative sourcing

•Increase process efficiencies
•Increase procurement efficiency

Life cycle of a PO
Create PO
1

Add items to PO
2

Add PO to Cart *
3

Create Document for the PO in the Cart
4

Create Requisition for the Document
5

Note: PO needs to be classified before it hits the Cart. After the Order
hits the Cart, then it is too late for classification.

Classifying Spend
• We have a set of pre-defined fields chosen for classification from a Purchase
Order. All these fields are concatenated to form one giant string. (Note:
This textual string could have multi-lingual strings.)

• Lexers can be used for detecting languages. (eg: Auto lexers, World lexers)

• SVM could be used for Textual mining.

Where does Machine Learning fit in?
(Spend Auto-Classification)
Ontology (including Spend
Descriptions + other textual
attributes) Taxonomies

Spend transaction

Spend
Auto-classifier
Linguistics (UIMA) +
Neural Net Engine/
Text SVM

Auto-Classified
Spend

Training data set
• To begin with, customers provide a Training data set. This is from their
historic data. They take some well known data set from their most common
use cases. This would constitute a good representation of their problem.

• We run our logic against this training set and get the results. The results are
verified. We iterate this for some cycles to tune the logic.

• Repeat the same over other use cases.

Data Mining Model
Create a Model

Model created

Enrich/Re-train

Cleanse incorrect classification

Support new categories (if needed)

What is Named Entity Recognition ?
•“Named-entity recognition (NER) (also known as entity identification and
entity extraction) is a subtask of information extraction that seeks to locate
and classify atomic elements in text into predefined categories such as the
names of persons, organizations, locations, expressions of times, quantities,
monetary values, percentages, etc.” -- Wikipedia
•Most research on NER systems has been structured as taking an
unannotated block of text, such as this one

• Jim bought 300 shares of Acme Corp. in 2006.

•And producing an annotated block of text, such as this one:

• <ENAMEX TYPE="PERSON">Jim</ENAMEX>bought<NUMEX
TYPE="QUANTITY">300</NUMEX>shares of<ENAMEX TYPE="ORGANIZATION">Acme
Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.

Anatomy of a query …

Query = “Find Approved Status POs with High
Amount”

Stemmed Entity Recognition & Linguistic
Parsing yields…
Search Verb:
“Find”

Find Approved Status POs with High Amount

Parsing yields…
Search Verb:
“Find”

Attribute:Status= “Approved”


Parsing yields…
Search Verb: Entity:
“Find” Attribute:Type=“PO”



Parsing yields…
Search Verb: Entity:

Attribute:Amount= “High”


Parsing yields…
Search Verb: Target Entity:



Parsing yields…

Having
Attribute



Parsing yields…

Having Having
Attribute Attribute



Spend record with a Domain Ontology

OWL:
attribute: string
Transaction Party
has a Code
OWL:class has OWL:class
many
Role OWL:class
has an plays OWL:
Is A
Bank
attribute: string
is related
ID to
Person Corporation
OWL:class Is A
OWL:class
OWL:attribute: Finance
number Corporation
OWL:class
has has
First has Name ID
has many many
Name OWL:class OWL:
Address attribute: string
Last has an OWL:attribute:
Account number
Name
OWL:attribute: has an in
ID
string
OWL:attribute: has
number

Door Street City State Zip Country
Number Name
OWL: OWL: OWL: OWL: attribute:
OWL: OWL: attribute:string
attribute: string attribute: string attribute: string attribute:string string

Transaction

ID:200911071234
has Party

has ID: SBK
has Role: S? Bank Role

played by

Bank

has Name: Bank Of Congo

has
many Address
has Street Name: Afrique Au Congo
has Country: RDC

Transaction

ID:200911071235
has Party

has ID: ORP
has Role: Ordering Party Role

played by

Person

has First Name: John
has Last Name: Doe

has
many Address
Account has City: Kinshasa
has Account Id: 123456 has Country: CD

in Bank
has Name: Bank Of Congo

Transaction
Transaction
ID:200911071234
ID:200911071235 has
is related
Party to
Party
has has ID: ORP has ID: SBK
has Role: Ordering Party Role has Role: S? Bank Role

played by

Person

has First Name: John played by
has Last Name: Doe Address
has City: Kinshasa Address
has Country: CD has Street Name:
has Afrique Au Congo
many has Country: RDC
Account

has Account Id: 123456
in Bank has
has Name: Bank Of Congo many

A possible solution: Pipelining approach

•Flow 1:
•Machine learning Pipeline: Input data is directly fed to the Machine Learning piece.

•Flow 2:
•Domain Ontology Pipeline: Input data is fed to a Domain Ontology.
•Standardize the output from the Domain Ontology.
•Machine learning Pipeline: Feed it into the Machine Learning piece.

•Flow 3:
•NER Pipeline: Input data is fed to a NER.
•Domain Ontology Pipeline: Output from the NER is fed to the Domain Ontology.
•Standardize the output from the Domain Ontology.
•Machine learning Pipeline: Feed it into the Machine Learning piece.

•Note:
•Domain Ontology and NER Pipelines can be optionally turned on or off

5
6

1
9

4

2
3 7

8

SVM Steps

1.Identify taxonomy (hierarchical or flat) to be classified against
2.Identity representative training data that has been classified to this taxonomy
3.Run training data against blank SVM model and the given taxonomy
4.Classify training data as per required taxonomy
5.Classify the data
6.Increase training population and enrich classification model
7.Recognize and realign impact of original model against fresh training data
8.Classify (manually) misclassifications into proper taxonomy nodes
9.Run step 6 through 8 until all the variations for a given domain have been recognized
10.Introduce live data
11.Repeat steps 4 and 5 for misclassifications
12.Store the result in a relational database
13.Insert data in an Ontology
14.Enable analysis using RQL or SPARQL

Open source software

1.Jena
2.Pentaho http://www.pentaho.com/
3. Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml
4.Annie NER
5.GATE
6.UIMA
7.SVM, http://en.wikipedia.org/wiki/Support_vector_machine

Acknowledgements

•Keshava Rangarajan, Chief Architect, Halliburton Corporation
•Gopalan Arun, Vice President, Software Development
•Ramesh Vasudevan, Senior Director, Software Development
•Nagaraj Srinivasan, VP R&D, Halliburton Corporation (Landmark Graphics)
•Ashish Pathak, Director, Product Management, Oracle America, Inc.
•Chandra Yeleshwarapu, Director, Information Management & Platform
Technologies, Halliburton Corporation
•Jayesh Shah, Vice President, Product Development, Oracle America, Inc.
•Rajesh Raheja, Senior Director, Applications Development, Oracle America, Inc
•Stanford NER, http://nlp.stanford.edu:8080/ner/process
•Prashant Mendki
http://scn.sap.com/community/epm/blog/2012/03/03/your-basic-guide-to-spend-
classification

References

•Jena
•Pentaho http://www.pentaho.com/
• Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml
•Annie NER
•GATE
•UIMA
•SVM, http://en.wikipedia.org/wiki/Support_vector_machine
•Oracle Spend Classification Process Guide, Release 7.9.6
•Lexers

Semantic Spend Classification

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Semantic Spend Classification

Similar to Semantic Spend Classification (20)

Recently uploaded

Recently uploaded (20)

Semantic Spend Classification

Editor's Notes