SlideShare a Scribd company logo
1 of 80
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
TIBCO® Patterns
Partner Enablement – October 12th, 2011
Making Systems Smarter about dealing
with “imperfect” data
Dave Chamberlain
dchamberlain@Tibco.com
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
2
Safe Harbor Disclosure
During the course of this presentation TIBCO or its representatives may make
forward-looking statements regarding future events, TIBCO‟s future results or our
future financial performance. These statements are based on management‟s
current expectations. Although we believe that the expectations reflected in the
forward looking statements contained in this presentation are reasonable, these
expectations or any of the forward looking statements could be prove to be
incorrect and actual results or financial performance could differ materially from
those stated herein. We refer you to the reports that TIBCO files from time to time
with the Securities and Exchange Commission for a discussion of important
factors that could cause actual results or financial performance to differ materially
from those contained in any forward-looking statement made in connection with
this presentation. TIBCO does not undertake to update any forward-looking
statement that may be made from time to time or on its behalf.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
3
First Last Addr1 Addr2 City State Zip DOB
Jon Smith 1030 Main St. Princeton NJ O8540 10/12/79
10/12/97 Jon Smiht 1030 Main Princeton NJ 0854O
John Smyth Main Street 103A Pton NJ 08540 12/12/79
What‟s the problem?
Humans can tell these records are about the
same person
Systems have a very hard time
they can‟t
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
4
First Last Addr1 Addr2 City County Post-code DOB
Jonathan Price 103 The High Street Flat 2 York Yorkshire YR1604 10/12/79
Pryce Jon 1o3-2 High St YR16o4 Dec 10 1977 York
John Prce High St #103 2 Y0rk Yorkshire YR1064 12/12/79
What‟s the problem?
Humans can tell these records are about the
same person
Systems have a very hard time
they can‟t
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
5
TIBCO® Patterns
 Focused on structured (fielded) data
 Products, people, companies, claims, events, etc…
 In-memory, real-time and designed to be embedded
 Products
 TIBCO® Patterns - Search
• Finds patterns systems or people are looking for in data
 TIBCO® Patterns - Learn
• Detects and leans patterns when human make decisions on data
similarity
Enables organizations to “connect the dots”
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
6
Horizontal applicability – all industries and agencies
• CSRs looking for the right customer
• Admissions finding the right patient
• Customers finding things to buy
• Intel agencies identifying terrorists
Find
• Identifying records about the same customer for KYC and SCV
regulations
• Ensuring citizens receive correct entitlements
• Conforming with import/export regulations
Match
• Identifying potential fraud
• Anti Money Laundering
• Creating and maintaining an Master Patient Index
Link
The good news!
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
7
Use cases by key verticals
• Building 360 degree view of customers for regulatory purposes
• Generating better up sell and cross sell opportunities (with BE integration)
• Quickly finding the right customer
• Anti Money Laundering
FSI
•Quickly finding the right customer
•Understanding total relationship with customers
•Keeping multiple systems synchronized
Telco
• Law enforcement/Intel – finding the “bad guys”
• Making sure our kids are safe – child protection/youth services
• Ensuring citizens receive (only) their correct entitlements
Federal & State
Government
• Consolidating customers due to M&A activity
• Matching energy trade sides
• Linking data about grid and network assets
Energy
• Identifying duplicates in Master Patient Index
• Linking patient encounter records for outcome driven healthcare
• Finding the right patient, first time every time
Healthcare
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
8
Mainframe Database
3-Tier
Online
ESB
N-Tier
Event Driven
2-Tier
Batch
000,000,000’s 000,000,000,000’s000,000’s
Building Block
Enterprise 1.0
(‟60s – ‟80s)
Data Processing
Enterprise 2.0
(‟80s – 2000)
Client Server
Enterprise 3.0
(2000 – 2020)
Predictive
Software
Velocity
Interactions
Time to
React
Amount of DataHalf Life of Data
The New Data Challenge
Now it’s even more
important to deal effectively
and efficiently with
imperfect data
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
9
The problems we all face
In the real world, database information is never 100% perfect, never 100%
consistent, and never 100% complete – and never can be. Data by its nature is
full of errors: omissions, inconsistencies and duplicates.
 A root cause - human-computer gap
 Humans recall information approximately and easily tolerate data errors and
variations when determining similarity
 Software has been exact and unforgiving
 Equality or inequality is easy
 “last name = chamberlain” - “inventory level < 100”
 Similarity is difficult
 Select * from customers with .85 similarity between this and that…
 “Chamberlain” ≈ “Chumberland” - are they the same person?
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
10
The cost of not finding the data you need
 The organizational/societal cost is high
 Terrorists board planes
 Criminals get away
 Patients get the wrong treatments
 Enterprises don‟t realize economies of scale
 FSI doesn‟t really know their customers – up-sell/x-sell opportunities are lost –
risk is not known
 Government entitlements get abused - fraud goes undetected
 Goods and/or people enter or leave a country illegally
 Repeat drunk drivers get drivers licenses
 TV listings are wrong
 Logitech remote controls don‟t work correctly
 etc…
 These types of problems permeate every organization
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
11
Types of things our customers do
 Ensuring compliance with export/import regulations
 Linking patients and their visit records for outcome-driven healthcare
 Finding the right person across all law enforcement systems
 Creating the world‟s largest Biobank for genetic researchers
 Helping customers find the right brand and the right model to program their
remote controls
 Automating the ingest of TV programming schedules from over 150 broadcast
and cable operators
 Reducing turnaround time from 5 days to 4 hours to respond to customer
requests for equipment
 And many more examples…
https://ssl.tibcommunity.com/community/products/patterns
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
12
Our innovations
Problem
Conventional
Solutions
TIBCO Innovations Advantages
How similar are sets of
data elements?
Soundex, NYSIIS, Edit
Distance, Metaphone etc
Mathematical model that finds
patterns systems or people are
looking for in data
• Superior accuracy
• Symmetric error-tolerance
• No guessing of rules and parameters
• Computational efficiency & scalability
• Data independence - people, assets,
TV programs, stock trades, products,
companies, claims, transactions, etc.
• Engineering efficiency - easy to
maintain and refine
• Independent of language
• Real-time
• Sparse data support built-in
• Easily embeddable
• Quick and easy deployment
• DBMS independent
Are records about the
same entity?
Custom built matching
rule sets - optional
statistical parameters
Mathematical model that
identifies and learns patterns
as humans make decisions
about data similarity
TIBCO Patterns - Search
TIBCO Patterns - Learn
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
13
TIBCO Patterns – Search - Bipartite Graph – String Matching w. Unigrams
 Cost = |displacement| (linear cost function)
 Pick set of edges that minimize cost
 Only one edge per symbol allowed
P E T E R _ S M I T H
S M I T P E T T E R
4
4
5
-6
-6
-6
5
2
7
4 5
-6
-3 -2
1
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
14
TIBCO Patterns – Search - Bipartite Graph – String Matching w. Polygrams
P E T E R _ S M I T H
S M I T P E T T E R
4
-6
5
Total cost = 4 + 5 + |-6| = 15
Find local cost minimum
Longer Grams have more “weight”
5
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
15
Bipartite Graph – String Matching w. Alignment
 Shifted 4 positions for global cost minimum (edges may change)
 Minimizing total cost (w/o weight: 12, w/ weight: 42) [simplified]
 Different solutions possible – weights, tokenization, …
P E T E R _ S M I T H
S M I T P E T T E R
0 (x3)
-10 (x4)
1 (x2)
0
1
(x#: weight based on length
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
16
Why is this relevant?
 Unique capabilities result from fundamental approach
 Closest to human intuition – “natural” paradigm
 Translates to accuracy
 Complete independence of domain – any sequences embedded into 1-space
(think genetic sequences)
 Does not care about data type, culture, language,
character set, tokenization, fielding
 Solid scientific footing guarantees robustness
(linear behavior)
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
17
First Last DOB Address City State Height Hair color etc
Rec 1 Jason Fitzgerlad 12/1/1971 200 Classen St. Paul MN 5‟10” Brown
Rec 2
2000 N
Classon
Fitzgerland Jasoz Saint Paul MN 5-11 Brawn
TIBCO Patterns - Search
(0.80) 0.90 0.82 -1 0.87 0.85 1.0 0.95 1.0
TIBCO Patterns - Learn
Overall score / classification 0.93 Intelligent combination of field scores
Search compared to Learn
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
18
TIBCO Patterns - Learn
 N input features F = (f1,f2, … fn)
 Similarity score
 Custom score (date)
 Binary values: both records male/female
 Other numeric input
 Features can be missing (defaults, undefined, invalid): -1
 Similarity problem is a different one depending on what information is present
(If you only have a name and no address you look at the name differently!)
 Conditional dependencies = hidden patterns in data
 When ID matches closely, you are more generous in the address field
 When (both records) female, totally different last name is acceptable (if first name is similar or …)
 Thresholds, weights, patterns, …
 Humans do it intuitively – such as recognize a person
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
19
TIBCO® Patterns – Learn - training
 Pair selection for training
 Human user is presented with a pair of records
 Machine Learning Engine sees the numeric features and human answer
 Engine updates model and tests its performance
 Stop when model converges
 Avoid overtraining
Initial
Matching
Pair
selection
Labeling
Train
Test
Domain
Experts
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
20
TIBCO Patterns – Learn - Deployment
 Deploy model - incrementally train a model with new
patterns
 Add features to existing model and incrementally train
 Select among multiple models at run time
 Significant boost in accuracy
 Need expert operators to coach during training
 Set and forget – very robust
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
21
TIBCO® Patterns - Learn - When to use it
 Multiple patterns present
 Many (short) fields
 Sparse data
 When you can’t or don’t want to build matching rules to
deal with multiple parallel scenarios
 e.g. Comparables matching: product data, similarity
judged based on UPC code, or name and manufacturer or
description only
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
22
California Department of Public Health
Prenatal Genetic Screening Program
TIBCO® Patterns evaluation and implementation
 CDPH benchmarked TIBCO and a competitor
 After 3 weeks competitor reached 79% accuracy
 After 1 day TIBCO topped 97%
 Two phase project undertaken
 First - cleaning the CDHP database (get clean)
 5.5 million record reference database of at risk women
 2.3 million duplicate records identified - representing 1 million unique women
 1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women
 Then – automate matching of incoming test results (stay clean)
 Before TIBCO – 65% automated match rate
 After TIBCO – 95%+ automated match rate
 Overall results
 Greatly improved levels of automation
 Earlier identification/treatment/counseling for possible problems
 Bottom line - Better quality care for at risk women and their unborn
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
23
Labs
CDPH
Reference
Database
TIBCO® Patterns
Labs
Test results from contract
labs and PDCs
Screening results
ingest process
> threshold = match
< threshold = no match = new
>< thresholds = human review?
Human review
and action
California Department of Public Health
Prenatal Genetic Screening Program
Diagnostic
Centers
Diagnostic
Centers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
24
Typical customers and partners
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
25
Architectural Overview
TIBCO® Patterns
Solaris
(32/64)
Linux
(32/64)
HPUX
(32/64)
AIX
(32/64)
Windows
(32/64)
VMS
(32/64)
.NET C/C++Java
ActiveMatrix
Language/client
Interfaces
Server based engines
Supported OSes
LearnSearch
BusinessWorks
Python
BusinessEventsCIM
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
26
Typical Deployment
Database Engine(s)
(ActiveSpaces, Oracle, DB2, SQLServer, MySQL, Informix,
Sybase, Postgres, Caché, …)
Applications
(front-end and/or back-end)
TIBCO client
TCP/IP sockets
TCP/IP sockets
Current
Applications
Run unchanged TIBCO® Patterns
Loader / Syncer
Tables
•Multiple instances with multiple tables
•TCP/IP Sockets
•Thin client to marshal requests and return
results
•Partition and/or replicate data for scale
•Loader/syncer for initial load and
subsequent synchronization
Other data sources
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
27
Identifying opportunities
 Everyone of your customers and prospects – is a prospect!
 Some questions to ask
 What is the business impact (and cost) of not being able to deal effectively
with imperfect/bad data?
 Where do you have people (either your own or customers) spending a lot of
time searching for the right data (about people, products, suppliers etc etc)
 How many people do you have matching records by hand?
 What would it mean if you were to automate a higher percentage of the
matching?
 What is your current level of matching accuracy?
 What do you do with the records that SQL can‟t match?
 How do you deal with content differences between records that represent the
same entity?
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
28
Resources
 The Princeton team
 “Webex” sessions/demos, on-site meetings, EBC visits, POCs, custom demos, industry
or customer specific materials…Anything we can do to help identify/develop/close
TIBCO Patterns license revenue.
 Live demos (the only vendor to do this, I wonder why…)
 Demo index - http://www.netrics.com/demo_index/
 English live demo (try the advanced search button) - http://netrics.com/demo/
 Multilingual (try the surprise me button) demo - http://www.netrics.com/demo/index_foreign.php
 Oil well head data - http://www.netrics.com/demo_energy_oil
 Spanish names - http://www.netrics.com/demo_spanish_names
 Portuguese university names - http://www.netrics.com/demo_universities/
 FDA drug demo - http://netrics.com/demo_fda_drugs/
 SalesCentral materials - https://salescentral.tibco.com/people/dchamberlain?view=documents
 2 minute explainer - http://www.netrics.com/demo/NetricsSPOT.html
 TIBCommunity - https://ssl.tibcommunity.com/people/dchamberlain?view=documents
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
29
Live demonstration of TIBCO® Patterns capabilities
It’s inherently very difficult to demonstrate an engine, and
we wanted to show:
1. The ability to deal effectively and efficiently with just about
any type of structured data
2. The ability to work with any “language”
3. Very low latency when dealing with large data sets
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
30
Live demonstration of capabilities
TIBCO is the only vendor to feature live demonstrations. We
show the ability to deal with any type of data in any language
with very low latency.
English Demos
Multilingual Demos
FDA Drug Demo
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
31
Differentiation
 TIBCO innovations are unique in the market…
 Mathematical modeling
• Finding patterns in data – giving system and people the data they need
• Finding and learning from human decisions
 Eliminates the need to guess complex matching rule sets:
• Difficult to develop, maintain and update
 Works equally effectively across multi-domain, multi-lingual data
 Does not require a DBMS, but integrates nicely if needed
 The results are unmatched…
 Accuracy
 Speed
 Scalability
 Easy to deploy, maintain and update
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
32
Five things customers should consider
 Accuracy – how close can the system come to reaching the same
conclusions as a domain expert when faced with the same data?
 Efficiency – how easily can the system deal with increasingly large volumes
of data and workloads?
 Entity and language independence – how does the system deal with data
about any type of business entity in any language? Systems are global and
need to deal with data about many entities other than customers and
products.
 Configurability – what options are provided to fine tune requests to easily
achieve the desired results?
 Ease of integration – how is the system integrated into existing applications,
processes and tools? What native language support provided? What ESB,
SOA, BPM and CEP products are supported?
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
TIBCO® Patterns
Customer stories
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
34
Types of things our customers do
 Ensuring pre-natal genetic screening results are linked to the right woman
 Automating the ingest of TV programming schedules from over 150
broadcast and cable operators
 Helping customers find the right brand and the right model to program their
remote controls
 Ensuring compliance with government export/import regulations
 Reducing turnaround time from 5 days to 4 hours when responding to
customer requests for equipment
 Helping UK government agencies collaborate to provide better care
 Provide real-time linking and de-duplication across 700 million
bibliographical and citation items
 Linking patients and their visit records for outcome-driven healthcare
 Creating the world’s largest Biobank for genetic researchers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
35
California Department of Public Health
Prenatal Genetic Screening Program
TIBCO® Patterns - evaluation and implementation
 CDPH benchmarked the effectiveness of TIBCO and a competitor
 Created a standardized data set, identified duplicates by hand, then…
 After 3 weeks competitor reached 79% accuracy
 After 1 day TIBCO topped 97%
 Two phase project undertaken
 First - cleaning the CDPH database (get clean)
 5.5 million record reference database of at risk women
 2.3 million duplicate records identified - representing 1 million unique women
 1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women
 Then – automate matching of incoming test results (stay clean)
 Before TIBCO – 65% automated match rate
 After TIBCO – 95%+ automated match rate
 Results
 Greatly improved levels of automation
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
36
Contract
Labs
4.2 million
records of at risk
women
TIBCO® Patterns
Contract
Labs
Test results from contract
labs and PDCs
Genetic testing results
ingest process
> threshold = link
< threshold = no link = new?
<>thresholds – human review
Human review
and action
CDPH – linking test results to the right woman
Prenatal
Diagnostic
Centers
Prenatal
Diagnostic
Centers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
37
TV Guide – accurate, timely programming
 Different stations often describe the same show in different ways
 TV Guide have built a reference dataset of 2M + programs
 Regular ingesting of future programming from hundreds of broadcast/cable
operators
 Millions of Web, print and channel guide customers
 Information on over 12,0000 channel lineups provided to hundred of cable/satellite
operators and millions of Web visitors and print readers.
 TV Guide benchmarked several vendors – TIBCO selected for superior accuracy
and automation
 Results
 Significant reduction in manual effort
 More accurate guides
 People now focus on enriching the data and enhancing customer experiences
Providing accurate, informative program
information for 100s of millions of customers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
38
Cable
operators
2+ million record
content database
TIBCO® Patterns
Programming from
hundreds of cable, satellite
& broadcast outlets
Match and link incoming
records to content DB
>Exists – link/enrich content
<New – add to content DB
<>Uncertain – human review
Human review
and action
TV Guide – matching future programming to content database
 Incoming data quality is highly variable
 Content data on over 170,000 movies, a million plus TV
series episodes and every TV show since 1960
Broadcast
channels
Satellite
providers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
39
Logitech – increasing customer satisfaction by making it easier to find the right
brand/model
 Electronic brands/model number combinations are complex and hard to double
transcribe
 Customers were becoming frustrated…
 Needed a way of suggesting the right brand/model even when customer entries
were way off
 TIBCO Patterns – Search is used “behind” the Web UI to find the closest matching
models and suggest them to the consumer
Harmony remotes feature activity-based control that makes getting
to what you want to do as simple as pressing a button
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
40
Web UI
300,000+ records
about brands &
models
TIBCO® Patterns
Customers programming
their Harmony remotes
Finding the closest
matching models
> threshold = show closest
matching models
Logitech – finding the right model to program
Web UI
Web UI
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
41
Customs & Excise Department of Hong Kong/PCCW
TIBCO® Patterns - evaluation and implementation
 Hong Kong Customs & Excise Dept required a matching engine as part of their
initial specification of the ROCARS system
 Need to check bills of lading and manifests against a series of white/black lists – in
mix of simplified Chinese, Traditional Chinese and English
 PCCW won the contract and started development work with another vendors
matching capability – and quickly realized it was not up to the job
 In late 2008, PCCW Googled and found TIBCO® Patterns – after several weeks of
discussions and demonstrations they started - and quickly finished a POC
 The ROCARS system went live in early 2010 with TIBCO® Patterns at the heart
“risk engine”.
ROCARS System
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
42
Border
crossings
White lists
Black lists
TIBCO® Patterns
ROCARS data
entry sources
Checking Bills of Lading
and Cargo Manifests
> threshold = suspicious
< threshold = OK
<> thresholds – human review
Human review
and action
Customs & Excise Department – ensuring compliance with
government regulations
Web
Self service
kiosks
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
43
Sterilmed – automating quote process
Sterilmed offers products and services to help healthcare providers lower their
device and equipment costs
 Hospitals send lists of up to 5,000 items of required medical equipment they need,
often in Excel spreadsheets
 Specific line items need to be matched to the Sterilmed inventory system – required
equipment would usually be described differently
 Getting a quote for a request could take up a week - several analysts matching line
item by line item
 Process of producing a quote has decreased to 4 hours, and is much more highly
automated
 Other uses now include matching across various healthcare industry databases and
identifying duplicate contacts across systems
 Results
 More efficient and effective business processes
 People now able to do their real job
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
44
Hospitals
Sterilmed device
and equipment
inventory
TIBCO® Patterns
Equipment requests from
healthcare providers
Linking customer requests to
Sterilmed inventory
> threshold = link
< threshold = no link = new?
<> thresholds – human review
Human review
and action
Sterilmed – reducing quote process from 5 days to 4 hours
Clinics
Other
providers
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
45
LiquidLogic – helping government organisations collaborate
LiquidLogic provides the public sector with a platform enabling multi-
agency solutions for collaborative working
 UK Public Sector is moving towards collaboration between multiple
organisations/agencies
 The same client is often represented differently in several different
databases – hampering collaboration, and with potentially dangerous or
fatal outcomes
 Need to model and enable real-time process and data sharing across
multiple organisations
 TIBCO® Patterns - Search provides the real-time duplicate identification,
duplicate prevention and searching services across the Protocol platform
 Installed at dozens of Public sector organisations, helping them provide
better services to their clients and detect potential fraud
PROTOCOL overview
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
46
Children
services
Linked view of
clients across
multiple systems
TIBCO® Patterns
Multiple systems across
multiple organisations
Providing 360 degree of clients
• Duplicate identification
• Duplicate prevention
• Searching
Human review
and action
LiquidLogic – Protocol platform
Domestic
violence
Other
systems
•Allows a radical redesign of processes
•Integrates with existing corporate applications to present a SOA
•Manages multiple disparate data sources supporting the information sharing
requirements of federated applications
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
47
Los Alamos National Laboratory (LANL) – real-time linking of
bibliographic and citation data
 LANL Research Library locally hosts large data collections
 A&I databases: ISI Citation Databases, Inspec, BIOSIS, Engineering Index, …
 Full-text collections: Elsevier, Wiley, APS, IOP, …
 Duplicates in LANL data collection
 Amongst bibliographic records and citations
 Between bibliographic records and citations
 De-duplication, matching and linking needed
 Join records from several databases that describe the same work
 Find works that cite a given work
 High volumes - >600 million citations and >65 million bibliographic items
 High request rates ->25/second
 Results look much better than those of batch de-duplication approach ~ TIBCO® Patterns + training by librarians
 Can „de-dup‟ external data against local data, no batch processing, but on-the-fly de-duplication
 Possibility to retrain the system to optimize responses without data reprocessing: machine learning module
 Scalability to accommodate growth of datasets
LANL presentation on real-time matching
Los Alamos National Laboratory is a premier national security research institution. Working on advanced
technologies to provide the United States with the best scientific and engineering solutions to many of
the nation’s most crucial challenges.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
48
PubMed
Inspec
Biosis
65 million
bibliographical
items
TIBCO® Patterns
Local and remote A&I
databases
Linking bibliographic and
citations in real-time
LANL – systems matching like humans
Local and remote full text
databases
APS
Elsevier
Wiley
700 million
citation
items
•Trained by domain experts (LANL librarians)
•Clearer cut off between matches and non-matches
•Never a need to re-index
• Given a bibliographic key, which are the matching bibliographic records?
• Given a citation key, which are the matching bibliographic records?
• Given the identifier of a bibliographic record, what is the corresponding bibliographic key?
• Given a bibliographic key, which are the citing bibliographic records?
• Given a citation key, which are the citing bibliographic records?
Forgiving to errors in datasets
Forgiving to errors in query
Compares like humans
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
49
Rush Health – moving to outcome driven care
Rush Health is a clinically integrated network of providers working together to improve health through high
quality, efficient health services covering the spectrum of patient care from wellness, prevention and health promotion, to
disease management and complex care management.
 Three major hospitals, 750 on-staff physicians and 50 allied health providers
 Initiatives – both need highly accurate and automated matching
 Moving to make diagnosis and treatment more proactive
 Implementing a system where payments are based on the outcome of healthcare, not the amount prescribed
 Patients will often visit different facilities and have their encounter and therapy details
recorded differently, resulting in high duplicate records rates
 First - cleaning the EMPI (get clean)
 ~2 million record EMPI database was analyzed for duplicate patient records
 ~1.6 million duplicate records identified
 Leaving reference database of ~400,000 unique patients imported back into EMPI
 Then – automate matching of incoming patient encounter records into their Enterprise Data
Warehouse based EMPI (stay clean)
 Results
 Much higher level of automated matching - for an accurate and duplicate free EMPI
 More accurate assessment of treatment outcomes and preventative care
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
50
Hospitals
Enterprise Data
Warehouse -
EMPI
TIBCO® Patterns
Encounter records from
hospital and doctor visits
Linking to the right patient
record
> threshold = link
< threshold = no link = new?
<> thresholds – human review?
Human review
and action
Rush Health – linking encounter records to the right patient
Dr offices
Clinics
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
51
California Department of Public Health – Building the world‟s largest
Research Ready Biobank
Development of a research-ready pregnancy and newborn biobank in
California
 California has been collecting samples at several key life events for many years, this could
have been a tremendous potential source of research data
 Because of the human involvement in manually linking record from different sources , only 5
or 6 requests for research sample data per year could be handled
 NIH grant to Sequoia Foundation for the specimen tracking and linkage project awarded in
late 2009
 Now linking ~ 200 million records life event records from across multiple systems to form the
research ready biobank
 Results will be
 A life course, client based system that enables cross-generational studies, population-based family
studies, and women-level studies across multiple pregnancies
 Cost-efficiently process requests, track specimens, conduct linkage and integration of new data
 Process high volumes of specimen and data requests
CDHP Research ready Biobank
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
52
Fetal
deaths
Live
births
Genetic
Screening
Research ready
biobank
TIBCO® Patterns
Sample data collected
over 20+ years on
significant life events
Linking records from multiple
sources
CDPH – linking 100s of millions of records
Deaths
Screened
newborns
• Descriptive epidemiologic studies on the birth prevalence of genetic disorders and
seroprevalence of infectious agents
• Analytic epidemiologic studies to determine the causes of birth defects, preterm birth and
other disorders
• Laboratory studies to develop and validate screening tests and other assays
• Prevention and intervention studies to guide the design of screening models with maximum
efficiency
> threshold = link
< threshold = no link
<> thresholds = human review
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
TIBCO® Patterns for
BusinessWorks™
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
54
BusinessWorks™ Patterns Plugin Goals
 Enable applications developed with BusinessWorks™ Designer to deal effectively
with “imperfect” data
 Provide access to the TIBCO® Patterns – Search capabilities without having to
write any code.
 Use standard BusinessWorks™ Designer to design, test and initiate queries to the
engine.
 Integrate TIBCO® Patterns – Search into the TIBCO product suite using standard
TIBCO development and integration tools.
 Leverage existing BW components to provide access to TIBCO® Patterns –
Search from custom and off the shelf applications.
 Provide dynamic query request configuration.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
55
Architecture
 Runs in the BusinessWorks™ extensible framework for
integration
 Uses the TIBCO® Patterns Java API
 Consists of two jar files
 TIBCO-Patterns-Java-Interface-4.5.1.jar
 BWPatternsPlugIn_4.5.1.jar
 Installed into <TIBCO-HOME> using the TIBCO Universal
Installer
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
56
BusinessWorks palettes
Patterns Palette provides CRUD operations to be
performed on in-memory tables. Data can be loaded
from any external data source. In-memory tables are
used for matching in accordance to the matching criteria
and query strings provided. Can leverage other BW
activities and can be exposed as services for further
consumption by other enterprise consumers.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
57
Duplicate identification
Data loaded
from multiple
systems
Multiple files about the
same entity type
Load in common schema
Results above specific similarity score
are duplicates
Iterate
through
data record
by record
Take appropriate
action – merge?
link?
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
58
Sample duplicate identification process
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
59
Data augmentation
Reference
Data
(Experian life
style data
brick) for
example
File with data for augmentation
Find most similar record(s)
Augmented with data from reference
Augmented data
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
60
Identifying overlap and uniqueness across multiple data sets
Unique to A Unique to B
Overlap2 or more files about the
same entity type
Data loaded
into
multiple
tables
Results
in A or B or both
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
61
Record classification
Table(s)
loaded with
keywords or
phrases of
interest
Incoming records for
classification
Search for most similar keywords
Results are closest classification(s)
Iterate
through
data record
by record
Take appropriate
action based on
classification
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
62
Fuzzy parsing
Table(s)
loaded with
keywords or
phrases of
interest
Incoming records for
parsing
Search for most similar keywords
Results are closest matching words and phrases
Iterate
through
data record
by record
Take appropriate
action
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
TIBCO® Patterns for
BusinessEvents™
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
64
Patterns plug-in for BusinessEvents
• TIBCO Patterns Plugin for BusinessEvents available in 4.5.1
• Implemented as BE Studio Catalog Functions
• Code samples available for all function calls
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
65
TIBCO Patterns - Business Events Plugin
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
66
TIBCO Patterns – Enable Catalog Function View
Catalog Functions appear when the BE perspective is selected and a
Rule Function is open.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
67
Customer database(s)
TIBCO Patterns correlates
the records before loading
Multiple concepts about
the same customer are
loaded
Correlating multiple records about the same entity
Before After
A single concept for a
customer is loaded
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
68
BE
Incoming eventsIncoming events
Concept Cache
TIBCO Patterns
BusinesEvents Concept Cache
Are there existing
similar events?
Before After
New concepts are added even
when very similar concepts exist
Do existing similar events exist?
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
69
Event cloud
System A System B System C
TIBCO
Patterns
Fraud?
Cyber attack?
Money laundering?
Finding patterns in the event cloud
Other conclusions…
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
AMX BPM / Patterns Demo
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
71
Synopsis
Data is very rarely entered into a system (or worse, multiple systems) the same
way. We use full names sometimes and nicknames others, we “fat-finger”
keys, we reverse fields, we use maiden names or married names, etc., etc.
This makes finding the exact data we want or need very difficult (or many times
impossible).
Finding incorrect information, or not finding information at all can be life
threatening in some situations. In Healthcare if we do not find information about
a Patient, we may not know about allergies or drug interactions as an example.
This demonstration of AMX BPM includes our Netrics Matching Engine
technology which allows a user to find a customer record even without having all
the information about that customer or regardless of how that customer
information may have been incorrectly entered into the system(s).
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
72
Step 1: Loading the data into Patterns
This is the format of the data table created in the Matching Engine as displayed from the Display Table option in BW.
This BW process loads data from a source (in this case a
CSV file) into Patterns - which is running as a Windows
service.
This data represents the most current customer
information on file.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
73
Step 2: Starting the Process – Enter Customer Information
This is the AMX BPM Business Service (FormFlow) which starts the process.
Enter “Customer” information
to search form displayed on
the first step in the process.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
74
Step 3: Calling Patterns
This is the AMX BPM Business Service (FormFlow) which starts the process.
This BW process calls the Matching Engine with the search criteria entered in the first form. The results are mapped and passed back to the BPM process for display
on the second form.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
75
Step 4: Selecting the most similar records or refining the search
The Results form displayed on the
second step of the process (after the call
Matching Engine step) showing the
results of the initial search. Additional
refined searches can also be run from
this step.
This entire process appears as a single
form to the user (FormFlow).
This is the AMX BPM Business Service (FormFlow) which starts the process.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
76
Step 5: Invoking the main process
This is the AMX BPM Business Service (FormFlow) which starts the process.
This is the main AMX BPM process that displays the selected customer record
from the Business Service Process and allows for modifications to the data.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
77
Step 6: Viewing the selected customer record and modifying the data if necessary
This form allows a user to edit the data
returned from the Matching Engine.
If a change is made, it will go to an
approval step.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
78
Step 7: Approving the updated customer record
This is the Approval form. It allows a
user to approve (or reject) the update to
the customer record made in the
previous step.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
79
Step 8: Update back-end system
This step checks the back-end
system of record to see if the
customer record exists,
performs an update if it does, or
an Insert if it doesn‟t.
© 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary.
80
Step 9: Notification of the update
Once complete, depending on the process path taken one of two email
notifications will be sent.

More Related Content

What's hot

Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Daniel Katz
 
Innovation and Emerging Technology
Innovation and Emerging TechnologyInnovation and Emerging Technology
Innovation and Emerging TechnologyRon Dolin
 
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...Daniel Katz
 
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...Ceski
 
Technologies for Lawyers - Legal Sector
Technologies for Lawyers - Legal SectorTechnologies for Lawyers - Legal Sector
Technologies for Lawyers - Legal SectorSatya Pal
 
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...Procurement & Government Contracting Compliance (Series: Corporate & Regulato...
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...Financial Poise
 
Overcoming In-house Politics to Implement eDiscovery
Overcoming In-house Politics to Implement eDiscoveryOvercoming In-house Politics to Implement eDiscovery
Overcoming In-house Politics to Implement eDiscoveryJ. David Morris
 
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017Richard Susskind
 
Ai and Legal Industy - Executive Overview
Ai and Legal Industy - Executive OverviewAi and Legal Industy - Executive Overview
Ai and Legal Industy - Executive OverviewGraeme Wood
 
Procurement governance and complex technologies: a promising future?
Procurement governance and complex technologies: a promising future?Procurement governance and complex technologies: a promising future?
Procurement governance and complex technologies: a promising future?Albert Sanchez Graells
 
Vintage group-newsletter xbrl-final-lr-1017
Vintage group-newsletter xbrl-final-lr-1017Vintage group-newsletter xbrl-final-lr-1017
Vintage group-newsletter xbrl-final-lr-1017genein19
 

What's hot (13)

Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
 
Innovation and Emerging Technology
Innovation and Emerging TechnologyInnovation and Emerging Technology
Innovation and Emerging Technology
 
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...
Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...
 
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...
CrimsonLogic World Bank_IADB_Washington DC_30 Sep 2009_eGovernance to yield g...
 
Technologies for Lawyers - Legal Sector
Technologies for Lawyers - Legal SectorTechnologies for Lawyers - Legal Sector
Technologies for Lawyers - Legal Sector
 
Ai and law
Ai and lawAi and law
Ai and law
 
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...Procurement & Government Contracting Compliance (Series: Corporate & Regulato...
Procurement & Government Contracting Compliance (Series: Corporate & Regulato...
 
Overcoming In-house Politics to Implement eDiscovery
Overcoming In-house Politics to Implement eDiscoveryOvercoming In-house Politics to Implement eDiscovery
Overcoming In-house Politics to Implement eDiscovery
 
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017
Susskind, 'A Manifesto for AI in the Law' ICAIL 2017, London, 2017
 
Ai and Legal Industy - Executive Overview
Ai and Legal Industy - Executive OverviewAi and Legal Industy - Executive Overview
Ai and Legal Industy - Executive Overview
 
Procurement governance and complex technologies: a promising future?
Procurement governance and complex technologies: a promising future?Procurement governance and complex technologies: a promising future?
Procurement governance and complex technologies: a promising future?
 
Vintage group-newsletter xbrl-final-lr-1017
Vintage group-newsletter xbrl-final-lr-1017Vintage group-newsletter xbrl-final-lr-1017
Vintage group-newsletter xbrl-final-lr-1017
 
Enabling Efficient and Fair Markets for Digital Content
Enabling Efficient and Fair Markets for Digital ContentEnabling Efficient and Fair Markets for Digital Content
Enabling Efficient and Fair Markets for Digital Content
 

Viewers also liked

TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015Bipin Singh
 
Getting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireGetting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireHerwig Van Marck
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Nicola Sandoli
 
Houston Energy Data Science Meet up_TIBCO Slides
Houston Energy Data Science Meet up_TIBCO SlidesHouston Energy Data Science Meet up_TIBCO Slides
Houston Energy Data Science Meet up_TIBCO SlidesJennifer Walsh
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Kai Wähner
 
Smart Manufacturing and Industry 4.0 - Tibco PoV
Smart Manufacturing and Industry 4.0 - Tibco PoVSmart Manufacturing and Industry 4.0 - Tibco PoV
Smart Manufacturing and Industry 4.0 - Tibco PoVNicola Sandoli
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Kai Wähner
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
 

Viewers also liked (11)

Rapport annuel TIBCO - Exercice 2015
Rapport annuel TIBCO - Exercice 2015Rapport annuel TIBCO - Exercice 2015
Rapport annuel TIBCO - Exercice 2015
 
TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015
 
Getting the most out of Tibco Spotfire
Getting the most out of Tibco SpotfireGetting the most out of Tibco Spotfire
Getting the most out of Tibco Spotfire
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
Tibco Amx Bpm
Tibco Amx BpmTibco Amx Bpm
Tibco Amx Bpm
 
Houston Energy Data Science Meet up_TIBCO Slides
Houston Energy Data Science Meet up_TIBCO SlidesHouston Energy Data Science Meet up_TIBCO Slides
Houston Energy Data Science Meet up_TIBCO Slides
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
 
Smart Manufacturing and Industry 4.0 - Tibco PoV
Smart Manufacturing and Industry 4.0 - Tibco PoVSmart Manufacturing and Industry 4.0 - Tibco PoV
Smart Manufacturing and Industry 4.0 - Tibco PoV
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 

Similar to Making Systems Smarter about dealing with “imperfect” data

Introduction to ENT (Entity Network Translation)
Introduction to ENT (Entity Network Translation)Introduction to ENT (Entity Network Translation)
Introduction to ENT (Entity Network Translation)ENT Technologies
 
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...ActiveState
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
 
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsVirtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsSplunk
 
The Digital Procurement Era
The Digital Procurement EraThe Digital Procurement Era
The Digital Procurement EraTejari
 
Blue Rubin Task Force Presentation - Digital Preservation
Blue Rubin Task Force Presentation - Digital PreservationBlue Rubin Task Force Presentation - Digital Preservation
Blue Rubin Task Force Presentation - Digital PreservationPeter Mojica
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems
 
Viscount Systems (OTCQB:VSYS) Presentation
Viscount Systems (OTCQB:VSYS) PresentationViscount Systems (OTCQB:VSYS) Presentation
Viscount Systems (OTCQB:VSYS) PresentationInvestorideas.com
 
Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemSmart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemDATAVERSITY
 
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...CA Technologies
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
Ecosystm BreakFirst presentation slides
Ecosystm BreakFirst presentation slidesEcosystm BreakFirst presentation slides
Ecosystm BreakFirst presentation slidesChris White
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Predictive Enterprise Strategic Overview
Predictive Enterprise Strategic OverviewPredictive Enterprise Strategic Overview
Predictive Enterprise Strategic OverviewSteven Gorenbergh
 

Similar to Making Systems Smarter about dealing with “imperfect” data (20)

CTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINALCTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINAL
 
Introduction to ENT (Entity Network Translation)
Introduction to ENT (Entity Network Translation)Introduction to ENT (Entity Network Translation)
Introduction to ENT (Entity Network Translation)
 
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...
Python & Finance: US Government Mandates, Financial Modeling, and Other Snake...
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19DAMA Big Data & The Cloud 2012-01-19
DAMA Big Data & The Cloud 2012-01-19
 
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government InsightsVirtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
 
The Digital Procurement Era
The Digital Procurement EraThe Digital Procurement Era
The Digital Procurement Era
 
Blue Rubin Task Force Presentation - Digital Preservation
Blue Rubin Task Force Presentation - Digital PreservationBlue Rubin Task Force Presentation - Digital Preservation
Blue Rubin Task Force Presentation - Digital Preservation
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & Analytics
 
Viscount Systems (OTCQB:VSYS) Presentation
Viscount Systems (OTCQB:VSYS) PresentationViscount Systems (OTCQB:VSYS) Presentation
Viscount Systems (OTCQB:VSYS) Presentation
 
Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data EcosystemSmart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
Smart Data Slides: Leverage the IOT to Build a Smart Data Ecosystem
 
Taming the data beast
Taming the data beastTaming the data beast
Taming the data beast
 
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...
Fines in the Millions Levied Every Year Coming Soon! The Business Case for ...
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
Ecosystm BreakFirst presentation slides
Ecosystm BreakFirst presentation slidesEcosystm BreakFirst presentation slides
Ecosystm BreakFirst presentation slides
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Dit yvol3iss8
Dit yvol3iss8Dit yvol3iss8
Dit yvol3iss8
 
Predictive Enterprise Strategic Overview
Predictive Enterprise Strategic OverviewPredictive Enterprise Strategic Overview
Predictive Enterprise Strategic Overview
 
rutgers slides04
rutgers slides04rutgers slides04
rutgers slides04
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Making Systems Smarter about dealing with “imperfect” data

  • 1. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns Partner Enablement – October 12th, 2011 Making Systems Smarter about dealing with “imperfect” data Dave Chamberlain dchamberlain@Tibco.com
  • 2. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 2 Safe Harbor Disclosure During the course of this presentation TIBCO or its representatives may make forward-looking statements regarding future events, TIBCO‟s future results or our future financial performance. These statements are based on management‟s current expectations. Although we believe that the expectations reflected in the forward looking statements contained in this presentation are reasonable, these expectations or any of the forward looking statements could be prove to be incorrect and actual results or financial performance could differ materially from those stated herein. We refer you to the reports that TIBCO files from time to time with the Securities and Exchange Commission for a discussion of important factors that could cause actual results or financial performance to differ materially from those contained in any forward-looking statement made in connection with this presentation. TIBCO does not undertake to update any forward-looking statement that may be made from time to time or on its behalf.
  • 3. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 3 First Last Addr1 Addr2 City State Zip DOB Jon Smith 1030 Main St. Princeton NJ O8540 10/12/79 10/12/97 Jon Smiht 1030 Main Princeton NJ 0854O John Smyth Main Street 103A Pton NJ 08540 12/12/79 What‟s the problem? Humans can tell these records are about the same person Systems have a very hard time they can‟t
  • 4. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 4 First Last Addr1 Addr2 City County Post-code DOB Jonathan Price 103 The High Street Flat 2 York Yorkshire YR1604 10/12/79 Pryce Jon 1o3-2 High St YR16o4 Dec 10 1977 York John Prce High St #103 2 Y0rk Yorkshire YR1064 12/12/79 What‟s the problem? Humans can tell these records are about the same person Systems have a very hard time they can‟t
  • 5. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 5 TIBCO® Patterns  Focused on structured (fielded) data  Products, people, companies, claims, events, etc…  In-memory, real-time and designed to be embedded  Products  TIBCO® Patterns - Search • Finds patterns systems or people are looking for in data  TIBCO® Patterns - Learn • Detects and leans patterns when human make decisions on data similarity Enables organizations to “connect the dots”
  • 6. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 6 Horizontal applicability – all industries and agencies • CSRs looking for the right customer • Admissions finding the right patient • Customers finding things to buy • Intel agencies identifying terrorists Find • Identifying records about the same customer for KYC and SCV regulations • Ensuring citizens receive correct entitlements • Conforming with import/export regulations Match • Identifying potential fraud • Anti Money Laundering • Creating and maintaining an Master Patient Index Link The good news!
  • 7. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 7 Use cases by key verticals • Building 360 degree view of customers for regulatory purposes • Generating better up sell and cross sell opportunities (with BE integration) • Quickly finding the right customer • Anti Money Laundering FSI •Quickly finding the right customer •Understanding total relationship with customers •Keeping multiple systems synchronized Telco • Law enforcement/Intel – finding the “bad guys” • Making sure our kids are safe – child protection/youth services • Ensuring citizens receive (only) their correct entitlements Federal & State Government • Consolidating customers due to M&A activity • Matching energy trade sides • Linking data about grid and network assets Energy • Identifying duplicates in Master Patient Index • Linking patient encounter records for outcome driven healthcare • Finding the right patient, first time every time Healthcare
  • 8. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 8 Mainframe Database 3-Tier Online ESB N-Tier Event Driven 2-Tier Batch 000,000,000’s 000,000,000,000’s000,000’s Building Block Enterprise 1.0 (‟60s – ‟80s) Data Processing Enterprise 2.0 (‟80s – 2000) Client Server Enterprise 3.0 (2000 – 2020) Predictive Software Velocity Interactions Time to React Amount of DataHalf Life of Data The New Data Challenge Now it’s even more important to deal effectively and efficiently with imperfect data
  • 9. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 9 The problems we all face In the real world, database information is never 100% perfect, never 100% consistent, and never 100% complete – and never can be. Data by its nature is full of errors: omissions, inconsistencies and duplicates.  A root cause - human-computer gap  Humans recall information approximately and easily tolerate data errors and variations when determining similarity  Software has been exact and unforgiving  Equality or inequality is easy  “last name = chamberlain” - “inventory level < 100”  Similarity is difficult  Select * from customers with .85 similarity between this and that…  “Chamberlain” ≈ “Chumberland” - are they the same person?
  • 10. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 10 The cost of not finding the data you need  The organizational/societal cost is high  Terrorists board planes  Criminals get away  Patients get the wrong treatments  Enterprises don‟t realize economies of scale  FSI doesn‟t really know their customers – up-sell/x-sell opportunities are lost – risk is not known  Government entitlements get abused - fraud goes undetected  Goods and/or people enter or leave a country illegally  Repeat drunk drivers get drivers licenses  TV listings are wrong  Logitech remote controls don‟t work correctly  etc…  These types of problems permeate every organization
  • 11. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 11 Types of things our customers do  Ensuring compliance with export/import regulations  Linking patients and their visit records for outcome-driven healthcare  Finding the right person across all law enforcement systems  Creating the world‟s largest Biobank for genetic researchers  Helping customers find the right brand and the right model to program their remote controls  Automating the ingest of TV programming schedules from over 150 broadcast and cable operators  Reducing turnaround time from 5 days to 4 hours to respond to customer requests for equipment  And many more examples… https://ssl.tibcommunity.com/community/products/patterns
  • 12. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 12 Our innovations Problem Conventional Solutions TIBCO Innovations Advantages How similar are sets of data elements? Soundex, NYSIIS, Edit Distance, Metaphone etc Mathematical model that finds patterns systems or people are looking for in data • Superior accuracy • Symmetric error-tolerance • No guessing of rules and parameters • Computational efficiency & scalability • Data independence - people, assets, TV programs, stock trades, products, companies, claims, transactions, etc. • Engineering efficiency - easy to maintain and refine • Independent of language • Real-time • Sparse data support built-in • Easily embeddable • Quick and easy deployment • DBMS independent Are records about the same entity? Custom built matching rule sets - optional statistical parameters Mathematical model that identifies and learns patterns as humans make decisions about data similarity TIBCO Patterns - Search TIBCO Patterns - Learn
  • 13. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 13 TIBCO Patterns – Search - Bipartite Graph – String Matching w. Unigrams  Cost = |displacement| (linear cost function)  Pick set of edges that minimize cost  Only one edge per symbol allowed P E T E R _ S M I T H S M I T P E T T E R 4 4 5 -6 -6 -6 5 2 7 4 5 -6 -3 -2 1
  • 14. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 14 TIBCO Patterns – Search - Bipartite Graph – String Matching w. Polygrams P E T E R _ S M I T H S M I T P E T T E R 4 -6 5 Total cost = 4 + 5 + |-6| = 15 Find local cost minimum Longer Grams have more “weight” 5
  • 15. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 15 Bipartite Graph – String Matching w. Alignment  Shifted 4 positions for global cost minimum (edges may change)  Minimizing total cost (w/o weight: 12, w/ weight: 42) [simplified]  Different solutions possible – weights, tokenization, … P E T E R _ S M I T H S M I T P E T T E R 0 (x3) -10 (x4) 1 (x2) 0 1 (x#: weight based on length
  • 16. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 16 Why is this relevant?  Unique capabilities result from fundamental approach  Closest to human intuition – “natural” paradigm  Translates to accuracy  Complete independence of domain – any sequences embedded into 1-space (think genetic sequences)  Does not care about data type, culture, language, character set, tokenization, fielding  Solid scientific footing guarantees robustness (linear behavior)
  • 17. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 17 First Last DOB Address City State Height Hair color etc Rec 1 Jason Fitzgerlad 12/1/1971 200 Classen St. Paul MN 5‟10” Brown Rec 2 2000 N Classon Fitzgerland Jasoz Saint Paul MN 5-11 Brawn TIBCO Patterns - Search (0.80) 0.90 0.82 -1 0.87 0.85 1.0 0.95 1.0 TIBCO Patterns - Learn Overall score / classification 0.93 Intelligent combination of field scores Search compared to Learn
  • 18. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 18 TIBCO Patterns - Learn  N input features F = (f1,f2, … fn)  Similarity score  Custom score (date)  Binary values: both records male/female  Other numeric input  Features can be missing (defaults, undefined, invalid): -1  Similarity problem is a different one depending on what information is present (If you only have a name and no address you look at the name differently!)  Conditional dependencies = hidden patterns in data  When ID matches closely, you are more generous in the address field  When (both records) female, totally different last name is acceptable (if first name is similar or …)  Thresholds, weights, patterns, …  Humans do it intuitively – such as recognize a person
  • 19. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 19 TIBCO® Patterns – Learn - training  Pair selection for training  Human user is presented with a pair of records  Machine Learning Engine sees the numeric features and human answer  Engine updates model and tests its performance  Stop when model converges  Avoid overtraining Initial Matching Pair selection Labeling Train Test Domain Experts
  • 20. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 20 TIBCO Patterns – Learn - Deployment  Deploy model - incrementally train a model with new patterns  Add features to existing model and incrementally train  Select among multiple models at run time  Significant boost in accuracy  Need expert operators to coach during training  Set and forget – very robust
  • 21. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 21 TIBCO® Patterns - Learn - When to use it  Multiple patterns present  Many (short) fields  Sparse data  When you can’t or don’t want to build matching rules to deal with multiple parallel scenarios  e.g. Comparables matching: product data, similarity judged based on UPC code, or name and manufacturer or description only
  • 22. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 22 California Department of Public Health Prenatal Genetic Screening Program TIBCO® Patterns evaluation and implementation  CDPH benchmarked TIBCO and a competitor  After 3 weeks competitor reached 79% accuracy  After 1 day TIBCO topped 97%  Two phase project undertaken  First - cleaning the CDHP database (get clean)  5.5 million record reference database of at risk women  2.3 million duplicate records identified - representing 1 million unique women  1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women  Then – automate matching of incoming test results (stay clean)  Before TIBCO – 65% automated match rate  After TIBCO – 95%+ automated match rate  Overall results  Greatly improved levels of automation  Earlier identification/treatment/counseling for possible problems  Bottom line - Better quality care for at risk women and their unborn
  • 23. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 23 Labs CDPH Reference Database TIBCO® Patterns Labs Test results from contract labs and PDCs Screening results ingest process > threshold = match < threshold = no match = new >< thresholds = human review? Human review and action California Department of Public Health Prenatal Genetic Screening Program Diagnostic Centers Diagnostic Centers
  • 24. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 24 Typical customers and partners
  • 25. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 25 Architectural Overview TIBCO® Patterns Solaris (32/64) Linux (32/64) HPUX (32/64) AIX (32/64) Windows (32/64) VMS (32/64) .NET C/C++Java ActiveMatrix Language/client Interfaces Server based engines Supported OSes LearnSearch BusinessWorks Python BusinessEventsCIM
  • 26. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 26 Typical Deployment Database Engine(s) (ActiveSpaces, Oracle, DB2, SQLServer, MySQL, Informix, Sybase, Postgres, Caché, …) Applications (front-end and/or back-end) TIBCO client TCP/IP sockets TCP/IP sockets Current Applications Run unchanged TIBCO® Patterns Loader / Syncer Tables •Multiple instances with multiple tables •TCP/IP Sockets •Thin client to marshal requests and return results •Partition and/or replicate data for scale •Loader/syncer for initial load and subsequent synchronization Other data sources
  • 27. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 27 Identifying opportunities  Everyone of your customers and prospects – is a prospect!  Some questions to ask  What is the business impact (and cost) of not being able to deal effectively with imperfect/bad data?  Where do you have people (either your own or customers) spending a lot of time searching for the right data (about people, products, suppliers etc etc)  How many people do you have matching records by hand?  What would it mean if you were to automate a higher percentage of the matching?  What is your current level of matching accuracy?  What do you do with the records that SQL can‟t match?  How do you deal with content differences between records that represent the same entity?
  • 28. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 28 Resources  The Princeton team  “Webex” sessions/demos, on-site meetings, EBC visits, POCs, custom demos, industry or customer specific materials…Anything we can do to help identify/develop/close TIBCO Patterns license revenue.  Live demos (the only vendor to do this, I wonder why…)  Demo index - http://www.netrics.com/demo_index/  English live demo (try the advanced search button) - http://netrics.com/demo/  Multilingual (try the surprise me button) demo - http://www.netrics.com/demo/index_foreign.php  Oil well head data - http://www.netrics.com/demo_energy_oil  Spanish names - http://www.netrics.com/demo_spanish_names  Portuguese university names - http://www.netrics.com/demo_universities/  FDA drug demo - http://netrics.com/demo_fda_drugs/  SalesCentral materials - https://salescentral.tibco.com/people/dchamberlain?view=documents  2 minute explainer - http://www.netrics.com/demo/NetricsSPOT.html  TIBCommunity - https://ssl.tibcommunity.com/people/dchamberlain?view=documents
  • 29. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 29 Live demonstration of TIBCO® Patterns capabilities It’s inherently very difficult to demonstrate an engine, and we wanted to show: 1. The ability to deal effectively and efficiently with just about any type of structured data 2. The ability to work with any “language” 3. Very low latency when dealing with large data sets
  • 30. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 30 Live demonstration of capabilities TIBCO is the only vendor to feature live demonstrations. We show the ability to deal with any type of data in any language with very low latency. English Demos Multilingual Demos FDA Drug Demo
  • 31. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 31 Differentiation  TIBCO innovations are unique in the market…  Mathematical modeling • Finding patterns in data – giving system and people the data they need • Finding and learning from human decisions  Eliminates the need to guess complex matching rule sets: • Difficult to develop, maintain and update  Works equally effectively across multi-domain, multi-lingual data  Does not require a DBMS, but integrates nicely if needed  The results are unmatched…  Accuracy  Speed  Scalability  Easy to deploy, maintain and update
  • 32. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 32 Five things customers should consider  Accuracy – how close can the system come to reaching the same conclusions as a domain expert when faced with the same data?  Efficiency – how easily can the system deal with increasingly large volumes of data and workloads?  Entity and language independence – how does the system deal with data about any type of business entity in any language? Systems are global and need to deal with data about many entities other than customers and products.  Configurability – what options are provided to fine tune requests to easily achieve the desired results?  Ease of integration – how is the system integrated into existing applications, processes and tools? What native language support provided? What ESB, SOA, BPM and CEP products are supported?
  • 33. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns Customer stories
  • 34. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 34 Types of things our customers do  Ensuring pre-natal genetic screening results are linked to the right woman  Automating the ingest of TV programming schedules from over 150 broadcast and cable operators  Helping customers find the right brand and the right model to program their remote controls  Ensuring compliance with government export/import regulations  Reducing turnaround time from 5 days to 4 hours when responding to customer requests for equipment  Helping UK government agencies collaborate to provide better care  Provide real-time linking and de-duplication across 700 million bibliographical and citation items  Linking patients and their visit records for outcome-driven healthcare  Creating the world’s largest Biobank for genetic researchers
  • 35. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 35 California Department of Public Health Prenatal Genetic Screening Program TIBCO® Patterns - evaluation and implementation  CDPH benchmarked the effectiveness of TIBCO and a competitor  Created a standardized data set, identified duplicates by hand, then…  After 3 weeks competitor reached 79% accuracy  After 1 day TIBCO topped 97%  Two phase project undertaken  First - cleaning the CDPH database (get clean)  5.5 million record reference database of at risk women  2.3 million duplicate records identified - representing 1 million unique women  1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women  Then – automate matching of incoming test results (stay clean)  Before TIBCO – 65% automated match rate  After TIBCO – 95%+ automated match rate  Results  Greatly improved levels of automation
  • 36. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 36 Contract Labs 4.2 million records of at risk women TIBCO® Patterns Contract Labs Test results from contract labs and PDCs Genetic testing results ingest process > threshold = link < threshold = no link = new? <>thresholds – human review Human review and action CDPH – linking test results to the right woman Prenatal Diagnostic Centers Prenatal Diagnostic Centers
  • 37. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 37 TV Guide – accurate, timely programming  Different stations often describe the same show in different ways  TV Guide have built a reference dataset of 2M + programs  Regular ingesting of future programming from hundreds of broadcast/cable operators  Millions of Web, print and channel guide customers  Information on over 12,0000 channel lineups provided to hundred of cable/satellite operators and millions of Web visitors and print readers.  TV Guide benchmarked several vendors – TIBCO selected for superior accuracy and automation  Results  Significant reduction in manual effort  More accurate guides  People now focus on enriching the data and enhancing customer experiences Providing accurate, informative program information for 100s of millions of customers
  • 38. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 38 Cable operators 2+ million record content database TIBCO® Patterns Programming from hundreds of cable, satellite & broadcast outlets Match and link incoming records to content DB >Exists – link/enrich content <New – add to content DB <>Uncertain – human review Human review and action TV Guide – matching future programming to content database  Incoming data quality is highly variable  Content data on over 170,000 movies, a million plus TV series episodes and every TV show since 1960 Broadcast channels Satellite providers
  • 39. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 39 Logitech – increasing customer satisfaction by making it easier to find the right brand/model  Electronic brands/model number combinations are complex and hard to double transcribe  Customers were becoming frustrated…  Needed a way of suggesting the right brand/model even when customer entries were way off  TIBCO Patterns – Search is used “behind” the Web UI to find the closest matching models and suggest them to the consumer Harmony remotes feature activity-based control that makes getting to what you want to do as simple as pressing a button
  • 40. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 40 Web UI 300,000+ records about brands & models TIBCO® Patterns Customers programming their Harmony remotes Finding the closest matching models > threshold = show closest matching models Logitech – finding the right model to program Web UI Web UI
  • 41. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 41 Customs & Excise Department of Hong Kong/PCCW TIBCO® Patterns - evaluation and implementation  Hong Kong Customs & Excise Dept required a matching engine as part of their initial specification of the ROCARS system  Need to check bills of lading and manifests against a series of white/black lists – in mix of simplified Chinese, Traditional Chinese and English  PCCW won the contract and started development work with another vendors matching capability – and quickly realized it was not up to the job  In late 2008, PCCW Googled and found TIBCO® Patterns – after several weeks of discussions and demonstrations they started - and quickly finished a POC  The ROCARS system went live in early 2010 with TIBCO® Patterns at the heart “risk engine”. ROCARS System
  • 42. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 42 Border crossings White lists Black lists TIBCO® Patterns ROCARS data entry sources Checking Bills of Lading and Cargo Manifests > threshold = suspicious < threshold = OK <> thresholds – human review Human review and action Customs & Excise Department – ensuring compliance with government regulations Web Self service kiosks
  • 43. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 43 Sterilmed – automating quote process Sterilmed offers products and services to help healthcare providers lower their device and equipment costs  Hospitals send lists of up to 5,000 items of required medical equipment they need, often in Excel spreadsheets  Specific line items need to be matched to the Sterilmed inventory system – required equipment would usually be described differently  Getting a quote for a request could take up a week - several analysts matching line item by line item  Process of producing a quote has decreased to 4 hours, and is much more highly automated  Other uses now include matching across various healthcare industry databases and identifying duplicate contacts across systems  Results  More efficient and effective business processes  People now able to do their real job
  • 44. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 44 Hospitals Sterilmed device and equipment inventory TIBCO® Patterns Equipment requests from healthcare providers Linking customer requests to Sterilmed inventory > threshold = link < threshold = no link = new? <> thresholds – human review Human review and action Sterilmed – reducing quote process from 5 days to 4 hours Clinics Other providers
  • 45. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 45 LiquidLogic – helping government organisations collaborate LiquidLogic provides the public sector with a platform enabling multi- agency solutions for collaborative working  UK Public Sector is moving towards collaboration between multiple organisations/agencies  The same client is often represented differently in several different databases – hampering collaboration, and with potentially dangerous or fatal outcomes  Need to model and enable real-time process and data sharing across multiple organisations  TIBCO® Patterns - Search provides the real-time duplicate identification, duplicate prevention and searching services across the Protocol platform  Installed at dozens of Public sector organisations, helping them provide better services to their clients and detect potential fraud PROTOCOL overview
  • 46. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 46 Children services Linked view of clients across multiple systems TIBCO® Patterns Multiple systems across multiple organisations Providing 360 degree of clients • Duplicate identification • Duplicate prevention • Searching Human review and action LiquidLogic – Protocol platform Domestic violence Other systems •Allows a radical redesign of processes •Integrates with existing corporate applications to present a SOA •Manages multiple disparate data sources supporting the information sharing requirements of federated applications
  • 47. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 47 Los Alamos National Laboratory (LANL) – real-time linking of bibliographic and citation data  LANL Research Library locally hosts large data collections  A&I databases: ISI Citation Databases, Inspec, BIOSIS, Engineering Index, …  Full-text collections: Elsevier, Wiley, APS, IOP, …  Duplicates in LANL data collection  Amongst bibliographic records and citations  Between bibliographic records and citations  De-duplication, matching and linking needed  Join records from several databases that describe the same work  Find works that cite a given work  High volumes - >600 million citations and >65 million bibliographic items  High request rates ->25/second  Results look much better than those of batch de-duplication approach ~ TIBCO® Patterns + training by librarians  Can „de-dup‟ external data against local data, no batch processing, but on-the-fly de-duplication  Possibility to retrain the system to optimize responses without data reprocessing: machine learning module  Scalability to accommodate growth of datasets LANL presentation on real-time matching Los Alamos National Laboratory is a premier national security research institution. Working on advanced technologies to provide the United States with the best scientific and engineering solutions to many of the nation’s most crucial challenges.
  • 48. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 48 PubMed Inspec Biosis 65 million bibliographical items TIBCO® Patterns Local and remote A&I databases Linking bibliographic and citations in real-time LANL – systems matching like humans Local and remote full text databases APS Elsevier Wiley 700 million citation items •Trained by domain experts (LANL librarians) •Clearer cut off between matches and non-matches •Never a need to re-index • Given a bibliographic key, which are the matching bibliographic records? • Given a citation key, which are the matching bibliographic records? • Given the identifier of a bibliographic record, what is the corresponding bibliographic key? • Given a bibliographic key, which are the citing bibliographic records? • Given a citation key, which are the citing bibliographic records? Forgiving to errors in datasets Forgiving to errors in query Compares like humans
  • 49. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 49 Rush Health – moving to outcome driven care Rush Health is a clinically integrated network of providers working together to improve health through high quality, efficient health services covering the spectrum of patient care from wellness, prevention and health promotion, to disease management and complex care management.  Three major hospitals, 750 on-staff physicians and 50 allied health providers  Initiatives – both need highly accurate and automated matching  Moving to make diagnosis and treatment more proactive  Implementing a system where payments are based on the outcome of healthcare, not the amount prescribed  Patients will often visit different facilities and have their encounter and therapy details recorded differently, resulting in high duplicate records rates  First - cleaning the EMPI (get clean)  ~2 million record EMPI database was analyzed for duplicate patient records  ~1.6 million duplicate records identified  Leaving reference database of ~400,000 unique patients imported back into EMPI  Then – automate matching of incoming patient encounter records into their Enterprise Data Warehouse based EMPI (stay clean)  Results  Much higher level of automated matching - for an accurate and duplicate free EMPI  More accurate assessment of treatment outcomes and preventative care
  • 50. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 50 Hospitals Enterprise Data Warehouse - EMPI TIBCO® Patterns Encounter records from hospital and doctor visits Linking to the right patient record > threshold = link < threshold = no link = new? <> thresholds – human review? Human review and action Rush Health – linking encounter records to the right patient Dr offices Clinics
  • 51. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 51 California Department of Public Health – Building the world‟s largest Research Ready Biobank Development of a research-ready pregnancy and newborn biobank in California  California has been collecting samples at several key life events for many years, this could have been a tremendous potential source of research data  Because of the human involvement in manually linking record from different sources , only 5 or 6 requests for research sample data per year could be handled  NIH grant to Sequoia Foundation for the specimen tracking and linkage project awarded in late 2009  Now linking ~ 200 million records life event records from across multiple systems to form the research ready biobank  Results will be  A life course, client based system that enables cross-generational studies, population-based family studies, and women-level studies across multiple pregnancies  Cost-efficiently process requests, track specimens, conduct linkage and integration of new data  Process high volumes of specimen and data requests CDHP Research ready Biobank
  • 52. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 52 Fetal deaths Live births Genetic Screening Research ready biobank TIBCO® Patterns Sample data collected over 20+ years on significant life events Linking records from multiple sources CDPH – linking 100s of millions of records Deaths Screened newborns • Descriptive epidemiologic studies on the birth prevalence of genetic disorders and seroprevalence of infectious agents • Analytic epidemiologic studies to determine the causes of birth defects, preterm birth and other disorders • Laboratory studies to develop and validate screening tests and other assays • Prevention and intervention studies to guide the design of screening models with maximum efficiency > threshold = link < threshold = no link <> thresholds = human review
  • 53. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns for BusinessWorks™
  • 54. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 54 BusinessWorks™ Patterns Plugin Goals  Enable applications developed with BusinessWorks™ Designer to deal effectively with “imperfect” data  Provide access to the TIBCO® Patterns – Search capabilities without having to write any code.  Use standard BusinessWorks™ Designer to design, test and initiate queries to the engine.  Integrate TIBCO® Patterns – Search into the TIBCO product suite using standard TIBCO development and integration tools.  Leverage existing BW components to provide access to TIBCO® Patterns – Search from custom and off the shelf applications.  Provide dynamic query request configuration.
  • 55. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 55 Architecture  Runs in the BusinessWorks™ extensible framework for integration  Uses the TIBCO® Patterns Java API  Consists of two jar files  TIBCO-Patterns-Java-Interface-4.5.1.jar  BWPatternsPlugIn_4.5.1.jar  Installed into <TIBCO-HOME> using the TIBCO Universal Installer
  • 56. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 56 BusinessWorks palettes Patterns Palette provides CRUD operations to be performed on in-memory tables. Data can be loaded from any external data source. In-memory tables are used for matching in accordance to the matching criteria and query strings provided. Can leverage other BW activities and can be exposed as services for further consumption by other enterprise consumers.
  • 57. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 57 Duplicate identification Data loaded from multiple systems Multiple files about the same entity type Load in common schema Results above specific similarity score are duplicates Iterate through data record by record Take appropriate action – merge? link?
  • 58. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 58 Sample duplicate identification process
  • 59. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 59 Data augmentation Reference Data (Experian life style data brick) for example File with data for augmentation Find most similar record(s) Augmented with data from reference Augmented data
  • 60. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 60 Identifying overlap and uniqueness across multiple data sets Unique to A Unique to B Overlap2 or more files about the same entity type Data loaded into multiple tables Results in A or B or both
  • 61. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 61 Record classification Table(s) loaded with keywords or phrases of interest Incoming records for classification Search for most similar keywords Results are closest classification(s) Iterate through data record by record Take appropriate action based on classification
  • 62. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 62 Fuzzy parsing Table(s) loaded with keywords or phrases of interest Incoming records for parsing Search for most similar keywords Results are closest matching words and phrases Iterate through data record by record Take appropriate action
  • 63. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns for BusinessEvents™
  • 64. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 64 Patterns plug-in for BusinessEvents • TIBCO Patterns Plugin for BusinessEvents available in 4.5.1 • Implemented as BE Studio Catalog Functions • Code samples available for all function calls
  • 65. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 65 TIBCO Patterns - Business Events Plugin
  • 66. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 66 TIBCO Patterns – Enable Catalog Function View Catalog Functions appear when the BE perspective is selected and a Rule Function is open.
  • 67. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 67 Customer database(s) TIBCO Patterns correlates the records before loading Multiple concepts about the same customer are loaded Correlating multiple records about the same entity Before After A single concept for a customer is loaded
  • 68. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 68 BE Incoming eventsIncoming events Concept Cache TIBCO Patterns BusinesEvents Concept Cache Are there existing similar events? Before After New concepts are added even when very similar concepts exist Do existing similar events exist?
  • 69. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 69 Event cloud System A System B System C TIBCO Patterns Fraud? Cyber attack? Money laundering? Finding patterns in the event cloud Other conclusions…
  • 70. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. AMX BPM / Patterns Demo
  • 71. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 71 Synopsis Data is very rarely entered into a system (or worse, multiple systems) the same way. We use full names sometimes and nicknames others, we “fat-finger” keys, we reverse fields, we use maiden names or married names, etc., etc. This makes finding the exact data we want or need very difficult (or many times impossible). Finding incorrect information, or not finding information at all can be life threatening in some situations. In Healthcare if we do not find information about a Patient, we may not know about allergies or drug interactions as an example. This demonstration of AMX BPM includes our Netrics Matching Engine technology which allows a user to find a customer record even without having all the information about that customer or regardless of how that customer information may have been incorrectly entered into the system(s).
  • 72. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 72 Step 1: Loading the data into Patterns This is the format of the data table created in the Matching Engine as displayed from the Display Table option in BW. This BW process loads data from a source (in this case a CSV file) into Patterns - which is running as a Windows service. This data represents the most current customer information on file.
  • 73. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 73 Step 2: Starting the Process – Enter Customer Information This is the AMX BPM Business Service (FormFlow) which starts the process. Enter “Customer” information to search form displayed on the first step in the process.
  • 74. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 74 Step 3: Calling Patterns This is the AMX BPM Business Service (FormFlow) which starts the process. This BW process calls the Matching Engine with the search criteria entered in the first form. The results are mapped and passed back to the BPM process for display on the second form.
  • 75. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 75 Step 4: Selecting the most similar records or refining the search The Results form displayed on the second step of the process (after the call Matching Engine step) showing the results of the initial search. Additional refined searches can also be run from this step. This entire process appears as a single form to the user (FormFlow). This is the AMX BPM Business Service (FormFlow) which starts the process.
  • 76. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 76 Step 5: Invoking the main process This is the AMX BPM Business Service (FormFlow) which starts the process. This is the main AMX BPM process that displays the selected customer record from the Business Service Process and allows for modifications to the data.
  • 77. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 77 Step 6: Viewing the selected customer record and modifying the data if necessary This form allows a user to edit the data returned from the Matching Engine. If a change is made, it will go to an approval step.
  • 78. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 78 Step 7: Approving the updated customer record This is the Approval form. It allows a user to approve (or reject) the update to the customer record made in the previous step.
  • 79. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 79 Step 8: Update back-end system This step checks the back-end system of record to see if the customer record exists, performs an update if it does, or an Insert if it doesn‟t.
  • 80. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 80 Step 9: Notification of the update Once complete, depending on the process path taken one of two email notifications will be sent.