Partner event tibco patterns 2011-10-12

1,941 views

Published on

About the patterns in Tibco AMX BPM

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,941
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Partner event tibco patterns 2011-10-12

  1. 1. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns Partner Enablement – October 12th, 2011 Making Systems Smarter about dealing with “imperfect” data Dave Chamberlain dchamberlain@Tibco.com
  2. 2. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 2 Safe Harbor Disclosure During the course of this presentation TIBCO or its representatives may make forward-looking statements regarding future events, TIBCO‟s future results or our future financial performance. These statements are based on management‟s current expectations. Although we believe that the expectations reflected in the forward looking statements contained in this presentation are reasonable, these expectations or any of the forward looking statements could be prove to be incorrect and actual results or financial performance could differ materially from those stated herein. We refer you to the reports that TIBCO files from time to time with the Securities and Exchange Commission for a discussion of important factors that could cause actual results or financial performance to differ materially from those contained in any forward-looking statement made in connection with this presentation. TIBCO does not undertake to update any forward-looking statement that may be made from time to time or on its behalf.
  3. 3. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 3 First Last Addr1 Addr2 City State Zip DOB Jon Smith 1030 Main St. Princeton NJ O8540 10/12/79 10/12/97 Jon Smiht 1030 Main Princeton NJ 0854O John Smyth Main Street 103A Pton NJ 08540 12/12/79 What‟s the problem? Humans can tell these records are about the same person Systems have a very hard time they can‟t
  4. 4. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 4 First Last Addr1 Addr2 City County Post-code DOB Jonathan Price 103 The High Street Flat 2 York Yorkshire YR1604 10/12/79 Pryce Jon 1o3-2 High St YR16o4 Dec 10 1977 York John Prce High St #103 2 Y0rk Yorkshire YR1064 12/12/79 What‟s the problem? Humans can tell these records are about the same person Systems have a very hard time they can‟t
  5. 5. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 5 TIBCO® Patterns  Focused on structured (fielded) data  Products, people, companies, claims, events, etc…  In-memory, real-time and designed to be embedded  Products  TIBCO® Patterns - Search • Finds patterns systems or people are looking for in data  TIBCO® Patterns - Learn • Detects and leans patterns when human make decisions on data similarity Enables organizations to “connect the dots”
  6. 6. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 6 Horizontal applicability – all industries and agencies • CSRs looking for the right customer • Admissions finding the right patient • Customers finding things to buy • Intel agencies identifying terrorists Find • Identifying records about the same customer for KYC and SCV regulations • Ensuring citizens receive correct entitlements • Conforming with import/export regulations Match • Identifying potential fraud • Anti Money Laundering • Creating and maintaining an Master Patient Index Link The good news!
  7. 7. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 7 Use cases by key verticals • Building 360 degree view of customers for regulatory purposes • Generating better up sell and cross sell opportunities (with BE integration) • Quickly finding the right customer • Anti Money Laundering FSI •Quickly finding the right customer •Understanding total relationship with customers •Keeping multiple systems synchronized Telco • Law enforcement/Intel – finding the “bad guys” • Making sure our kids are safe – child protection/youth services • Ensuring citizens receive (only) their correct entitlements Federal & State Government • Consolidating customers due to M&A activity • Matching energy trade sides • Linking data about grid and network assets Energy • Identifying duplicates in Master Patient Index • Linking patient encounter records for outcome driven healthcare • Finding the right patient, first time every time Healthcare
  8. 8. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 8 Mainframe Database 3-Tier Online ESB N-Tier Event Driven 2-Tier Batch 000,000,000’s 000,000,000,000’s000,000’s Building Block Enterprise 1.0 (‟60s – ‟80s) Data Processing Enterprise 2.0 (‟80s – 2000) Client Server Enterprise 3.0 (2000 – 2020) Predictive Software Velocity Interactions Time to React Amount of DataHalf Life of Data The New Data Challenge Now it’s even more important to deal effectively and efficiently with imperfect data
  9. 9. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 9 The problems we all face In the real world, database information is never 100% perfect, never 100% consistent, and never 100% complete – and never can be. Data by its nature is full of errors: omissions, inconsistencies and duplicates.  A root cause - human-computer gap  Humans recall information approximately and easily tolerate data errors and variations when determining similarity  Software has been exact and unforgiving  Equality or inequality is easy  “last name = chamberlain” - “inventory level < 100”  Similarity is difficult  Select * from customers with .85 similarity between this and that…  “Chamberlain” ≈ “Chumberland” - are they the same person?
  10. 10. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 10 The cost of not finding the data you need  The organizational/societal cost is high  Terrorists board planes  Criminals get away  Patients get the wrong treatments  Enterprises don‟t realize economies of scale  FSI doesn‟t really know their customers – up-sell/x-sell opportunities are lost – risk is not known  Government entitlements get abused - fraud goes undetected  Goods and/or people enter or leave a country illegally  Repeat drunk drivers get drivers licenses  TV listings are wrong  Logitech remote controls don‟t work correctly  etc…  These types of problems permeate every organization
  11. 11. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 11 Types of things our customers do  Ensuring compliance with export/import regulations  Linking patients and their visit records for outcome-driven healthcare  Finding the right person across all law enforcement systems  Creating the world‟s largest Biobank for genetic researchers  Helping customers find the right brand and the right model to program their remote controls  Automating the ingest of TV programming schedules from over 150 broadcast and cable operators  Reducing turnaround time from 5 days to 4 hours to respond to customer requests for equipment  And many more examples… https://ssl.tibcommunity.com/community/products/patterns
  12. 12. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 12 Our innovations Problem Conventional Solutions TIBCO Innovations Advantages How similar are sets of data elements? Soundex, NYSIIS, Edit Distance, Metaphone etc Mathematical model that finds patterns systems or people are looking for in data • Superior accuracy • Symmetric error-tolerance • No guessing of rules and parameters • Computational efficiency & scalability • Data independence - people, assets, TV programs, stock trades, products, companies, claims, transactions, etc. • Engineering efficiency - easy to maintain and refine • Independent of language • Real-time • Sparse data support built-in • Easily embeddable • Quick and easy deployment • DBMS independent Are records about the same entity? Custom built matching rule sets - optional statistical parameters Mathematical model that identifies and learns patterns as humans make decisions about data similarity TIBCO Patterns - Search TIBCO Patterns - Learn
  13. 13. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 13 TIBCO Patterns – Search - Bipartite Graph – String Matching w. Unigrams  Cost = |displacement| (linear cost function)  Pick set of edges that minimize cost  Only one edge per symbol allowed P E T E R _ S M I T H S M I T P E T T E R 4 4 5 -6 -6 -6 5 2 7 4 5 -6 -3 -2 1
  14. 14. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 14 TIBCO Patterns – Search - Bipartite Graph – String Matching w. Polygrams P E T E R _ S M I T H S M I T P E T T E R 4 -6 5 Total cost = 4 + 5 + |-6| = 15 Find local cost minimum Longer Grams have more “weight” 5
  15. 15. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 15 Bipartite Graph – String Matching w. Alignment  Shifted 4 positions for global cost minimum (edges may change)  Minimizing total cost (w/o weight: 12, w/ weight: 42) [simplified]  Different solutions possible – weights, tokenization, … P E T E R _ S M I T H S M I T P E T T E R 0 (x3) -10 (x4) 1 (x2) 0 1 (x#: weight based on length
  16. 16. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 16 Why is this relevant?  Unique capabilities result from fundamental approach  Closest to human intuition – “natural” paradigm  Translates to accuracy  Complete independence of domain – any sequences embedded into 1-space (think genetic sequences)  Does not care about data type, culture, language, character set, tokenization, fielding  Solid scientific footing guarantees robustness (linear behavior)
  17. 17. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 17 First Last DOB Address City State Height Hair color etc Rec 1 Jason Fitzgerlad 12/1/1971 200 Classen St. Paul MN 5‟10” Brown Rec 2 2000 N Classon Fitzgerland Jasoz Saint Paul MN 5-11 Brawn TIBCO Patterns - Search (0.80) 0.90 0.82 -1 0.87 0.85 1.0 0.95 1.0 TIBCO Patterns - Learn Overall score / classification 0.93 Intelligent combination of field scores Search compared to Learn
  18. 18. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 18 TIBCO Patterns - Learn  N input features F = (f1,f2, … fn)  Similarity score  Custom score (date)  Binary values: both records male/female  Other numeric input  Features can be missing (defaults, undefined, invalid): -1  Similarity problem is a different one depending on what information is present (If you only have a name and no address you look at the name differently!)  Conditional dependencies = hidden patterns in data  When ID matches closely, you are more generous in the address field  When (both records) female, totally different last name is acceptable (if first name is similar or …)  Thresholds, weights, patterns, …  Humans do it intuitively – such as recognize a person
  19. 19. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 19 TIBCO® Patterns – Learn - training  Pair selection for training  Human user is presented with a pair of records  Machine Learning Engine sees the numeric features and human answer  Engine updates model and tests its performance  Stop when model converges  Avoid overtraining Initial Matching Pair selection Labeling Train Test Domain Experts
  20. 20. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 20 TIBCO Patterns – Learn - Deployment  Deploy model - incrementally train a model with new patterns  Add features to existing model and incrementally train  Select among multiple models at run time  Significant boost in accuracy  Need expert operators to coach during training  Set and forget – very robust
  21. 21. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 21 TIBCO® Patterns - Learn - When to use it  Multiple patterns present  Many (short) fields  Sparse data  When you can’t or don’t want to build matching rules to deal with multiple parallel scenarios  e.g. Comparables matching: product data, similarity judged based on UPC code, or name and manufacturer or description only
  22. 22. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 22 California Department of Public Health Prenatal Genetic Screening Program TIBCO® Patterns evaluation and implementation  CDPH benchmarked TIBCO and a competitor  After 3 weeks competitor reached 79% accuracy  After 1 day TIBCO topped 97%  Two phase project undertaken  First - cleaning the CDHP database (get clean)  5.5 million record reference database of at risk women  2.3 million duplicate records identified - representing 1 million unique women  1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women  Then – automate matching of incoming test results (stay clean)  Before TIBCO – 65% automated match rate  After TIBCO – 95%+ automated match rate  Overall results  Greatly improved levels of automation  Earlier identification/treatment/counseling for possible problems  Bottom line - Better quality care for at risk women and their unborn
  23. 23. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 23 Labs CDPH Reference Database TIBCO® Patterns Labs Test results from contract labs and PDCs Screening results ingest process > threshold = match < threshold = no match = new >< thresholds = human review? Human review and action California Department of Public Health Prenatal Genetic Screening Program Diagnostic Centers Diagnostic Centers
  24. 24. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 24 Typical customers and partners
  25. 25. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 25 Architectural Overview TIBCO® Patterns Solaris (32/64) Linux (32/64) HPUX (32/64) AIX (32/64) Windows (32/64) VMS (32/64) .NET C/C++Java ActiveMatrix Language/client Interfaces Server based engines Supported OSes LearnSearch BusinessWorks Python BusinessEventsCIM
  26. 26. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 26 Typical Deployment Database Engine(s) (ActiveSpaces, Oracle, DB2, SQLServer, MySQL, Informix, Sybase, Postgres, Caché, …) Applications (front-end and/or back-end) TIBCO client TCP/IP sockets TCP/IP sockets Current Applications Run unchanged TIBCO® Patterns Loader / Syncer Tables •Multiple instances with multiple tables •TCP/IP Sockets •Thin client to marshal requests and return results •Partition and/or replicate data for scale •Loader/syncer for initial load and subsequent synchronization Other data sources
  27. 27. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 27 Identifying opportunities  Everyone of your customers and prospects – is a prospect!  Some questions to ask  What is the business impact (and cost) of not being able to deal effectively with imperfect/bad data?  Where do you have people (either your own or customers) spending a lot of time searching for the right data (about people, products, suppliers etc etc)  How many people do you have matching records by hand?  What would it mean if you were to automate a higher percentage of the matching?  What is your current level of matching accuracy?  What do you do with the records that SQL can‟t match?  How do you deal with content differences between records that represent the same entity?
  28. 28. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 28 Resources  The Princeton team  “Webex” sessions/demos, on-site meetings, EBC visits, POCs, custom demos, industry or customer specific materials…Anything we can do to help identify/develop/close TIBCO Patterns license revenue.  Live demos (the only vendor to do this, I wonder why…)  Demo index - http://www.netrics.com/demo_index/  English live demo (try the advanced search button) - http://netrics.com/demo/  Multilingual (try the surprise me button) demo - http://www.netrics.com/demo/index_foreign.php  Oil well head data - http://www.netrics.com/demo_energy_oil  Spanish names - http://www.netrics.com/demo_spanish_names  Portuguese university names - http://www.netrics.com/demo_universities/  FDA drug demo - http://netrics.com/demo_fda_drugs/  SalesCentral materials - https://salescentral.tibco.com/people/dchamberlain?view=documents  2 minute explainer - http://www.netrics.com/demo/NetricsSPOT.html  TIBCommunity - https://ssl.tibcommunity.com/people/dchamberlain?view=documents
  29. 29. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 29 Live demonstration of TIBCO® Patterns capabilities It’s inherently very difficult to demonstrate an engine, and we wanted to show: 1. The ability to deal effectively and efficiently with just about any type of structured data 2. The ability to work with any “language” 3. Very low latency when dealing with large data sets
  30. 30. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 30 Live demonstration of capabilities TIBCO is the only vendor to feature live demonstrations. We show the ability to deal with any type of data in any language with very low latency. English Demos Multilingual Demos FDA Drug Demo
  31. 31. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 31 Differentiation  TIBCO innovations are unique in the market…  Mathematical modeling • Finding patterns in data – giving system and people the data they need • Finding and learning from human decisions  Eliminates the need to guess complex matching rule sets: • Difficult to develop, maintain and update  Works equally effectively across multi-domain, multi-lingual data  Does not require a DBMS, but integrates nicely if needed  The results are unmatched…  Accuracy  Speed  Scalability  Easy to deploy, maintain and update
  32. 32. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 32 Five things customers should consider  Accuracy – how close can the system come to reaching the same conclusions as a domain expert when faced with the same data?  Efficiency – how easily can the system deal with increasingly large volumes of data and workloads?  Entity and language independence – how does the system deal with data about any type of business entity in any language? Systems are global and need to deal with data about many entities other than customers and products.  Configurability – what options are provided to fine tune requests to easily achieve the desired results?  Ease of integration – how is the system integrated into existing applications, processes and tools? What native language support provided? What ESB, SOA, BPM and CEP products are supported?
  33. 33. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns Customer stories
  34. 34. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 34 Types of things our customers do  Ensuring pre-natal genetic screening results are linked to the right woman  Automating the ingest of TV programming schedules from over 150 broadcast and cable operators  Helping customers find the right brand and the right model to program their remote controls  Ensuring compliance with government export/import regulations  Reducing turnaround time from 5 days to 4 hours when responding to customer requests for equipment  Helping UK government agencies collaborate to provide better care  Provide real-time linking and de-duplication across 700 million bibliographical and citation items  Linking patients and their visit records for outcome-driven healthcare  Creating the world’s largest Biobank for genetic researchers
  35. 35. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 35 California Department of Public Health Prenatal Genetic Screening Program TIBCO® Patterns - evaluation and implementation  CDPH benchmarked the effectiveness of TIBCO and a competitor  Created a standardized data set, identified duplicates by hand, then…  After 3 weeks competitor reached 79% accuracy  After 1 day TIBCO topped 97%  Two phase project undertaken  First - cleaning the CDPH database (get clean)  5.5 million record reference database of at risk women  2.3 million duplicate records identified - representing 1 million unique women  1.3 million duplicates eliminated – leaving reference database of 4.2 million unique women  Then – automate matching of incoming test results (stay clean)  Before TIBCO – 65% automated match rate  After TIBCO – 95%+ automated match rate  Results  Greatly improved levels of automation
  36. 36. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 36 Contract Labs 4.2 million records of at risk women TIBCO® Patterns Contract Labs Test results from contract labs and PDCs Genetic testing results ingest process > threshold = link < threshold = no link = new? <>thresholds – human review Human review and action CDPH – linking test results to the right woman Prenatal Diagnostic Centers Prenatal Diagnostic Centers
  37. 37. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 37 TV Guide – accurate, timely programming  Different stations often describe the same show in different ways  TV Guide have built a reference dataset of 2M + programs  Regular ingesting of future programming from hundreds of broadcast/cable operators  Millions of Web, print and channel guide customers  Information on over 12,0000 channel lineups provided to hundred of cable/satellite operators and millions of Web visitors and print readers.  TV Guide benchmarked several vendors – TIBCO selected for superior accuracy and automation  Results  Significant reduction in manual effort  More accurate guides  People now focus on enriching the data and enhancing customer experiences Providing accurate, informative program information for 100s of millions of customers
  38. 38. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 38 Cable operators 2+ million record content database TIBCO® Patterns Programming from hundreds of cable, satellite & broadcast outlets Match and link incoming records to content DB >Exists – link/enrich content <New – add to content DB <>Uncertain – human review Human review and action TV Guide – matching future programming to content database  Incoming data quality is highly variable  Content data on over 170,000 movies, a million plus TV series episodes and every TV show since 1960 Broadcast channels Satellite providers
  39. 39. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 39 Logitech – increasing customer satisfaction by making it easier to find the right brand/model  Electronic brands/model number combinations are complex and hard to double transcribe  Customers were becoming frustrated…  Needed a way of suggesting the right brand/model even when customer entries were way off  TIBCO Patterns – Search is used “behind” the Web UI to find the closest matching models and suggest them to the consumer Harmony remotes feature activity-based control that makes getting to what you want to do as simple as pressing a button
  40. 40. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 40 Web UI 300,000+ records about brands & models TIBCO® Patterns Customers programming their Harmony remotes Finding the closest matching models > threshold = show closest matching models Logitech – finding the right model to program Web UI Web UI
  41. 41. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 41 Customs & Excise Department of Hong Kong/PCCW TIBCO® Patterns - evaluation and implementation  Hong Kong Customs & Excise Dept required a matching engine as part of their initial specification of the ROCARS system  Need to check bills of lading and manifests against a series of white/black lists – in mix of simplified Chinese, Traditional Chinese and English  PCCW won the contract and started development work with another vendors matching capability – and quickly realized it was not up to the job  In late 2008, PCCW Googled and found TIBCO® Patterns – after several weeks of discussions and demonstrations they started - and quickly finished a POC  The ROCARS system went live in early 2010 with TIBCO® Patterns at the heart “risk engine”. ROCARS System
  42. 42. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 42 Border crossings White lists Black lists TIBCO® Patterns ROCARS data entry sources Checking Bills of Lading and Cargo Manifests > threshold = suspicious < threshold = OK <> thresholds – human review Human review and action Customs & Excise Department – ensuring compliance with government regulations Web Self service kiosks
  43. 43. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 43 Sterilmed – automating quote process Sterilmed offers products and services to help healthcare providers lower their device and equipment costs  Hospitals send lists of up to 5,000 items of required medical equipment they need, often in Excel spreadsheets  Specific line items need to be matched to the Sterilmed inventory system – required equipment would usually be described differently  Getting a quote for a request could take up a week - several analysts matching line item by line item  Process of producing a quote has decreased to 4 hours, and is much more highly automated  Other uses now include matching across various healthcare industry databases and identifying duplicate contacts across systems  Results  More efficient and effective business processes  People now able to do their real job
  44. 44. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 44 Hospitals Sterilmed device and equipment inventory TIBCO® Patterns Equipment requests from healthcare providers Linking customer requests to Sterilmed inventory > threshold = link < threshold = no link = new? <> thresholds – human review Human review and action Sterilmed – reducing quote process from 5 days to 4 hours Clinics Other providers
  45. 45. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 45 LiquidLogic – helping government organisations collaborate LiquidLogic provides the public sector with a platform enabling multi- agency solutions for collaborative working  UK Public Sector is moving towards collaboration between multiple organisations/agencies  The same client is often represented differently in several different databases – hampering collaboration, and with potentially dangerous or fatal outcomes  Need to model and enable real-time process and data sharing across multiple organisations  TIBCO® Patterns - Search provides the real-time duplicate identification, duplicate prevention and searching services across the Protocol platform  Installed at dozens of Public sector organisations, helping them provide better services to their clients and detect potential fraud PROTOCOL overview
  46. 46. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 46 Children services Linked view of clients across multiple systems TIBCO® Patterns Multiple systems across multiple organisations Providing 360 degree of clients • Duplicate identification • Duplicate prevention • Searching Human review and action LiquidLogic – Protocol platform Domestic violence Other systems •Allows a radical redesign of processes •Integrates with existing corporate applications to present a SOA •Manages multiple disparate data sources supporting the information sharing requirements of federated applications
  47. 47. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 47 Los Alamos National Laboratory (LANL) – real-time linking of bibliographic and citation data  LANL Research Library locally hosts large data collections  A&I databases: ISI Citation Databases, Inspec, BIOSIS, Engineering Index, …  Full-text collections: Elsevier, Wiley, APS, IOP, …  Duplicates in LANL data collection  Amongst bibliographic records and citations  Between bibliographic records and citations  De-duplication, matching and linking needed  Join records from several databases that describe the same work  Find works that cite a given work  High volumes - >600 million citations and >65 million bibliographic items  High request rates ->25/second  Results look much better than those of batch de-duplication approach ~ TIBCO® Patterns + training by librarians  Can „de-dup‟ external data against local data, no batch processing, but on-the-fly de-duplication  Possibility to retrain the system to optimize responses without data reprocessing: machine learning module  Scalability to accommodate growth of datasets LANL presentation on real-time matching Los Alamos National Laboratory is a premier national security research institution. Working on advanced technologies to provide the United States with the best scientific and engineering solutions to many of the nation’s most crucial challenges.
  48. 48. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 48 PubMed Inspec Biosis 65 million bibliographical items TIBCO® Patterns Local and remote A&I databases Linking bibliographic and citations in real-time LANL – systems matching like humans Local and remote full text databases APS Elsevier Wiley 700 million citation items •Trained by domain experts (LANL librarians) •Clearer cut off between matches and non-matches •Never a need to re-index • Given a bibliographic key, which are the matching bibliographic records? • Given a citation key, which are the matching bibliographic records? • Given the identifier of a bibliographic record, what is the corresponding bibliographic key? • Given a bibliographic key, which are the citing bibliographic records? • Given a citation key, which are the citing bibliographic records? Forgiving to errors in datasets Forgiving to errors in query Compares like humans
  49. 49. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 49 Rush Health – moving to outcome driven care Rush Health is a clinically integrated network of providers working together to improve health through high quality, efficient health services covering the spectrum of patient care from wellness, prevention and health promotion, to disease management and complex care management.  Three major hospitals, 750 on-staff physicians and 50 allied health providers  Initiatives – both need highly accurate and automated matching  Moving to make diagnosis and treatment more proactive  Implementing a system where payments are based on the outcome of healthcare, not the amount prescribed  Patients will often visit different facilities and have their encounter and therapy details recorded differently, resulting in high duplicate records rates  First - cleaning the EMPI (get clean)  ~2 million record EMPI database was analyzed for duplicate patient records  ~1.6 million duplicate records identified  Leaving reference database of ~400,000 unique patients imported back into EMPI  Then – automate matching of incoming patient encounter records into their Enterprise Data Warehouse based EMPI (stay clean)  Results  Much higher level of automated matching - for an accurate and duplicate free EMPI  More accurate assessment of treatment outcomes and preventative care
  50. 50. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 50 Hospitals Enterprise Data Warehouse - EMPI TIBCO® Patterns Encounter records from hospital and doctor visits Linking to the right patient record > threshold = link < threshold = no link = new? <> thresholds – human review? Human review and action Rush Health – linking encounter records to the right patient Dr offices Clinics
  51. 51. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 51 California Department of Public Health – Building the world‟s largest Research Ready Biobank Development of a research-ready pregnancy and newborn biobank in California  California has been collecting samples at several key life events for many years, this could have been a tremendous potential source of research data  Because of the human involvement in manually linking record from different sources , only 5 or 6 requests for research sample data per year could be handled  NIH grant to Sequoia Foundation for the specimen tracking and linkage project awarded in late 2009  Now linking ~ 200 million records life event records from across multiple systems to form the research ready biobank  Results will be  A life course, client based system that enables cross-generational studies, population-based family studies, and women-level studies across multiple pregnancies  Cost-efficiently process requests, track specimens, conduct linkage and integration of new data  Process high volumes of specimen and data requests CDHP Research ready Biobank
  52. 52. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 52 Fetal deaths Live births Genetic Screening Research ready biobank TIBCO® Patterns Sample data collected over 20+ years on significant life events Linking records from multiple sources CDPH – linking 100s of millions of records Deaths Screened newborns • Descriptive epidemiologic studies on the birth prevalence of genetic disorders and seroprevalence of infectious agents • Analytic epidemiologic studies to determine the causes of birth defects, preterm birth and other disorders • Laboratory studies to develop and validate screening tests and other assays • Prevention and intervention studies to guide the design of screening models with maximum efficiency > threshold = link < threshold = no link <> thresholds = human review
  53. 53. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns for BusinessWorks™
  54. 54. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 54 BusinessWorks™ Patterns Plugin Goals  Enable applications developed with BusinessWorks™ Designer to deal effectively with “imperfect” data  Provide access to the TIBCO® Patterns – Search capabilities without having to write any code.  Use standard BusinessWorks™ Designer to design, test and initiate queries to the engine.  Integrate TIBCO® Patterns – Search into the TIBCO product suite using standard TIBCO development and integration tools.  Leverage existing BW components to provide access to TIBCO® Patterns – Search from custom and off the shelf applications.  Provide dynamic query request configuration.
  55. 55. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 55 Architecture  Runs in the BusinessWorks™ extensible framework for integration  Uses the TIBCO® Patterns Java API  Consists of two jar files  TIBCO-Patterns-Java-Interface-4.5.1.jar  BWPatternsPlugIn_4.5.1.jar  Installed into <TIBCO-HOME> using the TIBCO Universal Installer
  56. 56. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 56 BusinessWorks palettes Patterns Palette provides CRUD operations to be performed on in-memory tables. Data can be loaded from any external data source. In-memory tables are used for matching in accordance to the matching criteria and query strings provided. Can leverage other BW activities and can be exposed as services for further consumption by other enterprise consumers.
  57. 57. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 57 Duplicate identification Data loaded from multiple systems Multiple files about the same entity type Load in common schema Results above specific similarity score are duplicates Iterate through data record by record Take appropriate action – merge? link?
  58. 58. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 58 Sample duplicate identification process
  59. 59. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 59 Data augmentation Reference Data (Experian life style data brick) for example File with data for augmentation Find most similar record(s) Augmented with data from reference Augmented data
  60. 60. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 60 Identifying overlap and uniqueness across multiple data sets Unique to A Unique to B Overlap2 or more files about the same entity type Data loaded into multiple tables Results in A or B or both
  61. 61. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 61 Record classification Table(s) loaded with keywords or phrases of interest Incoming records for classification Search for most similar keywords Results are closest classification(s) Iterate through data record by record Take appropriate action based on classification
  62. 62. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 62 Fuzzy parsing Table(s) loaded with keywords or phrases of interest Incoming records for parsing Search for most similar keywords Results are closest matching words and phrases Iterate through data record by record Take appropriate action
  63. 63. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. TIBCO® Patterns for BusinessEvents™
  64. 64. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 64 Patterns plug-in for BusinessEvents • TIBCO Patterns Plugin for BusinessEvents available in 4.5.1 • Implemented as BE Studio Catalog Functions • Code samples available for all function calls
  65. 65. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 65 TIBCO Patterns - Business Events Plugin
  66. 66. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 66 TIBCO Patterns – Enable Catalog Function View Catalog Functions appear when the BE perspective is selected and a Rule Function is open.
  67. 67. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 67 Customer database(s) TIBCO Patterns correlates the records before loading Multiple concepts about the same customer are loaded Correlating multiple records about the same entity Before After A single concept for a customer is loaded
  68. 68. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 68 BE Incoming eventsIncoming events Concept Cache TIBCO Patterns BusinesEvents Concept Cache Are there existing similar events? Before After New concepts are added even when very similar concepts exist Do existing similar events exist?
  69. 69. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 69 Event cloud System A System B System C TIBCO Patterns Fraud? Cyber attack? Money laundering? Finding patterns in the event cloud Other conclusions…
  70. 70. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. AMX BPM / Patterns Demo
  71. 71. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 71 Synopsis Data is very rarely entered into a system (or worse, multiple systems) the same way. We use full names sometimes and nicknames others, we “fat-finger” keys, we reverse fields, we use maiden names or married names, etc., etc. This makes finding the exact data we want or need very difficult (or many times impossible). Finding incorrect information, or not finding information at all can be life threatening in some situations. In Healthcare if we do not find information about a Patient, we may not know about allergies or drug interactions as an example. This demonstration of AMX BPM includes our Netrics Matching Engine technology which allows a user to find a customer record even without having all the information about that customer or regardless of how that customer information may have been incorrectly entered into the system(s).
  72. 72. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 72 Step 1: Loading the data into Patterns This is the format of the data table created in the Matching Engine as displayed from the Display Table option in BW. This BW process loads data from a source (in this case a CSV file) into Patterns - which is running as a Windows service. This data represents the most current customer information on file.
  73. 73. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 73 Step 2: Starting the Process – Enter Customer Information This is the AMX BPM Business Service (FormFlow) which starts the process. Enter “Customer” information to search form displayed on the first step in the process.
  74. 74. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 74 Step 3: Calling Patterns This is the AMX BPM Business Service (FormFlow) which starts the process. This BW process calls the Matching Engine with the search criteria entered in the first form. The results are mapped and passed back to the BPM process for display on the second form.
  75. 75. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 75 Step 4: Selecting the most similar records or refining the search The Results form displayed on the second step of the process (after the call Matching Engine step) showing the results of the initial search. Additional refined searches can also be run from this step. This entire process appears as a single form to the user (FormFlow). This is the AMX BPM Business Service (FormFlow) which starts the process.
  76. 76. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 76 Step 5: Invoking the main process This is the AMX BPM Business Service (FormFlow) which starts the process. This is the main AMX BPM process that displays the selected customer record from the Business Service Process and allows for modifications to the data.
  77. 77. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 77 Step 6: Viewing the selected customer record and modifying the data if necessary This form allows a user to edit the data returned from the Matching Engine. If a change is made, it will go to an approval step.
  78. 78. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 78 Step 7: Approving the updated customer record This is the Approval form. It allows a user to approve (or reject) the update to the customer record made in the previous step.
  79. 79. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 79 Step 8: Update back-end system This step checks the back-end system of record to see if the customer record exists, performs an update if it does, or an Insert if it doesn‟t.
  80. 80. © 2008 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. 80 Step 9: Notification of the update Once complete, depending on the process path taken one of two email notifications will be sent.

×