Introduction to Anzo Unstructured

©2014 Cambridge Semantics Inc. All rights reserved.
Introduction to Anzo Unstructured
June 29, 2016
Richard Mallah
Director of Unstructured and Advanced Analytics
richard@cambridgesemantics.com

©2013 Cambridge Semantics Inc. All rights reserved. Page 2.
Agenda
• Anzo Unstructured and the Anzo Smart Data Platform
• Core Capabilities of Anzo Unstructured
• Configuration, Operations, and Output
• Example Use Cases in Pharma and Finance
• Exploring Document-Derived Analytics
• Visualizing Additional Annotators and Capabilities

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 3.
Introduction to Cambridge Semantics (CSI)
The Anzo Smart Data Platform is used to create data analytics and
management solutions with diverse data from varied sources
Company:
 Founded in 2007 by senior team from IBM’s Advanced Internet Technology Group
 Privately Funded
 Select customers:
Software:
 Market leading Anzo software suite is built on open Semantic Web standards
 Currently 3rd generation of the product in production use

ApplicationsMiddlewareEnterprise
DataFabric
Anzo.js
Client Library
Anzo Enterprise Server
(SOA; OSGI, RDF & OWL over JMS)
Anzo.Net
Client Library
Anzo .java/.Net
Client Library Anzo Relational Replicator
Reasoning
& Rules
Workflow
Semantic
Services
Anzo
Connect
Enterprise
Directory Connect
Anzo
Unstructured
Anzo for Excel Applications and BI ToolsAnzo on the Web
Anzo
Graph
Database
Anzo
Content
Repository
RDBMS
Data Mart/
Warehouse
Enterprise
Applications
Directory
(LDAP, AD)
• Virtualize data using W3C
semantic standards
• Operationalize industry
standards e.g., FIBO, LEI
• Real-time data events
• Granular security and access
control
• Ontology, Mapping,
Visualization & Service registries
Rich Client Apps
………
 Full/Incremental ETL
 Web Services
 Federated SPARQL
 NLP
 Text Analytics
 Semantic Analysis
3rd Party
Databases &
Applications External Data Sources
Unstructured
Content
 RDBMS
 Teradata
 Hadoop
 SalesForce
The Anzo Smart Data Platform

©2015 Cambridge Semantics Inc. All rights reserved.
Anzo Smart Data Lake
Anzo Smart Data Lake Server
Anzo Enterprise Server
• Self-service analytics,
visualization and data discovery
• Data curation, annotation and
application workflow
• MPP graph query engine for
interactive analytics at scale
• ODATA Integration for 3rd party
analytics tools
• Metadata, ontology and
mapping catalog
• Model-driven data provisioning
and loading
• Text analytics
• Canonical entity linking and
transformation
• Scalable Graph and Document
Storage
Anzo Graph Query Engine
Anzo Ingestion Servers
Anzo Unstructured

©2014 Cambridge Semantics Inc. All rights reserved. Page 6.
Anzo Ontology Editor

What Solutions Benefit From Anzo?
• For aggregation of data from multiple, diverse data sources
• For integration of internal data with external data across the Web or
firewalls
• For solutions involving data sources, business rules, analytics and
actions that are not evident in advance
• For solutions that change often
• For analyzing diverse data sources with a diverse variety of access
control requirements with a need for full provenance and traceability
• For evolving solutions benefiting from ongoing involvement from
domain experts to update data models, data sources, and analytics as
needed
• For formal and informal day-to-day business activities that require
workflow, alerts, and automation
• For collecting & analyzing data that doesn’t currently have any system
of record (e.g. “shadow IT” systems)

Anzo Unstructured Capabilities
Overview
• Intake Sources
– Social Media
– Local Directories
– Enterprise CMSs
– Structured Databases
– Web Sites & Boards
– Spreadsheets
– Google Search Appliance
– Mail Servers
– + dozens more
• File Formats
– Office Documents
– PDFs
– Web Pages
– Email Messages
– + dozens more
• Multilingual
– European, Asian, and Middle Eastern Languages
– Native-Language Annotation
– Document Translation
– Annotation Translation
– Phonetic Name Normalization/Indexing
– Cross-Lingual Concepts Automapped
• Extraction Categories
– Entities
– Relationships
– Granular Sentiment
– Topic Classification
– Patterns and Concepts
– and more
• Concept Types Extracted
– MedicalHistoryAilment
– LegalStatuteSection
– BiomarkerForDisease
– AnalystEarningsEstimate
– JobTitle
– SentimentTopic
– + thousands more
– + easily user-extended/customized
• Semantic Analysis
– Concept-Based Relationships
– Relationship Compounding
– Annotation Harmonization
– Multi-NLP Weighting/Voting
– Ontology Growing
– Ontology Alignment
• Semantic Search
– Concept-Based Full-Text Search
– Facet On Concept or Type
– Mix Structured & Unstructured Filters
– Visualize Annotations In Context
– External Index Federation
– Multi-Stage Searching/Filtering/Clustering
• Structured/Unstructured Integration
– Find/link structured resources in text
– Analyze text within structured columns
– Populate new structured resources
from text
– Auto-enrich entities found in unstructured
– Auto-extend schemas from unstructured
properties

Anzo Unstructured NLP Plugins
Overview
• Anzo Unstructured is both a pluggable framework supporting a
large number of ready-made third-party NLP integrations, and also
has significant NLP capabilities bundled along with it
– Plugins on the following pages are a small number of our many supported
NLP capabilities from a variety of sources
• Among the annotators include out of the box are:
– Autotagger and Classifier Annotator (Statistical, can fall back to rule-based)
– Autotagger and Classifier Annotator (Rule-Based, can fall back to statistical)
– Standard Entity Extractors (People, companies, locations, job titles, dates, etc.)
– Custom Knowledgebase Annotator (Lever your taxonomies, thesauri, databases)
– Fuzzy Rule Network Annotator (Find concepts by related, surrounding, contextual concepts)
– Significant Phrase Annotator (Automatically extracts the important concepts)
– Document Section Annotator (Autogenerate table of contents and contextualize more)
– Pattern Annotators (Find part no., id no., statute section, or any custom pattern)
– Custom Relationship Annotator (Find events or relationships spanning different extractions)

Optional NLP Plugin Technology Partners

Semantic Post-Processing of NLP
• Harmonization
– Normalized formats for knowledge integration
• Cooperation
– Multiple annotators strengthen, correct, and increase the network effect
of relationships
• Probabilistic Reasoning
– Semantic knowledge integration includes both deduction and inference
• Filtering
– The set of concepts, overlaps, affects, and relationships can be
automatically filtered down to reduce noise
• Enrichment
– Web services, semantic services, internal and external databases and
knowledgebases, and pluggable computations can be used to add more
context and data to your new domain object
• Machine Learning and Predictive Analytics
– Train on some gold standard and do some supervised classification
– Incrementally build a conceptual cluster space for predictive analytics

Point and Click Configuration of Unstructured Pipelines

Point and Click Configuration of Annotation

Unstructured Pipeline Operations Monitor

Dashboarding Structured/Unstructured
Knowledge Integration
Structured
property
Multiple NLP
Technologies
Harmonized
Overlapping
annotations
Enriched
property
Unstructured
entity
Unstructured
relationship
Archived copy for review,
validation & provenance
(both HTML Format &
Original )

The CSI Semantic Knowledge Integration Approach
to Enterprise Text Analytics
• Use Multiple NLP Engines or Annotators
• Leverage a Knowledge Integration Platform
– Make the annotators cooperate
– Enrich the annotations with internal or external data
– Link annotations with existing structured data
– Filter them down to the most relevant set
– Harmonize ontologies and instances
– Deal with probabilistic or uncertain information
• Quality Control
– Manual curation and automated QC
– Workflow, provenance lineage
• Easily Deal with Data Changes and Schema Changes
– Both are dealt with in real-time at runtime
– Maintenance is orders of magnitude more efficient

Use Cases in Pharma
• PV & Safety Data Management - Automatic tagging of case reports with
customized curation workflow, text mining, and contextual search
• R&D Competitive Intelligence – Explore the competitive landscape for
Therapeutic Area, Indication, Target, Company, Compound, & Partners
• R&D Informatics– Understand and correlate your internal research and
how it may be related to any external developments or research
• Clinical Trial Site Selection and Optimization - Site selection, KOL search,
trial planning
• Scientific Affairs/Medical Science Liaisons - Track Key Opinion Leaders
(KOL) in literature and clinical trials & analyze feedback from medical
professionals and patients
• Information Landscape - Track and monitor data stewardship and usage
through the organization to drive more efficient usage.
• Commercial Analytics – Sales and Marketing, Rx Data, Text Analytics

Use Cases in Financial Services
• Compliance Policy & Procedure Management - Monitor structured and
unstructured data sources for relevant regulatory changes; have
collaborative workflows for policy & documentation development,
approval, and control; and establish targeted policy dissemination and
attestation workflows.
• Compliance Surveillance & Investigation– Legal and Compliance analysts
can create structures and views that provide analysis, rules, and alert
thresholds easily changed on-the-fly by investigators, who can then
comprehend and interact with the big data picture.
• Market and Customer Intelligence- Understand how clients and prospects
are thinking about your firm and competitors’ offerings
• Research - Automated analytics of news, chatter, IMs, secondary research
reports, emails, sentiment, etc. for research alerts, semantic search, and
relationship visualization, forming an integrated intelligence platform for
analysts, including Complex Event Processing.
• Information Landscape - Track and monitor data stewardship and usage
through the organization to drive more efficient usage.
• Commercial Analytics – Sales and Marketing, Tx Data, Text Analytics

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 21
Relationship Explorer
Find Unexpected Connections Between Companies | Follow Paths Out or In From Anything | Follow the Money

Incremental Semantic Overlays: Product, Brand, Offering

Semantic Correspondence Linking and Overlay

Asking Cross-Ontology Questions

Cross-Ontology Questions Meet The Network Effect

Multi-Ontology Knowledge Graph Exploration

Deep News View
Customizable Fundamental, Technical, and Thematic Filters | View Only Most Recent n Minutes | Semantic Search

Rapid Concept Drilldown
GPS for Concepts | Assisted Skimming | Interactive Annotation-Driven Navigation | Auto-translates Foreign Languages

Example: Customizable Stock Centric Surveillance
Dashboards Per Stock | Per Cohort | Per Industry | Per Custom Sector | Analyst Can Define Filters and Drilldowns

Example: Competitor Sentiment Comparison
Longitudinal | Sentiment Aggregation | By Cohort | From Single Stock Selection | Visualize Leaders and Followers

Example: Intraday Sentiment
Drill Down | Intraday Topic-Granular Sentiment | Attribute Price Action Drivers | Investigate Unusual Volumes

Longitudinal and Outlier Business Intelligence
Unstructured Data Becomes Structured

CSI Web Scraper Annotator

Contextual Semantic Overlay

I1
I2 I3
E1
E2
I4
I1
I2 I3
E1
E2
I4
I1
I2 I3
E1
E2
I4
Main
Pipeline
Purple
Helper
Pipeline
Green
Helper
Pipeline

Fuzzy Concept Matching Example: Skills
Understanding and Recognition in Semantic Search

Fuzzy Concept Matching Example: Skills
Concept Curation
• Use Excel to define each skill concept with any combination of methods
• Multiple values are comma-separated
• Patterns support wildcards, y within n words of x, and intuitive groupings
• Define more atomic concepts before more compound concepts

CSI Document Classifier

Indirect Filters on Domain-Specific Summaries
Auto-Summarization | Extensive Filters | Integration with Multiple Sources of News and Research | Assisted Reader

Cross-Lingual Annotation and Optional Translation

Multiple Languages, One Concept

In Situ Translation and Annotation

Automated Redaction

CSI Significant Phrase Annotator

CSI Custom Relationship Annotator

Linguamatics I2E Annotator, Biomarkers for Diseases

SciBite Termite Annotator

Lexalytics Salience

Simplified Views for Non-Technical Users
Semantic Search Made Easy

Anzo Unstructured Capabilities
APIs and SDK
Create new pipeline components for any of these tiers:
– Document Crawler / Listener
• Obtain documents of any format from any source
– Document Rich Text, Thumbnail, and Metadata Extraction
• Deal with custom or less-common file formats completely pluggably
– Document Format Cleansing and Transformation
• Remove unwanted artifacts specific to your documents or translate to a particular
format or language
– Full-Text Indexing
• Pluggable corpus-level indexing and search
– Annotator
• Already supports GATE, UIMA, and FrAU annotation frameworks
• Provides access to annotations from any other annotator, cleansed text, format-
analyzed document, and original file, supporting mixed-representation annotation
• Multithreading safe
– Semantic Postprocessor
• Recombine, filter, and restructure annotations

Click here to view the full webinar

Introduction to Anzo Unstructured

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Introduction to Anzo Unstructured

Similar to Introduction to Anzo Unstructured (20)

More from Cambridge Semantics

More from Cambridge Semantics (10)

Recently uploaded

Recently uploaded (20)

Introduction to Anzo Unstructured

Editor's Notes