“Making Mole-hills out of Mountains”
www.witnesstree.com
30-70% of Big Data is Unstructured
• Difficult to mine and analyze
• Ergo, Largely ignored
• Represents a potential gold
mine undiscovered
• NEED:: a seamless, structured
representation of unstructured
data
Text Analytics
• Software and transformational processes that
uncovers business value in unstructured text
• Uses statistical, linguistic, machine learning, data
analysis and visualization techniques
• $2Bn market expected to grow @ 25% CAGR
WitnessTree Analytics
API
VISUALIZE
Structured Data
Unstructured Data
Data Information Knowledge
DISCOVER
REDUCE
ORGANIZE
WitnessTree: Text Analytics
Discover
Boost search accuracy
Reduce ambiguity
Contextual analysis
Reduce
Analyze relevant data
Identify & Define themes
Content + contextual similarity
Organize
Dynamic categories, Named-Entity (people,
places, brands, dates), Facets (metadata –
real and derived)
WT Semantic Analysis Machine (SAM)
6
Near Duplicate
Detector
Thread Analyzer Topic Explorer Search & Facet
API/web service API/web service API/web service API/web service
Client App/service
Semantic Analysis Machine
Named Entity
Extractor
API/web service
Unsupervised
Doc Clustering
API/web service
Theme Detector
API/web service
Started with
1,000,000
docs
draw associations
with no prior
knowledge of docs
Clustering
SET-UPNear-DupDe-dup
Reduce redundant docs by 40% to 60%
SET-UP
Smart Search
Categories
Clustering
“on the fly”
Refine Search
Found 10,000 docs
the Few,
the Relevant
WitnessTree hosted solution for legal eDiscovery
How to e-discover 10,000 from 1M?
“Find the Relevant. With intuitive ease."
chains
near-dups
removes
duplicates
Labeled
cluster tree
600k
unique docs
create
“categories” of
search results
dynamic clustering
on categories
concept, example,
similarity, paragraph,
boolean, proximity ,
fuzzy
Topic detection
Email
threading
Recreates email
threads +
Id’s Missing &
Inclusive emails
Extracts
themes from
clusters
Backend
SaaSHostedLicensed
Application Platforms / Development Tools
Presentation Technologies
Operating Systems
IntegrationServices
WitnessTree Technology Stack
Topic Explorer
• Discover concepts.
• Cross-reference ideas.
• Connect the dots.
• Build relevant queries.
• Get results.
INSTANTLY!!!
(Un)supervised Doc Clustering
• Clusters related documents
Hierarchical clustering
• Labels each cluster
• User-guided,
system-generated
Guided flexibility!!!
• Re-construct email threads
• Identify Inclusive emails
• Find Missing/Deleted emails
Email Thread Analyzer
Near-Duplicate Detection
Theme Detection
• Detects recurring themes
• Filters based on relevancy
ranking
• Search Wide, Dig Deep
Named Entity Recognition
Identifies:
• People
• Places
• Companies
• Time/Date
• Monetary
Crew members on the ISS will open the hatch
Monday and unload 2,780 pounds of supplies
and experiments, the news release said.
"From the men and women involved in the
design, integration and test, to those who
launched the Antares (rocket) and operated the
Cygnus, our whole team”, said David W.
Thompson, president and chief executive
officer of Orbital, in a written statement from
the company.
It will burn up during re-entry over the Pacific
Ocean, officials said.
Orbital has a $1.9 billion contract with NASA to
make eight flights to the space station under
the space agency's commercial supply
program.
Our Differentiators
• Structured and unstructured (text) data
• API or web application
Analytics Framework
• Minimal training required.
• Web browser + internet connection
Easy to Use
• Hosted model, SaaS, Licensed in-houseFlexibility
• Document classification, visualization, categorization,
APIVersatility
• State-of-the-art feature set, in placeRich Feature-set
• OEM, white-label, resellerPartnership Models

Witness tree text analysis

  • 1.
    “Making Mole-hills outof Mountains” www.witnesstree.com
  • 2.
    30-70% of BigData is Unstructured • Difficult to mine and analyze • Ergo, Largely ignored • Represents a potential gold mine undiscovered • NEED:: a seamless, structured representation of unstructured data
  • 3.
    Text Analytics • Softwareand transformational processes that uncovers business value in unstructured text • Uses statistical, linguistic, machine learning, data analysis and visualization techniques • $2Bn market expected to grow @ 25% CAGR
  • 4.
    WitnessTree Analytics API VISUALIZE Structured Data UnstructuredData Data Information Knowledge DISCOVER REDUCE ORGANIZE
  • 5.
    WitnessTree: Text Analytics Discover Boostsearch accuracy Reduce ambiguity Contextual analysis Reduce Analyze relevant data Identify & Define themes Content + contextual similarity Organize Dynamic categories, Named-Entity (people, places, brands, dates), Facets (metadata – real and derived)
  • 6.
    WT Semantic AnalysisMachine (SAM) 6 Near Duplicate Detector Thread Analyzer Topic Explorer Search & Facet API/web service API/web service API/web service API/web service Client App/service Semantic Analysis Machine Named Entity Extractor API/web service Unsupervised Doc Clustering API/web service Theme Detector API/web service
  • 7.
    Started with 1,000,000 docs draw associations withno prior knowledge of docs Clustering SET-UPNear-DupDe-dup Reduce redundant docs by 40% to 60% SET-UP Smart Search Categories Clustering “on the fly” Refine Search Found 10,000 docs the Few, the Relevant WitnessTree hosted solution for legal eDiscovery How to e-discover 10,000 from 1M? “Find the Relevant. With intuitive ease." chains near-dups removes duplicates Labeled cluster tree 600k unique docs create “categories” of search results dynamic clustering on categories concept, example, similarity, paragraph, boolean, proximity , fuzzy Topic detection Email threading Recreates email threads + Id’s Missing & Inclusive emails Extracts themes from clusters
  • 8.
    Backend SaaSHostedLicensed Application Platforms /Development Tools Presentation Technologies Operating Systems IntegrationServices WitnessTree Technology Stack
  • 9.
    Topic Explorer • Discoverconcepts. • Cross-reference ideas. • Connect the dots. • Build relevant queries. • Get results. INSTANTLY!!!
  • 10.
    (Un)supervised Doc Clustering •Clusters related documents Hierarchical clustering • Labels each cluster • User-guided, system-generated Guided flexibility!!!
  • 11.
    • Re-construct emailthreads • Identify Inclusive emails • Find Missing/Deleted emails Email Thread Analyzer
  • 12.
  • 13.
    Theme Detection • Detectsrecurring themes • Filters based on relevancy ranking • Search Wide, Dig Deep
  • 14.
    Named Entity Recognition Identifies: •People • Places • Companies • Time/Date • Monetary Crew members on the ISS will open the hatch Monday and unload 2,780 pounds of supplies and experiments, the news release said. "From the men and women involved in the design, integration and test, to those who launched the Antares (rocket) and operated the Cygnus, our whole team”, said David W. Thompson, president and chief executive officer of Orbital, in a written statement from the company. It will burn up during re-entry over the Pacific Ocean, officials said. Orbital has a $1.9 billion contract with NASA to make eight flights to the space station under the space agency's commercial supply program.
  • 15.
    Our Differentiators • Structuredand unstructured (text) data • API or web application Analytics Framework • Minimal training required. • Web browser + internet connection Easy to Use • Hosted model, SaaS, Licensed in-houseFlexibility • Document classification, visualization, categorization, APIVersatility • State-of-the-art feature set, in placeRich Feature-set • OEM, white-label, resellerPartnership Models