Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Witness tree text analysis


Published on

text analytics for document discovery

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Witness tree text analysis

  1. 1. “Making Mole-hills out of Mountains”
  2. 2. 30-70% of Big Data is Unstructured • Difficult to mine and analyze • Ergo, Largely ignored • Represents a potential gold mine undiscovered • NEED:: a seamless, structured representation of unstructured data
  3. 3. Text Analytics • Software and transformational processes that uncovers business value in unstructured text • Uses statistical, linguistic, machine learning, data analysis and visualization techniques • $2Bn market expected to grow @ 25% CAGR
  4. 4. WitnessTree Analytics API VISUALIZE Structured Data Unstructured Data Data Information Knowledge DISCOVER REDUCE ORGANIZE
  5. 5. WitnessTree: Text Analytics Discover Boost search accuracy Reduce ambiguity Contextual analysis Reduce Analyze relevant data Identify & Define themes Content + contextual similarity Organize Dynamic categories, Named-Entity (people, places, brands, dates), Facets (metadata – real and derived)
  6. 6. WT Semantic Analysis Machine (SAM) 6 Near Duplicate Detector Thread Analyzer Topic Explorer Search & Facet API/web service API/web service API/web service API/web service Client App/service Semantic Analysis Machine Named Entity Extractor API/web service Unsupervised Doc Clustering API/web service Theme Detector API/web service
  7. 7. Started with 1,000,000 docs draw associations with no prior knowledge of docs Clustering SET-UPNear-DupDe-dup Reduce redundant docs by 40% to 60% SET-UP Smart Search Categories Clustering “on the fly” Refine Search Found 10,000 docs the Few, the Relevant WitnessTree hosted solution for legal eDiscovery How to e-discover 10,000 from 1M? “Find the Relevant. With intuitive ease." chains near-dups removes duplicates Labeled cluster tree 600k unique docs create “categories” of search results dynamic clustering on categories concept, example, similarity, paragraph, boolean, proximity , fuzzy Topic detection Email threading Recreates email threads + Id’s Missing & Inclusive emails Extracts themes from clusters
  8. 8. Backend SaaSHostedLicensed Application Platforms / Development Tools Presentation Technologies Operating Systems IntegrationServices WitnessTree Technology Stack
  9. 9. Topic Explorer • Discover concepts. • Cross-reference ideas. • Connect the dots. • Build relevant queries. • Get results. INSTANTLY!!!
  10. 10. (Un)supervised Doc Clustering • Clusters related documents Hierarchical clustering • Labels each cluster • User-guided, system-generated Guided flexibility!!!
  11. 11. • Re-construct email threads • Identify Inclusive emails • Find Missing/Deleted emails Email Thread Analyzer
  12. 12. Near-Duplicate Detection
  13. 13. Theme Detection • Detects recurring themes • Filters based on relevancy ranking • Search Wide, Dig Deep
  14. 14. Named Entity Recognition Identifies: • People • Places • Companies • Time/Date • Monetary Crew members on the ISS will open the hatch Monday and unload 2,780 pounds of supplies and experiments, the news release said. "From the men and women involved in the design, integration and test, to those who launched the Antares (rocket) and operated the Cygnus, our whole team”, said David W. Thompson, president and chief executive officer of Orbital, in a written statement from the company. It will burn up during re-entry over the Pacific Ocean, officials said. Orbital has a $1.9 billion contract with NASA to make eight flights to the space station under the space agency's commercial supply program.
  15. 15. Our Differentiators • Structured and unstructured (text) data • API or web application Analytics Framework • Minimal training required. • Web browser + internet connection Easy to Use • Hosted model, SaaS, Licensed in-houseFlexibility • Document classification, visualization, categorization, APIVersatility • State-of-the-art feature set, in placeRich Feature-set • OEM, white-label, resellerPartnership Models