Veda Semantics - introduction document


Published on

As more and more organizations move from recognizing that unstructured data exists, and remains untapped, the field of semantic technology and text analysis capabilities is

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Veda Semantics - introduction document

  1. 1. Building intelligence through semantics Machine Learning Sentiment Analysis Text Text Analytics Analytic s Ontology Building Context Analysis
  2. 2. About Veda • A semantic technology service provider leveraging its capabilities to provide standardized and bespoke solutions Awards and references • One of 5 companies worldwide named as Semantic Application Specialists by Gartner (Who’s Who of Text Analytics, September 2012) Formation and background • • Started as a JV with the Fraunhofer Institute, Germany Earlier part of 3i Infotech, a large listed IT form. Acquired by current promoters as part of a management buy out • Headquartered in Bangalore, India’s software capital, with ready access to critical talent • Currently a 20 member team, also having a sales presence in Chicago, USA. Key members of technology team each have over a decade’s worth of experience in semantic technology Who we are Location Team 3
  3. 3. Enterprise’ Information Distribution ~30% Unstructured Data: • Consists of textual information like contracts, emails, presentations • 70% of organizations’ information remains in an unstructured form hence it is not utilized at all. ~70% Structured Data: • Consists of information from ERP, CRM systems, XML data • It is organized and manageable • Currently only 30% of organizations’ information is analysed for decision making Are we using only structured data for decision making? What are the critical misses that are made as a result? 5
  4. 4. What is hidden in unstructured data Examples of unstructured data • • • • • • • • • Customer complaints Employee feedback Brand perception Financial data from reports Competitive news Information Facts Events etc. And many many more…. What it contains • Insights • Opportunities • Risks • Just the things needed for good decision making! 6
  5. 5. Semantics – making sense of unstructured data • Semantics is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for their denotation. [Wikipedia] • SEMANTICS = MEANING • It is about describing things • In linguistics, semantics is the subfield that is devoted to the study of meaning as inherent at the levels of words, phrases, sentences, and larger units of discourse. 7
  6. 6. Industry Overview - Need for Semantic Technology Information overload Heterogeneous Distributed Unorganized High data volumes • • • Increasing numbers Increasing Sources Unmanageable Inefficient retrieval 8 • • • • • • Keyword search is inefficient Lack of Classification and relevance Focus on “Search” rather than “Find” The definition of ‘Data’, which had been artificially restricted to only numerical data, can now extend to text and other unstructured data as well… …Providing more insights and richness for decision making
  7. 7. Top 9 Technology Trends Likely to Impact Information Management in 2013 Technology Trend Big Data Modern information infrastructure Semantic technologies The logical data warehouse NoSQL DBMSs In-memory computing Chief data officer and other information-centric roles Information stewardship applications Information valuation / infonomics Source: Gartner 9
  8. 8. Broadly, text based offerings can be clubbed under two main heads Statistical text mining • • • • • • 10 Natural language processing Looks for documents based on statistical techniques. Helps identify high frequency terms or expressions Identifies other terms being used in conjunction with them Assigns match probability to documents based on mathematical techniques to facilitate searches and knowledge management Accuracy could be improved further by using machine learning principles • Primary applications: Text mining and document matching (eg VoC analysis, Email analysis, E Discovery, etc) • • • • Parses a sentence to identify nature of words in it More relevant for sentence level analysis as opposed to document level analysis Principles of English, as opposed to statistical techniques, take precedence in analysis Accuracy dependent on strengths of algorithms written Primary applications: Named Entity Extraction (knowledge management), Sentiment analysis (VoC analysis, E mail monitoring, etc)
  9. 9. Industry Overview – usual application areas Areas Technique used Social media analytics Better advertising placement CRM information capture and action Sentiment Analysis using NLP Coupled with vertical specific taxonomies E Discovery Auto classification Forensic analysis Statistical text mining Named Entity Recognition (NER) Machine learning Pattern analysis Predictive modelling Statistical text mining Named Entity Recognition Coupled with structured data (e.g. frequency of mails, department information, etc) Knowledge Management Auto tagging and classification Discovery (eg healthcare information sharing) NER (for named entities) Statistical text mining Custom ontologies / semantic networks Vertical specific use cases Examples: Financial services, Publishing, Pharma, Healthcare, Legal, Insurance, etc Various degrees of text mining, NLP and sentiment analysis, and entity extraction techniques Marketing Compliance Risk analysis, Fraud detection 11
  10. 10. But purely from an R&D perspective, quality thresholds have a very high standard deviation NLP eDiscovery Ontology 12 • • • • Attaching sentiment to attribute, and attribute to object Handling basic keywords (e.g. I like something, vs. something is like another) Vertical taxonomies that allow aggregation Vertical specific sentiment words (e.g. executing a man vs. executing a transaction, high fuel economy vs. high fuel consumption) High variability in Recall and Precision rates Tagging of concepts remains difficult Summarization techniques based on basic lexical parsing Limited use cases Often seen as multi year projects as opposed to quick win areas
  11. 11. The reason for the quality difference is that at many times, client context is not fully understood and the software is not trained on such context • What is the primary purpose for which the tool will be used for: finding trends, better search, forensics, fraud prevention, building predictive models, etc • Are certain terms so common that they must be ignored while doing an analysis • Are there domain specific words that attain a different meaning than in other domains (eg ‘execution’ has a different meaning in financial services than in the news domain) • Should weightages assigned to certain kinds of documents / words be increased to improve relevance • How will the results be presented – are they to be shown visually and not be connected to other enterprise systems, or should they be an integrated part of the overall BI roadmap of an organization Unlike traditional systems, text analytics has a large dependency on context. Consequently, in order to unleash its full potential, the usual bifurcation between consultancy, software development and software implementation must disappear in the case of text analytics. An off-the-shelf product approach will definitely not help, and one must adopt a services model to better serve client needs! 13
  12. 12. In addition, there is limited focus on client needs and use cases Technology focused • Companies mostly founded and run by technology experts Customer language • Focus on technology capability and terms as opposed to problems to be solved Product approach 14 • Leave out value to be derived by examining enterprise specific data more closely, or integrating it with structured data for greater insights
  13. 13. An example of our Natural Language Processing capabilities “The car model looks like the old one” “I loved the food, but the service was terrible” “Did anyone like the car?” “I really luuuuv it” “The Tokyo office does not like the current prototype of the product. Bob said we should talk to them to find out why they are unhappy. Must close this ASAP to get the launch done by August 2013.” IP protection: • Patent being filed for clause based sentiment extraction process 16 • Can tag sentiments to attributes, and attributes to products • Can handle difficult words, eg ‘like’ based on context – most engines cannot • Can handle anaphora resolution (eg pronouns) • Can handle Named Entity Recognition with high recall and precision
  14. 14. Our Discovery product demonstrates the NLP capability in a powerful manner, making consumer feedback actionable • • Clickthrough allows deeper dives into each category • Though price gets mainly negative reviews, not too many people seem to talk about it. Perhaps a discount scheme could help? • Actual sentences are displayed, and things to which the sentiments are attached are highlighted • 17 In this example about a vehicle, most people care about comfort, and luckily, the product gets mostly positive reviews in this area Sentiments are associated with specific aspects of the product
  15. 15. Example of Natural Language Processing in Financial Domain (continuing R&D)  Extracts economic factors that have been impacted  Recommendations and predictions help analyze complex financial information in quickest time.  Helps in predictive analytics 18
  16. 16. Example of Natural Language Processing in Financial Domain – highlighting outlook by driver (continuing R&D)  Linguistic rules to extract financial / economic indicators  Domain specific verbs and nouns to understand movement Financial markets rebounded strongly in 2006's third quarter . FINANCE ENT : Financial markets ACTION : rebounded TIME : 2006's third quarter MOVEMENT : UP By the end of the third quarter , crude oil had fallen over 20 % from its[crude_oil] July peak , while a similar retreat in natural gas prices produced the latest high-profile hedge fund debacle . FINANCE ENT : crude oil ACTION : had fallen TIME : the end of the third quarter QUANTITY : 20 % MOVEMENT : DOWN FINANCE ENT : natural gas prices ACTION : produced the latest high-profile hedge fund debacle MOVEMENT : DOWN Prices of longer-dated bonds rallied too : the 10-year U. S. Treasury bond yield fell over 60 basis points during the third quarter . FINANCE ENT : Prices of longer-dated bonds ACTION : rallied MOVEMENT : UP FINANCE ENT : the 10-year U. S. Treasury bond yield ACTION : fell over 60 basis points TIME : the third quarter QUANTITY : 60 basis points MOVEMENT : DOWN
  17. 17. Example of Natural Language Processing in Financial Domain -extracting Cause and Effect (continuing R&D) As the fourth quarter begins , financial markets remain supported by positive earnings and interest rate trends . FINANCE ENT : financial markets ACTION : remain supported TIME : the fourth quarter CAUSE : positive earnings and interest rate trends EFFECT : financial markets remain supported However , the pace of U. S. economic activity will slow further by year-end as weakness in the housing and automotive sectors becomes increasingly acute . FINANCE ENT : the pace of U. S. economic activity ACTION : will slow TIME : year-end MOVEMENT : DOWN CAUSE : weakness in the housing and automotive sectors becomes increasingly acute . EFFECT : the pace of U. S. economic activity will slow year-end 20
  18. 18. An example of our Enterprise capabilities • Ontology modeling using RDF and OWL semantic web standards • Document Matching / Similarity using statistical models and concept based approach for Patent Search, Knowledge Management etc.. • Information Extraction using linguistic models for Fraud Detection, analysis of news stories etc.. • Demonstrated capability for patent search, legal cases, handling survey data • Machine learning capability allows for precision to be attuned and increased for specific client situations • Can disambiguate based on domain specific situations, e.g. execution may mean a different thing in a news domain, vs. executing a transaction in financial services domain 21
  19. 19. Veda Text Mining capability – key features Preprocessing Processing Data input in various forms (eg txt, doc, etc) Can accept data from public sources (eg Facebook, Twitter) apart from Enterprise sources • • • • • Removal of junk text around emails Removal of small Emails like “Thanks” Removal of forwarded Emails attached to main Email from analysis Spell checks and autocorrects Language parsing for English • • • • Natural Language and Statistical Processing techniques Extraction of key discussion items from the text, and what is being said in relation to them Key themes from messages and semantic chaining. Can be combined with sentiment analysis as well. Ability to handle high velocity and high volume data using Big Data infrastructure (Hadoop, Storm, etc.) • Input • • Group discussion items into categories and sub categories, while identifying what is being said about them: • Automatic for synonyms, singular and plural, etc • Ability to add / delete categories • Ability to further analyse sub-categories Categorization UI, editing and • • export • 22 Simple, easy custom built UI with filtering and drill down capability Machine learning approach where human insight guides further results Output not only available in visual format, but exportable to other applications or databases
  20. 20. Veda Text Mining capability – screens of analysis in progress Clustering conversations into categories using semantic analysis. Example customized outputs 23
  21. 21. Our Delivery Capabilities Proof of Concept Trial & Demonstration Delivery Methodology High-level client requirements Detailed solution requirements - Define the scope of work - Delivery framework (core offering + value added services) - Documented External Interfaces with Volume and associated recurring cost (if any) information - User Guide & Training - Proof of concept - Methodology (Agile, Waterfall approach or client specified approach) - Timelines for each deliverable 24 - Responsibility Matrix
  22. 22. Delivery Methodology Client assignments Program Activities Project Delivery Program Mgmt Program Initiation Project Kick-off Support Activities 25 Feature Selection Data Set Creation Business Requirements Infrastructure Readiness Program HR Mgmt Analysis and Design Operational Readiness Program Benefits Tracking Change Analysis Project Closure Machine Learning Development Support Delivery Test & Verify Training Release Post Release Support
  23. 23. 26 Taking the next step *Implement for a business function/division/a single geography *Multiple features of SIS implemented including cross business solutions leading to concrete measurable gains Phase 3 Veda will solve a business challenge you choose to demonstrate the power of a semantics based solutions in a quick turn around (Typically within few days) exercise Phase 2 Phase 1 For bespoke development, we are prepared to start small, to show clients clear value and RoI Replicating the success of the previous phase – *Across Larger Sections of the enterprise *Wider Data consolidation scope *Multiple output delivery channels *Visible long term gains
  24. 24. But ultimately, we believe that clients will benefit considerably by a unified Semantic Information System Staging Area Data Warehouse Reporting Data Mart * Insights from Unstructured data coupled with Analytics from Structured Data assets (E.g. BI, Big Data) Dashboards Databases Structured data Store into Cubes Data Mart Processed data Databases Alerts Unstructured data (Server,SAN,SAS) Internet Public Web Data Ready insights Processed data Online Natural Language processing Email Crawler Ontologies Files Crawler Data Semantic Analysis Knowledge Base Crawler Unstruct ured Data Categorized Data Veda Organising Processes Web Crawler Social Media Auto Classification Visual Segregation Unstructured & Semi-Structured Data Structured Data Social Media 27 Processed data Veda Collection Processes chatter * Collecting unstructured data from disparate sources Databases Formatted data Structured Data * Analyse all collected unstructured data, Organize it using rich knowledge representation/domain ontologies Data Structured Data Data Mart Marketing Purchasing Payroll Sales LOB Applications Operations
  25. 25. Veda Approach – COP Framework Our proprietary Collect – Organize- Present framework and tools allow us to undertake quick bespoke development • Connectors Collect — Collect information from variety of (heterogeneous) sources • Information Extraction — Using NLP and semantic analysis • Semantic Net / Ontology Editor — Smart knowledge representation of a domain Organize • Auto Classifier — Classify data and tag it to industry specific concepts automatically • Ontology Reasoning — Analyze industry knowledge and infer from ontological knowledge • Analytics — Identify various patterns and insights from the data Present • Semantic Matching — Provide most relevant information • Semantic Search and Browsing — Semantic explorer to retrieve contextual concept-based information 28
  26. 26. Veda’s Value Proposition • Technology Deep understanding of the Semantics space • • Expertise in both NLP and ontologies / taxonomies, and in standards (RDF / OWL) • • In the semantic technology space for more than a decade Team has provided services not only to clients, but to other semantic service providers Tie up with academia • • Delivery 29 Allows for cutting edge R&D • • Tie up with leading Indian university in the area High quality talent pipeline Live - Delivery and Support Turnaround — The Veda Platform is the core that — Is a solution accelerator giving a head start to all our assignments (tested and certified components) — Allows for lower costs — Allows for incremental rollouts
  27. 27. Veda’s Value Proposition (contd) • Expertise in Multiple Business Domains • Experience Healthy mix of business and technology expertise – can provide clear use cases for Semantics and help establish clear RoI metrics • Core team members have had experience in Semantic technology since 2003, longer than most other companies • Technology team experienced in providing expertise in a wide variety of business domains leading to speedy and effective solution implementations • Located in India, with associated inherent advantages • Lower cost options for clients with onshore – offshore model • 24 hour work cycle • Large talent pool • Location Tie ups with companies focused on various other related technologies to offer integrated offerings, eg full service offering / working with offshore vendor to make outsourced processes more efficient using semantics 30
  28. 28. Veda’s End-to-End Semantic Expertise • Text Analytics — • Analyzing unstructured text, converting to structured data Machine learning — Statistical techniques resulting in increasing accuracy over time (with more inputs) • Sentiment Analysis — • Semantic Information Retrieval — • More artifacts searched/More accurate – e- Mails, Documents, Spreadsheets, Output from existing structured data sources Semantic Web Standards — 32 Identifying if the sentiment of a sentence is positive, negative or neutral (and the various shades in between) Standardized storage and output formats for easier information sharing
  29. 29. Past Experience Client Profile Project Description A global publishing house in legal, tax, finance and healthcare  Context-based content research platform for tax & legal domain  Automatic meta-tagging , ontology modeling and ontology driven content reference system. A prominent product manufacturer on inference and reasoning engine  Leveraged semantics for a supply chain process to integrate systems with heterogeneous data sources and help in automatic decision making in case of any disruptions in the cycle.  Provided ontology modeling and application development services. A reputed university and complex systems  Produced a method for organizing and potentially navigating the wide research lab in Australia range of web-pages associated with the Murray-Darling river system in a seamless fashion An analytics software manufacturer in Australia A premier worldwide online providers of news, information, communication, entertainment and shopping services 33  Assist investigation of fraud and terrorism – Establishing links between entities  Unstructured data analysis  Developed a web analytics platform for analyzing click-stream data in real-time.
  30. 30. Some sample use cases mapped to our current technology demonstrators Current situation • How Semantics will help Mapping to current Veda technology demonstrator Saved in C drives or in DMS, separate excel sheets maintained to check on timely renewals, etc. Tough to compare specific clauses across contracts or find relevant clause as needed • Search for specific kind of contract and specific clause will throw up (a) master template (b) earlier contracts entered into in the area (c) extracts from the relevant clause • Patent search demonstrator uses similar techniques, allowing the user to also see probabilistic match of documents • Dig deep into embedded code to see what departments and areas will get impacted • Ontology based relational steps make it easy to see connected departments, processes, etc. that will be impacted • Tax caselaw and section ontology created • Mapping social sentiment and reviews done manually or using dictionary based social monitoring tools • Some social marketing and social listening already being done, though not accurate. A better quality NLP engine allows for more accurate results (e.g. the word ‘like’). • Veda Discovery Engine which has sentiment capabilities • Obtaining right resumes using keyword search remains time consuming Employee suggestions in open ended surveys not aggregatable Qualitative comments in employee evaluations not aggregated • Identify key intervention areas at aggregate levels Map trends in overall ratings to key strength and weakness areas • Veda Discovery for aggregation, Veda Txt for identification of gist of comments Metatagging remains a manual process and as a result, searches remain searches, not findings • Automatic metatagging (Persons, Locations, Organizations, concepts, etc.) • Veda Discovery – NER Engine, Veda Legal demonstrator, Veda Msg (for alerts) Legal contracts • Process changes Marketing HR • • • Knowledge management 34 •
  31. 31. Sample use cases by industries Domain Publishing, media Allows automatic extraction of people, location, dates and events, being extended to themes and concepts. Helps in automatic metatagging. • Current tagging process is manual and time consuming. Technology provides clear RoI by reducing this time and manual labour, providing consistent tagging, and allowing easier search for future reference, rather than relying on keywords (eg Mahatma vs Gandhi vs Mahatma Gandhi). Oil and Gas 35 Description Can make Incident monitoring and reporting systems more robust, thereby reducing risk of major accidents • For incident reporting, a user need not fill in multiple structured data fields. Text analytics can quickly match data to structured inputs. • Witness reports, once converted to text, can be monitored across incidents for patters that would otherwise have gone unnoticed. Helps make process changes easier and allows all linked aspects to be seen at one go • Helps determine what other processes and safety regulations are relevant if a sub process is sought to be changed (could also include contractual information etc if relevant) Usually, companies have millions of oil well logs which can be classified by performing named entity extraction and enrichment
  32. 32. Sample use cases by industries Domain Description Financial services • • • • • Contract matching (including addendums) VoC analysis • Churn prediction • Highlights capability gaps Promotion management • Avoids duplication of creation of similar material across divisions / locations. Saving in man hours and resources by leveraging all available material produced earlier Risk analysis • Manage and gather customer documents from various sources to look for areas of concern “Know your customer” analysis Competitor analysis Financial news analysis for investment managers Telecom • • • Legal interception and pattern recognition SMS analyses for recognizing spam to avoid penalties VoC analysis Airlines • Analysis of unstructured problem and safety logs to avoid incidents • • 36
  33. 33. Sample use cases by industries Domain Description Healthcare • Link and compare patient records to obtain insights on: • Symptoms, medicines and discharge times to determine if some medication mixes may be more beneficial than others across a wide set of patient records • why some patients may be re-admitted Pharma • • R&D improvement by allowing scientists, who need to refer to papers but may not know exactly what to look for, to see relevant topics (based on automatic metatagging, and linked ontology at the backend) Better knowledge management - automatically tag papers, saving scientist time and making search consistent Feedback analysis for product from distributors, doctors and end patients • Broker document analysis to deepen insight on insured risks to improve risk management • Insurance 37
  34. 34. Sample functional use cases Domain Marketing • • • • Voice of Customer analysis New product ideas Competitor analysis Complaint monitoring HR • • Drawing insights from employee suggestions Analysing unstructured inputs in evaluations and improving training efficacy Risk • Internal document monitoring for risk and compliance Legal 38 Description • Better contract management
  35. 35. Veda Solutions Currently Deployed Veda for Business Process Workflow • Configurable to any Business requirement across Industries • Sources of content can be structured AND Unstructured • Can be integrated to various Business Applications - ERP, Content Management, Portals, etc.. • Configurable User Interface with features such as: – Saving of Search for later reference – Tabbed Views – No. of results to be displayed with sort order 39
  36. 36. Veda Solutions Currently Deployed Veda Social Media Analytics  Registration & log in  Inputs from Social Media  Inputs from Blogs, Websites  Hierarchy & Relevance Analysis  Sentiment Analysis  Rich Reporting 40
  37. 37. Veda Solutions Currently Deployed Veda Recruiter 41
  38. 38. Veda Solutions Currently Deployed Veda Patent Search  Registration & log in  Subscription  Payment Gateway  Keyword Search  Semantic Search  Rich Internet Application  Saved Search  Filters 42
  39. 39. Veda Solutions Currently Deployed Veda SMS Service  Registration & log in • Crunches judgment text into high relevance words that can be sent through an SMS for immediate access • Is combined with website service offering full access for relevant cases 44  Subscription  Payment Gateway  Keyword Search  Semantic Search  Legal ontology (Indian)  Filters
  40. 40. Contact details Veda Semantics Pvt Ltd Contact person: Rajat Kumar (CEO) # +91-9619308745 45