Actionable Intelligence From Unstructured Data using MDA


Published on

Data is everywhere, but far too often, not the information we need. Businesses continue to generate a huge volume of memos, reports, minutes of meetings, planning documents, proposals, emails, website content, blogs, wikis and other content. But this wealth of data is not providing companies with the information base it needs to make the right decisions when it needs to. Because all this unstructured data is not actionable intelligence. As a result, although we are awash with data everywhere, we make uninformed decisions based on a very small slice of that information that is readily available to us. This white paper explores a solution strategy.

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Actionable Intelligence From Unstructured Data using MDA

  1. 1. Call 888.453.0014 ADA SOFTWARE SOFTWARE MODERNIZATION - POWERED BY MODELING The automated software modernization company Call 888.453.0014 Informational Primer Actionable Intelligence from Unstructured Data MEMBER Software Modernization. It’s all we do!!! P 1 7 AGE OF 379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837
  2. 2. SOFTWARE MODERNIZATION - POWERED BY MODELING EXECUTIVE SUMMARY D ata is everywhere, but far too often, $6 million per year searching for information and not the information we need. Busi- not finding it. Add to this the lost revenues nesses continue to generate a huge caused by unproductive employee time. volume of memos, reports, minutes The potential loss from unstructured data is, of meetings, planning documents, proposals, therefore, multi-faceted and consists of: emails, website content, blogs, wikis and other  Uninformed decisions content. But this wealth of data is not providing  Overlooked risks companies with the information base it needs to  Loss of employee time make the right decisions when it needs to. Be-  Loss of opportunity cause all this unstructured data is not actionable  Loss of revenues intelligence. As a result, although we are awash All of these can be fixed by our meta- with data everywhere, we make uninformed deci- model driven information management solution sions based on a very small slice of that informa- that can turn all this unstructured data into rich, tion that is readily available to us. Figure-1 shows actionable intelligence. how the Information Framework stands broken. Worse still, all this underutilized deluge of unstructured data is actually causing compa- nies to lose money. IDC estimated in their report titled “The High Cost of Not Finding Infor- mation” (IDC #29127) that com- panies with 1,000 white collar employ- ees typically wasted in excess of Fig - 1 Software Modernization. It’s all we do!!! PAGE 2 OF 7
  3. 3. SOFTWARE MODERNIZATION - POWERED BY MODELING ACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATA Make Unstructured Data Come Alive UNDERSTANDING “UNSTRUCTURE” this structure automatically unless we find a way of adding machine-readable information to all this “Unstructured data” is not really unstruc- data. tured. Let us take the example of a paper maga- zine. It has a wonderful structure. The Table of Contents offers an instant overview of the entire SOLUTIONS STRATEGY magazine and provides an useable index that we The key to unleashing knowledge from al can use to jump to any article by page number. this powerful, but untapped, information lies in Within articles, there are pictures to help us visu- being able to: “Unstructured” data alize the information contained in  Generate the right METADATA (data about might have an the text; there are headings that the unstructured data) that a machine can excellent structure of its own - that are bolded and tell us what a understand, computers do not section of text is talking about;  CATEGORIZE the data using an easily un- understand. there are blurbs (information call- derstood VOCABULARY, and a TAXONOMY outs) that highlights some of the that indicates the data hierarchy and relation- main points of the article; there might be an ab- ships. stract providing a gist of the whole article; there  Provide a KNOWLEDGE RETRIEVAL may be footnotes, citations and references that mechanism that understands all of the link the ideas expressed with a world of informa- above. tion outside the magazine, There are advertise- ments, which we immediately recognize as ad- APPLYING OMG STANDARDS vertisements. There is information about the edi- OMG has modeling standards embodied torial team, the company publishing the maga- in Model Driven Architecture that can be utilized zine and the authors of the various articles, for modeling any kind of information (though it is An e-mail gives you all the information as originally intended for modeling and understand- to who wrote the e-mail; when was it written; to whom was it addressed; who all received a copy; what was it all about (the Subject); the main body of the message; and, any reference material pro- vided as an attachment or an URL So there is, indeed, a lot of structure in what we started out as identifying as “unstructured data”. ing software systems). The Knowledge Discovery The problem is not with the data. The Metamodel (KDM), for instance, separates problem is that a machine does not understand Software Modernization. It’s all we do!!! PAGE 3 OF 7
  4. 4. SOFTWARE MODERNIZATION - POWERED BY MODELING knowledge about existing systems into four or- Abstract Syntax Tree Metamodel representing thogonal dimensions: Structure, Behavior, Data the unstructured data. and User Interface, Unlike software, data has no behavior. But it has associative fact patterns. So AUTOMATIC CATEGORIZERS we utilize a modified version of the KDM concept Automatic categorizers will act on the adapted for understanding unstructured data, metadata and perform the following functions: which we call mKDM.  Linguistic analysis OMG also has an initiative called the Se-  Statistical inference mantics of Business Vocabulary and Rules  Machine learning (SBVR) which is a standard for establishing a  Rule-based processing business vocabulary and terminology system that These will obtain the relevant vocabular- can be used to express business models. This is ies, taxonomies and rules from the Semantics of very useful for defining vocabularies to under- Business Vocabulary & Rules (SBVR) that is part of our reference Knowledge Modeling Standard. The SBVR will provide the relevant business vocabulary necessary to do this job properly. For instance, if we are doing this job for a stock- broker, the relevant business vocabulary will be far different from what will be relevant for a law firm. Documents will be assigned to multiple stand unstructured data, taxonomies to catego- categories. rize unstructured data, and rules for processing The output of the automatic categorizers unstructured data. will be a Metadata Repository; and catalogs, fact Coupled together, mKDM and SBVR pro- patterns and indexes. vide the base technology for creating metadata; defining the vocabularies, taxonomies and rules KNOWLEDGE RETRIEVAL ENGINE for processing the data; and retrieving useful in- Regardless of whether the user is formation based on linked entities as well as searching or browsing or seeking information “inferred” fact patterns. This helps convert un- through a web service or an API, the actual re- structured data into actionable intelligence. trieval will be performed by a Knowledge Re- trieval Engine. It has to scan and parse the PARTS OF THE SYSTEM “request for information” with reference to the SCANNERS AND PARSERS same vocabularies, taxonomies and rules in the Scanners and parsers will process the SBVR that were used by the automatic categoriz- unstructured data with reference to the Knowl- ers. edge Modeling Standards of OMG, and produce It will then retrieve two kinds of informa- symbol tables and syntax trees. This will be an tion: Software Modernization. It’s all we do!!! PAGE 4 OF 7
  5. 5. SOFTWARE MODERNIZATION - POWERED BY MODELING 1. ENTITY EXTRACTION: Focus on identifying than one way of representing the results: textu- named entities. ally or visually, through spatial diagrams or mind 2. FACT EXTRACTION: Focus on fact patterns maps. and detecting relationship between data us- Figure-2 is a schematic representing the ing “inference”. methodology.. The retrieved information will be focused and relevant to the “request for information”. It will be PRACTICAL APPLICATIONS actionable intelligence. Apart from the holistic application of this methodology across an enterprise for rich pro- PACKAGING & DELIVERY ENGINE ductivity gains, greater revenue and informed The retrieved information has to be pack- decisions, this methodology also has many aged and delivered to the seeker of information smaller practical applications on limited sets of using the right channel, The request can come data. from one of many channels, such as interactive search, interactive browsing, web services, or E-DISCOVERY FROM EMAILS well defined APIs. The results are pushed back Email has become the standard for both through the same channel. There is also more Fig - 2 Software Modernization. It’s all we do!!! PAGE 5 OF 7
  6. 6. SOFTWARE MODERNIZATION - POWERED BY MODELING internal and external communication. A com-  Advanced search capabilities to find specific pany's email contains important, and sometimes records within your complete and secure ar- confidential, information that is today increasingly chive. going into massive e-mail archives, whether to  Locate and produce evidence-quality mes- comply with mandatory gov- sages with metadata in seconds. Powerful E-Mail Analytics ernment regulations or for in-  Analyze a complete audit trail for every mes- can provide never formation archival. sage. before discovered intelligence from E-discovery refers to  Review and classify every message (based plain company discovery in civil litigation on your company's rules and permissions) emails which deals with information in that leaves or enters your organization's do- electronic format also referred to as Electronically mains. Stored Information (ESI). Emails can be a prime  Messages can easily be classified for legal source of information in civil litigation. hold when court or counsel requests that all Financial and other firms subject to Sar- data relevant to a particular case be pre- banes-Oxley regulatory compliance need effec- served. tive e-discovery mechanisms from their e-mail  E-mail analytics designed to be utilized in archives and other documents. complex litigation or investigative matters. Our solution can help you implement a  Search for and identify key individuals and powerful information retrieval mechanism from assess their relationships and communica- Email Archives, resulting in the following capabili- tion patterns. The Activity Schematic in Fig- ties, and more: Fig - 3 Software Modernization. It’s all we do!!! PAGE 6 OF 7
  7. 7. SOFTWARE MODERNIZATION - POWERED BY MODELING ure-3 displays communication patterns with a professionals like research scientists, pharma- key individual placed in the center, and e- cologists, chemists, biologists, chemical engi- mail correspondents connected with radial neers, production floor specialists, clinical trial spokes. units and others. All the information flow amongst  Timeline View can be produced as a horizon- these diverse entities located in diverse geo- tal timeline to help assess critical time peri- graphical locations has a very large share of ods in the matter under investigation. “unstructured” data.  E-mail Analytics help you to easily identify communications of the key players for sub- LAW FIRMS stantive review. Law firms try to make sense from un- We can transform your email archive into structured data every single minute of their exis- rich actionable intelligence. tence. With the expanding Internet-driven uni- verse, making sense out of information overload OTHER REGULATORY COMPLIANCE and using the results meaningfully for their cli- Regulatory compliance also weighs ents’ benefit is an ever-expanding challenge. down the life science companies (pharmaceutical, biotech and medical device CONTENT PUBLISHERS companies). FDA regulations pertaining to clini- Companies engaged in any kind of pub- cal trials, manufacturing proc- lishing, especially delivery of content over the The Lifebood of the Enterprise is esses and drug discovery re- Internet, are competing for differentiation in information. The quire similar diligence in pre- search capability. information economy thrives and survives serving “evidence” for a stipu- Content metadata is most important, as on information. lated period of time. Such evi- that is indispensable for setting up the catalogs, dence is also contained in the fact patterns and indexes that, in turn, can trans- unstructured data items like e- late into accuracy of information delivered. mails and documents. Our methodology equips the company and auditors with reliable and quick INTELLIGENCE & LAW ENFORCEMENT e-discovery processes, apart from other proac- Especially in this age of rampant terror- tive compliance monitoring functions of interest to ism, proactive prevention of crimes is a top prior- the company. ity. The huge world of “unstructured” data con- stantly evolving on the Internet is a rich source of RESEARCH AND DEVELOPMENT intelligence and alerts, but too humungous for Pharmaceutical companies engaged in manual processing and/or informal methods. drug development, for instance, can benefit from A methodology such as ours can effec- every bit of better intelligence and every minute tively harness and delivery untold value from the of human effort saved. Drug development activi- Internet. ties for a single product can span over ten years and involve collaboration from a wide range of Software Modernization. It’s all we do!!! PAGE 7 OF 7
  8. 8. SOFTWARE MODERNIZATION - POWERED BY MODELING When one needs a heart bypass, one goes to a cardiac surgeon. Call When one needs the best storage solutions, one goes to EMC, the storage specialists. 888.453.0014 Why would you go to Accenture, Cap Gemini, Infosys or Wipro for software modernization? WE ARE THE SOFTWARE MODERNIZATION SPECIALISTS. IT IS ALL WE DO. Software modernization. It’s all we do!!! 379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837 Software Modernization. It’s all we do!!! PAGE 8 OF 7