Actionable Intelligence From Unstructured Data using MDA


Data is everywhere, but far too often, not the information we need. Businesses continue to generate a huge volume of memos, reports, minutes of meetings, planning documents, proposals, emails, website content, blogs, wikis and other content. But this wealth of data is not providing companies with the information base it needs to make the right decisions when it needs to. Because all this unstructured data is not actionable intelligence. As a result, although we are awash with data everywhere, we make uninformed decisions based on a very small slice of that information that is readily available to us. This white paper explores a solution strategy.

  Informational Primer Actionable Intelligence from Unstructured Data
  EXECUTIVE SUMMARY

Data is everywhere, but far too often, not the information we need. Businesses continue to generate a huge volume of memos, reports, minutes of meetings, planning documents, proposals, emails, website content, blogs, wikis and other content. But this wealth of data is not providing companies with the information base it needs to make the right decisions when it needs to. Because all this unstructured data is not actionable intelligence. As a result, although we are awash with data everywhere, we make uninformed decisions based on a very small slice of that information that is readily available to us. Figure-1 shows how the Information Framework stands broken.

Worse still, all this underutilized deluge of unstructured data is actually causing companies to lose money. IDC estimated in their report titled "The High Cost of Not Finding Information" (IDC #29127) that companies with 1,000 white collar employees typically wasted in excess of $6 million per year searching for information and not finding it. Add to this the lost revenues caused by unproductive employee time. The potential loss from unstructured data is, therefore, multi-faceted and consists of:
 Uninformed decisions
 Overlooked risks
 Loss of employee time
 Loss of opportunity
 Loss of revenues

All of these can be fixed by our metamodel driven information management solution that can turn all this unstructured data into rich, actionable intelligence.

Fig - 1
  3. 3. SOFTWARE MODERNIZATION - POWERED BY MODELING ACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATA Make Unstructured Data Come Alive UNDERSTANDING “UNSTRUCTURE” this structure automatically unless we find a way of adding machine-readable information to all this “Unstructured data” is not really unstruc- data. tured. Let us take the example of a paper maga- zine. It has a wonderful structure. The Table of Contents offers an instant overview of the entire SOLUTIONS STRATEGY magazine and provides an useable index that we The key to unleashing knowledge from al can use to jump to any article by page number. this powerful, but untapped, information lies in Within articles, there are pictures to help us visu- being able to: “Unstructured” data alize the information contained in  Generate the right METADATA (data about might have an the text; there are headings that the unstructured data) that a machine can excellent structure of its own - that are bolded and tell us what a understand, computers do not section of text is talking about;  CATEGORIZE the data using an easily un- understand. there are blurbs (information call- derstood VOCABULARY, and a TAXONOMY outs) that highlights some of the that indicates the data hierarchy and relation- main points of the article; there might be an ab- ships. stract providing a gist of the whole article; there  Provide a KNOWLEDGE RETRIEVAL may be footnotes, citations and references that mechanism that understands all of the link the ideas expressed with a world of informa- above. tion outside the magazine, There are advertise- ments, which we immediately recognize as ad- APPLYING OMG STANDARDS vertisements. There is information about the edi- OMG has modeling standards embodied torial team, the company publishing the maga- in Model Driven Architecture that can be utilized zine and the authors of the various articles, for modeling any kind of information (though it is An e-mail gives you all the information as originally intended for modeling and understand- to who wrote the e-mail; when was it written; to whom was it addressed; who all received a copy; what was it all about (the Subject); the main body of the message; and, any reference material pro- vided as an attachment or an URL So there is, indeed, a lot of structure in what we started out as identifying as “unstructured data”. ing software systems). The Knowledge Discovery The problem is not with the data. The Metamodel (KDM), for instance, separates problem is that a machine does not understand Software Modernization. It’s all we do!!! PAGE 3 OF 7
  1. ENTITY EXTRACTION: Focus on identifying named entities.
2. FACT EXTRACTION: Focus on fact patterns and detecting relationship between data using "inference".

The retrieved information will be focused and relevant to the "request for information". It will be actionable intelligence.

PACKAGING & DELIVERY ENGINE

The retrieved information has to be packaged and delivered to the seeker of information using the right channel, The request can come from one of many channels, such as interactive search, interactive browsing, web services, or well defined APIs. The results are pushed back through the same channel. There is also more than one way of representing the results: textually or visually, through spatial diagrams or mind maps.

Figure-2 is a schematic representing the methodology.

PRACTICAL APPLICATIONS

Apart from the holistic application of this methodology across an enterprise for rich productivity gains, greater revenue and informed decisions, this methodology also has many smaller practical applications on limited sets of data.

E-DISCOVERY FROM EMAILS

Email has become the standard for both internal and external communication. A company's email contains important, and sometimes confidential, information that is today increasingly going into massive e-mail archives, whether to comply with mandatory government regulations or for information archival.

E-discovery refers to discovery in civil litigation which deals with information in electronic format also referred to as Electronically Stored Information (ESI). Emails can be a prime source of information in civil litigation.

Financial and other firms subject to Sarbanes-Oxley regulatory compliance need effective e-discovery mechanisms from their e-mail archives and other documents.

Our solution can help you implement a powerful information retrieval mechanism from Email Archives, resulting in the following capabilities, and more:
 Advanced search capabilities to find specific records within your complete and secure archive.
 Locate and produce evidence-quality messages with metadata in seconds. Powerful E-Mail Analytics can provide never before discovered intelligence from plain company emails
 Analyze a complete audit trail for every message.
 Review and classify every message (based on your company's rules and permissions) that leaves or enters your organization's domains.
 Messages can easily be classified for legal hold when court or counsel requests that all data relevant to a particular case be preserved.
 E-mail analytics designed to be utilized in complex litigation or investigative matters.
 Search for and identify key individuals and assess their relationships and communication patterns. The Activity Schematic in Figure-3 displays communication patterns with a key individual placed in the center, and e-mail correspondents connected with radial spokes.

Fig - 2
  5. 5. SOFTWARE MODERNIZATION - POWERED BY MODELING 1. ENTITY EXTRACTION: Focus on identifying than one way of representing the results: textu- named entities. ally or visually, through spatial diagrams or mind 2. FACT EXTRACTION: Focus on fact patterns maps. and detecting relationship between data us- Figure-2 is a schematic representing the ing “inference”. methodology.. The retrieved information will be focused and relevant to the “request for information”. It will be PRACTICAL APPLICATIONS actionable intelligence. Apart from the holistic application of this methodology across an enterprise for rich pro- PACKAGING & DELIVERY ENGINE ductivity gains, greater revenue and informed The retrieved information has to be pack- decisions, this methodology also has many aged and delivered to the seeker of information smaller practical applications on limited sets of using the right channel, The request can come data. from one of many channels, such as interactive search, interactive browsing, web services, or E-DISCOVERY FROM EMAILS well defined APIs. The results are pushed back Email has become the standard for both through the same channel. There is also more Fig - 2 Software Modernization. It’s all we do!!! PAGE 5 OF 7
  6. 6. SOFTWARE MODERNIZATION - POWERED BY MODELING internal and external communication. A com-  Advanced search capabilities to find specific pany's email contains important, and sometimes records within your complete and secure ar- confidential, information that is today increasingly chive. going into massive e-mail archives, whether to  Locate and produce evidence-quality mes- comply with mandatory gov- sages with metadata in seconds. Powerful E-Mail Analytics ernment regulations or for in-  Analyze a complete audit trail for every mes- can provide never formation archival. sage. before discovered intelligence from E-discovery refers to  Review and classify every message (based plain company discovery in civil litigation on your company's rules and permissions) emails which deals with information in that leaves or enters your organization's do- electronic format also referred to as Electronically mains. Stored Information (ESI). Emails can be a prime  Messages can easily be classified for legal source of information in civil litigation. hold when court or counsel requests that all Financial and other firms subject to Sar- data relevant to a particular case be pre- banes-Oxley regulatory compliance need effec- served. tive e-discovery mechanisms from their e-mail  E-mail analytics designed to be utilized in archives and other documents. complex litigation or investigative matters. Our solution can help you implement a  Search for and identify key individuals and powerful information retrieval mechanism from assess their relationships and communica- Email Archives, resulting in the following capabili- tion patterns. The Activity Schematic in Fig- ties, and more: Fig - 3 Software Modernization. It’s all we do!!! PAGE 6 OF 7
  7. 7. SOFTWARE MODERNIZATION - POWERED BY MODELING ure-3 displays communication patterns with a professionals like research scientists, pharma- key individual placed in the center, and e- cologists, chemists, biologists, chemical engi- mail correspondents connected with radial neers, production floor specialists, clinical trial spokes. units and others. All the information flow amongst  Timeline View can be produced as a horizon- these diverse entities located in diverse geo- tal timeline to help assess critical time peri- graphical locations has a very large share of ods in the matter under investigation. “unstructured” data.  E-mail Analytics help you to easily identify communications of the key players for sub- LAW FIRMS stantive review. Law firms try to make sense from un- We can transform your email archive into structured data every single minute of their exis- rich actionable intelligence. tence. With the expanding Internet-driven uni- verse, making sense out of information overload OTHER REGULATORY COMPLIANCE and using the results meaningfully for their cli- Regulatory compliance also weighs ents’ benefit is an ever-expanding challenge. down the life science companies (pharmaceutical, biotech and medical device CONTENT PUBLISHERS companies). FDA regulations pertaining to clini- Companies engaged in any kind of pub- cal trials, manufacturing proc- lishing, especially delivery of content over the The Lifebood of the Enterprise is esses and drug discovery re- Internet, are competing for differentiation in information. The quire similar diligence in pre- search capability. information economy thrives and survives serving “evidence” for a stipu- Content metadata is most important, as on information. lated period of time. Such evi- that is indispensable for setting up the catalogs, dence is also contained in the fact patterns and indexes that, in turn, can trans- unstructured data items like e- late into accuracy of information delivered. mails and documents. Our methodology equips the company and auditors with reliable and quick INTELLIGENCE & LAW ENFORCEMENT e-discovery processes, apart from other proac- Especially in this age of rampant terror- tive compliance monitoring functions of interest to ism, proactive prevention of crimes is a top prior- the company. ity. The huge world of “unstructured” data con- stantly evolving on the Internet is a rich source of RESEARCH AND DEVELOPMENT intelligence and alerts, but too humungous for Pharmaceutical companies engaged in manual processing and/or informal methods. drug development, for instance, can benefit from A methodology such as ours can effec- every bit of better intelligence and every minute tively harness and delivery untold value from the of human effort saved. Drug development activi- Internet. ties for a single product can span over ten years and involve collaboration from a wide range of Software Modernization. It’s all we do!!! PAGE 7 OF 7
