Unstructured BI in pharmaceutical company


Published on

A critical discussion on the statement "Enterprises today have access to large amounts of information from internal as well as external sources. The information typically comes in either structured or less structured forms. However, enterprises generally do not make the best use of the information they have access to, tending instead to focus on just internal structured data generated by core transactional systems"

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Unstructured BI in pharmaceutical company

  1. 1. K6221 Business Intelligence Mini Assignment K6221 Business Intelligence 2011-2012 Mini Assignment Sesagiri Raamkumar Aravind (G1101761F) Mane Shivaji Dilip Kumar (G1101841A) “Enterprises today have access to large amounts of information from internal as well as external sources. The information typically comes in either structured or less structured forms. However, enterprises generally do not make the best use of the information they have access to, tending instead to focus on just internal structured data generated by core transactional systems.”Statement ElucidationAs per the problem statement, even though enterprises have access to plethora of required information around them,they make good use of the data coming from traditional OLTP systems only and it is restricted to structured content.Internal and external unstructured data is not leveraged for making business decisions. Wittles (n.d.) asserts that only20% of an organization’s data is structured and ready for use in BI data analysis. The remaining 80% is unstructureddata. Therefore, the significance of unstructured data is highly underestimated in most enterprises.ScenarioThe authors opt to critically discuss the problem statement based on a particular scenario. The scenario is“„Marketing Director‟ of a major pharmaceutical company monitoring the performance of a newly launchedpotential blockbuster drug in the Asia Pacific region (excluding Japan).”DiscussionLarge enterprises of today rely on enormous and complicated information systems to fuel their growth and help withtheir daily operations and sustainability. The amount spent on such systems even reaches billions in certaincompanies. In our scenario, pharmaceutical companies inadvertently rely on unstructured data for leading the raceagainst competitors as studies show that the average company makes decisions based on data that is 14 months old.It has become clear that companies that can make faster decisions will spearhead that particular market. Strategicadoption of the IT systems is very critical as it has direct impact to the process of research, development and sales ofdrugs (Dave). Enterprises have reached a stable stage with respect to the setup of BI infrastructure that can handleinternal data extracted from different sources such as ERP and CRM systems. Enterprise data warehouses areupdated on a daily basis with transactional data coming from different regions. Data from EDW cascades toregion/domain specific data marts and ODS so as to meet local reporting needs. In totality, EDW provides a goodcanvas for supporting transactional and historical reporting needs of MIS, ESS and DSS systems.A product launch is a major make or break event for a pharmaceutical company as it feels the push to realizerevenue generation through short term and long term strategies so as to fund further R&D activities. A marketingdirector cannot afford to rely entirely on transactional data for making sound business decisions. These decisions aremade to increase visibility and saleability of the new drug in a particular market. As a part of the job, the marketingdirector would be expecting to get information about different aspects. The table 1.1 provides the details Page 1 of 4
  2. 2. K6221 Business Intelligence Mini AssignmentSl. ReadilyNo Information Source Type Available Remarks Assumption that internal DSS Sales of drug in each market (split-up by day, has data from all markets at 1 region, distributor etc) Internal Structured Y required frequency Marketing Cost in each market (by media)- this Assumption that internal DSS is 2 includes free samples Internal Structured Y integrated with CRM systems Perception about the drug from Doctors, Sales Internal Personnel, Marketing staff, other internal staff and Can be got only after collation 3 and general public External Unstructured N from different sources Market Share of new drug by value and volume Can be got at end of every by each market on comparison to other quarter from market 4 competitor drugs from same therapeutic area External Structured N intelligence firms such as IMS Assumption that internal DSS Actuals vs Budget and Actuals vs Forecast has data from all markets at 5 comparison by each market. Internal Structured Y required frequency Y because Readily available in Details about dept level decisions recorded in repository and N because not 6 documents Internal Unstructured Y and N in integrated state Table 1.1: Valuable information for pharmaceutical company during drug launchIt is clear that information about some important aspects is of unstructured format. Examples of unstructured data inan enterprise are HTML content (e.g. web chat, blogs and web pages), Documents (e.g. memos, research papers,MoMs and articles), Forms (e.g. patent applications), Emails, SMS content and Multimedia content (audio, video,images) (Ferguson,2011; McCallum, 2005; SPSS, 2003).Decision makers in a company have to rely on facts to make sound business decisions. The availability of sufficientand timely facts can help in the process. In this case, the Marketing Director should be able to pull the required dataand the system should have the mechanism to push specific information as well. A distinction is made between dataand information because only information should be pushed to a user as he/she will not have time to analyze plainfacts without any context. Typical examples applicable to this case are listed below.Pull data: Sales & Expenses data, Market share, and Supply chain inventory data.Push information: Supply chain deficiencies, summarized content delivery from analytics systems pertaining tosentiment and opinion about the new drug from internal and external social media platforms, flash updates on sales,libel cases on new drug from FDA and other sources.The Push type of information is mostly of unstructured format thereby justifying its importance. Unstructured datacharacteristics are visibly and intrinsically different from transactional data. Differentiating factors are mainlyrelated to representation, source, context, understandability, timeliness and shelf-life. In general, characteristics ofunstructured data are:- Page 2 of 4
  3. 3. K6221 Business Intelligence Mini Assignment Does not reside in relational database tables. Has no predefined structure or format. Not arranged in any order. Difficult to categorize for use in BI. Resides in several documents over multiple sources  Internal (data within an organization)  External (data outside the organization)These characteristics make it difficult for technical personnel to store and catalog unstructured data in an EnterpriseData Warehouse (EDW) apart from the inherent difficulty in capturing required data. The heterogeneous nature ofthe sources adds to the complexity. Typical sources for unstructured data include Email archives, Call centertranscripts, Customer feedback databases, Enterprise intranets, Enterprise content management systems, Filesystems, Document management systems, Social networking sites and RSS Newsfeeds (Ferguson 2011:6).There are techniques for unstructured data to be captured and utilized. Crawlers can be used for capturing relevantinformation from enterprise data ecosystem, social media sites and WWW. The captured information is then taggedand indexed for retrieval purpose. The final stage is the knowledge discovery stage that involves text mining andweb mining (popularly called as content analytics), to derive insight for business benefits.An ideal BI system should provide the ability to create Enterprise Mashups. Mashups are used to integrateinformation sources and functionality from different sources to create new services. These kinds of applications aremore suitable for agile development project thereby suitable to our scenario to look at data from different sourcesthat help in making decisions. However, there are few challenges to it. Choosing the right information sourcesamongst unstructured data and content sifting mechanisms are some known challenges. Mashups are an emergingtrend that is there to stay as it provides a one-stop shop for decision makers.Future considerations for handling unstructured data Ensuring that user content is accurately tagged. Ensure that content is up-to-date and relevant. Validating content sources. Identify business drivers to get the best solution. For scalability issues allocate adequate processing power to analytics.Figure 1 gives a pictorial representation of the current usage of BI in pharmaceutical companies and the neglectedblue ocean segment of unstructured data BI. Page 3 of 4
  4. 4. K6221 Business Intelligence Mini Assignment Fig 1: Usage of Business Intelligence in a pharmaceutical companyConclusionEnterprises are aware of the importance of unstructured data in current day scenario but they fail to leverage it dueto technical (capturing and storing) and logical (classification and integration) constraints. This situation is bound toimprove with best practices and simpler technical processes. Investment in Content Analytics and EnterpriseMashups will definitely be realized in the long run.ReferencesWittles, G. (n.d.). Unstructured data offers a vast store of untapped BI value . Retrieved fromhttp://www.themanager.org/strategy/Unstructured_data.htm (Wittles)Dave , W. (n.d.). Unstructured data in life sciences. Retrieved fromhttp://blogs.hds.com/storagestat/2011/11/unstructured-data-in-life-sciences.html (Dave)Ferguson, M. (n.d.). Integrating and analyzing unstructured data. Info 360 BI Conference. Washington DC.(Ferguson, 2011)McCallum, A. 2005. Information Extraction. (http://people.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf )Retrieved 17 February 2011. (McCallum, 2005)SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago (SPSS, 2003)Grimes, S. (n.d.). Nimble intelligence: Enterprise bi mashup best practices. Retrieved fromhttp://www.jackbe.com/downloads/nimblebi_grimes.pdf (Grimes) Page 4 of 4