• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bussiness intelligence 2011

Bussiness intelligence 2011



How to release the value locked in isolated data to assist business discovery

How to release the value locked in isolated data to assist business discovery



Total Views
Views on SlideShare
Embed Views



2 Embeds 8

http://www.linkedin.com 7
https://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • DIRECTV thrives with active data warehousing: Data latency less than 15 minutes so as to measure customer churn promptly.
  • Unlike a bar graph, the quantitative scale of a line graph need not begin at zero, but it can be narrowed to a range of values beginning just below the lowest and just above the highest values in the data, thereby filling the data region of the graph and revealing greater detail.
  • Scientific articles are tailored to present information in human-readable aliquots. Although the Internet has revolutionized the way our society thinks about information, the traditional text-based framework of the scientific article remains largely unchanged. This format imposes sharp constraints upon the type and quantity of biological information published today.The next challenge is to integrate this vast and ever-growing body of information with academic journals and other media.A schematic illustration of the proposed Structured Digital Abstract for a single genetics article [19]. This document – a machine-readable summary of pertinent findings arranged for simple database deposit – would be coded in XML and submitted alongside the manuscript for final publication. Inset; the same information presented in a hierarchical text-based format, similar to the final arrangement in the actual XML document.
  • Scientific innovation depends on finding, integrating, and re-using the products of previous research. Here we explore how recent developments in Web technology, particularly those related to the publication of data and metadata, might assist that process by providing semantic enhancements to journal articles within the mainstream process of scholarly journal publishing. We exemplify this by describing semantic enhancements we have made to a recent biomedical research article taken from PLoS Neglected Tropical Diseases, providing enrichment to its content and increased access to datasets within it. These semantic enhancements include provision of live DOIs and hyperlinks; semantic markup of textual terms, with links to relevant third-party information resources; interactive figures; a re-orderable reference list; a document summary containing a study summary, a tag cloud, and a citation analysis; and two novel types of semantic enrichment: the first, a Supporting Claims Tooltip to permit “Citations in Context”, and the second, Tag Trees that bring together semantically related terms. In addition, we have published downloadable spreadsheets containing data from within tables and figures, have enriched these with provenance information, and have demonstrated various types of data fusion (mashups) with results from other research articles and with Google Maps. We have also published machine-readable RDF metadata both about the article and about the references it cites, for which we developed a Citation Typing Ontology, CiTO (http://purl.org/net/cito/).
  • The major contribution of BioPAX to e-Science is that it provides a single conceptual framework for the various multiple conceptualizations of pathway databases, i.e. metabolic, molecular interaction, signal transduction and regulatory pathways. It also provides a common format.

Bussiness intelligence 2011 Bussiness intelligence 2011 Presentation Transcript

  • 1
  • Can Business be Intelligent? • Today’s business is in an age of dramatic change, Business Intelligence (BI) is an interactive process for corporates to promptly discern the trends or patterns of business operations, products, services, customers, markets and competitors, thereby to derive insights and draw conclusions. • Human brains are extremely powerful to integrate separated data (even almost forgotten ones) with current scenario to make the best possible decision, cooperation’s decision systems have a long way to go to be even nearly as efficient. • It requires a combination of technologies, art and human intelligence to surface the value under the data sea efficiently. • Contents of the study: 2 ◦ BI System ◦ BI Data Flow Architecture ◦ BI Development Process ◦ BI and Data Preparation ◦ BI and Data Visualization ◦ BI and Dashboard Design ◦ BI and Web Analytics ◦ BI and Social Network ◦ BI and Semantic Technologies ◦ BI and Algorithm
  • Pressures-Responses-Support Model 3 Globalization Customer Demand Market Conditions Competition Technology Advance Regulations … Business Environment Organization Responses Strategic Planning New Business Models Restructure Business Processes Choose New Vendors Improve Partnership Relationships Improve Information Systems Encourage Innovation Improve Customer Service Improve Communication Improve Data Access Automate tasks Real-time Response … Pressures Opportunities Decision and Support Analysis Predictions Decisions Business Intelligence Support (Turban, 2010)
  • Brief History of BI • 1958, Hans Peter Luhn published a paper “A Business Intelligence System” in the IBM System Journal.  “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.” • 1983, Teradata sold the first relational database management system (RDBMS) designed specifically for decision support to Wells Fargo. • 1992, Bill Inmon published a book “Building the Data Warehouse” (Wiley). • 1995, The Data Warehouse Institute (TDWI) was formed. • 1996, Ralph Kimball published a book “The Data Warehouse Toolkit: practical techniques for building dimensional data warehouses”  Business units build their own data “marts”, which could be connected with a “bus”. • 1996, Jim Gray published an article “Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals.”  Support OLAP (online analytical processing) 4 (Hammerbacher, 2009)
  • Bench marking Goal of BI 5 Historical Current Predictive Views of Business Operations Better, Quicker Business Decision-Making Performance Management Reporting Analytics Data Mining Predictive Analytics Internal Data External Data FinanceR&D Supply & Production Customer & Sales Usage Industry Analysis Competitor Status User Analysis Product Ranking Technology Analysis
  • Common Pitfalls of Current System • Reporting data from departments are fragmented (e.g. in excel/PDF files). • Manual extraction is prevalent. • Updating frequency is relatively low, usually monthly. • Analysts often spend more time on data collection than data analysis. Developers spend previous time for manual data feeding instead of improving the products and services. • “Information silo problem”: rich information at source is not easily accessible, or even known to users. • If it is a sin to have useful data unused or underused, then most organization, if not all, in the business world have the sin. The waste is tremendous. 6 Financial Supply Production Subscription Usage Industry Analysis Competitor Status Executives
  • Target System 7 Departmental Product Customer Relation Categorized, Top-down Business Views Automatic and Integrated System Cross departments Data Integration Statistical Analysis Business Metrics Calculation Overnight/Real time Data Collection Executives Strategic Analysis Knowledge Workers Operational Analysis Managers Tactic Analysis Internal Data External Data FinanceR&D Supply & Production Subscription & Sales Usage Industry Analysis Competitor Status User Analysis Product Ranking Technology Analysis
  • What BI is Not • BI is not a panacea for a poor or outdated information system  If information is not complete because some pieces are still in text file manipulated manually, it is better to change the business process to move all the information into better data systems and automate business logics.  If the information is fragmented because there is no unique and well formatted keys to link them, it is better to improve the production system with well designed keys. • BI is not just a collection of charts or tables  BI is supposed to transform data into information.  BI is supposed to link information together to provide insights and assist discovery.  BI is supposed to support both information aggregation and drilldown.  BI is supposed to support “information retrieval” – search capability.  Replication of excel chart/table in BI system often results in static or mediocre reports. 8
  • Where does Intelligence come from in BI? • BI system organize and visualize information so well that human intelligence can be well engaged to analyze the information efficiently. • Human put analysis methods and knowledge into BI system so a BI system can behave like a “smart” expert, following pre-defined logics. • In a well-designed BI system, tremendous data can be linked together in a data network and manifest their underlying relationships which can be hidden from human eyes. • In a (near) real time system, fresh data can arrive to decision makers’ fingertips so quickly that prompt steps can be taken before permanent damages are done, such as to retain customers just requested to cancel services. 9
  • BI Vendor Examples • QlikTech – Qlikview is a flexible, nimble BI solution • Microsoft – SQL server + SharePoint + Excel Power Pivot + Silverlight • Actuate – Business Performance Management (BPM), built on BIRT (an open source BI platform) • Oracle – comprehensive platform • SAS – Business Analysis, Forecast, and Data Visualization • IBM Cognos – Corporate Performance Management (CPM) • SAP – supports a software-as-a-service infrastructure • Google – Google Analytics • Information Builders – Customer Relationship Management (CRM) 10
  • QlikView • Pros  Click driven, visually interactive interface is simple to learn and use.  Based on in-memory associative technology, which is fast.  Flexible data source (Oracle, SQL, excel, txt file).  Quicker to build comparing with traditional BI systems. • Cons  Need straight-forward relationship among tables, which requires very clean data to link multiple tables.  Its underlining calculation logic, set analysis, is not rigorous and hard to use for complicated logics.  Its script language is not complete enough to accomplish comprehensive tasks.  All the data need to be in memory. 11
  • References • Gray, J., Bosworth, A., Layman, A. & Priahesh, H. (1996). Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals”. In Su, S. (Ed.). Proceedings of the 12th International Conference on Data Engineering (pp. 152-159). New York, NY: IEEE. • Hammerbacher, J. (2009). Information platforms and the rise of the data scientist. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 5. Sebastopol, CA: O’Reilly Media. • Inmon, W. H. (1992). Building the data warehouse. New York, NY: Wiley and Sons. • Kimball, R. & Ross, M. (1996). The Data Warehouse Toolkit: practical techniques for building dimensional data warehouses. New York, NY: Wiley and Sons. • Luhn, H.P. (1958). A business intelligence system. IBM Journal of Research and Development, 2(4), 314-319. • Turban, E., Sharda, R., Delen, D., and King, D. (2010). Business Intelligence: a managerial approach (2nd ed.). Upper Saddle River, NJ: Prentice Hall. 12
  • BI and Data Source Data Warehouse Data Mart, Staging Table Production Database Manual Edited File Description A repository of an organization’s electronically stored data A subset of an organizational data store, usually oriented to a specific purpose Data extracted from production system directly Files maintained by information workers. Pros Integrated, Validated, Logic clearly defined Validated, Logic clearly defined, Easy to build Real time, No extra storage Flexible, Cheap Cons Long time to build, Expensive Not fully integrated Impact on production, Data not validated, Transformation limited Prone to human error, Lack of details 13
  • Data Warehouse and ETL 14 Oracle MS SQL Excel File Text File Web Extract Standardize Primary Keys Cleani ng Transform Transform Format Translate Embedded Logic Referential Integrity Check Indexing Load BI Data Warehouse Summarization , Derivation Merge Sort Integration, Aggregation BI System (Moss, 2003)
  • Data Flow Architecture Building Data Mart • Each department offers aggregated data in staging tables, or BI system queries directly from production/standby table. • BI system integrates data and generates reports. • Pro  Quick to build • Con  Data not fully integrated 15 BI System Submission Staging Tables Production Staging Tables Sales Staging Tables Usage Staging Tables Financials Staging Tables External Data Staging Tables
  • Data Flow Architecture Building Data Warehouse • Each department offers raw data or aggregated data in staging tables and push the data to a central database repository. • BI system pulls data and results from data warehouse and generates reports. • Pro  Deep data integration and complicated analysis can be realized efficiently. • Con  Long time to build 16 Data Warehouse Submission Staging Tables Production Staging Tables Sales Staging Tables Usage Staging Tables Financials Staging Tables External Data Staging Tables BI System
  • Data Flow Architecture Hybrid Design • Start with data Mart. • Gradually build data warehouse. • Pro  Quick to build data mart, eventually have the advantage of data warehouse. • Con  Complicated process. 17 Central Staging Repository Production Staging Tables Sales Summary Tables External Data Staging Tables BI System Usage Staging Tables Financials Staging Tables Submission Staging Tables
  • Facebook’s Dataspace Management with Open Source Tools 18 Transactional Databases Application Logs Web Crawls (Post) All Data from Enterprise Structured Data Unstructured Data Hadoop Distributed File System (HDFS) Query language Query UI (HiPal) Hive 15 terabytes new data per day in 2009 Data Warehousing Framework Argus Portal for Sharing Charts and Graphs Databee Workflow Management System PyHive Python Script Framework for MapReduce Cassandra Storage System for Serving Data to End Users Tools Parallelized Data Processing at Massive Scale (Hammerbacher, 2009)
  • References • Hammerbacher, J. (2009). Information platforms and the rise of the data scientist. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 5. Sebastopol, CA: O’Reilly Media. • Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications. Boston, MA: Addison –Wesley. 19
  • Heavyweight Development Process 20 (Moss, 2003)
  • Agile Development Process Plan •Business Goals •KPI Analysis •Data Sources •Calculation Logics Data ETL •Extraction •Transform •Loading Design •Report Layout •Data Visualization Validation •Data •Logics Feedback •New Requirements 21  Phased Release. ◦ Important KPI first. ◦ Well connected data first.  Quick Feedback ◦ Design ◦ Data ◦ Logic
  • Challenges of BI Management • BI project is across all departments, winning a cooperative support is the key for its success. • BI development often encounter unexpected issues. Forcing a deadline may cause low-quality report; relaxing due date too much may halt a project. • BI system is very efficient to expose data abnormalities, if data owners and suppliers can treat the process as a rare opportunity to fix data at source, a more cleaner data system can be an excellent bonus of a BI project. 22
  • References • Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications. Boston, MA: Addison –Wesley. 23
  • Data Connection and Naming Issues • Naming issues to link data Same thing with different names Different things with the same name • Possible Solutions Matching on multiple fields  Choose a set of parameters and create a set of fixed rules deciding things match or not. Collective reconciliation  Take advantage of the full network of data for record matching. 24 (Segaran, 2009)
  • Matching on Multiple Fields • Setup matching rules 1. First Name Last Name Country Organization Department 2. Email Last Name 25 Submit Author Tufte, Ed Country US Organization Princeton Department Politics Email etufte@priceton.edu Author Profile Edward R. Tufte Country United States Organization Princeton University Department Political Science Email etufte@priceton.edu
  • Collective Reconciliation • Even not one field match perfectly for the submitting author, we can conclude this as a match by combining the similarity of multiple fields. 26 Submit Author Tufte, Ed Country US Organization Princeton Department Politics Author Profile Edward R. Tufte Country United States Organization Princeton University Department Political Science
  • Data Modeling – Event Chain 27 Submit Editor Review Peer Review Production Online Downloads Submit Date Final Decision Date Online Date Review Dates Download Date • Separate dates make it easy to trace the history of articles in the system. • User can select a period of submit date, and the charts of accept articles and published articles will only include the articles submitted in the period. • The model is suitable for detail analysis.
  • Data Modeling – Event List 28 • All the event dates and regions are consolidated in the event table. • When a journal, a period or a region is selected, all the charts will be changed to reflect the selection. • The data model is suitable for high level overview. Event ID Journal Event Date Event Region Submit Editor Review Peer Review Production Online Usage
  • References • Segaran, T. (2009). Connecting data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 20. Sebastopol, CA: O’Reilly Media. 29
  • Data Visualization • Informative  Reveal intended message clearly with enough data  With different perspectives to facilitate discovery • Efficient  Visually emphasize what matters and reveal relationship  Use axes, color and size to convey meaning • Novel  Break the limit of default format, choose best format to suit data  A fresh look at the data  A new level of understanding • Aesthetic  Appropriate usage of graphical construction to offer visual appeal. 30 (Lliinsky, 2010)
  • 1854 Cholera Epidemic in London 31 The epidemic took the lives of 600 Londoners in September 1854. What was the cause? Dr. John Snow started the mapping of incident location. (Tufte, 2001)
  • Discovery seems so easy when right information are put together 32 Then Dr. John Snow linked the incident location to pump sites. It is verified later the Broad Street pump was the cause of the epidemic. (Tufte, 2001)
  • 2008 Electoral Vote Results of Presidential Election 33 (Nagourney, 2008) Issue: the geographically accurate map is actually a very inaccurate map of electoral influence. Electoral Votes N.J. 16 15
  • 2008 Electoral Vote Results of Presidential Election - Revision 34 (Lliinsky, 2010) Accurate and beautiful: a proportionally weighted electoral vote results map of the United States Electoral Votes 16 15
  • Mining and Visualizing Social Patterns 35 From public data on a local newspaper: 18 women attending 14 different social events. The links between woman are weighted by the number of events both woman attended. Start with strongest link to reveal clustering. (Krebs, 2010)
  • Mining and Visualizing Social Patterns(2) 36 Gradual Inclusion: focuses initially on the strongest tires in the structure and then gradually lowers the membership threshold to reveal weaker tiers in the network. Very weak links are dismissed as social noise. (Krebs, 2010)
  • References • Krebs, V. (2010). Your choices reveal who you are: mining and visualizing social patterns. In Steele, J. & Lliinsky, N. (Eds.). Beautiful visualization, Chapter 7. Sebastopol, CA: O’Reilly Media. • Lliinsky, N. (2010). On beauty. In Steele, J. & Lliinsky, N. (Eds.). Beautiful visualization, Chapter 1. Sebastopol, CA: O’Reilly Media. • Nagourney, A., Zeleny, J. & Carter, S. (2008). The electoral map: key states. The New York Times. Retrieved from http://elections.nytimes.com/2008/president/whos-ahead/key-states/map.html. • Tufle, E. (2001). The Visual Display of Quantitative Information (2nd ed.). Connecticut , US: Graphics Press. 37
  • Challenge of Dashboard Design • “A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.” • “Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations.” • “No matter how great the technology, a dashboard’s success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately.” • “Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think.” • Unfortunately, most vendors focus their marketing efforts on flash and dazzle that subvert the goals of clear communication. “Once implemented, however, these cute displays lose their spark in a matter of days and become just plain annoying.” 38 (Few, 2006)
  • Common Measures (KPIs) Category Measures Sales Bookings Billings Sales pipeline Number of orders Order amounts Selling prices Marketing Market share Campaign success Customer demographics Finance Revenues Expenses Profits Web Services Number of visitors Number of page hits Visit durations 39 Comparative measure Example The same measure at the same point in time in the past The same day last year The same measure at some other point in time in the past The end of last year The current target for the measure A budgeted amount for the current period A prior prediction of the measure Forecast of where we expected to be today An extrapolation of the current measure Projection out into the future, e.g. year end. Some measure of the norm for this measure Average, normal range or a bench mark. (Few, 2006)
  • Non-Quantitative Dashboard Data • Tasks that behind schedule • Tasks that need to be completed • Accomplishments that should to be highlighted. • Issues that need to be investigated 40 (Few, 2006)
  • Utilize Short-Term Memory • Memory comes in three fundamental types:  Iconic memory (a.k.a. the visual sensory register)  Short-term memory (a.k.a. working memory)  Long-term memory • Only 3-9 chunks of information can be stored in short-term memory. • Graphs over text.  Individual numbers are stored in discrete chunks.  One or more lines in a line graph, can represent a great deal of information as a single chunk. • Relevant information on the same screen.  Once the information is no longer visible, unless it is one of the few chunks of information stored in short-term memory, it is no longer available.  If everything remains within eye span, users can exchange information in and out of short- term memory at lighting speed. 41 (Few, 2006)
  • Information in Well-designed Dashboard • Exceptionally well organized  All important data in one page • Condensed, primarily in the form of summaries and exceptions  Single numbers from sums or averages.  Something falls outside the realm of normality, which needs attention. • Specific to and customized for the dashboard’s audience and objectives  Information should be narrowed to address the objective(s).  Use audience’s vocabulary. • Displayed using concise and often small media that communicate the data and its message in the clearest and most direct way possible.  Reduce the non-data pixels.  Enhance the data pixels. 42 (Few, 2006)
  • Reducing the Non-Data Ink 43 (Few, 2006) When the non-data ink is removed or reduced, the data become more manifest and it is easier to find the trending or pattern among them.
  • Emphasize Most Important Data 44 (Few, 2006) Different degrees of visual emphasis are associated with different regions of a dashboard. The information in the center results in the emphasis only when it is set apart from what surrounds it. Recent data often deserve display with smaller timing scale than remote history data. Visual attributes, such as color, size, line width, enclosure, and added marked, can also be used to manifest important data.
  • Effective Dashboard Display Media 45 (Few, 2006) Easier to spot trend with line chart Clean display of related data Simple symbol or number
  • 46 (Few, 2006) Organize the display objects to reveal their intrinsic relationship
  • Sample Sales Dashboard 47 (Few, 2006)
  • Add Interactivity to Dashboard Add selection box so users can focus on a subset of data 48
  • When Dashboard is not Enough • As soon as a dashboard shows abnormalities, users will often want to know more details about them. • The responsible individual can be called to provide the details, who may query the database or ask IT staff to do the query… The process is long and resource consuming. • Layered reports can provide top-down views:  Layer 1: One page dashboard  Layer 2: More detailed aggregation such as regional reports  Layer 3: Data tables with all the details needed • The data in detail views can be narrowed from top views, which offers a natural analysis flow. 49
  • References • Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media. 50
  • BI and Web Analytics So many data, still so little insights 51 The reason for so few actionable insights even with abundant web click data: The clickstream is about “what”, but not “why”. (Kaushik, 2010)
  • Web Analytics 2.0 52 (Kaushik, 2010)
  • Web Analytics Tools 53 (Kaushik, 2010)
  • Metrics for Clickstream Analysis 54 (Kaushik, 2010)
  • Top Questions to Answer • How many visitors to my site? • Where are visitors coming from?  Direct traffic.  Referring sites.  Search engine: Keywords.  Campaign and paid ads. • What do I want visitors to do on my site? • What visitors are actually doing?  Top entry pages.  Top viewed pages.  Site overlay analysis (navigation analysis)  Abandonment analysis. 55 (Kaushik, 2010)
  • Typical Analysis Flow 56 (Kaushik, 2010) Bounce Rate of Top Search keywords Search Keywords: Users’ intent Bounce: not happy with finding Q: ranked wrong keyword? Q: landing pages miss info? Site Overlay (Click Density) Analysis % clicks or conversions User Behavior: Also check days to convert
  • Source of Traffic Analysis Who sends valued traffic? 57 (Kaushik, 2010)
  • Module Click Analysis • Pages using same layout template share same modules. • Click analysis at module level can reveal which modules are outperforming or underperforming. • Click on link positions within each module can reveal more user behavior pattern. 58 Many Pages Same Layout Performance Across Pages?
  • Scroll Percentage for Long Page 59 0-20% Scrolled 30% 20%-40% Scrolled 22% 50%-60% Scrolled 11% 60%-80% Scrolled 9% 80%-100% Scrolled 26%
  • Visitor Segmentation • What/how are they viewing? • Why do they leave? • How to engage them more? • How to connect them? New Visitors Casual Visitors Loyal Visitors Elapsed Visitors 60 • Growing the loyal visitors is essential to keep the site thriving. • So it is important to understand their navigation pattern, what do they like and unlike.
  • Consumption of Content 61
  • Navigation Flow Among Top Pages/Content 62 (Adobe.com)
  • Navigation Flow to a Page 63 (Adobe.com)
  • Navigation Flow from a Page 64 (Adobe.com)
  • Markov Chain Analysis Grouping Page Views for Behavior Analysis 65 (Gwizdka, 2010)
  • Factors Influencing Satisfaction for Information Retrieval • System Effectiveness  Measures how well a given IR system achieves it objective.  Precision (relevant documents retrieved /total retrieved documents)  Recall (relevant documents retrieved / total relevant documents in database) • User Effectiveness  Measures accuracy and completeness with which users achieve certain goals.  Number of tasks successfully completed  Number of relevant documents obtained  Time taken by users to complete set tasks • User Effort  Measures users’ effort to get relevant information.  Number of Clicks  Number of queries and queries reformulation  Rank position accessed 66 (Al-Maskari, 2010)
  • See Users’ Experience by Visual Replay of HTML Steam 67 http://www.tealeaf.com/products/real-time-customer-experience-management.php Accessed on Dec 6th, 2011 Tealeaf is one of tools to record all the dynamically generated HTML at the network level and store it for later searching and visual replay.
  • See Users’ Joy and Tear by Visual Replay of What Users Saw and Their Actions 68 Such case study can help to understand the reasons behind the summarized numbers. http://www.tealeaf.com/products/real-time-customer-experience-management.php Accessed on Dec 6th, 2011
  • Web Detective Solve the web mysteries 69 Third Party Payment System received payment for one candy, forwarded the user to application server to receive a receipt. Server stored the order as two candies, and print a receipt of two candies. Valid
  • Web Detective Replaying web session can reveal true culprit 70 Open Tab 1 and add one candy. Time 10:00 10:05 Open Tab 2 and add second candy. 10:10 Submit Tab 1. 10:10 Receive the payment in Tab 1. 10:11 Process the order in Tab 2.
  • References • Adobe training video. Retrieved from https://outv.omniture.com/. • Al-Maskari, A. and Sanderson, M. (2010). A review of factors influencing user satisfaction in information retrieval. Journal of the American Society for Information Science and Technology, 61: 859–868. Doi: 10.1002/asi.21300 • Gwizdka, J. (2010). Distribution of cognitive load in Web search. Journal of the American Society for Information Science and Technology, 61: 2167–2187. DOI: 10.1002/asi.21385 • Kaushik, A. (2010). Web Analytics 2.0. Indianapolis, IN: Wiley Publishing. 71
  • BI and Social Network • Social Networks, such as LinkedIn, Facebook, and Twitter, are becoming important means for people, including scientists, to share information, though academic world had been slow to utilize social network. (Curry, 2009) • Capability to extract the tremendous, unstructured, time-sensitive information is becoming increasing important for business analysis. • The recent development of literature-based scientific social networks is promising  Sites  BioMedExperts  UniPHY  Unique for research world  Preloaded professional profiles based on publications.  Preloaded networking based on co-authorship analysis.  Periodically sending publication updates in each user’s network. • The effective ways to analyze the content on social network and promote scientists’ contribution on social network are still need to be developed. 72
  • Effectiveness of Scientific Social Networks • Academic social networks will soon be out of favor if it cannot help scientists effectively. • We need to study weather such network can improve scientists’ research productivity, increase collaboration among scientists, as well as increase the traffic to scientific content web sites.  Statistical analysis based on user’s profiles on the site.  Web analytics using tools like Google Analytics.  Scenarios analysis using session capture tools like Tealeaf.  Traditional usability test using tools like Morae.  Survey. • Linking user’s activities on academic social networks, profiles on professional member societies and clicking streaming on academic content sites can help to understand and server each user efficiently.  Organize the order of contents to user’s long/short term interest.  Recommend relevant events, such as academic forum/seminar, industrious shows.  Let users promote academic contents or events interesting to them via social network. 73
  • Building Users’ Expert Profile Based on Concepts in Publications 74 (Gunter, 2009) Document fingerprints aggregated to expert profiles
  • Motivating Contribution in Social Media • Social Learning  People learn by observation in social situations, and that they will begin to act like people they observe even without external incentives. (Bandura, 1977).  Social sites can make it easy for users to observe the behaviors of active users. • Feedback  Theories of reciprocity (Cialdini, 1984;Gouldner, 1960), reinforcement (Ferster, 1957) and the need to belong (Baumeister, 1995) all suggest that feedback from other users should predict long-term participation of the social media users.  Site design and its backend technologies can bring users convenience to tag and comment • Distribution  Reputation is a common motivation for participation in many online environments.  Competitive motivations in the form of reputation and status attainment have been cited as a primary incentive for continued participation for open-source software. (Hertel, 2003)  Bloggers cite the intent to affect their professional reputation as being among their top motivations for blogging. (Marlow, 2006).  Promoting active users and distributing their influence is the effective social currency to ‘bribe’ key contributing users. 75
  • Case Study at Facebook: Motivating Newcomer Contribution • Measures Dependent variable  The number of photos uploaded by the newcomers between their third and fifteenth weeks on the site. Independent variables  Learning – the number of photo-uploading stories the newcomers saw in their News Feeds during their first two weeks.  Singling out – whether the newcomer was tagged in a photo during his or her first two weeks.  Feedback – whether the newcomer received any comments on his or her initial photos during the first two weeks.  Distribution – the number of News Feed stories shown to friends about the newcomer’s photos. 76 (Burke, 2009)
  • Result of Case Study at Facebook: Motivating Newcomer Contribution • “Design elements which facilitate learning from friends, singling out, feedback, and content distribution can help increase the level of engagement for new users, leading to further content contributions and an overall better user experience. • “The most consistent result we found was for learning from friends. An increase in visible photo activity was always predictive of increased newcomer contribution.” • “Designers of social networking sites should also find ways to support newcomers with varying behavioral patterns.”  “For newcomers who are active, highlighting opportunities for others to leave them feedback and allowing the newcomers to increase the size of their audience may be particularly effective.”  “For newcomers who are relatively inactive, designers might want to encourage their friends to pay more attention to them, whether through singling out in a public fashion or sending more directed private communication.” 77 (Burke, 2009)
  • References • Bandura, A. (1977). Social Learning Theory. New York, NY: General Learning Press. • Baumeister, R. & Leary, M. (1995). The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychological Bulletin, 117(3), 497-529. • Burke, M., Marlow, C. & Lento, T. (2009). Feed me: motivating newcomer contribution in social network sites. Proceedings of the 27th international conference on human factors in computing systems (pp. 945-954). Boston, MA: ACM Press. • Cialdini, R.B. (1984). Influence. New York, NY: William Marrow and Company. • Curry, R., Kiddle, C. and Simmonds, R. (2009). Social networking and scientific gateways. Proceedings of the 5th Grid Computing Environments Workshop. Doi: 10.1145/1658260.158266. • Gouldner, A. (1960). The norm of reciprocity: A preminary statement. American Sociological Review, 25(2), 161-178. • Ferster, C. & Skinner, B. (1957). Schedules of Reinforcement. New York, NY: Appleton-Century-Corfts. • Gunter, D. (2009). Semantic Search. Bulletin of the American Society for Information Science and Technology, 36: 36-37. • Gunter, D. (2009). Semantic Search. Bulletin of the American Society for Information Science and Technology, 36: 36-37. • Hertel, G., Niedner, S. & Herrmann. S. (2003). Motivation of software developers in open source projects: An internet-based survey of contributiors to the linux kernel. Research Policy, 32(7), 1159-1177. • Marlow, C. (2006). Linking without thinking: Weblogs readership and online social capital formation. In Proceedings of the International Communication Association, Dresden, Germany. 78
  • Semantic Technologies, BI and Just-in-Time Discovery • “Discoverability requires the ability to recall related historical data so that an arriving piece of data can find its place, similar to the way each jigsaw puzzle piece is assessed relative to a work-in-progress puzzle.” (Jonas, 2009) • Directories for enterprise-wide discoverability  Context-less directories  Basic directories to locate information  Semantically reconciled directories  Concepts with similar meanings are bundled together  Semantically reconciled and relationship-aware directories.  Information are linked together in Context  Context-based discovery • Academic publishers can organize the factors and activities of their subscribers, users and authors in a way to be easily pulled together, and put new information into the context to assist business discovery. 79
  • Semantic Web – Linked Data 80 (Berners-Lee, 2001) Relational database is too strict to catch the dynamic relationship. New fields and new relationship need to be added to the database all the times, which is not efficient. Graphical database is designed to store the dynamic relationship with simple and flexible schema. Here are some open source examples: Sesame (http://openrdf.org) Jena (http://jena.sourceforge.net) AllegroGraph (http://agraph.franz.com) Neo4J (http://neo4j.org) (Segaran, 2009)
  • Semantic Web Elements URI, RDF, Ontology 81 Gene 1 Modify Gene 2 Gene 2 Affect Disease A Gene 1 May Affect Disease A URI Universal Resource Identifier • Specify an entity • Identical, exchangeable in different documents RDF Resource Description Framework • Subject – Predicate – Object (Triples) • Express the relationship between entities Ontology • Collection of URI, RDF • Collection of inferring rules
  • Dublin Core Metadata Initiative The Dublin Core is a set of predefined properties for describing documents. The following example demonstrates the use of some of the Dublin Core properties in an RDF document: 82 <?xml version="1.0"?> <!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN" "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://dublincore.org/"> <dc:title>Dublin Core Metadata Initiative - Home Page</dc:title> <dc:description>The Dublin Core Metadata Initiative Web site.</dc:description> <dc:date>2001-01-16</dc:date> <dc:format>text/html</dc:format> <dc:language>en</dc:language> <dc:contributor>The Dublin Core Metadata Initiative</dc:contributor> <!-- guesses for the translation of the above titles --> <dc:title xml:lang="fr">L'Initiative de métadonnées du Dublin Core</dc:title> <dc:title xml:lang="de">der Dublin-Core Metadata-Diskussionen</dc:title> </rdf:Description> </rdf:RDF>
  • Semantic Tools RDFS, OWL, SPARQL 83 (Shadbolt, 2006) <rdfs:Class rdf:ID="animal" /> <rdfs:Class rdf:ID="horse"> <rdfs:subClassOf rdf:resource="#animal"/> </rdfs:Class> RDFS RDF Schema • RDFS is an extension to RDF • Provides the framework to describe application-specific classes and properties Class(a:cat_owner complete intersectionOf(a:person restriction(a:has_pet someValuesFrom (a:cat)))) SubPropertyOf(a:has_pet a:likes) Class(a:cat_liker complete intersectionOf(a:person restriction(a:likes someValuesFrom (a:cat)))) • Cat owners have cats as pets. • has pet is a subproperty of likes, so anything that has a pet must like that pet. => Cat owners must like a cat.OWL Web Ontology Language • A family of knowledge representation languages for authoring ontologies • Express and Process information on the web PREFIX abc: <http://example.com/exampleOntology#> SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y . ?y abc:countryname ?country ; abc:isInContinent abc:Africa . } What are all the country capitals in Africa? SPARQL A RDF query language
  • Linked Data for STM Publication 84 R. Arlen Price Faculty An obesity-related locus in chromosome region 12q23-24 Diabetes Author Subscribe Read American Diabetes Association Publication National Institutes of Health Funding Research Interest Genetics of Complex Traits, Genetics of Obesity, Behavioral Genetics, Genetic Epidemiology Faculty Profile Research Techniques Linkage mapping, linkage disequilibrium association analyses, and gene expression profiling Profile Research Strength Ding Li Author Student Attend Events Proposal Review Linking data helps to server each researcher’s need better.
  • Semantic Publishing – Integrate Data in Academic Journals 85 (Serinhaus, 2007) Publish machine-readable summary information in XML along with the article. BI system can retrieve and organize the meta data.
  • Semantic Publishing – Semantic Enhancement to Research Articles 86 (Shotton, 2009) The relevant data can be linked together online. BI system can help to retrieve and organize the relationship and data.
  • BI and E-Science 87 Research is becoming more data-driven, often require to link data in large scale. BI can trace the location of data sources, understand the relationship of these academic databases, and provide user with corresponding data services. (Luciano, 2007) Multiple pathway databases are linked to construct the human insulin signaling pathway.
  • References • Berners-Lee, T.,Hendler, J. and Lassila, O.(2001)The Semantic Web. Scientific American, 284(5), 28–37. • Jonas, J. & Sokol, L. (2009). Data finds data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 7 . Sebastopol, CA: O’Reilly Media. • Luciano, J. and Stevens, R. (2007). e-Science and biological pathway semantics. BMC Bioinformatics, 8(Suppl 3): S3. doi: 10.1186/1471- 2105-8-S3-S3. • Segaran, T. (2009). Connecting data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 20. Sebastopol, CA: O’Reilly Media. • Seringhaus, M. and Gerstein, M. (2007). Publishing perishing? Towards tomorrow's information architecture. BMC Bioinformatics, 8:17. doi: 10.1186/1471-2105-8-17. • Shadbolt, N., Berners-Lee, T., and Hall, W. (2006). The Semantic Web Revisited. IEEE Intelligent Systems 21(3): 96–101. • Shotton, D., Portwin, K., Klyne, G., and Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol ,5(4): e1000361. doi: 10.1371/journal.pcbi.1000361. 88
  • BI and Algorithm by example 89 In US and Canada: A list of hospitals A list of medical groups All with (latitude, longitude) How to find the nearby hospitals (within 1 mile) for each medical group? It is too time-consuming to calculate the distance of all combination. We need to limit candidates before calculation. Simple spherical law of cosines formula to calculate distance: d = acos(sin(lat1).sin(lat2)+cos(lat1).cos(lat2).cos(long2−long1)).R where R is earth’s radius (mean radius = 6,371km)
  • Can We Find a Key? 90 Simple spherical law of cosines formula to calculate distance: d = acos(sin(lat1).sin(lat2)+cos(lat1).cos(lat2).cos(long2−long1)).R where R is earth’s radius (mean radius = 6,371km, or 3,959mi)
  • Boundary Condition 91 2851, -5181 2851, -5180 2851, -5179 2850, -5181 2850, -5180 2850, -5179 2849, -5181 2849, -5180 2849, -5179
  • References • Movable Type Ltd. Calculate distance, bearing and more between Latitude/Longitude points. Retrieved from http://www.movable- type.co.uk/scripts/latlong.html 92
  • Future Plan • More on Data Mining • More on Data Modeling • BI and User Experience • BI and Predictive Analysis • BI and Technology Intelligence 93
  • Thank You • Please send your comment, suggestion and discussion to dingli2@gmail.com • The file will be updated at: http://www.slideshare.net/dingli2/ 94