Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • It's taken the entire history of humanity through 1999 to accumulate 12 exabytes of information. By the middle of 2002 the second dozen exabytes was created , according to a study produced by UC Berkeley. 55 percent of the world's digital information is confined to single-user personal computers, compared with the 16 percent that is stored in corporate data warehouses. The Berkeley researchers noted several key trends: While the Internet is growing rapidly, "stock" web pages account for just 21 terabytes of storage. Far more significant is the impact of e-mail, which contributes more than 500 times more data to the total pool each year than the volume generated by new web pages. By 2009 all TV must be done digitally – will be 100’s of Exabytes. More info here: Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks. The World Wide Web contains about 170 terabytes of information on its surface; in volume this is seventeen times the size of the Library of Congress print collections. Instant messaging generates five billion messages a day (750GB), or 274 Terabytes a year. Email generates about 400,000 terabytes of new information each year worldwide. Table 1.1: How Big is an Exabyte? Kilobyte (KB) 1,000 bytes OR 103bytes 2 Kilobytes: A Typewritten page. 100 Kilobytes: A low-resolution photograph. Megabyte (MB) 1,000,000 bytes OR 106 bytes 1 Megabyte: 2 – 250 page novels OR a 3.5 inch floppy disk, 2 Megabytes: A high-resolution photograph. 5 Megabytes: The complete works of Shakespeare. 10 Megabytes: A minute of high-fidelity sound. 100 Megabytes: 1 meter of shelved books. 500 Megabytes: A CD-ROM. Gigabyte (GB) 1,000,000,000 bytes OR 109 bytes 1 Gigabyte: a pickup truck filled with books. 20 Gigabytes: A good collection of the works of Beethoven. 100 Gigabytes: A library floor of academic journals. Terabyte (TB) 1,000,000,000,000 bytes OR 1012 bytes 1 Terabyte: 50000 trees made into paper and printed. 2 Terabytes: An academic research library. 10 Terabytes: The print collections of the U.S. Library of Congress 400 Terabytes: National Climactic Data Center (NOAA) database. Petabyte (PB) 1,000,000,000,000,000 bytes OR 1015 bytes 1 Petabyte: 3 years of EOS data (2001). 2 Petabytes: All U.S. academic research libraries. 20 Petabytes: Production of hard-disk drives in 1995. 200 Petabytes: All printed material. Exabyte (EB) 1,000,000,000,000,000,000 bytes OR 1018 bytes 2 Exabytes: Total volume of information generated in 1999. 5 Exabytes: All words ever spoken by human beings. Source: Many of these examples were taken from Roy Williams Data Powers of Ten web page at Caltech. Data Explosion factoids: In 2002, approximately 5 exabytes (5 billion gigabytes) of new data was inserted onto paper, optical disks, film and electronic storage devices, according to the How Much Information? project at the University of California at Berkeley. "If digitized with full formatting, the 17 million books in the Library of Congress contain about 136 terabytes of information; five exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections," the report stated. Hard drives absorbed about 2 exabytes of the total. The report also found that 400,000 terabytes of e-mail get produced per year, as do 274 terabytes of instant messages. (A terabyte is a million million bytes.) The surface Web--the Web people can access--contains about 170 terabytes of data. From Jim Gray: Business data - Wall Mart online: 1PB and growing. Paradox: most “transaction” systems < 1 PB; have to go to image/data monitoring for big data. Government is the biggest business. Two key areas will generate the majority of data volume growth: video and sensors. Sensors: Earth Observation - 15 PB by 2007 Medical Images & Information + Health Monitoring - Potential 1 GB/patient/y  1 EB/y Video Monitoring - ~1E8 video cameras @ 1E5 MBps  10TB/s  100 EB/y Airplane Engines - 1 GB sensor data/flight, 100,000 engine hours/day, 30PB/y Video (example): Gordon Bell is digitizing his life, and has now scanned virtually all Books written (and read when possible), Personal documents (correspondence, memos, email, bills, legal, …), Photos, Posters, paintings, photo of things (artifacts, …medals, plaques), Home movies and videos, CD collection, and, of course, all PC files. He’s recording all conversations, phone, radio, TV, web pages visited, etc. He’s been paperless from 2002 (so far 12” scanned, 12’ discarded). In total, his “life” so far is only 30GB excluding videos, but video is 2+ TB and growing fast.
  • slides

    1. 1. Zhangxi Lin Texas Tech University ISQS 6339, Data Management & Business Intelligence Introduction ISQS 6339, Data Mgmt & BI
    2. 2. Outline <ul><li>Definitions of BI </li></ul><ul><li>Categorizations of BI </li></ul><ul><li>BI Trend </li></ul><ul><li>BI tools </li></ul>ISQS 6339, Data Mgmt & BI
    3. 3. What is Business Intelligence <ul><li>A Simple Definition: The applications and technologies transforming Business Data into Action </li></ul><ul><ul><li>Business intelligence ( BI ) is a business management term </li></ul></ul><ul><ul><ul><li>refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. </li></ul></ul></ul><ul><ul><li>Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions. </li></ul></ul>ISQS 6339, Data Mgmt & BI
    4. 4. Data, information, and knowledge <ul><li>Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring. </li></ul><ul><li>Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning </li></ul><ul><li>Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information. </li></ul>ISQS 6339, Data Mgmt & BI
    5. 5. The process of BI <ul><li>Data -> information -> knowledge -> actionable plans </li></ul><ul><li>Data -> information: the process of determining what data is to be collected and managed and in what context </li></ul><ul><li>Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining </li></ul><ul><li>Knowledge -> actionable plans: The most important aspect in a BI process </li></ul>ISQS 6339, Data Mgmt & BI
    6. 6. Actionable Knowledge <ul><li>An information asset retains its value on if the converted knowledge is actionable . </li></ul><ul><ul><li>Need some methods for extracting value from knowledge </li></ul></ul><ul><ul><li>This is not a technical issue but an organizational one – need empowered individuals in the organization to take the action </li></ul></ul><ul><ul><li>There is an issue of Return on Investment (ROI) </li></ul></ul>ISQS 6339, Data Mgmt & BI
    7. 7. BI Problems <ul><li>Structured </li></ul><ul><ul><li>Detecting Credit card fraud </li></ul></ul><ul><ul><li>Setting Loan parameters </li></ul></ul><ul><ul><li>Market segmentation/Mass customization </li></ul></ul><ul><ul><li>Deciding Marketing mix </li></ul></ul><ul><ul><li>Customer Churn </li></ul></ul><ul><ul><li>Reducing employee turnover </li></ul></ul><ul><ul><li>Improving Quality/Efficiency </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Unstructured </li></ul><ul><ul><li>Data exploration </li></ul></ul><ul><ul><li>Utilization of resources (stored knowledge) to maximum effectiveness </li></ul></ul><ul><ul><li>… </li></ul></ul>ISQS 6339, Data Mgmt & BI
    8. 8. BI Applications <ul><li>Customer Analytics </li></ul><ul><ul><li>Customer profiling </li></ul></ul><ul><ul><li>Targeted marketing </li></ul></ul><ul><ul><li>Personalization </li></ul></ul><ul><ul><li>Collaborative filtering </li></ul></ul><ul><ul><li>Customer satisfaction </li></ul></ul><ul><ul><li>Customer lifetime value </li></ul></ul><ul><ul><li>Customer loyalty </li></ul></ul><ul><li>Sales Channel Analytics </li></ul><ul><ul><li>Marketing </li></ul></ul><ul><ul><li>Sales performance and pipeline </li></ul></ul>ISQS 6339, Data Mgmt & BI
    9. 9. BI Applications (2) <ul><li>Supply Chain Analytics </li></ul><ul><ul><li>Supplier and vendor management </li></ul></ul><ul><ul><li>Shipping </li></ul></ul><ul><ul><li>Inventory control </li></ul></ul><ul><ul><li>Distribution analysis </li></ul></ul><ul><li>Behavior Analysis </li></ul><ul><ul><li>Purchasing trends </li></ul></ul><ul><ul><li>Web activity </li></ul></ul><ul><ul><li>Fraud and abuse detection </li></ul></ul><ul><ul><li>Customer attrition </li></ul></ul><ul><ul><li>Social network analysis </li></ul></ul>ISQS 6339, Data Mgmt & BI
    10. 10. Why is BI getting hot? <ul><li>Demands from processing explosive information </li></ul><ul><ul><li>MIS/ERP </li></ul></ul><ul><ul><li>Internet </li></ul></ul><ul><li>Gartner Says Business Intelligence Software Market to Reach $3 Billion in 2009 Gartner's CIO Survey ranked BI as number one technology priority for 2006 London, UK, 7 February 2006 - New license revenue in the worldwide business intelligence (BI) software market is poised for constant growth through 2009, when the market is projected to reach $3 billion in 2009, according to the latest forecasts by Gartner Inc. In 2006, the market is estimated to reach 2.5 billion, a six percent increase from 2005. </li></ul>ISQS 6339, Data Mgmt & BI
    11. 11. Explosion of digitally born data <ul><li>Sources: </li></ul><ul><li> , </li></ul><ul><li>The Expanding Digital Universe, IDC white paper, March 2007 </li></ul><ul><li>55% in personal PCs </li></ul><ul><li>16% in corporate data warehouses </li></ul><ul><li>Internet only 21 TB </li></ul><ul><li>Email 500x more than Internet / year </li></ul>
    12. 12. BI Job Description - BI Analyst (1) <ul><li>Description: Looking for professionals in Microsoft Business Intelligence and Data Warehousing who have a proven track record of success within industry. The position requires a broad range of skills and the ability to step in to different roles depending on the size and scope of an engagement both internally and at client sites. The qualified candidate would have proven experience developing successful Microsoft-based Business Intelligence and Data Warehouse solutions. </li></ul><ul><li>Requirements: * 10+ years of experience developing Business Intelligence solutions with Microsoft database, ETL and OLAP technologies (SQL Server, SSIS, Analysis Services) * Demonstrated understanding of multi-dimensional database design and architecture. * Ability to develop business requirements and translate them into a data warehouse dimensional model. * Demonstrated ability to develop front-end reporting and analytical solutions that meet the business needs. * Microsoft SQL Server data modeling and development (10 years) * Microsoft SQL Server Analysis Services design and development (5 years) * Microsoft SQL Server Integration Services (2 years) * Microsoft SQL Server Reporting Services design and development * Understanding of Data Warehouse Methodologies, preferably using Kimball Methodology * Demonstrated leadership aptitude and ability to work effectively within a team environment </li></ul>ISQS 6339, Data Mgmt & BI
    13. 13. BI Analyst (2) <ul><li>Microsoft SQL Server (BI) Business Intelligence </li></ul><ul><li>SetFocus is seeking professionals with Analyst and/or Data Warehousing backgrounds for Business Intelligence consulting positions across the country. Apply Today: </li></ul><ul><li>Successful candidates have had backgrounds as: </li></ul><ul><li>Business Intelligence Analyst, Database Developer, SQL Programmer, Financial Analyst, Business Analyst, System Analyst, Software Developer, Dir. of IT, VP of IT and / or experience with Cognos, Siebel, SAP, Business Objects, SAS, PeopleSoft, Oracle, Microstrategy, Information Builders, ProClarity, CA, or Actuate. </li></ul>ISQS 6339, Data Mgmt & BI
    14. 14. The Evolution of Business Intelligence <ul><li>1 st Generation – Traditional analytics (query and reporting) </li></ul><ul><li>2 nd Generation – Traditional generation (OLAP, data warehousing) </li></ul><ul><li>2.5 nd Generation – New traditional generation </li></ul><ul><li>3 rd Generation - Advanced analytics </li></ul><ul><ul><li>Rules, predictive analytics and realtime data mining </li></ul></ul><ul><ul><li>Stream analytics </li></ul></ul>ISQS 6339, Data Mgmt & BI
    15. 15. Business Intelligence Classifications ISQS 6339, Data Mgmt & BI Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time , continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role 3 rd -Generation BI Legacy BI “ New Traditional” Analytics “ 2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Source: Bill O’Connell IBM, Aug 2007
    16. 16. Business Intelligence Use Cases ISQS 6339, Data Mgmt & BI Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time , continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role “ New Traditional” Analytics “ 2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Example Target Solutions : Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-Volume Focus on what is happening RIGHT NOW Focus on what will happen Analytic applications that apply statistical relationships in the form of RULES Focus on what did happen Turning data into information is limited by the relationships which the end-user already knows to look for. Data mining to determine why something happened by unearthing relationships that the end-user may not have known existed. Source: Bill O’Connell IBM, Aug 2007 Real-Time Threshold
    17. 17. 3 rd Generation Business Intelligence <ul><li>Raises Traditional Warehousing to new levels  Dynamic Warehousing </li></ul><ul><li>Injects analytical insight into the day to day process of an organization when activity is occurring in real time </li></ul><ul><li>Broad, real time, leverage of insight to achieve business optimization </li></ul><ul><li>Moves beyond “what happened” to “why and what should happen next”. </li></ul><ul><li>Requires the marriage of analytical insight with real time business processing. </li></ul><ul><li>3rd Gen BI by nature requires a Data Warehouse Platform and MDM system to consume analytical insight , not just source data for BI. </li></ul>ISQS 6339, Data Mgmt & BI
    18. 18. “ 3 rd Generation BI” Attributes … from data management perspective <ul><li>Near-real time (streaming, change data control, memory resident, etc.) </li></ul><ul><li>Off-line capable </li></ul><ul><li>In-context </li></ul><ul><li>Actionable through predictive/prescriptive stats, optimization and business rules </li></ul><ul><li>Search User Interface (UI) as the front end of BI </li></ul><ul><li>Structured + unstructured </li></ul><ul><li>Visual </li></ul><ul><li>For the masses </li></ul><ul><li>Horizontal platform with verticalized solutions </li></ul><ul><li>Can be delivered via a hosted model </li></ul>ISQS 6339, Data Mgmt & BI
    19. 19. Main BI Topics <ul><li>Data warehousing – Making historical data available for analytics </li></ul><ul><li>Data preparation – Extraction, transformation and loading </li></ul><ul><li>Query - a collection of specifications that enables you to focus on a particular set of data. </li></ul><ul><li>Online Analytical Processing ( OLAP ) - a capability of information systems that supports interactive examination of large amounts of data from many perspectives. </li></ul><ul><li>Reporting - generates aggregated views of data to keep the management informed about the state of their business. </li></ul><ul><li>Data mining - extraction of knowledge by utilizing software that can isolate and identify previously unknown patterns or trends in large amounts of data. </li></ul>ISQS 6339, Data Mgmt & BI
    20. 20. BI Product Providers <ul><li>Microsoft </li></ul><ul><li>SAS </li></ul><ul><li>IBM </li></ul><ul><li>Oracle </li></ul><ul><li>SyBase </li></ul><ul><li>Business Objects </li></ul><ul><li>BI Tools Survey </li></ul>ISQS 6339, Data Mgmt & BI
    21. 21. List of BI tools ISQS 6339, Data Mgmt & BI No. Tool Version Vendor 1. Oracle Enterprise BI Server 7.8 Oracle 2. Business Objects Enterprise XI r2 Business Objects (now SAP) 3. SAP NetWeaver BI 7.0 SAP 4. SAS Enterprise BI Server 9.1.3 SAS Institute 5. TM/1 & Executive Viewer 9.1 Applix (now IBM) 6. BizzScore Suite 7.2 EFM Software 7. WebFocus 7 Information Builders 8. Excel, Performance Point, Analysis Server 2007/2005 Microsoft 9. QlikView 8 QlikTech 10. Microstrategy 8 Microstrategy 11. Hyperion System 9 Hyperion (now Oracle) 12. Actuate 9.1 Actuate 13. Cognos Series 8 8.3 Cognos (now IBM)
    22. 22. Software Used in this Class <ul><li>Microsoft SQL Server 2005 </li></ul><ul><li>SAS Enterprise Guide v4.1 </li></ul><ul><li>Base SAS for Data Preparation Programming </li></ul>ISQS 6339, Data Mgmt & BI
    23. 23. Microsoft SQL Server <ul><li>SQL Server is a client-server based, relational database engine. That puts it head-to-head with the likes of IBM’s DB2 and Oracle’s Oracle… or so Microsoft dearly wants us to believe. </li></ul><ul><li>The problem is that, while DB2 and Oracle are unquestionably enterprise-level products, SQL server has for years been dogged by the suspicion that it can’t really cut the mustard. </li></ul><ul><li>SQL Server Products </li></ul><ul><ul><li>Microsoft SQL Server 2000 </li></ul></ul><ul><ul><li>Microsoft SQL Server 2005 </li></ul></ul><ul><ul><li>Microsoft SQL Server 2008 </li></ul></ul><ul><li>SQL Server 2005 Editions </li></ul><ul><ul><li>SQL Server Express SQL Server Workgroup SQL Server Developer SQL Server Standard SQL Server Enterprise SQL Server Compact </li></ul></ul>ISQS 6339, Data Mgmt & BI
    24. 24. CAABI <ul><li>Center for Advanced Analytics and Business Intelligence initially started in 2004 by Dr. Peter Westfall, ISQS, Rawls College of Business. </li></ul><ul><li>Looking to offer support to companies in developing BI capabilities. </li></ul><ul><li>Lots of technical expertise. </li></ul>ISQS 6339, Data Mgmt & BI