It's taken the entire history of humanity through 1999 to accumulate 12 exabytes of information. By the middle of 2002 the second dozen exabytes was created , according to a study produced by UC Berkeley. 55 percent of the world's digital information is confined to single-user personal computers, compared with the 16 percent that is stored in corporate data warehouses. The Berkeley researchers noted several key trends: While the Internet is growing rapidly, &quot;stock&quot; web pages account for just 21 terabytes of storage. Far more significant is the impact of e-mail, which contributes more than 500 times more data to the total pool each year than the volume generated by new web pages. By 2009 all TV must be done digitally – will be 100’s of Exabytes. More info here: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks. The World Wide Web contains about 170 terabytes of information on its surface; in volume this is seventeen times the size of the Library of Congress print collections. Instant messaging generates five billion messages a day (750GB), or 274 Terabytes a year. Email generates about 400,000 terabytes of new information each year worldwide. Table 1.1: How Big is an Exabyte? Kilobyte (KB) 1,000 bytes OR 103bytes 2 Kilobytes: A Typewritten page. 100 Kilobytes: A low-resolution photograph. Megabyte (MB) 1,000,000 bytes OR 106 bytes 1 Megabyte: 2 – 250 page novels OR a 3.5 inch floppy disk, 2 Megabytes: A high-resolution photograph. 5 Megabytes: The complete works of Shakespeare. 10 Megabytes: A minute of high-fidelity sound. 100 Megabytes: 1 meter of shelved books. 500 Megabytes: A CD-ROM. Gigabyte (GB) 1,000,000,000 bytes OR 109 bytes 1 Gigabyte: a pickup truck filled with books. 20 Gigabytes: A good collection of the works of Beethoven. 100 Gigabytes: A library floor of academic journals. Terabyte (TB) 1,000,000,000,000 bytes OR 1012 bytes 1 Terabyte: 50000 trees made into paper and printed. 2 Terabytes: An academic research library. 10 Terabytes: The print collections of the U.S. Library of Congress 400 Terabytes: National Climactic Data Center (NOAA) database. Petabyte (PB) 1,000,000,000,000,000 bytes OR 1015 bytes 1 Petabyte: 3 years of EOS data (2001). 2 Petabytes: All U.S. academic research libraries. 20 Petabytes: Production of hard-disk drives in 1995. 200 Petabytes: All printed material. Exabyte (EB) 1,000,000,000,000,000,000 bytes OR 1018 bytes 2 Exabytes: Total volume of information generated in 1999. 5 Exabytes: All words ever spoken by human beings. Source: Many of these examples were taken from Roy Williams Data Powers of Ten web page at Caltech. Data Explosion factoids: In 2002, approximately 5 exabytes (5 billion gigabytes) of new data was inserted onto paper, optical disks, film and electronic storage devices, according to the How Much Information? project at the University of California at Berkeley. &quot;If digitized with full formatting, the 17 million books in the Library of Congress contain about 136 terabytes of information; five exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections,&quot; the report stated. Hard drives absorbed about 2 exabytes of the total. The report also found that 400,000 terabytes of e-mail get produced per year, as do 274 terabytes of instant messages. (A terabyte is a million million bytes.) The surface Web--the Web people can access--contains about 170 terabytes of data. From Jim Gray: Business data - Wall Mart online: 1PB and growing. Paradox: most “transaction” systems < 1 PB; have to go to image/data monitoring for big data. Government is the biggest business. Two key areas will generate the majority of data volume growth: video and sensors. Sensors: Earth Observation - 15 PB by 2007 Medical Images & Information + Health Monitoring - Potential 1 GB/patient/y 1 EB/y Video Monitoring - ~1E8 video cameras @ 1E5 MBps 10TB/s 100 EB/y Airplane Engines - 1 GB sensor data/flight, 100,000 engine hours/day, 30PB/y Video (example): Gordon Bell is digitizing his life, and has now scanned virtually all Books written (and read when possible), Personal documents (correspondence, memos, email, bills, legal, …), Photos, Posters, paintings, photo of things (artifacts, …medals, plaques), Home movies and videos, CD collection, and, of course, all PC files. He’s recording all conversations, phone, radio, TV, web pages visited, etc. He’s been paperless from 2002 (so far 12” scanned, 12’ discarded). In total, his “life” so far is only 30GB excluding videos, but video is 2+ TB and growing fast.
Zhangxi Lin Texas Tech University ISQS 6339, Data Management & Business Intelligence Introduction ISQS 6339, Data Mgmt & BI
Gartner Says Business Intelligence Software Market to Reach $3 Billion in 2009 Gartner's CIO Survey ranked BI as number one technology priority for 2006 London, UK, 7 February 2006 - New license revenue in the worldwide business intelligence (BI) software market is poised for constant growth through 2009, when the market is projected to reach $3 billion in 2009, according to the latest forecasts by Gartner Inc. In 2006, the market is estimated to reach 2.5 billion, a six percent increase from 2005.
Description: Looking for professionals in Microsoft Business Intelligence and Data Warehousing who have a proven track record of success within industry. The position requires a broad range of skills and the ability to step in to different roles depending on the size and scope of an engagement both internally and at client sites. The qualified candidate would have proven experience developing successful Microsoft-based Business Intelligence and Data Warehouse solutions.
Requirements: * 10+ years of experience developing Business Intelligence solutions with Microsoft database, ETL and OLAP technologies (SQL Server, SSIS, Analysis Services) * Demonstrated understanding of multi-dimensional database design and architecture. * Ability to develop business requirements and translate them into a data warehouse dimensional model. * Demonstrated ability to develop front-end reporting and analytical solutions that meet the business needs. * Microsoft SQL Server data modeling and development (10 years) * Microsoft SQL Server Analysis Services design and development (5 years) * Microsoft SQL Server Integration Services (2 years) * Microsoft SQL Server Reporting Services design and development * Understanding of Data Warehouse Methodologies, preferably using Kimball Methodology * Demonstrated leadership aptitude and ability to work effectively within a team environment
SetFocus is seeking professionals with Analyst and/or Data Warehousing backgrounds for Business Intelligence consulting positions across the country. Apply Today: www.setfocus.com/Apply/defaultbi.aspx
Successful candidates have had backgrounds as:
Business Intelligence Analyst, Database Developer, SQL Programmer, Financial Analyst, Business Analyst, System Analyst, Software Developer, Dir. of IT, VP of IT and / or experience with Cognos, Siebel, SAP, Business Objects, SAS, PeopleSoft, Oracle, Microstrategy, Information Builders, ProClarity, CA, or Actuate.
1 st Generation – Traditional analytics (query and reporting)
2 nd Generation – Traditional generation (OLAP, data warehousing)
2.5 nd Generation – New traditional generation
3 rd Generation - Advanced analytics
Rules, predictive analytics and realtime data mining
ISQS 6339, Data Mgmt & BI
Business Intelligence Classifications ISQS 6339, Data Mgmt & BI Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time , continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role 3 rd -Generation BI Legacy BI “ New Traditional” Analytics “ 2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Source: Bill O’Connell IBM, Aug 2007
Business Intelligence Use Cases ISQS 6339, Data Mgmt & BI Traditional Analytics 1 st Generation Analytics (Query & Reporting) 2 nd Generation Analytics (OLAP, Data Warehousing) Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining Stream Analytics* Real-time , continuous, sequential analysis (ranging from basic to advanced analytics) * In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role “ New Traditional” Analytics “ 2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Example Target Solutions : Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-Volume Focus on what is happening RIGHT NOW Focus on what will happen Analytic applications that apply statistical relationships in the form of RULES Focus on what did happen Turning data into information is limited by the relationships which the end-user already knows to look for. Data mining to determine why something happened by unearthing relationships that the end-user may not have known existed. Source: Bill O’Connell IBM, Aug 2007 Real-Time Threshold
List of BI tools ISQS 6339, Data Mgmt & BI No. Tool Version Vendor 1. Oracle Enterprise BI Server 7.8 Oracle 2. Business Objects Enterprise XI r2 Business Objects (now SAP) 3. SAP NetWeaver BI 7.0 SAP 4. SAS Enterprise BI Server 9.1.3 SAS Institute 5. TM/1 & Executive Viewer 9.1 Applix (now IBM) 6. BizzScore Suite 7.2 EFM Software 7. WebFocus 7 Information Builders 8. Excel, Performance Point, Analysis Server 2007/2005 Microsoft 9. QlikView 8 QlikTech 10. Microstrategy 8 Microstrategy 11. Hyperion System 9 Hyperion (now Oracle) 12. Actuate 9.1 Actuate 13. Cognos Series 8 8.3 Cognos (now IBM)