Week 4 Handout


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Week 4 Handout

  1. 1. CIS1000: Tutorial Student Worksheet – Module 4 Multi-choice Practice Questions 1. In the data hierarchy, a byte typically represents a ____. a. record c. field b. bit d. character 2. A(n) ____ is a generalized class of people, places, or things for which data is collected, stored, and maintained. a. record c. attribute b. entity d. file 3. Data redundancy can lead to problems with ____. a. key fields c. data model range b. data integrity d. data domain 4. The database approach to data management requires ____. a. at least one flat file c. a DBMS b. data redundancy d. program-data dependence 5. A key advantage of the database approach is ____. a. increasing data redundancy b. increasing hardware dependence c. controlling data redundancy d. being able to use the benefits hierarchical databases offer 6. A(n) ____ is a diagram of entities and their relationships. a. entity-relationship diagram c. hierarchical model b. data flow diagram d. file diagram 7. A data model that uses basic graphical symbols to show the relationships between data is a(n) ____. a. planned redundancy model c. hierarchical data model b. entity-relationship diagram d. network data model 8. As long as tables in a database ____, they can be linked to provide useful information. a. share at least one common data attribute b. share two or more common data attributes c. are projected d. have been normalized 9. A ____ allows the database's creator to describe the logical access paths and logical records in a database. a. DDL c. data model b. DML d. data dictionary 10. One of the first steps in installing and using a database involves ____. a. data cleaning c. training users b. determining the schema d. writing SQL for data manipulation 11. To speed up processing, Pill's Pottery Company sends a copy of important data to each of its five warehouses. At the end of the day, any changes are sent back to the main database. Pill's is using ____. a. ODBC c. replicated databases b. OLAP d. data warehouses 12. ____ refers to the degree to which data in any one file is accurate. a. Data redundancy c. Data integrity b. Data scalability d. Data volume 13. A ____ is a file that contains a description of a subset of the database and identifies which users can view and modify the data items in that subset. 1
  2. 2. a. schema c. flat file b. subschema d. data dictionary 14. ____ is a form of data mining that combines historical data with assumptions about future conditions to predict outcomes of events such as future product sales or the probability that a customer will default on a loan. a. Data warehousing c. Business intelligence b. Predictive analysis d. Knowledge management 15. ____ involves combining two or more tables. a. Projecting c. Joining b. Selecting d. Synchronizing 16. In an object-oriented database, a(n) ____ is a procedure or action. a. method c. class b. object d. message 17. An object-oriented database uses a(n) ____ to provide a user interface and connections to other programs. a. SQL c. ORDBMS b. OQL d. OODBMS 18. ____ are usually selected to capture the relevant characteristics of entities such as employees or customers. a. Data items c. Attributes b. Traits d. Keys 19. The duplication of data in separate files is known as ____. a. data integrity c. data distribution b. data redundancy d. knowledge management 20. The ____ design of a database shows an abstract model of how the data should be structured and arranged to meet an organization’s information needs. a. physical c. entity b. enterprise d. logical Answers 1. ANS: D REF: 97 2. ANS: B REF: 97 3. ANS: B REF: 100 4. ANS: C REF: 100 5. ANS: C REF: 100 6. ANS: A REF: 102 7. ANS: B REF: 102 8. ANS: A REF: 104 9. ANS: A REF: 107 10. ANS: B REF: 107 11. ANS: C REF: 122 12. ANS: C REF: 100 13. ANS: B REF: 107 14. ANS: B REF: 119 15. ANS: C REF: 104 16. ANS: A REF: 124 17. ANS: D REF: 124 18. ANS: C REF: 98 19. ANS: B REF: 99 20. ANS: D REF: 102 2
  3. 3. Review Questions (2, 5, 7, 10 & 11) on p. 128 - 9 of Stair & Reynolds: 2) Define the term database. Describe how is it different from a database management system. _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 5) Identify what is data modelling, its purpose and briefly describe three (3) commonly used data models. _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 3
  4. 4. 7) Identify five (5) of the seven (7) important characteristics in selecting a database management system. _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 10) Describe what is a data warehouse, and discuss how is it different from a traditional database used to support Online Transaction Processing (OLTP). _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 4
  5. 5. 11) Differentiate between Data Mining and Online Analytical Processing (OLAP). _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ Theory Activities The database consists of a group of programs that can be used as an interface between a database and the user of the database and application programs. The software acts as a buffer between the application programs and the database itself. You are asked to illustrate a student registration system. Prepare a diagram that illustrates the USQ student registration system. The student data is stored in a database, the Student Registration system is used to interface the student database with the Student Registration, Student Transcript, and Student Bill application programs. The output from the system will be the student registration report, student transcript, and student bill. Theory Case Study: An Analytical Approach to Data Mining The fledgling Institute of Analytics Professionals of Australia is hoping its work on a new data- mining certification will soon distinguish trained professionals from cowboys. Founded last October, the IAPA has members working with PricewaterhouseCoopers, Ernst & Young, the Australian Tax Office, Telstra, AAPT, banks, insurance companies and universities. IAPA chairwoman Dr Inna Kolyshkina, who works with PricewaterhouseCoopers' global risk management solutions group, says proper certification should reduce marketplace confusion. "Someone who is doing OLAP (online analytical processing) is only doing two- dimensional processing," she says. "At best this is data reporting - not analytical - because it's not building a predictive model “You want to give something a client can run their data through and have results predicted. This is just one example of people being cowboys in the area." Kolyshkina says because data mining is all very new - only about two years old - its boundaries aren't yet clearly defined. As a profession, it's only now starting to emerge as "analytics". So the chairwoman is keen to distinguish those with true data-mining expertise from programmers who simply generate reports, and says there is an order-of-magnitude difference between them. "Our methods are less dependent on statistical methods," she says. "These say: 'Let's assume we are in an ideal world and a number of assumptions about our data are correct. Then we can use an elegant theory to model it.' "But with data mining, we are saying 'This is my data. Let's apply some brute-force computing to choose the best 5
  6. 6. possible model from hundreds of thousands of possible combinations.' The computer will then choose one automatically." This type of search-driven modelling started when companies realised that with widely available high-powered computers they could analyse and detect patterns in large bodies of data. For example, with customer churn, data mining can predict not only how many people will leave a bank or insurance company, but also who are the most likely customers at risk. The same methods can be used to trawl through millions of financial transactions to single out fraudsters or even terror suspects. This is possible because data mining allows researchers to throw a thousand possible factors into an equation, and the data-mining application itself will find the most important ones. Traditional reporting methods choke when considering even 20 variables over large data sets. The IAPA says likening data mining to traditional reporting is akin to comparing a warplane with a rifle. "Data mining is the answer to the industry need of managing large data sets," Kolyshkina says. "But it doesn't mean any monkey can press the data-mining button. If a warplane landed in a remote place where the people are uneducated, they wouldn't know how to fly it. But in the hands of a knowledgeable person, it can do much more than a gun." The IAPA has so far recruited about 50 members - mainly industry specialists, researchers and academics. Only two members are students. But IAPA secretary Eugene Dubossarsky, who works as a senior consultant in Ernst and Young's actuarial business, is bullish about their prospects. "Data mining as a profession is definitely growing because data is growing," he says. "And data is becoming more and more usable because of data warehousing (where information from many locations can be centrally mined). So the only way is up." Pricewaterhouse Coopers has 50 people working in its data-mining area, and anticipates it will hire more as industry demand increases. IAPA committee member Warwick Graco, confirms his employer, the Tax Office, might also soon be hiring. "The ATO intends having a network of about 30 data miners working with another 70 or so analytics staff, such as OLAP people," he says. "And these numbers do not include those who perform less quantitatively oriented intelligence analysis and risk analysis. The ATO has recognised that it needs skilled staff that can extract value and meaning from its large holdings of data." The good news is these new data-mining jobs will probably stay in Australia. This is because privacy considerations limit the distribution of data, while an intimate knowledge of business conditions is advantageous to making sense of it. However, being a new field in computing, people with formal data-mining qualifications are practically non-existent. One proposal before the IAPA is for a new "Accredited Data Mining Professional" certification to be based on holding a related university degree, in conjunction with demonstrable field experience. "We want to be very inclusive," Kolyshkina says. "We don't want to make it a power game or use it for leverage. But we want to make sure someone off the street can't claim to be the data miner of the year. There is no degree in data mining, so it's very hard to say who is a data miner and who is not." To promote better understanding of the field, the institute aims to help students find their way into the industry by providing mentoring. Critical Thinking Questions 1. Do you think there is a need for a new data-mining certification from the Institute of Analytics Professionals of Australia (IAPA)? Explain your answer. 2. Do you think data mining is a profession that will grow or decrease in the future? What Would You Do? You would like to become a data miner with an ”Accredited Data Mining Professional" certification. As there are no specific degrees in data mining you decide to either do a Computer Science, Information Technology, or Information Systems degree. 6
  7. 7. 3. What strategy should you put into place to achieve your goal of becoming an accredited data mining professional? 4. Do you think you will need to work overseas or will there enough data-mining jobs in Australia? SOURCE: Eric Wilson, “An analytical approach to data mining” www.smh.com.au, March 9, 2004, accessed from “http://www.smh.com.au/articles/2004/03/08/1078594280704.html” 7