ITS632 – Introduction to Data Mining
Instructions: You must show all of your calculations.
Ghost Map
1. (1 point) Write a one sentence summary of how John Snow used his crude form of data mining to conclude the Broad Street well was the source of cholera.
Type of data and management scales
2. (8 points) Use the four data and management scales on the left to categorize the descriptions on the right.
a. nominal ____ customer service rating from 1 to 5
b. ordinal ____ gender: male or female
c. interval ____ today’s low temperature is 50F and today’s high is 75F
d. ratio ____ hair color such as black, brown, red
____ he is 6 feet tall
____ pain level from 1 to 10
____ average age in the course is 24.3
____ he ran the mile in exactly 4 minutes
Scatter diagram
3. (1 point) Using one sentence, explain the correlation between the number of beach visitors and the average daily temperature.
4. (3 points) Gini Index
Use the following table to calculate your answers to the three questions below.
a. What is the Gini Index for Home Owners?
b. What is the Gini Index for non-Home Owners?
c. Compute the weighted average for the Home Owner type.
5. (1 point) Bayes Theorem
Probability of a dangerous fire = 1%
Probability of smoke is common mainly due to barbeques = 10%
Probability of dangerous fires when there is smoke = 90%
Calculate the probability of a dangerous fire when there is smoke.
6. (6 points) Decision Trees
a. Examine the following dataset. If a datapoint with an x coordinate = 3 is added, what color would the datapoint be?
b. Given the following dataset, write rules for each color of datapoints.
1) green datapoints
2) red datapoints
3) blue datapoints
c. Calculate the Gini impurities for the following imperfect split.
1) Left =
2) Right =
Graduate Management Research Paper: Grading Rubric
Dr. William Dean
(Grading Guide)
Student Name: Title:
Date: Scoring: _______ out of 100 points
Criteria
Exemplary
Good
Acceptable
Unacceptable
Cover Sheet/Paper Header
Title, Your name, Professor’s name, class, date, no errors
5 points
Evidence of 4
4 points
Evidence of 3
3 points
Evidence of 2 or less
2 points or less
Introduction:
This section provides a history of the topic and associated research. It should clearly state the purpose of the paper, inclusive of thesis statement.
Clearly and concisely states the purpose. Is engaging and relevant. Previews the structure of the paper. No errors. The thesis statement is clearly stated.
15 points
States the purpose of the paper and is clear. Minimal errors
10 points
States main topic but is not engaging and does not outline structure
7 points
Does not state purpose, More than 5 grammatical errors. Unclear on all fronts
5 points or less
Development of main points
This will include itemized arguments with points in support and rebuttal of thesis stateme ...
ITS632 – Introduction to Data MiningInstructions You must sh.docx
1. ITS632 – Introduction to Data Mining
Instructions: You must show all of your calculations.
Ghost Map
1. (1 point) Write a one sentence summary of how John Snow
used his crude form of data mining to conclude the Broad Street
well was the source of cholera.
Type of data and management scales
2. (8 points) Use the four data and management scales on the
left to categorize the descriptions on the right.
a. nominal ____ customer service rating from 1
to 5
b. ordinal ____ gender: male or female
c. interval ____ today’s low temperature is 50F
and today’s high is 75F
d. ratio ____ hair color such as black, brown,
red
____ he is 6 feet tall
____ pain level from 1 to 10
____ average age in the course is 24.3
____ he ran the mile in exactly 4 minutes
2. Scatter diagram
3. (1 point) Using one sentence, explain the correlation between
the number of beach visitors and the average daily temperature.
4. (3 points) Gini Index
Use the following table to calculate your answers to the three
questions below.
a. What is the Gini Index for Home Owners?
b. What is the Gini Index for non-Home Owners?
c. Compute the weighted average for the Home Owner type.
5. (1 point) Bayes Theorem
Probability of a dangerous fire = 1%
Probability of smoke is common mainly due to barbeques = 10%
Probability of dangerous fires when there is smoke = 90%
Calculate the probability of a dangerous fire when there is
smoke.
3. 6. (6 points) Decision Trees
a. Examine the following dataset. If a datapoint with an x
coordinate = 3 is added, what color would the datapoint be?
b. Given the following dataset, write rules for each color of
datapoints.
1) green datapoints
2) red datapoints
3) blue datapoints
c. Calculate the Gini impurities for the following imperfect
split.
1) Left =
2) Right =
4. Graduate Management Research Paper: Grading Rubric
Dr. William Dean
(Grading Guide)
Student Name: Title:
Date: Scoring:
_______ out of 100 points
Criteria
Exemplary
Good
Acceptable
Unacceptable
Cover Sheet/Paper Header
Title, Your name, Professor’s name, class, date, no errors
5 points
Evidence of 4
4 points
Evidence of 3
3 points
Evidence of 2 or less
2 points or less
5. Introduction:
This section provides a history of the topic and associated
research. It should clearly state the purpose of the paper,
inclusive of thesis statement.
Clearly and concisely states the purpose. Is engaging and
relevant. Previews the structure of the paper. No errors. The
thesis statement is clearly stated.
15 points
States the purpose of the paper and is clear. Minimal errors
10 points
States main topic but is not engaging and does not outline
structure
7 points
Does not state purpose, More than 5 grammatical errors.
Unclear on all fronts
5 points or less
Development of main points
This will include itemized arguments with points in support and
rebuttal of thesis statement.
This section includes a focus on creative thinking, thoroughness
of thought, and is relevant to the stated research paper thesis.
+Outlines an idea. Has clear, thoughtful facts or arguments to
support idea and topic of paper. No errors.
+ Each paragraph has thoughtful supporting detail sentences
6. that develop the main idea both for and against thesis statement
+ Section headings used
30 points
Each paragraph has some sufficient detail. Minimal errors.
+ Each paragraph has sufficient supporting detail sentences that
develop the main idea.
20 points
Paragraph lacks details. Writing unclear. Some errors.
+ Each paragraph lacks supporting detail sentences.
15 points
No evidence to support points, many errors.
+ Each paragraph fails to develop the main idea.
10 points or less
Conclusion
This section provides a substantive summary of the key points
outlined in the paper and completes the thesis statement’s
intent.
The conclusion is engaging and restates personal learning.
Organized and draws facts together in coherent way. No errors.
Is more than one paragraph
20 points
The conclusion restates the purpose. Minimal errors.
15 points
The conclusion does not adequately restate the purpose.
7. 10 points
No evidence of a conclusion.
0 points
Works Cited
+ 10 required
Will be either peer-reviewed or PDF-formatted articles. Books
used at a minimum and in addition to articles.
Done in MLA and with no errors. Has at least 10 good
references. Includes single space citations and double space
between sources
10 points
Done in correct format with some evidence of format error. Has
at least 7 references
7 points
Not in correct format. Less than 7 references.
5 points
No bibliography
0 points
8. Citations
+ MLA format
+ 15 minimum
All main ideas are cited in the text. MLA is utilized correctly
with no errors. Citations include author(s) and page number
10 points
Some cited works. Errors in consistency.
7 points
Few cited works are done in the correct format.
5 points
No citations.
0 points
Grammar/Spelling
/Punctuation
No errors in spelling, sentence structure, etc.
5 points
Limited Errors
4 points
Many errors
9. 3 points
Can’t read
0 points
General review
+ Submitted electronic version of document by due date of
paper.
+ Word choice is generally good. The writer often goes beyond
the generic word to find one more precise and effective. Meets
assignment length of 12-14 pages of content.
5 points
4 points
3 points
0 point