1Dr. LaMar D. Brown PhD, MBAExecutive MSITUnivEttaBenton28
1
Dr. LaMar D. Brown PhD, MBA
Executive MSIT
University of the Cumberlands
Course: 2019-SPR-IG-ITS530-21: 2019_SPR_IG_Analyzing and Visualizing Data_21
Chapter Readings Reflections Journal
Chapter 1: Defining Data Visualization
Summary
In Chapter 1, the author Mr. Kirk describes about the concept of Data Visualization. Data visualization was defined as the visual analysis and communication of data. The chapter also included the historical background survey definition of data visualization by various other authors.
Also, in the book was a set of fascinating recipes that of the components in that involve in the definition. The type of data that is required to be visually analyzed is important before it is being subjected to further processing before visualization.
Mr. Kirk also emphasized the significance of the art and science of making data analysis a fun filled technical and an analytical reading that encourages the use of human perception to make decisions in assistance of visual treats that come in the form of graphs, pie charts among others. The science of data visualization is defined with the implication of truth, evidence and rules that govern the process of visualizing a set of data that can be quintessential in determining the path of an enterprise or an organization.
Highlights:
Upon reading the chapter 1 in this book that was in depth into data visualization, I was able to grasp essential technical and analytical definitions and can say they are quiet telling in terms of the importance on the concept and visual representation of the definitions. The use of some of the citations was a key indicator that data visualization can be defined in various ways and can assist in technical improvements if used in way that is beneficial to all parties.
Ideas and thoughts:
The chapter was a thorough analysis of the concept. However, I was also keen on looking for live examples of visual tools or results of analysis inculcated in this defining place of the book. The big positive is the use of the concept of science and art that can be implemented in the day to day activities to introduce data visualization in any area and can help in making decisions that can set a trend for the growth of an organization. In terms of the course, it was a great read to write this review journal and can hopefully add a firm base to the things to come.
Application:
The concept of data visualization can be implemented in my current work environment. As an IT personnel, I deal with the network infrastructure and constantly come across large chunk of data that will need to be analyzed for its usage stats, bandwidth, performance and benefits of choosing the hardware or software accordingly. To best impact this, the monitoring tools such a s NetFlow helps us in verifying bandwidth over utilization or underutilization to perform a set of tasks before troubleshooting any related issues. Now, the concept of data visualization can be implemented here ...
( (P A N G . N I N G T A NM i c h i g a n VannaJoy20
\ ( (
P A N G . N I N G T A N
M i c h i g a n S t a t e U n i v e r s i t y
M I C H A E L S T E I N B A C H
U n i v e r s i t y o f M i n n e s o t a
V I P I N K U M A R
U n i v e r s i t y o f M i n n e s o t a
a n d A r m y H i g h P e r f o r m a n c e
C o m p u t i n g R e s e a r c h C e n t e r
+f.f_l crf.rfh. .W if f
aqtY 6l$
t . T . R . C .
i'&'ufe61ttt1/.
Y \ t.\ $t,/,1'
n,5 \ . 7 \ V
' 4 8 !
Boston San Francisco NewYork
London Toronto Sydney Tokyo Singapore Madrid
MexicoCity Munich Paris CapeTown HongKong Montreal
G.R
r+6,q
If you purchased this book within the United States or Canada you should be aware that it has been
wrongfirlly imported without the approval of the Publishel or the Author.
T3
Loo 6
- {)gq*
3 AcquisitionsEditor Matt Goldstein
ProjectEditor Katherine Harutunian
Production Supervisor Marilyn Lloyd
Production Services Paul C. Anagnostopoulos of Windfall Software
Marketing Manager Michelle Brown
Copyeditor Kathy Smith
Proofreader IenniferMcClain
Technicallllustration GeorgeNichols
Cover Design Supervisor Joyce Cosentino Wells
Cover Design Night & Day Design
Cover Image @ 2005 Rob Casey/Brand X pictures
hepress and Manufacturing Caroline Fell
Printer HamiltonPrinting
Access the latest information about Addison-Wesley titles from our iWorld Wide Web site:
http : //www. aw-bc.com/computing
Many of the designations used by manufacturers and sellers to distiriguish their products
are claimed as trademarks. where those designations appear in this book, and Addison-
Wesley was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
The programs and applications presented in this book have been incl,[rded for their
instructional value. They have been tested with care, but are not guatanteed for any
particular purpose. The publisher does not offer any warranties or representations, nor does
it accept any liabilities with respect to the programs or applications.
Copyright @ 2006 by Pearson Education, Inc.
For information on obtaining permission for use of material in this work, please submit a
written request to Pearson Education, Inc., Rights and Contract Department, 75 Arlington
Street, Suite 300, Boston, MA02II6 or fax your request to (617) g4g-j047.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or any other media embodiments now known or hereafter to become known,
without the prior written permission of the publisher. printed in the united States of
America.
lsBN 0-321-42052-7
2 3 4 5 67 8 9 10-HAM-O8 07 06
our famili,es
Preface
Advances in data generation and collection are producing data sets of mas-
sive size in commerce and a variety of scientific disciplines. Data warehouses
store details of the sales and operations of businesses, Earth-orbiting satellites
beam high-resolution ima ...
\ ( (
P A N G . N I N G T A N
M i c h i g a n S t a t e U n i v e r s i t y
M I C H A E L S T E I N B A C H
U n i v e r s i t y o f M i n n e s o t a
V I P I N K U M A R
U n i v e r s i t y o f M i n n e s o t a
a n d A r m y H i g h P e r f o r m a n c e
C o m p u t i n g R e s e a r c h C e n t e r
+f.f_l crf.rfh. .W if f
aqtY 6l$
t . T . R . C .
i'&'ufe61ttt1/.
Y \ t.\ $t,/,1'
n,5 \ . 7 \ V
' 4 8 !
Boston San Francisco NewYork
London Toronto Sydney Tokyo Singapore Madrid
MexicoCity Munich Paris CapeTown HongKong Montreal
G.R
r+6,q
If you purchased this book within the United States or Canada you should be aware that it has been
wrongfirlly imported without the approval of the Publishel or the Author.
T3
Loo 6
- {)gq*
3 AcquisitionsEditor Matt Goldstein
ProjectEditor Katherine Harutunian
Production Supervisor Marilyn Lloyd
Production Services Paul C. Anagnostopoulos of Windfall Software
Marketing Manager Michelle Brown
Copyeditor Kathy Smith
Proofreader IenniferMcClain
Technicallllustration GeorgeNichols
Cover Design Supervisor Joyce Cosentino Wells
Cover Design Night & Day Design
Cover Image @ 2005 Rob Casey/Brand X pictures
hepress and Manufacturing Caroline Fell
Printer HamiltonPrinting
Access the latest information about Addison-Wesley titles from our iWorld Wide Web site:
http : //www. aw-bc.com/computing
Many of the designations used by manufacturers and sellers to distiriguish their products
are claimed as trademarks. where those designations appear in this book, and Addison-
Wesley was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
The programs and applications presented in this book have been incl,[rded for their
instructional value. They have been tested with care, but are not guatanteed for any
particular purpose. The publisher does not offer any warranties or representations, nor does
it accept any liabilities with respect to the programs or applications.
Copyright @ 2006 by Pearson Education, Inc.
For information on obtaining permission for use of material in this work, please submit a
written request to Pearson Education, Inc., Rights and Contract Department, 75 Arlington
Street, Suite 300, Boston, MA02II6 or fax your request to (617) g4g-j047.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or any other media embodiments now known or hereafter to become known,
without the prior written permission of the publisher. printed in the united States of
America.
lsBN 0-321-42052-7
2 3 4 5 67 8 9 10-HAM-O8 07 06
our famili,es
Preface
Advances in data generation and collection are producing data sets of mas-
sive size in commerce and a variety of scientific disciplines. Data warehouses
store details of the sales and operations of businesses, Earth-orbiting satellites
beam high-resolution ima.
Demonstration of core knowledgeThe following section demonstrate.docxruthannemcmullen
Demonstration of core knowledge
The following section demonstrates the core knowledge that I am qualified to graduate from Mechanical Engineering graduate program.
This section will focus on two different fields:
· Material properties and Selection
· Simulation of processes
In Material properties and Selection field, the main concept is to identify the different properties of material to meet the requirement of the design. This is the early step for mechanical engineer to select the material for manufacturing products, which means by obtaining this knowledge I am capable of implementing what I learn to help designing a product for a company. For example, the core knowledge that I obtained in one of my graduate classes can demonstrate this field. The final project of the class, shown in Fig 1., is the standard procedure of early design that will be used for manufacturing industries. The result of the project shows that I am capable of using the trade-off plot which include several factors density, Young’s modulus, yield strength, and cost to identify the material that meet constrains and objectives of the design. Moreover, understanding the definition of each material property and the corresponding limitation. Such as density will affect the mass and volume and yield strength indicates the limit if elastic behavior are the basic and also the requirement of being a master student of mechanical engineering.
Fig. 1 Material Selection Trade-off Plot
For the second field, Simulation of processes, before any complex or costly manufacturing process. It is indispensable to run the simulation before the actual process. Not only the error can be predicted in the result of the simulation but the overall result of the end product. For example, the casting process for an impeller, a rotor with blades used to increase the pressure and/or flow of a fluid, is challenging and also easy to fail. However, with the help of simulating the process, shown in Fig. 2 &3, the failure of the casting process is able to be predicted by identifying the location of maximum principle, which the growth of the crack will occur in direction perpendicular to, and maximum normal stress, which the failure will occur, to improve the actual casting process and prevent the failure of a process.
Fig. 2 Identidy the maximum principle stressFig. 3 Identify the maximun normal stress
Both two fields listed above, Material properties and Selection and Simulation of processes,
demonstrate the essential core knowledge that I obtained while studying master of mechanical engineering. The first enable me to determine which material is the most suitable for the product, which allow me to work as a design engineer. The latter help me simulate the manufacturing process which can also help me with my future to work as a process engineer.
1
A Guide for Writing a Technical Research Paper
Libby Shoop
Macalester College, Mathematics and Computer Science Department
1 Introduction
.
1Dr. LaMar D. Brown PhD, MBAExecutive MSITUnivEttaBenton28
1
Dr. LaMar D. Brown PhD, MBA
Executive MSIT
University of the Cumberlands
Course: 2019-SPR-IG-ITS530-21: 2019_SPR_IG_Analyzing and Visualizing Data_21
Chapter Readings Reflections Journal
Chapter 1: Defining Data Visualization
Summary
In Chapter 1, the author Mr. Kirk describes about the concept of Data Visualization. Data visualization was defined as the visual analysis and communication of data. The chapter also included the historical background survey definition of data visualization by various other authors.
Also, in the book was a set of fascinating recipes that of the components in that involve in the definition. The type of data that is required to be visually analyzed is important before it is being subjected to further processing before visualization.
Mr. Kirk also emphasized the significance of the art and science of making data analysis a fun filled technical and an analytical reading that encourages the use of human perception to make decisions in assistance of visual treats that come in the form of graphs, pie charts among others. The science of data visualization is defined with the implication of truth, evidence and rules that govern the process of visualizing a set of data that can be quintessential in determining the path of an enterprise or an organization.
Highlights:
Upon reading the chapter 1 in this book that was in depth into data visualization, I was able to grasp essential technical and analytical definitions and can say they are quiet telling in terms of the importance on the concept and visual representation of the definitions. The use of some of the citations was a key indicator that data visualization can be defined in various ways and can assist in technical improvements if used in way that is beneficial to all parties.
Ideas and thoughts:
The chapter was a thorough analysis of the concept. However, I was also keen on looking for live examples of visual tools or results of analysis inculcated in this defining place of the book. The big positive is the use of the concept of science and art that can be implemented in the day to day activities to introduce data visualization in any area and can help in making decisions that can set a trend for the growth of an organization. In terms of the course, it was a great read to write this review journal and can hopefully add a firm base to the things to come.
Application:
The concept of data visualization can be implemented in my current work environment. As an IT personnel, I deal with the network infrastructure and constantly come across large chunk of data that will need to be analyzed for its usage stats, bandwidth, performance and benefits of choosing the hardware or software accordingly. To best impact this, the monitoring tools such a s NetFlow helps us in verifying bandwidth over utilization or underutilization to perform a set of tasks before troubleshooting any related issues. Now, the concept of data visualization can be implemented here ...
( (P A N G . N I N G T A NM i c h i g a n VannaJoy20
\ ( (
P A N G . N I N G T A N
M i c h i g a n S t a t e U n i v e r s i t y
M I C H A E L S T E I N B A C H
U n i v e r s i t y o f M i n n e s o t a
V I P I N K U M A R
U n i v e r s i t y o f M i n n e s o t a
a n d A r m y H i g h P e r f o r m a n c e
C o m p u t i n g R e s e a r c h C e n t e r
+f.f_l crf.rfh. .W if f
aqtY 6l$
t . T . R . C .
i'&'ufe61ttt1/.
Y \ t.\ $t,/,1'
n,5 \ . 7 \ V
' 4 8 !
Boston San Francisco NewYork
London Toronto Sydney Tokyo Singapore Madrid
MexicoCity Munich Paris CapeTown HongKong Montreal
G.R
r+6,q
If you purchased this book within the United States or Canada you should be aware that it has been
wrongfirlly imported without the approval of the Publishel or the Author.
T3
Loo 6
- {)gq*
3 AcquisitionsEditor Matt Goldstein
ProjectEditor Katherine Harutunian
Production Supervisor Marilyn Lloyd
Production Services Paul C. Anagnostopoulos of Windfall Software
Marketing Manager Michelle Brown
Copyeditor Kathy Smith
Proofreader IenniferMcClain
Technicallllustration GeorgeNichols
Cover Design Supervisor Joyce Cosentino Wells
Cover Design Night & Day Design
Cover Image @ 2005 Rob Casey/Brand X pictures
hepress and Manufacturing Caroline Fell
Printer HamiltonPrinting
Access the latest information about Addison-Wesley titles from our iWorld Wide Web site:
http : //www. aw-bc.com/computing
Many of the designations used by manufacturers and sellers to distiriguish their products
are claimed as trademarks. where those designations appear in this book, and Addison-
Wesley was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
The programs and applications presented in this book have been incl,[rded for their
instructional value. They have been tested with care, but are not guatanteed for any
particular purpose. The publisher does not offer any warranties or representations, nor does
it accept any liabilities with respect to the programs or applications.
Copyright @ 2006 by Pearson Education, Inc.
For information on obtaining permission for use of material in this work, please submit a
written request to Pearson Education, Inc., Rights and Contract Department, 75 Arlington
Street, Suite 300, Boston, MA02II6 or fax your request to (617) g4g-j047.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or any other media embodiments now known or hereafter to become known,
without the prior written permission of the publisher. printed in the united States of
America.
lsBN 0-321-42052-7
2 3 4 5 67 8 9 10-HAM-O8 07 06
our famili,es
Preface
Advances in data generation and collection are producing data sets of mas-
sive size in commerce and a variety of scientific disciplines. Data warehouses
store details of the sales and operations of businesses, Earth-orbiting satellites
beam high-resolution ima ...
\ ( (
P A N G . N I N G T A N
M i c h i g a n S t a t e U n i v e r s i t y
M I C H A E L S T E I N B A C H
U n i v e r s i t y o f M i n n e s o t a
V I P I N K U M A R
U n i v e r s i t y o f M i n n e s o t a
a n d A r m y H i g h P e r f o r m a n c e
C o m p u t i n g R e s e a r c h C e n t e r
+f.f_l crf.rfh. .W if f
aqtY 6l$
t . T . R . C .
i'&'ufe61ttt1/.
Y \ t.\ $t,/,1'
n,5 \ . 7 \ V
' 4 8 !
Boston San Francisco NewYork
London Toronto Sydney Tokyo Singapore Madrid
MexicoCity Munich Paris CapeTown HongKong Montreal
G.R
r+6,q
If you purchased this book within the United States or Canada you should be aware that it has been
wrongfirlly imported without the approval of the Publishel or the Author.
T3
Loo 6
- {)gq*
3 AcquisitionsEditor Matt Goldstein
ProjectEditor Katherine Harutunian
Production Supervisor Marilyn Lloyd
Production Services Paul C. Anagnostopoulos of Windfall Software
Marketing Manager Michelle Brown
Copyeditor Kathy Smith
Proofreader IenniferMcClain
Technicallllustration GeorgeNichols
Cover Design Supervisor Joyce Cosentino Wells
Cover Design Night & Day Design
Cover Image @ 2005 Rob Casey/Brand X pictures
hepress and Manufacturing Caroline Fell
Printer HamiltonPrinting
Access the latest information about Addison-Wesley titles from our iWorld Wide Web site:
http : //www. aw-bc.com/computing
Many of the designations used by manufacturers and sellers to distiriguish their products
are claimed as trademarks. where those designations appear in this book, and Addison-
Wesley was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
The programs and applications presented in this book have been incl,[rded for their
instructional value. They have been tested with care, but are not guatanteed for any
particular purpose. The publisher does not offer any warranties or representations, nor does
it accept any liabilities with respect to the programs or applications.
Copyright @ 2006 by Pearson Education, Inc.
For information on obtaining permission for use of material in this work, please submit a
written request to Pearson Education, Inc., Rights and Contract Department, 75 Arlington
Street, Suite 300, Boston, MA02II6 or fax your request to (617) g4g-j047.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, or any other media embodiments now known or hereafter to become known,
without the prior written permission of the publisher. printed in the united States of
America.
lsBN 0-321-42052-7
2 3 4 5 67 8 9 10-HAM-O8 07 06
our famili,es
Preface
Advances in data generation and collection are producing data sets of mas-
sive size in commerce and a variety of scientific disciplines. Data warehouses
store details of the sales and operations of businesses, Earth-orbiting satellites
beam high-resolution ima.
Demonstration of core knowledgeThe following section demonstrate.docxruthannemcmullen
Demonstration of core knowledge
The following section demonstrates the core knowledge that I am qualified to graduate from Mechanical Engineering graduate program.
This section will focus on two different fields:
· Material properties and Selection
· Simulation of processes
In Material properties and Selection field, the main concept is to identify the different properties of material to meet the requirement of the design. This is the early step for mechanical engineer to select the material for manufacturing products, which means by obtaining this knowledge I am capable of implementing what I learn to help designing a product for a company. For example, the core knowledge that I obtained in one of my graduate classes can demonstrate this field. The final project of the class, shown in Fig 1., is the standard procedure of early design that will be used for manufacturing industries. The result of the project shows that I am capable of using the trade-off plot which include several factors density, Young’s modulus, yield strength, and cost to identify the material that meet constrains and objectives of the design. Moreover, understanding the definition of each material property and the corresponding limitation. Such as density will affect the mass and volume and yield strength indicates the limit if elastic behavior are the basic and also the requirement of being a master student of mechanical engineering.
Fig. 1 Material Selection Trade-off Plot
For the second field, Simulation of processes, before any complex or costly manufacturing process. It is indispensable to run the simulation before the actual process. Not only the error can be predicted in the result of the simulation but the overall result of the end product. For example, the casting process for an impeller, a rotor with blades used to increase the pressure and/or flow of a fluid, is challenging and also easy to fail. However, with the help of simulating the process, shown in Fig. 2 &3, the failure of the casting process is able to be predicted by identifying the location of maximum principle, which the growth of the crack will occur in direction perpendicular to, and maximum normal stress, which the failure will occur, to improve the actual casting process and prevent the failure of a process.
Fig. 2 Identidy the maximum principle stressFig. 3 Identify the maximun normal stress
Both two fields listed above, Material properties and Selection and Simulation of processes,
demonstrate the essential core knowledge that I obtained while studying master of mechanical engineering. The first enable me to determine which material is the most suitable for the product, which allow me to work as a design engineer. The latter help me simulate the manufacturing process which can also help me with my future to work as a process engineer.
1
A Guide for Writing a Technical Research Paper
Libby Shoop
Macalester College, Mathematics and Computer Science Department
1 Introduction
.
Creating a Use Case
Jennifer LeClair
CIS 510
Instructor Name: Dr. Austin Umezurike
10/27/2016
Assignment 2:
Creating a Use Case
Introduction
With this paper I will show how a use case diagram should be used. I base this paper from fig. 3
– 11 pages 78 – 80 in our textbook titled: System Analysis and Design in a Changing World, 6th
edition, by Satzinger, Jackson, and Burd. In the Use Case Diagram that I make, I will depict a
use case for a RMO CSMS subsystem. I will also be describing the overview of the diagram. I
will also provide an analysis of the characters.
Use Case Introduction
An activity that a system performs is known as a use case. It is mostly in response to the
user. Use case analysis is a technique that is used for identifying the functional requirements of
the software system. A use case is to designate the point of view from a client and customer, this
is a use cases main purpose. An analytical role in the development process is done by the
developer. The other definition of a use case is as an objective or as an actor. Actors are with a
particular system and they want to achieve. In the use case diagram that I create, I will show the
actors and use cases for the RMO CSMS subsystem for marketing.
Marketing Subsystem
RMO CSMS
Marketing Merchandising
Overview
The overview of this use case diagram has the following: It shows the system boundary,
the association and the actors. The one that does the interaction with the system by entering or
receiving data is called a group, actor, external agent or person. Another part of the whole system
are the system boundaries. System boundaries are the computerized part of the application along
with the users who operate it. When a customer places a relationship between certain things such
as a certain employee in a department and an order, this would be a logical association. In my
diagram I have included two actors, one is representing marketing and the other represents
merchandising.
Analysis
The events and actions that define the interactions with a system and the role in order to
be able to discover a goal is a list of actions or steps in an event in a use case. The elements that
make up a use case diagram and the connections that are between a use case and the actors is an
association. This lets us know that there is communication between the actors and the use case.
On the marketing side they need to be able to update / add promotions, production and business
partners. On the merchandising side they need to be able to update / add production information
and accessory packages.
Summary
The important part of a use case diagram is that you can identi ...
c PJM6610 Foundations of Project Business Analysis.docxbartholomeocoombs
c
PJM6610 Foundations of Project Business Analysis
Prof. Johan Roos
Signature Assignment 1
Planning for Elicitation Assignment
Signature Assignment: Planning for Elicitation
By Group:
Mustafa Uzun, Shraddha Sherekar, Vikitha Veera
Content
1. An overview ……..………………….……………………………………………………………32. Elicitation plan ………………………………………..…………………………………………43. Project plan ……………………...…………….…………………………………………………54. References….…………………………………………………………………………………..…6
1. An Overview
Skype. It has a substantial market share (and mindshare), many people use it daily, yet nearly every core component of the program is seen as being out of date. The Skype corporation has been operating online for more than 20 years, and by spreading the word about its ability to make audio and video conversations via the internet instead of over the phone, it has grown its subscriber base.
Surveys, focus groups with observation, and floating questionnaires to clients who have used this product at least once are the finest ways to learn about the present status of the business and, consequently, the main product offering. It can be very helpful to identify the target audience and to provide useful inputs that could help define a future state for the product. Data obtained from online surveys through various e-commerce platforms with which the company has partnerships, data obtained from social media channels, and data from websites. Locals can provide insightful information that will serve as clear prompts for the company's R&D team as they plot the course for upcoming innovations or enhancements to current products.
Customer and influencer marketing-provided product evaluations are another crucial metric that may assist a business discover what consumers like and dislike about a product, as well as how they perceive its value, quality, and ability to effectively clean their teeth, among other things. The basic problem that the Skype team must overcome may be understood through root cause and opportunity analysis. Understanding the present situation of the product and the business may be accomplished with the use of this knowledge together with data from real surveys and website visitors.
2. Elicitation Plan
Elicitation Techniques:
1. Survey/Questionnaire
Stakeholders including end-users are presented with a series of questions over a survey or a questionnaire to help quantify their opinions. Following the gathering of the responses here, data is evaluated to determine the stakeholders' areas of focus that need improvement. High priority risks should be the basis for questions. Direct and clear questions are best. Closed-ended questions will help us focus on areas that we know need improvement while open-ended ones will help us comprehend what we may have overlooked.
Advantage:
The benefit of following this process is that data from a broad audience is simple to obtain and time taken to receive participants' re.
Profile Analysis of Users in Data Analytics DomainDrjabez
Data Analytics and Data Science is in the fast forward
mode recently. We see a lot of companies hiring people for data
analysis and data science, especially in India. Also, many
recruiting firms use stackoverflow to fish their potential
candidates. The industry has also started to recruit people based
on the shapes of expertise. Expertise of a personal is
metaphorically outlined by shapes of letters like I, T, M and
hyphen betting on her experiencein a section (depth) and
therefore the variety of areas of interest (width).This proposal
builds upon the work of mining shapes of user expertise in a
typical online social Question and Answer (Q&A) community
where expert users often answer questions posed by other
users.We have dealt with the temporal analysis of the expertise
among the Q&A community users in terms how the user/ expert
have evolved over time.
Keywords— Shapes of expertise, Graph communities, Expertise
evolution, Q&A community
DAT 520 Final Project Guidelines and Rubric Overview .docxsimonithomas47935
DAT 520 Final Project Guidelines and Rubric
Overview
You must complete a decision analysis research project as your final project for this course. Your research project will focus on a real-world topic of your choice,
as approved by your instructor. You will pick a topic from the list provided or with approval from your instructor, and create a data analysis plan and decision
tree model based on a real-world scenario. This assessment will provide you with the opportunity to employ highly valued decision support skills and concepts
for data within a real-world context. You can use the Final Project Notes document, found in the Assignment Guidelines and Rubrics section of the course.
The project is divided into three milestones, which will be submitted at various points throughout the course to scaffold learning and ensure quality final
submissions. These milestones will be submitted in Modules Two, Five, and Seven. The final submission will occur in Module Nine.
This project will address the following course outcomes:
Appraise data in context according to industry-standard methods and techniques for its utility in supporting decision making
Determine suitable data manipulation and modeling methods for decision support
Articulate data frameworks for organizational decision support by applying data manipulation, modeling, and management concepts
Evaluate the ethical issues surrounding organizational use of decision-oriented data based on industry standards and one’s personal ethical criteria
Create and assess the agility of solutions through application of data-mining procedures for decision support in various industries
Prompt
Your decision analysis model and report should answer the following prompt: How does your model and evaluation resolve uncertainty in making a decision? In
order to produce your analytic report, you will need to choose and investigate a data set using the decision analysis techniques you learned in class. Then you
will formulate a research question, write an analytic plan, and implement it. Your report should not solely consist of descriptions of what you did. It should also
contain detailed explorations into the meaning behind your model and the implications of its results. You will also be testing your model’s fitness and evaluating
its strengths and weaknesses.
The project in a nutshell:
1. Choose a data set (get ideas from the source list in the spreadsheet Final Project Topics and Sources.xls)
2. Formulate your decision analysis research question
3. Write an analytic plan
4. Perform the top-down or bottom-up modeling
5. Perform model diagnostics
6. Evaluate
These activities are broken up into milestones so that the work is spread throughout the term and you can get early assistance with any obstacles.
A decision analysis report is similar to any other analytic report. These reports introduce a problem, state a line of inquiry, explain a model th.
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...Sahilakhurana
Banking and securities
Challenges
Early warning for securities fraud and trade visibilities
Card fraud detection and audit trails
Enterprise credit risk reporting
Customer data transformation and analytics.
The Security Exchange commission (SEC) is using big data to monitor financial market activity by using network analytics and natural language processing. This helps to catch illegal trading activity in the financial markets.
The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur in several phases at once. For most phases in the lifecycle, the movement can be either forward or backward. This iterative depiction of the lifecycle is intended to more closely portray a real project, in which aspects of the project move forward and may return to earlier stages as new information is uncovered and team members learn more about various stages of the project. This enables participants to move iteratively through the process and drive toward operationalizing the project work.
Phase 1—Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 2—Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
Toward a System Building Agenda for Data Integration
(and Data Science)
AnHai Doan, Pradap Konda, Paul Suganthan G.C., Adel Ardalan, Jeffrey R. Ballard, Sanjib Das,
Yash Govind, Han Li, Philip Martinkus, Sidharth Mudgal, Erik Paulson, Haojun Zhang
University of Wisconsin-Madison
Abstract
We argue that the data integration (DI) community should devote far more effort to building systems,
in order to truly advance the field. We discuss the limitations of current DI systems, and point out that
there is already an existing popular DI “system” out there, which is PyData, the open-source ecosystem
of 138,000+ interoperable Python packages. We argue that rather than building isolated monolithic DI
systems, we should consider extending this PyData “system”, by developing more Python packages that
solve DI problems for the users of PyData. We discuss how extending PyData enables us to pursue an
integrated agenda of research, system development, education, and outreach in DI, which in turn can
position our community to become a key player in data science. Finally, we discuss ongoing work at
Wisconsin, which suggests that this agenda is highly promising and raises many interesting challenges.
1 Introduction
In this paper we focus on data integration (DI), broadly interpreted as covering all major data preparation steps
such as data extraction, exploration, profiling, cleaning, matching, and merging [10]. This topic is also known
as data wrangling, munging, curation, unification, fusion, preparation, and more. Over the past few decades, DI
has received much attention (e.g., [37, 29, 31, 20, 34, 33, 6, 17, 39, 22, 23, 5, 8, 36, 15, 35, 4, 25, 38, 26, 32, 19,
2, 12, 11, 16, 2, 3]). Today, as data science grows, DI is receiving even more attention. This is because many
data science applications must first perform DI to combine the raw data from multiple sources, before analysis
can be carried out to extract insights.
Yet despite all this attention, today we do not really know whether the field is making good progress. The
vast majority of DI works (with the exception of efforts such as Tamr and Trifacta [36, 15]) have focused on
developing algorithmic solutions. But we know very little about whether these (ever-more-complex) algorithms
are indeed useful in practice. The field has also built mostly isolated system prototypes, which are hard to use and
combine, and are often not powerful enough for real-world applications. This makes it difficult to decide what
to teach in DI classes. Teaching complex DI algorithms and asking students to do projects using our prototype
systems can train them well for doing DI research, but are not likely to train them well for solving real-world DI
problems in later jobs. Similarly, outreach to real users (e.g., domain scientists) is difficult. Given that we have
Copyright 0000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purpose ...
Running head: SAMPLE CASE STUDY 1
Ima Student
MGT 450
Sample Case Study: Siebel Systems
Professor Amazing
December 15, 2008
SAMPLE CASE STUDY 2
Sample Case Study: Siebel Systems
Siebel Systems faced several problems at the time of this article.
Primarily, corporate software customers are looking for integrated “suites” of
software applications while Siebel offers only one application—customer
relations management (CRM) software (“Siebel”). To solve this problem and
to regain a corner of the corporate software market, Siebel Systems and its
CEO and owner, Tom Siebel, will have to relinquish the idea of “doing one
thing really well” (Kerstetter, 2003, p. 2). In order to grow and expand, Siebel
Systems needs to diversify software applications and integrate the applications
that corporations seek into one system.
The introduction
identifies the central
problem.
Thesis statement is
located at the end of
the introduction.
Indeed, corporate software customers want integrated, user-friendly,
and cost-effective software systems. Applications for financial data, corporate
planning, and human resources (Kerstetter, 2003), as well as what Siebel
currently offers, CRM, are in demand. While Siebel should consider
modifying its software for manageability or even integrating with rival
programs, this is not a long-term solution for the company. Nevertheless,
Siebel Systems will continue to shrink and elicit poor customer satisfaction if
it cannot create, buy, or partner with other software applications. Siebel
should develop a strong suite of software applications quickly before it
exhausts revenues and loses its current clientele.
The background
section is made up
of important general
facts.
As a short-term solution, Siebel Systems should work with IBM and
Microsoft on creating one version of Siebel’s new product line that is
compatible with both platforms, saving the companies the $550 million for
Here, the writer
begins to develop
the proposed
solution.
SAMPLE CASE STUDY 3
two versions that are in the works (Kerstetter, 2003). However, a long-term
solution involves Siebel’s creating, buying, or merging with other software
companies until an integrated, user-friendly suite of applications has been
developed. Once this has been achieved, Siebel Systems must provide
customer service that is geared toward problem avoidance rather than problem
patches and offer upgrade packages that are cost effective, relevant, and easily
implemented.
These modifications should yield a high return on Siebel’s investment.
Although the cost of acquiring additional applications is potentially greater
than the co.
JW House FundraiserJourney Through the Enchanted Forest Ga.docxpauline234567
JW House Fundraiser
Journey Through the Enchanted Forest Gala
Silent Auction
Table Decor
Specialized cocktails for Event
Three Screens will be Placed for Optimum Viewing by all Attendees
New House Announcement
Happy 30th Birthday, JW!
Auction
Isle down Center Allows Fundraising Auctioneer to Engage Audience
Balloon
Drop
S’mores Sponsored by Largest Corporate Donor
Finish the Evening with Dancing & Beverages
Image Sources
http://springfields.net.au/media/catalog/category/_2_43.png
https://s-media-cache-ak0.pinimg.com/originals/36/fa/fe/36fafee1408521530bfa23368e604d55.jpg
https://www.thegirlcreative.com
http://ballooncity.com/wp-content/uploads/2013/09/danceFloorFlipPNG.png
https://t3.rbxcdn.com/ea203ae8bb1787569f5e375cde0a93b2
http://jwhouse.org/wp-content/uploads/2017/07/jwPortraitStory.jpg
http://royalcandycompany.com/wp/wp-content/uploads/2016/04/Smores-Buffet.jpg
https://lhueagleeye.files.wordpress.com/2015/11/crowd_20080505124150.jpg
www.socialtables.com
https://media-cdn.tripadvisor.com/media/photo-s/03/0d/c8/a7/santa-clara-convention.jpg
https://s3.amazonaws.com/assets.winspireme.com/LPP/Buy-it-Now-Logo.png
http://www.tastefultreats.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/g/o/gourmet-kosher-sweets-gift-basket.png
https://vignette.wikia.nocookie.net/piratesonline/images/b/b3/Chest.png/revision/latest?cb=20090707201032
http://pngimg.com/uploads/question_mark/question_mark_PNG126.png
image1.tiff
image2.png
image3.tiff
image4.tiff
image5.tiff
image6.tiff
image7.tiff
image8.tiff
image9.tiff
image10.tiff
image11.tiff
image12.png
image13.tiff
image14.tiff
image15.tiff
image16.png
image17.tiff
image18.tiff
image19.tiff
image20.tiff
image21.tiff
image22.png
image23.tiff
image24.tiff
image25.tiff
image26.tiff
image27.tiff
image28.tiff
image29.tiff
1. INTRODUCTION. Begin by stating what you will discuss and explain why is important.
2. CRITICAL SUMMARY. Summarize the relevant views and the arguments that you believe are important.
Usually in a critical discussion it is not sufficient to merely summarize the author’s view. Your attention should be
focused on the author's development of the view--that is, on his arguments, in the broadest sense of the word.
3. CARE IN CITATIONS. Make sure you accurately state the position of the author and always include page
references for each quotation or attribution to her/him if applicable.
4. CRITICAL EVALUATION FROM A CHRISTIAN PERSPECTIVE. At least half of your paper must be devoted
to a critical evaluation of the views of the author you are discussing from the perspective of the Christian thesis that
a Christian call in business may prop-up the role of the markets.
5. CONSIDER POSSIBLE RESPONSES TO YOUR OBJECTIONS. Whenever you offer an objection to an
author's position, explicitly consider whether the author has said anythin.
JP Morgan Chase The Balance Between Serving Customers and Maxim.docxpauline234567
JP Morgan Chase: The Balance Between Serving Customers and Maximizing Shareholder Wealth
Penelope Bender
William Woods University
BUS 585: Integrated Studies in Business Administration
Dr. Leathers
Abstract
This paper investigates why JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing wealth.
It is an exploratory study done through literature review.
Often financial institutions, like JP Morgan, put profits ahead of the interests of those they serve.
The paper contributes to better understanding of corporate culture.
This paper investigates why JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing shareholder wealth. This exploratory study is done through a literature review to answer why financial institutions, specifically JP Morgan, often put profits ahead of those they serve. The study will provide evidence of the complex nature of balancing client interests over maximizing shareholder and individual wealth and the need for tighter internal and external oversight. This paper contributes to a better understanding of why corporate culture encourages profit over stakeholders’ interests.
2
Research Question
Why does JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing shareholder wealth?
Employees of JP Morgan Chase and other large banks work in their best interests to increase wealth and succeed by meeting management goals. However, because of the complex nature of large banks, an individual(s), unethical behavior can go unchecked.
3
Problem Statement
JP Morgan Chase competes globally and faces competition from other large banks in the US and abroad.
JP Morgan Chase is part of a complex system of regulation, self-interests, and wealth creation.
The interests of shareholders and investors is sometimes overshadowed by agents working in their own best interests.
Financial markets are a complex web of interests, and because of opportunities for individual profits, regulating individual’s actions without stricter regulations and internal oversight is impossible.
The study is not meant to be a moral or ethical analysis but merely why the complex relationship exists and will continue to exist in capitalist society. This paper contributes to a better understanding of why capitalism or financialism’s (Clarke, 2014) fundamentals encourage wealth creation. Financial markets are a complex web of interests, and because of opportunities for individual profits, regulating individual’s actions without stricter regulations and internal oversight is impossible.
4
Literature Review
The literature review showed a connection between self-interests, regulators, competition, and risk, which all lead to a complex system of conflicting agendas.
5
How Self-Interests Influence Behavior
Ross (1973) explains that all employment relationships are agency relationships and moral hazards are generally .
Interpret a Current Policy of Three CountriesInstructionsAs .docxpauline234567
Interpret a Current Policy of Three Countries
Instructions
As a scholar in public administration, you are asked to present options based on three different countries' information for the next congressional meeting in your state. Be sure to include the following information:
• Perform a SWOT analysis of each immigration system presenting the strengths, weaknesses, opportunities, and threats of each system. You are required to evaluate the United States' system but may choose two other countries besides Costa Rica and Ghana as these were already covered in your weekly resources. Topics such as ethics, history, actors, budgeting can be incorporated into your SWOT analysis.
• Facilitate an immigration benefit analysis for each system to determine the best fit for your state (be sure to identify your state to provide context for your presentation).
• Prepare a plan for the implementation of your chosen immigration program.
Compare how the immigration system is treated in three countries (the U.S. and two other countries).
Length: 12 to 15 pages, not including title and reference pages
References: Include a minimum of seven scholarly references.
The completed assignment should address all the assignment requirements, exhibit evidence of concept knowledge, and demonstrate thoughtful consideration of the content presented in the course. The writing should integrate scholarly resources, reflect academic expectations, and current APA standards.
Respond to
two or more of your colleagues’ posts in one or more of the following ways:
(100 words each Colleague)
· Ask a question about or provide an additional suggestion for the risks that your colleague’s organization might face if it engaged in the capital investment project.
· Provide an additional perspective on the level of risk associated with the project your colleague identified for their selected organization or on how willing/capable the organization might be in taking on and managing the risks your colleague identified.
· Offer an insight you gained from your colleague’s summary of the trade-offs between risks and returns and/or their recommendation for their selected organization to move or not move forward with the project.
Return to this Discussion in a few days to read the responses to your initial posting. Note what you have learned or any insights you have gained as a result of the comments your colleagues made.
1st Colleague to respond to:
The risks associated with a capital investment project for medical equipment for healthcare organizations such as hospitals, as discussed in Week 7, are listed below.
· An inadequate system of budget management caused by unethical conduct.
· The lack of a clearly defined internal process management framework
· Insufficient communication channels within the organization.
The information provided by the managerial accountant assists in making crucial business decisions. Thus, if such information is fabricat.
INTRODUCTIONWhen you think of surveillance, you may picture tw.docxpauline234567
INTRODUCTION
When you think of surveillance, you may picture two police officers camped out in an unmarked car, watching the comings and goings at a suspect’s apartment building. Or you may imagine an investigator trailing a car on the highway or tapping a suspect’s phone to listen in on potentially incriminating conversations. Surveillance is all these activities, but in the 21st century, it is also much more.
Consider video surveillance of local businesses, streets, and highways; cell phone data; and the reams and reams of digital information gathered on everyday activities—from social media and computer use to credit card transactions.
This week, you analyze concerns related to this new era of surveillance, such as privacy and legal requirements.
LEARNING OBJECTIVES
Students will:
Analyze issues related to privacy and surveillance
Describe surveillance
Differentiate between legal and illegal surveillance
Analyze legal requirements for conducting surveillance
PRIVACY VERSUS PUBLIC SAFETY
The average citizen today may feel as though they are constantly being watched and their actions recorded. And perhaps rightly so. After all, social media sites market personalized products based on how you use the Internet, cell phones pinpoint your location, and fitness trackers transmit your health and fitness activities to the cloud. This sense of being “spied on,” however, does not negate the important use of surveillance techniques in solving and preventing crime.
For this Discussion, you analyze how to balance two sometimes opposing sides in surveillance work: the expectation of privacy and the goal of public safety.
RESOURCES
Be sure to review the Learning Resources before completing this activity.
YOU WILL FIND THE READING FOR THIS ASSIGNMENT IN THE ATTACHED READING MATERIALS PLEASE GO THERE AND READ BEFORE TRYING TO COMPLETE THIS ASSIGNMENT SO YOU WILL UNDERSTAND WHAT IS NEEDED TO COMPLETE THE WORK….
Post a response to the following:
When conducting surveillance, explain how to balance an expectation of citizen privacy with legitimate investigative procedure that has public safety as its goal.
Explain whether citizens should differentiate between government intrusion and private companies who use citizens’ online data to surveil their movements and activity.
.
Interviews and Eyewitness Identifications AP PhotoMat.docxpauline234567
Interviews and
Eyewitness
Identifications
AP Photo/Matthew Apgar
OBJECTIVES
After reading this chapter you will be able to:
• Identify the evidence collected
by investigators in the BP
gas station robbery and
discuss its role in the
identification and apprehension
of the perpetrator.
• Discuss the advantages and
disadvantages of using facial
identification software and
forensic sketches to create
composite pictures of
suspects.
• Identify and discuss the
rationale of the recommended
lineup procedures.
• Discuss the research that
has been conducted on the
accuracy of hypnotically elicited
testimony. • Identify the difference between
primary and secondary
witnesses and give an example
of each.
• Discuss the value of eyewitness
identifications in establishing
proof. • Compare and contrast the
cognitive interviewing approach
with standard police interviews.
• Identify and discuss the
methods of eyewitness
identifications.
• Identify the three phases of
human memory and discuss
how factors at each phase
may affect the retrieval of
information from witnesses.
• Discuss the contributions
of cognitive interviewing in
enhancing memory recall.
From the CASE FILE
BP Gas Station Robbery
The introduction to this chapter consists of a police
report (edited for length) of the investigation of an
armed robbery of a British Petroleum (BP) gas station
that occurred on August 22, 2011, in Germantown,
Wisconsin (a suburb of Milwaukee). The report serves
as an example of a criminal investigation case report
and also highlights issues discussed in this chapter,
such as the value of eyewitness identification. Issues
discussed in other chapters, including the important
role of patrol officers in investigations, crime scene
photographs, investigation of robbery and auto theft,
and the value of DNA, are also present in this report.
Incident Report Number: 11-014277,
Report of Officer Toni Olson
On Monday, August 22, 2011, I, Officer Olson, was
assigned to investigate and respond to a robbery, which
had just occurred at the County Line BP, located at 21962
County Line Road. Officers were advised that the c I erk at
the BP gas station had called the non-emergency number
reporting that a younger wh ite male came into the store and
hit him over the head with an unknown object before taking
money out of his cash drawer and leaving in a red SUV or
truck, northbound on Bell Road. A possible registration
of 583RIB was given out for the suspect vehicle. I, along
with Lt. Huesemann, Officer Brian Ball, and Officer Daniel
Moschea of the Germantown Police Department responded.
Upon arriving on scene, officers were advised that witnesses
reported the suspect veh icle leaving the scene of the
robbery northbound on Bell Road into a subdivision. The
witnesses also stated that they had not seen the suspect
vehicle leave the subdivision, which only has two ways to get
in and.
Interview Presentation: Questions
To prepare:
· Identify an interview subject with a different cultural background than you.
· Ask your interview subject the questions below. Be sure to record the interview and/or take good notes.
During the interview, ask the individual the following interview questions:
· Have you ever lived or visited outside of the United States? If so, where? Describe the experience.
· What do you identify as your culture?
· What are the most important values and beliefs of your family and community?
· What are the important events, traditions, celebrations, and practices in your family or community?
· How does your family or community define gender roles?
· How do you identify your:
· Race
· Ethnicity
· National origin
· Color
· Sex
· Sexual orientation
· Gender identity or expression
· Age
· Marital status
· Political belief
· Religion
· Immigration status
· Disability status
· How well do you fit within your family or community based on these other identities you hold?
· How do you think others outside your community view your culture?
· Have you experienced prejudice or discrimination? Please describe.
Social Media and Ethical Considerations
Walden’s MSW Social Media Policy
A student’s presence on and use of social media reflects on the MSW program and the social
work profession; therefore, behavior on social media will be held to the same professional
standards and student code of conduct expectations. Social Work professionals, including
students, are expected to adhere to the NASW Code of Ethics related to virtual communications.
Students should use social work values and principles, as well as specific agency policy, to guide
their social media interactions.
Students need to consider the ethical consequences of their own social media use, as well as use
of social media in practice. Be aware of and follow agency policies regarding the use of social
media. Before using social media communication tools on behalf of a field agency, students
must seek agency approval of any messages or posts.
Walden MSW students are expected to adhere to the ethical standards outlined in the NASW
Code of Ethics. Common ethical issues that social workers need to understand and manage when
utilizing social media include, but are not limited to, privacy and confidentiality (Section 1.07),
conflicts of interest and dual relationships (Section 1.06), and informed consent (Section 1.03).
There is significant risk of unintentionally sharing protected information when using social
media. Be cautious when posting information about an agency. Never post confidential or
private information about clients or colleagues, even using pseudonyms.
Students need to remain aware of professional boundaries even when participating in social
media in their personal time. Managing “friend” requests and maintaining privacy settings is
critical regardless of whether a student uses social me.
INT 220 Business Brief Template Course Project.docxpauline234567
INT 220 Business Brief Template
Course Project
Section One: Drivers for Global Entry
Going global would afford the company many benefits including increased sales and revenues. Japan is a developed market and thus the purchasing power of the consumers is high, which implies that many consumers will be able to purchase our products. Expanding to Japan will enable increased profits that can be reinvested in research and development of new technology and innovation that will create a competitive advantage for both domestic and international market. In addition, entering the foreign market will help the business to tap into new market segment. According to International Data Corporation (IDC), Apple was the largest smartphone brand in 2020 in Japan with a 47.3 percent market share (Sudarshan, 2021). The data shows that Japan would be an ideal market for quality phone cell cases due to high purchase of smartphones. Therefore, the company will benefit from increased sales and profits.
Section Two: Market Profile
Cultural Profile
CategoryUnited StatesJapan
Commonly Spoken Languages
English
Japanese
Commonly Practiced Religions
Christianity
Shinto
Power Distance Index (PDI)
40
54
Individualism Versus Collectivism (IDV)
91
46
Masculinity Versus Femininity (MAS)
95
62
Uncertainty Avoidance Index (UAI)
92
46
Long-Term Orientation Versus Short-Term Normative Orientation (LTO)
88
26
Indulgence Versus Restraint (IVR)
42
68
Political and Economic Profile
CategoryUnited StatesJapan
Political System
Representative democracy
Constitutional monarchy
Current Leaders
Joseph Biden president
Fumio Kishida prime minister
Economic Classification
Developed
Developed
Economic Blocs Impacting Trade
World trade organization
World trade organization
Gross Domestic Product
23 trillion USD
4.9 trillion USD
Purchasing Power Parity
22,996.08
100.412
Gross Domestic Product Per Capita
69,287.54 USD
39,285.16 USD
Human Development Index
Very high 0.921
0.919
Human Poverty Index
$26,246 for a family of four
Poverty headcount ratio at $5.50 a day
In terms of economic development, both countries have developed economy, thus making them ideal for business. Consumers have high purchasing power which means that they are able to purchase new products. US has a higher GPD compared to Japan, however, this can be attributed to the size and population of U.S. compared to that of Japan. Furthermore, both countries are members of World Trade Organization, which means that their trade operations with other nations are regulated and subject to WTO regulations. The culture in Japan is hugely different then the culture in America. Americans are self-motivated while the Japanese culture embraces more of a group mentality and looks for approval from their superiors before making big decision. Both cultures work long hours and take very little breaks. For the most part Japanese culture is more formal in the work place then in the U.S.
Section Three: Market Consideratio.
More Related Content
Similar to InstructionsA SWOT analysis is used as a strategic planning tech.docx
Creating a Use Case
Jennifer LeClair
CIS 510
Instructor Name: Dr. Austin Umezurike
10/27/2016
Assignment 2:
Creating a Use Case
Introduction
With this paper I will show how a use case diagram should be used. I base this paper from fig. 3
– 11 pages 78 – 80 in our textbook titled: System Analysis and Design in a Changing World, 6th
edition, by Satzinger, Jackson, and Burd. In the Use Case Diagram that I make, I will depict a
use case for a RMO CSMS subsystem. I will also be describing the overview of the diagram. I
will also provide an analysis of the characters.
Use Case Introduction
An activity that a system performs is known as a use case. It is mostly in response to the
user. Use case analysis is a technique that is used for identifying the functional requirements of
the software system. A use case is to designate the point of view from a client and customer, this
is a use cases main purpose. An analytical role in the development process is done by the
developer. The other definition of a use case is as an objective or as an actor. Actors are with a
particular system and they want to achieve. In the use case diagram that I create, I will show the
actors and use cases for the RMO CSMS subsystem for marketing.
Marketing Subsystem
RMO CSMS
Marketing Merchandising
Overview
The overview of this use case diagram has the following: It shows the system boundary,
the association and the actors. The one that does the interaction with the system by entering or
receiving data is called a group, actor, external agent or person. Another part of the whole system
are the system boundaries. System boundaries are the computerized part of the application along
with the users who operate it. When a customer places a relationship between certain things such
as a certain employee in a department and an order, this would be a logical association. In my
diagram I have included two actors, one is representing marketing and the other represents
merchandising.
Analysis
The events and actions that define the interactions with a system and the role in order to
be able to discover a goal is a list of actions or steps in an event in a use case. The elements that
make up a use case diagram and the connections that are between a use case and the actors is an
association. This lets us know that there is communication between the actors and the use case.
On the marketing side they need to be able to update / add promotions, production and business
partners. On the merchandising side they need to be able to update / add production information
and accessory packages.
Summary
The important part of a use case diagram is that you can identi ...
c PJM6610 Foundations of Project Business Analysis.docxbartholomeocoombs
c
PJM6610 Foundations of Project Business Analysis
Prof. Johan Roos
Signature Assignment 1
Planning for Elicitation Assignment
Signature Assignment: Planning for Elicitation
By Group:
Mustafa Uzun, Shraddha Sherekar, Vikitha Veera
Content
1. An overview ……..………………….……………………………………………………………32. Elicitation plan ………………………………………..…………………………………………43. Project plan ……………………...…………….…………………………………………………54. References….…………………………………………………………………………………..…6
1. An Overview
Skype. It has a substantial market share (and mindshare), many people use it daily, yet nearly every core component of the program is seen as being out of date. The Skype corporation has been operating online for more than 20 years, and by spreading the word about its ability to make audio and video conversations via the internet instead of over the phone, it has grown its subscriber base.
Surveys, focus groups with observation, and floating questionnaires to clients who have used this product at least once are the finest ways to learn about the present status of the business and, consequently, the main product offering. It can be very helpful to identify the target audience and to provide useful inputs that could help define a future state for the product. Data obtained from online surveys through various e-commerce platforms with which the company has partnerships, data obtained from social media channels, and data from websites. Locals can provide insightful information that will serve as clear prompts for the company's R&D team as they plot the course for upcoming innovations or enhancements to current products.
Customer and influencer marketing-provided product evaluations are another crucial metric that may assist a business discover what consumers like and dislike about a product, as well as how they perceive its value, quality, and ability to effectively clean their teeth, among other things. The basic problem that the Skype team must overcome may be understood through root cause and opportunity analysis. Understanding the present situation of the product and the business may be accomplished with the use of this knowledge together with data from real surveys and website visitors.
2. Elicitation Plan
Elicitation Techniques:
1. Survey/Questionnaire
Stakeholders including end-users are presented with a series of questions over a survey or a questionnaire to help quantify their opinions. Following the gathering of the responses here, data is evaluated to determine the stakeholders' areas of focus that need improvement. High priority risks should be the basis for questions. Direct and clear questions are best. Closed-ended questions will help us focus on areas that we know need improvement while open-ended ones will help us comprehend what we may have overlooked.
Advantage:
The benefit of following this process is that data from a broad audience is simple to obtain and time taken to receive participants' re.
Profile Analysis of Users in Data Analytics DomainDrjabez
Data Analytics and Data Science is in the fast forward
mode recently. We see a lot of companies hiring people for data
analysis and data science, especially in India. Also, many
recruiting firms use stackoverflow to fish their potential
candidates. The industry has also started to recruit people based
on the shapes of expertise. Expertise of a personal is
metaphorically outlined by shapes of letters like I, T, M and
hyphen betting on her experiencein a section (depth) and
therefore the variety of areas of interest (width).This proposal
builds upon the work of mining shapes of user expertise in a
typical online social Question and Answer (Q&A) community
where expert users often answer questions posed by other
users.We have dealt with the temporal analysis of the expertise
among the Q&A community users in terms how the user/ expert
have evolved over time.
Keywords— Shapes of expertise, Graph communities, Expertise
evolution, Q&A community
DAT 520 Final Project Guidelines and Rubric Overview .docxsimonithomas47935
DAT 520 Final Project Guidelines and Rubric
Overview
You must complete a decision analysis research project as your final project for this course. Your research project will focus on a real-world topic of your choice,
as approved by your instructor. You will pick a topic from the list provided or with approval from your instructor, and create a data analysis plan and decision
tree model based on a real-world scenario. This assessment will provide you with the opportunity to employ highly valued decision support skills and concepts
for data within a real-world context. You can use the Final Project Notes document, found in the Assignment Guidelines and Rubrics section of the course.
The project is divided into three milestones, which will be submitted at various points throughout the course to scaffold learning and ensure quality final
submissions. These milestones will be submitted in Modules Two, Five, and Seven. The final submission will occur in Module Nine.
This project will address the following course outcomes:
Appraise data in context according to industry-standard methods and techniques for its utility in supporting decision making
Determine suitable data manipulation and modeling methods for decision support
Articulate data frameworks for organizational decision support by applying data manipulation, modeling, and management concepts
Evaluate the ethical issues surrounding organizational use of decision-oriented data based on industry standards and one’s personal ethical criteria
Create and assess the agility of solutions through application of data-mining procedures for decision support in various industries
Prompt
Your decision analysis model and report should answer the following prompt: How does your model and evaluation resolve uncertainty in making a decision? In
order to produce your analytic report, you will need to choose and investigate a data set using the decision analysis techniques you learned in class. Then you
will formulate a research question, write an analytic plan, and implement it. Your report should not solely consist of descriptions of what you did. It should also
contain detailed explorations into the meaning behind your model and the implications of its results. You will also be testing your model’s fitness and evaluating
its strengths and weaknesses.
The project in a nutshell:
1. Choose a data set (get ideas from the source list in the spreadsheet Final Project Topics and Sources.xls)
2. Formulate your decision analysis research question
3. Write an analytic plan
4. Perform the top-down or bottom-up modeling
5. Perform model diagnostics
6. Evaluate
These activities are broken up into milestones so that the work is spread throughout the term and you can get early assistance with any obstacles.
A decision analysis report is similar to any other analytic report. These reports introduce a problem, state a line of inquiry, explain a model th.
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...Sahilakhurana
Banking and securities
Challenges
Early warning for securities fraud and trade visibilities
Card fraud detection and audit trails
Enterprise credit risk reporting
Customer data transformation and analytics.
The Security Exchange commission (SEC) is using big data to monitor financial market activity by using network analytics and natural language processing. This helps to catch illegal trading activity in the financial markets.
The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur in several phases at once. For most phases in the lifecycle, the movement can be either forward or backward. This iterative depiction of the lifecycle is intended to more closely portray a real project, in which aspects of the project move forward and may return to earlier stages as new information is uncovered and team members learn more about various stages of the project. This enables participants to move iteratively through the process and drive toward operationalizing the project work.
Phase 1—Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 2—Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
Toward a System Building Agenda for Data Integration
(and Data Science)
AnHai Doan, Pradap Konda, Paul Suganthan G.C., Adel Ardalan, Jeffrey R. Ballard, Sanjib Das,
Yash Govind, Han Li, Philip Martinkus, Sidharth Mudgal, Erik Paulson, Haojun Zhang
University of Wisconsin-Madison
Abstract
We argue that the data integration (DI) community should devote far more effort to building systems,
in order to truly advance the field. We discuss the limitations of current DI systems, and point out that
there is already an existing popular DI “system” out there, which is PyData, the open-source ecosystem
of 138,000+ interoperable Python packages. We argue that rather than building isolated monolithic DI
systems, we should consider extending this PyData “system”, by developing more Python packages that
solve DI problems for the users of PyData. We discuss how extending PyData enables us to pursue an
integrated agenda of research, system development, education, and outreach in DI, which in turn can
position our community to become a key player in data science. Finally, we discuss ongoing work at
Wisconsin, which suggests that this agenda is highly promising and raises many interesting challenges.
1 Introduction
In this paper we focus on data integration (DI), broadly interpreted as covering all major data preparation steps
such as data extraction, exploration, profiling, cleaning, matching, and merging [10]. This topic is also known
as data wrangling, munging, curation, unification, fusion, preparation, and more. Over the past few decades, DI
has received much attention (e.g., [37, 29, 31, 20, 34, 33, 6, 17, 39, 22, 23, 5, 8, 36, 15, 35, 4, 25, 38, 26, 32, 19,
2, 12, 11, 16, 2, 3]). Today, as data science grows, DI is receiving even more attention. This is because many
data science applications must first perform DI to combine the raw data from multiple sources, before analysis
can be carried out to extract insights.
Yet despite all this attention, today we do not really know whether the field is making good progress. The
vast majority of DI works (with the exception of efforts such as Tamr and Trifacta [36, 15]) have focused on
developing algorithmic solutions. But we know very little about whether these (ever-more-complex) algorithms
are indeed useful in practice. The field has also built mostly isolated system prototypes, which are hard to use and
combine, and are often not powerful enough for real-world applications. This makes it difficult to decide what
to teach in DI classes. Teaching complex DI algorithms and asking students to do projects using our prototype
systems can train them well for doing DI research, but are not likely to train them well for solving real-world DI
problems in later jobs. Similarly, outreach to real users (e.g., domain scientists) is difficult. Given that we have
Copyright 0000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purpose ...
Running head: SAMPLE CASE STUDY 1
Ima Student
MGT 450
Sample Case Study: Siebel Systems
Professor Amazing
December 15, 2008
SAMPLE CASE STUDY 2
Sample Case Study: Siebel Systems
Siebel Systems faced several problems at the time of this article.
Primarily, corporate software customers are looking for integrated “suites” of
software applications while Siebel offers only one application—customer
relations management (CRM) software (“Siebel”). To solve this problem and
to regain a corner of the corporate software market, Siebel Systems and its
CEO and owner, Tom Siebel, will have to relinquish the idea of “doing one
thing really well” (Kerstetter, 2003, p. 2). In order to grow and expand, Siebel
Systems needs to diversify software applications and integrate the applications
that corporations seek into one system.
The introduction
identifies the central
problem.
Thesis statement is
located at the end of
the introduction.
Indeed, corporate software customers want integrated, user-friendly,
and cost-effective software systems. Applications for financial data, corporate
planning, and human resources (Kerstetter, 2003), as well as what Siebel
currently offers, CRM, are in demand. While Siebel should consider
modifying its software for manageability or even integrating with rival
programs, this is not a long-term solution for the company. Nevertheless,
Siebel Systems will continue to shrink and elicit poor customer satisfaction if
it cannot create, buy, or partner with other software applications. Siebel
should develop a strong suite of software applications quickly before it
exhausts revenues and loses its current clientele.
The background
section is made up
of important general
facts.
As a short-term solution, Siebel Systems should work with IBM and
Microsoft on creating one version of Siebel’s new product line that is
compatible with both platforms, saving the companies the $550 million for
Here, the writer
begins to develop
the proposed
solution.
SAMPLE CASE STUDY 3
two versions that are in the works (Kerstetter, 2003). However, a long-term
solution involves Siebel’s creating, buying, or merging with other software
companies until an integrated, user-friendly suite of applications has been
developed. Once this has been achieved, Siebel Systems must provide
customer service that is geared toward problem avoidance rather than problem
patches and offer upgrade packages that are cost effective, relevant, and easily
implemented.
These modifications should yield a high return on Siebel’s investment.
Although the cost of acquiring additional applications is potentially greater
than the co.
JW House FundraiserJourney Through the Enchanted Forest Ga.docxpauline234567
JW House Fundraiser
Journey Through the Enchanted Forest Gala
Silent Auction
Table Decor
Specialized cocktails for Event
Three Screens will be Placed for Optimum Viewing by all Attendees
New House Announcement
Happy 30th Birthday, JW!
Auction
Isle down Center Allows Fundraising Auctioneer to Engage Audience
Balloon
Drop
S’mores Sponsored by Largest Corporate Donor
Finish the Evening with Dancing & Beverages
Image Sources
http://springfields.net.au/media/catalog/category/_2_43.png
https://s-media-cache-ak0.pinimg.com/originals/36/fa/fe/36fafee1408521530bfa23368e604d55.jpg
https://www.thegirlcreative.com
http://ballooncity.com/wp-content/uploads/2013/09/danceFloorFlipPNG.png
https://t3.rbxcdn.com/ea203ae8bb1787569f5e375cde0a93b2
http://jwhouse.org/wp-content/uploads/2017/07/jwPortraitStory.jpg
http://royalcandycompany.com/wp/wp-content/uploads/2016/04/Smores-Buffet.jpg
https://lhueagleeye.files.wordpress.com/2015/11/crowd_20080505124150.jpg
www.socialtables.com
https://media-cdn.tripadvisor.com/media/photo-s/03/0d/c8/a7/santa-clara-convention.jpg
https://s3.amazonaws.com/assets.winspireme.com/LPP/Buy-it-Now-Logo.png
http://www.tastefultreats.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/g/o/gourmet-kosher-sweets-gift-basket.png
https://vignette.wikia.nocookie.net/piratesonline/images/b/b3/Chest.png/revision/latest?cb=20090707201032
http://pngimg.com/uploads/question_mark/question_mark_PNG126.png
image1.tiff
image2.png
image3.tiff
image4.tiff
image5.tiff
image6.tiff
image7.tiff
image8.tiff
image9.tiff
image10.tiff
image11.tiff
image12.png
image13.tiff
image14.tiff
image15.tiff
image16.png
image17.tiff
image18.tiff
image19.tiff
image20.tiff
image21.tiff
image22.png
image23.tiff
image24.tiff
image25.tiff
image26.tiff
image27.tiff
image28.tiff
image29.tiff
1. INTRODUCTION. Begin by stating what you will discuss and explain why is important.
2. CRITICAL SUMMARY. Summarize the relevant views and the arguments that you believe are important.
Usually in a critical discussion it is not sufficient to merely summarize the author’s view. Your attention should be
focused on the author's development of the view--that is, on his arguments, in the broadest sense of the word.
3. CARE IN CITATIONS. Make sure you accurately state the position of the author and always include page
references for each quotation or attribution to her/him if applicable.
4. CRITICAL EVALUATION FROM A CHRISTIAN PERSPECTIVE. At least half of your paper must be devoted
to a critical evaluation of the views of the author you are discussing from the perspective of the Christian thesis that
a Christian call in business may prop-up the role of the markets.
5. CONSIDER POSSIBLE RESPONSES TO YOUR OBJECTIONS. Whenever you offer an objection to an
author's position, explicitly consider whether the author has said anythin.
JP Morgan Chase The Balance Between Serving Customers and Maxim.docxpauline234567
JP Morgan Chase: The Balance Between Serving Customers and Maximizing Shareholder Wealth
Penelope Bender
William Woods University
BUS 585: Integrated Studies in Business Administration
Dr. Leathers
Abstract
This paper investigates why JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing wealth.
It is an exploratory study done through literature review.
Often financial institutions, like JP Morgan, put profits ahead of the interests of those they serve.
The paper contributes to better understanding of corporate culture.
This paper investigates why JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing shareholder wealth. This exploratory study is done through a literature review to answer why financial institutions, specifically JP Morgan, often put profits ahead of those they serve. The study will provide evidence of the complex nature of balancing client interests over maximizing shareholder and individual wealth and the need for tighter internal and external oversight. This paper contributes to a better understanding of why corporate culture encourages profit over stakeholders’ interests.
2
Research Question
Why does JP Morgan Chase and other financial institutions struggle to balance client interests over maximizing shareholder wealth?
Employees of JP Morgan Chase and other large banks work in their best interests to increase wealth and succeed by meeting management goals. However, because of the complex nature of large banks, an individual(s), unethical behavior can go unchecked.
3
Problem Statement
JP Morgan Chase competes globally and faces competition from other large banks in the US and abroad.
JP Morgan Chase is part of a complex system of regulation, self-interests, and wealth creation.
The interests of shareholders and investors is sometimes overshadowed by agents working in their own best interests.
Financial markets are a complex web of interests, and because of opportunities for individual profits, regulating individual’s actions without stricter regulations and internal oversight is impossible.
The study is not meant to be a moral or ethical analysis but merely why the complex relationship exists and will continue to exist in capitalist society. This paper contributes to a better understanding of why capitalism or financialism’s (Clarke, 2014) fundamentals encourage wealth creation. Financial markets are a complex web of interests, and because of opportunities for individual profits, regulating individual’s actions without stricter regulations and internal oversight is impossible.
4
Literature Review
The literature review showed a connection between self-interests, regulators, competition, and risk, which all lead to a complex system of conflicting agendas.
5
How Self-Interests Influence Behavior
Ross (1973) explains that all employment relationships are agency relationships and moral hazards are generally .
Interpret a Current Policy of Three CountriesInstructionsAs .docxpauline234567
Interpret a Current Policy of Three Countries
Instructions
As a scholar in public administration, you are asked to present options based on three different countries' information for the next congressional meeting in your state. Be sure to include the following information:
• Perform a SWOT analysis of each immigration system presenting the strengths, weaknesses, opportunities, and threats of each system. You are required to evaluate the United States' system but may choose two other countries besides Costa Rica and Ghana as these were already covered in your weekly resources. Topics such as ethics, history, actors, budgeting can be incorporated into your SWOT analysis.
• Facilitate an immigration benefit analysis for each system to determine the best fit for your state (be sure to identify your state to provide context for your presentation).
• Prepare a plan for the implementation of your chosen immigration program.
Compare how the immigration system is treated in three countries (the U.S. and two other countries).
Length: 12 to 15 pages, not including title and reference pages
References: Include a minimum of seven scholarly references.
The completed assignment should address all the assignment requirements, exhibit evidence of concept knowledge, and demonstrate thoughtful consideration of the content presented in the course. The writing should integrate scholarly resources, reflect academic expectations, and current APA standards.
Respond to
two or more of your colleagues’ posts in one or more of the following ways:
(100 words each Colleague)
· Ask a question about or provide an additional suggestion for the risks that your colleague’s organization might face if it engaged in the capital investment project.
· Provide an additional perspective on the level of risk associated with the project your colleague identified for their selected organization or on how willing/capable the organization might be in taking on and managing the risks your colleague identified.
· Offer an insight you gained from your colleague’s summary of the trade-offs between risks and returns and/or their recommendation for their selected organization to move or not move forward with the project.
Return to this Discussion in a few days to read the responses to your initial posting. Note what you have learned or any insights you have gained as a result of the comments your colleagues made.
1st Colleague to respond to:
The risks associated with a capital investment project for medical equipment for healthcare organizations such as hospitals, as discussed in Week 7, are listed below.
· An inadequate system of budget management caused by unethical conduct.
· The lack of a clearly defined internal process management framework
· Insufficient communication channels within the organization.
The information provided by the managerial accountant assists in making crucial business decisions. Thus, if such information is fabricat.
INTRODUCTIONWhen you think of surveillance, you may picture tw.docxpauline234567
INTRODUCTION
When you think of surveillance, you may picture two police officers camped out in an unmarked car, watching the comings and goings at a suspect’s apartment building. Or you may imagine an investigator trailing a car on the highway or tapping a suspect’s phone to listen in on potentially incriminating conversations. Surveillance is all these activities, but in the 21st century, it is also much more.
Consider video surveillance of local businesses, streets, and highways; cell phone data; and the reams and reams of digital information gathered on everyday activities—from social media and computer use to credit card transactions.
This week, you analyze concerns related to this new era of surveillance, such as privacy and legal requirements.
LEARNING OBJECTIVES
Students will:
Analyze issues related to privacy and surveillance
Describe surveillance
Differentiate between legal and illegal surveillance
Analyze legal requirements for conducting surveillance
PRIVACY VERSUS PUBLIC SAFETY
The average citizen today may feel as though they are constantly being watched and their actions recorded. And perhaps rightly so. After all, social media sites market personalized products based on how you use the Internet, cell phones pinpoint your location, and fitness trackers transmit your health and fitness activities to the cloud. This sense of being “spied on,” however, does not negate the important use of surveillance techniques in solving and preventing crime.
For this Discussion, you analyze how to balance two sometimes opposing sides in surveillance work: the expectation of privacy and the goal of public safety.
RESOURCES
Be sure to review the Learning Resources before completing this activity.
YOU WILL FIND THE READING FOR THIS ASSIGNMENT IN THE ATTACHED READING MATERIALS PLEASE GO THERE AND READ BEFORE TRYING TO COMPLETE THIS ASSIGNMENT SO YOU WILL UNDERSTAND WHAT IS NEEDED TO COMPLETE THE WORK….
Post a response to the following:
When conducting surveillance, explain how to balance an expectation of citizen privacy with legitimate investigative procedure that has public safety as its goal.
Explain whether citizens should differentiate between government intrusion and private companies who use citizens’ online data to surveil their movements and activity.
.
Interviews and Eyewitness Identifications AP PhotoMat.docxpauline234567
Interviews and
Eyewitness
Identifications
AP Photo/Matthew Apgar
OBJECTIVES
After reading this chapter you will be able to:
• Identify the evidence collected
by investigators in the BP
gas station robbery and
discuss its role in the
identification and apprehension
of the perpetrator.
• Discuss the advantages and
disadvantages of using facial
identification software and
forensic sketches to create
composite pictures of
suspects.
• Identify and discuss the
rationale of the recommended
lineup procedures.
• Discuss the research that
has been conducted on the
accuracy of hypnotically elicited
testimony. • Identify the difference between
primary and secondary
witnesses and give an example
of each.
• Discuss the value of eyewitness
identifications in establishing
proof. • Compare and contrast the
cognitive interviewing approach
with standard police interviews.
• Identify and discuss the
methods of eyewitness
identifications.
• Identify the three phases of
human memory and discuss
how factors at each phase
may affect the retrieval of
information from witnesses.
• Discuss the contributions
of cognitive interviewing in
enhancing memory recall.
From the CASE FILE
BP Gas Station Robbery
The introduction to this chapter consists of a police
report (edited for length) of the investigation of an
armed robbery of a British Petroleum (BP) gas station
that occurred on August 22, 2011, in Germantown,
Wisconsin (a suburb of Milwaukee). The report serves
as an example of a criminal investigation case report
and also highlights issues discussed in this chapter,
such as the value of eyewitness identification. Issues
discussed in other chapters, including the important
role of patrol officers in investigations, crime scene
photographs, investigation of robbery and auto theft,
and the value of DNA, are also present in this report.
Incident Report Number: 11-014277,
Report of Officer Toni Olson
On Monday, August 22, 2011, I, Officer Olson, was
assigned to investigate and respond to a robbery, which
had just occurred at the County Line BP, located at 21962
County Line Road. Officers were advised that the c I erk at
the BP gas station had called the non-emergency number
reporting that a younger wh ite male came into the store and
hit him over the head with an unknown object before taking
money out of his cash drawer and leaving in a red SUV or
truck, northbound on Bell Road. A possible registration
of 583RIB was given out for the suspect vehicle. I, along
with Lt. Huesemann, Officer Brian Ball, and Officer Daniel
Moschea of the Germantown Police Department responded.
Upon arriving on scene, officers were advised that witnesses
reported the suspect veh icle leaving the scene of the
robbery northbound on Bell Road into a subdivision. The
witnesses also stated that they had not seen the suspect
vehicle leave the subdivision, which only has two ways to get
in and.
Interview Presentation: Questions
To prepare:
· Identify an interview subject with a different cultural background than you.
· Ask your interview subject the questions below. Be sure to record the interview and/or take good notes.
During the interview, ask the individual the following interview questions:
· Have you ever lived or visited outside of the United States? If so, where? Describe the experience.
· What do you identify as your culture?
· What are the most important values and beliefs of your family and community?
· What are the important events, traditions, celebrations, and practices in your family or community?
· How does your family or community define gender roles?
· How do you identify your:
· Race
· Ethnicity
· National origin
· Color
· Sex
· Sexual orientation
· Gender identity or expression
· Age
· Marital status
· Political belief
· Religion
· Immigration status
· Disability status
· How well do you fit within your family or community based on these other identities you hold?
· How do you think others outside your community view your culture?
· Have you experienced prejudice or discrimination? Please describe.
Social Media and Ethical Considerations
Walden’s MSW Social Media Policy
A student’s presence on and use of social media reflects on the MSW program and the social
work profession; therefore, behavior on social media will be held to the same professional
standards and student code of conduct expectations. Social Work professionals, including
students, are expected to adhere to the NASW Code of Ethics related to virtual communications.
Students should use social work values and principles, as well as specific agency policy, to guide
their social media interactions.
Students need to consider the ethical consequences of their own social media use, as well as use
of social media in practice. Be aware of and follow agency policies regarding the use of social
media. Before using social media communication tools on behalf of a field agency, students
must seek agency approval of any messages or posts.
Walden MSW students are expected to adhere to the ethical standards outlined in the NASW
Code of Ethics. Common ethical issues that social workers need to understand and manage when
utilizing social media include, but are not limited to, privacy and confidentiality (Section 1.07),
conflicts of interest and dual relationships (Section 1.06), and informed consent (Section 1.03).
There is significant risk of unintentionally sharing protected information when using social
media. Be cautious when posting information about an agency. Never post confidential or
private information about clients or colleagues, even using pseudonyms.
Students need to remain aware of professional boundaries even when participating in social
media in their personal time. Managing “friend” requests and maintaining privacy settings is
critical regardless of whether a student uses social me.
INT 220 Business Brief Template Course Project.docxpauline234567
INT 220 Business Brief Template
Course Project
Section One: Drivers for Global Entry
Going global would afford the company many benefits including increased sales and revenues. Japan is a developed market and thus the purchasing power of the consumers is high, which implies that many consumers will be able to purchase our products. Expanding to Japan will enable increased profits that can be reinvested in research and development of new technology and innovation that will create a competitive advantage for both domestic and international market. In addition, entering the foreign market will help the business to tap into new market segment. According to International Data Corporation (IDC), Apple was the largest smartphone brand in 2020 in Japan with a 47.3 percent market share (Sudarshan, 2021). The data shows that Japan would be an ideal market for quality phone cell cases due to high purchase of smartphones. Therefore, the company will benefit from increased sales and profits.
Section Two: Market Profile
Cultural Profile
CategoryUnited StatesJapan
Commonly Spoken Languages
English
Japanese
Commonly Practiced Religions
Christianity
Shinto
Power Distance Index (PDI)
40
54
Individualism Versus Collectivism (IDV)
91
46
Masculinity Versus Femininity (MAS)
95
62
Uncertainty Avoidance Index (UAI)
92
46
Long-Term Orientation Versus Short-Term Normative Orientation (LTO)
88
26
Indulgence Versus Restraint (IVR)
42
68
Political and Economic Profile
CategoryUnited StatesJapan
Political System
Representative democracy
Constitutional monarchy
Current Leaders
Joseph Biden president
Fumio Kishida prime minister
Economic Classification
Developed
Developed
Economic Blocs Impacting Trade
World trade organization
World trade organization
Gross Domestic Product
23 trillion USD
4.9 trillion USD
Purchasing Power Parity
22,996.08
100.412
Gross Domestic Product Per Capita
69,287.54 USD
39,285.16 USD
Human Development Index
Very high 0.921
0.919
Human Poverty Index
$26,246 for a family of four
Poverty headcount ratio at $5.50 a day
In terms of economic development, both countries have developed economy, thus making them ideal for business. Consumers have high purchasing power which means that they are able to purchase new products. US has a higher GPD compared to Japan, however, this can be attributed to the size and population of U.S. compared to that of Japan. Furthermore, both countries are members of World Trade Organization, which means that their trade operations with other nations are regulated and subject to WTO regulations. The culture in Japan is hugely different then the culture in America. Americans are self-motivated while the Japanese culture embraces more of a group mentality and looks for approval from their superiors before making big decision. Both cultures work long hours and take very little breaks. For the most part Japanese culture is more formal in the work place then in the U.S.
Section Three: Market Consideratio.
Instructor Name Point Value 30Student NameCATEGORY .docxpauline234567
Instructor Name: Point Value: 30
Student Name:
CATEGORY Excellent (12–11 points) Good (10–9 points) Fair (8–7 points) Poor (6–1 points) Did Not Complete (0 points) # of points
Content Quality
40% of total Discussion
grade
Student participated in the
Discussion about the presented
topic with detailed, relevant,
supported initial posts and
responses. Student enhanced
points with examples and
questions that helped further
discussion. Discussion is well
organized, uses scholarly tone,
follows APA style, uses original
writing and proper paraphrasing,
contains very few or no writing
and/or spelling errors, and is fully
consistent with graduate-level
writing style. Discussion contains
multiple, appropriate and
exemplary sources
expected/required for the
assignment.
Student participated in the
Discussion about the presented
topic with detailed, relevant,
supported initial posts and
responses. Discussion is mostly
consistent with graduate level
writing style. Discussion may have
some small or infrequent
organization, scholarly tone, or
APA style issues, and/or may
contain a few writing and spelling
errors, and/or somewhat less than
the expected number of or type of
sources.
Student participated in the
Discussion about the presented
topic with adequate content but
the content lacked either detail,
relevancy, or support. Discussion
is somewhat below graduate level
writing style, with multiple smaller
or a few major problems.
Discussion may be lacking in
organization, scholarly tone, APA
style, and/or contain many writing
and/or spelling errors, or shows
moderate reliance on quoting vs.
original writing and paraphrasing.
Discussion may contain inferior
resources (number or quality).
Content of student's post and
responses was not clear, relevant,
or supported. Discussion is well
below graduate level writing style
expectations for organization,
scholarly tone, APA style, and
writing, or relies excessively on
quoting. Discussion may contain
few or no quality resources.
Student did not submit a post or
response.
CATEGORY Excellent (12–11 points) Good (10–9 points) Fair (8–7 points) Poor (6–1 points) Did Not Complete (0 points) # of points
Engagement
40% of total Discussion
grade
Student participated actively as
evidenced by strong reflective
thought in both the initial post and
in responses to classmates' posts.
Student response participation
exceeded the stated minimum
requirements.
Student participated actively as
evidenced by strong reflective
thought in both the initial post and
in responses to classmates'
posts.Student responses
contributed to classmates'
experience.
Student participated somewhat
actively as evidenced by posts
and responses that were adequate
but lacking strong reflective
thought.
Student did not participate actively
as evidenced by little reflective
thought in i.
InstructionsThere are two high-level types of distribution cha.docxpauline234567
Instructions
There are two high-level types of distribution channels, direct and indirect. In the direct distribution channel, goods are moved directly from the Producer to the Consumer. In the indirect distribution channel, the producer will meet consumer demand through third -party wholesalers and/or retailers. Direct channels produce short supply chains, indirect channels produce long chains.
Research and report on two large producers, Costco and Apple, and describe in detail which distribution approach each company uses -- direct, indirect, or mixed – for at least two products in each company.
Your APA paper should be at least 1,000 words in length.
.
InstructionsNOTE If you have already reviewed this presentation.docxpauline234567
Instructions
NOTE: If you have already reviewed this presentation in a different class please enter class number and instructor’s name in the submission text box below.
____________________________________________________________________
If you have not reviewed this presentation in a previous class, please proceed.
Please review the curated presentations below. These presentations will prepare you for writing deliverables that meet the expectations of this course. We want you to be successful in all your courses so please refer back to this tool often. This presentation is located in the library and the Student Center. To view an presentation, please click on the button below. Be sure to review all five presentations for this week!
Presentation Four: The Research Process & Choosing a Topic
Presentation Five: Types of Sources
Presentation Six: Search Strategies & Techniques
Presentation Seven: Evaluating Information
Presentation Eight: Ready to Shine!
When you have finished reviewing all five presentations, please copy and paste the following statement into the submission box below:
STATEMENT: I HAVE REVIEWED WEEK TWO INFORMATIONAL PRESENTATION. I UNDERSTAND THIS PRESENTATION IS ALSO LOCATED IN THE LIBRARY AND STUDENT CENTER FOR FUTURE REFERENCE.
.
InstructionsRead two of your colleagues’ postings from the Di.docxpauline234567
Instructions:
Read two of your colleagues’ postings from the Discussion question.
Respond with a comment that asks for clarification, provides support for, or contributes additional information to two of your colleagues.
Timia Brown (
She/Her)
In healthcare, whether long-term or acute care, interdisciplinary communication is necessary to provide patient-centered care. The two scenarios provided both effective and ineffective communication.
Scenario 1
Assuming the leader for the interdisciplinary rounds was the case manager, she introduced the nursing student, who was not paying attention. The case manager did not present other team members, so the student was left guessing. The pharmacist and the physical therapist were laughing and talking during the discussion. There was no engagement; the MD was on her phone, and everyone was preoccupied. Each team member individually knew the patient and his shortcomings, yet there was no preparation for the actual engagement with each other. Each team member projected issues onto the next member, using terms such as "somebody" or "someone" needed to do this. There was no responsibility for care. The team spoke unprofessionally to each other, using words like "yep" and "umm." In the end, the case manager assigned responsibility; however, the disciplines accepted the responsibility grudgingly. The team's disrespect for each other was portrayed to the student, who was disengaged throughout the meeting. The patient was not ready to be discharged from the sound of this scenario. The patient's pain was not controlled, nor was his anxiety; no equipment had been ordered for discharge. The patient's safety was not a priority in this meeting, which could lead to readmission or fall risk at home.
In scenario two, the team all appeared happy to be there, with smiling faces and excellent eye contact. The leader engaged the nursing student immediately by having the team introduce themself. The team was much more prepared and engaged. Each member respected the other's role in providing care and a safe, patient-centered discharge. The team took responsibility for what was needed from each of them now and at the time of release. The communication was more two-way communication. They did a recap of what was discussed, and everyone willingly took part in making sure the patient went home safely and confidently.
Effective communication between interdisciplinary teams must be present to provide the care needed for each patient. It starts with respecting each other's role in the patient's care and remembering the patient is the priority. The
Journal of Communication in Healthcare stated the leading cause of all sentinel events from 1995 to 2004 was ineffective communication. (2019, Altabba) Therefore effective communication could decrease the number of incidents, and lead to proper care.
References
Altabbaa G, Kaba A, Beran TN. Moving from structure.
InstructionsRespond to your colleagues. Respond with a comment .docxpauline234567
Instructions:
Respond to your colleagues. Respond with a comment that asks for clarification, supports, or contributes additional information to two or more of your colleagues.
Reynaldo Guerra
As influencers in our society, that bring about social change in healthcare as all those we contact, the type of agent I would align with is a Purposeful Participant. Where "School or work are the primary motivations for involvement in positive social change." (
What kind of social change agent are you? n.d.) are what defines greatly the type of agent I am. Due to my desire to expand my education and grow, I have been allowed to not just see but know that I can contribute to various aspects of healthcare. At the hospital I currently am employed, many principles are introduced to us and help us with making a difference for our patients as all professionals alike by the way we interact and the relationship we create with everyone. Even if driven by these two motivators, they have opened my eyes and expanded my limitations in the change we can bring about.
This eye-opening experience has changed my perspective on how I can make a social change with all those around me. I now feel that a cascade effect comes from my changes as little as it might seem, it gets passed down and impact larger changes in the long run. How I speak with my patients and show the advocate I am for them in addressing their healthcare issues with importance, to the trust and relationship I have created with the primary care providers, goes to show these small social changes can in the end bring a great change for all. This has shown me that social change has a larger purpose in the end and even as small of a change we bring about, if we all come together and do the same, the results would be even more significant than what we perceived as a small change in the beginning. From our professional interactions with one another to our desire to help and better our care with all patients alike, these changes have a great purpose and impact on our future as everyone else.
Apart from that, social change has influenced my education by motivating me to seek ways to make a difference in a community project presented by my university. It has ignited a flame in me, so to speak, and piqued my interest in seeing what my university has to offer in making a social change. Whether this is by being part of projects, joining a committee, or being part of future alumni programs to help others. Also, being able to refine my nursing practice in our community as in the hospital has been a change for me. This, in turn, will be put forth in the interactions and relationships I create with my patients, colleagues, peers, and others I come in contact with, hopefully, bringing a social change in the end. This is what the principles of social change will bring about for me.
References
Walden University. (n.d.).
What kind of social change agent are you? Lin.
Instructions
Procurement Outsourcing (PO) Strategies:
PO strategies at the highest level involve either materials or traditional business processes such as HR, IT, Finance, Accounting, Travel/Entertainment services, Marketing/Print/Advertising, or Customer Relationship Management (CRM). Your task here is to choose a public business organization and report on what direct materials are being outsourced. Direct materials are categorized as strategic (high-impact), bottleneck items (low-profit impact and high-supply risk), leverage items (high-profit items and low-supply risk), or non-critical (low-profit impact and low-supply risk). Describe the outsource process in detail, who provided the outsourced services, and what direct materials were involved.
You are to prepare a PowerPoint presentation, with a minimum of twelve (12) slides, to include inline citations, a cover slide, and a slide of references. Your citations and references should be APA-compliant.
Level of writing: Exemplary
.
InstructionsPart Four of Applied Final Project,Playing with Ge.docxpauline234567
Instructions
Part Four of Applied Final Project,Playing with Gender: Understanding Our Gendered Selves:
"Understanding My Playing-with-Gender Act" (20% of course grade; due end of Week 7) Five (5) pages (1200-1500 words)
All parts of this project should be formatted in APA style (follow for both essay and citation styles):https://libguides.umgc.edu/c.php?g=1003870
Purpose: Act Analysis
In this part of the assignment, you will perform, describe, and analyze your act. After you perform your act, compose a 5-page (1200-1500 words) task specifying your experiences. The first section (one-third to one-half of your paper) should describe your act and your responses to it, and the second section should analyze your act in terms of the scholarship on gender:
Section One (minimum 500 words):
1. Describe your act:
2. What did you do?
3. Where did you do it?
4. How did you prepare for it?
5. What responses did you get while performing your act?
6. How did you feel while performing your act?
7. What would you do differently if you had to perform this same act again? Would you perform the act in the same location and at same time? Would you change your appearance during the act? Would you do anything else differently?
8. Please refer directly to the required reading on Participant Observation (Mack et al., 2005) in this section of the paper (Mack et al., 2005) (
PLEASE see attached for document):
Mack et al. (2005). "Module Two: Participant Observation," from
Qualitative Research Methods: A Data Collector's Field Guide, Family Health International. Read Module 2, pages 13-27. Retrieved from
https://www.fhi360.org/sites/default/files/media/documents/Qualitative%20Research%20Methods%20-%20A%20Data%20Collector's%20Field%20Guide.pdf
Section Two: (minimum 700 words):
(Please see attached for document listing the sources)
Referring directly to at least three academic sources for support (these may be pulled from the sources you identified and discussed in your Annotated Bibliography for Part 3
and/or the readings for this class), consider the potential impact of your act. Here are some questions to consider (you do not have to answer all of these questions; they are provided to help you to think about ways your act may have impact on society):
· Can you explain the range of reactions to your act? Did those reactions reflect any of the sociological scholarship found in the course readings or in your research? Did any of the reactions challenge that research?
· How do you think class, race, age, and sexuality came into play during the conception and performance of the act?
· Was performing this act an act of feminism? Why? and, if so, what type(s) of feminism?
· Was your act an act of activism? That is, could it help to create social change? If so, how?
Please see attached for Project 1, 2 & 3 for information and assistance.
Qualitative
Research
Methods:
A DATA CO L L E.
InstructionsClients come to MFTs because they want to change, .docxpauline234567
Instructions
Clients come to MFTs because they want to change, whether the change is in cognitions, structure, insight, or something else. Therefore, it is important for you to understanding why, when, and how people change. This week, you will continue the exploration of core concepts related to systems theory and its application to MFT field concepts. You will review several concepts associated with change including homeostasis, first-order change, second-order change, continuous change, and discontinuous change.
Complete the provided worksheet template located in this week’s resources. Note: You will use the worksheet you complete this week as part of your work in Week 4.
For each item, be sure to address the following:
· Record a direct quotation that defines the concept or describes the assumption.
· Paraphrase the definition or description by explaining the information in your own words. As you are paraphrasing, keep in mind that concepts often involve several interrelated ideas. When you are paraphrasing, be sure to not oversimplify the concept.
· Provide an original example (not one you read about in the course resources) of the concept or assumption.
· Explain how your example reflects the definition. Refer to your paraphrased definition in order to compare the example to the concept.
Should you have questions or need clarification on any items, please contact your professor to discuss it.
Length: 1-2 pages (completed template). Additional resources/reference page is not required.
Your cheat sheet should demonstrate thoughtful consideration of the ideas and concepts presented in the course by providing new thoughts and insights relating directly to this topic. Be sure to adhere to Northcentral University's Academic Integrity Policy.
Upload your document, and then click the
Submit to Dropbox button.
Building Blocks to Conceptualizing Family: A Family System’s Perspective Valerie Q. Glass, PhD, LMFT
Background of Systemic Thinking
Systemic thinking, for some, means trying on a new and unique lens when considering “presenting problems” that arise in therapeutic settings. Most mental and emotional health backgrounds study individual cognitive and emotional processes, systemic thinking means a shift in looking at one person to looking at a whole system. Keeney (1983) calls this change in professional theory an epistemological shift. Epistemology, most basically, is the way one understands what is in front of them, and the root with which decisions are made. Helping fields all develop from different epistemologies. Psychiatry views medicine and biology as their epistemological construct of how or why people act the way they do. Much of the epistemological focus of social work fields embraces the necessity or connecting to resources and social support as a catalyst for change. Psychology explores the make-up of the individual’s mind and develops steps for change. Family systems, and.
INST560, Internet of Things (IoT)UNIVERSITY OF NORTH AMERICA.docxpauline234567
INST560, Internet of Things (IoT)
UNIVERSITY OF NORTH AMERICA
Lecture 3: Fall 2022
Professor Aliakbar Jalali
[email protected]
1
Internet of Things Enabling Technologies
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Overview
Introduction
Evolution of the Technology
Some significant statistics
IoT Technology
Risks of IoT Technologies
Use Cases of IoT Technology!
What are IoT Enabling Technology
Conclusion
References
2
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Introduction
Because of technological changes taking place in the world, IoT is gradually taking over all the fields, and the future of the IoT applications are increasing day by day.
Technological advances are fueling the growth of IoT.
Technology improved communications and network, new sensors of various kinds; cheaper, denser, more reliable, and power efficient storage both in the cloud and locally are converging to enable new types of IoT based products that were not possible a few years ago.
IoT technology will further develop to make our day-to-day operations much easier and more remotely controlled in the days to come.
3
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Introduction
Businesses need to constantly explore IoT applications within their domain to stay ahead in competitiveness and implementation.
The competition will primarily define in the coming decade as how companies take advantage of innovative technology.
However, it is the dominant technology that determines the future of many businesses attached to the future of the internet of things (IoT).
4
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Introduction
The emerging trends in IoT are majorly driven by technologies like artificial intelligence, blockchain, 5G and edge computing.
We need to know more in detail about the elements that make up broad spectrum of technologies, we know as the Internet of Things.
Technological advances lies in the business value of IoT applications like smart wearables, smart homes and buildings, smart cities, autonomous cars, smart factories, location trackers, wireless sensors and much more.
5
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Introduction: Technology is changing the world!
Technology is changing the world.
It is changing the way we communicate, shop, learn, travel, play and of course the way we work.
http://www.telegraph.co.uk/technology/2017/05/06/internet-things-could-really-change-way-live/
6
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
6
Introduction: Technology is changing the world!
7
Global gigabit subscriptions are expected to jump to 50 million in 2022, more than doubling from 24 million at the end of 2020, according to a new report from analyst firm Omdia.
High Speed Internet!
/59
UoNA-ST560-FALL-2022, Internet of Things (IoT)
Introduction: Social Media is Changing societies!
8
Are you on social media a lot? When is the last time you checked Twitter, Facebook, or Instagram? Last n.
Insert Prename, Surname of all studentsWinter Term 202223Theo.docxpauline234567
Insert Prename, Surname of all students
Winter Term 2022/23
Theory Factsheet: Insert name of theory
Level of analysis
Insert levels of analysis, e.g., organisation, individual, social
Dependent construct(s)
Please insert the dependent construct(s) of the theory
Independent construct(s)
Please insert the independent construct(s) of the theory
Short description of the theory
Please describe the theory in full sentences.
Cause-Effect Model
Please insert a visual diagram of the cause-effect relationships or factor model of the theory (if available).
Applications of the theory
Please describe for which purposes / in which fields the theory has been applied.
Which relevance does the theory have for digitalization in organizations?
Criticism
Describe alternative views, potential critique, and open discussion on the theory.
References
Insert sources and references used in this factsheet in APA 7th style.
Students will write a 2-3 pages essay analyzing one of the topics addressed during the semester under the section of Contemporary Issues: Human Rights. The student will be free to choose any of the topics discussed during class as well as his/her opinion about it.
1. Choose a topic (death penalty, assisted suicide, abortion, death by euthanasia, bioethics… etc.)
2. First page: description of the problem (is is here Fl, or national or worlwide, statistics, etc)
Second page: YOUR ETHICAL POSITION ABOUT IT (why is this an ethical issue, where your argument os coming from, etc)
3. REFERENCES (could be ppt, movie, article, web, book)
The writing will be evaluated for clarity and proper handling of terms, phrases, and concepts addressed up to this date. APA or MLA style will be required
https://owl.english.purdue.edu/owl/section/2/10/.
Reading listWinter semester 2022/23
Version 24.09.2022
Reading
Package
No.
Theories Papers
Information Systems Foundational Theories
Structuration Theory Orlikowski, W.J. (1992). The Duality of Technology: Rethinking the Concept of Technology in Organizations. Organization Science, 3 (3), 398-
427.
Structuration Theory Orlikowski, W.J. and Robey, D. (1991). Information Technology and the Structuring of Organizations. Information Systems Research, 2 (2),
143-169.
Structuration Theory Walsham, G. and Han, C.K. (1991) Structuration theory and information systems research. Journal of Applied Systems Analysis 17: 77-85.
Institutional Theory Barley, S.R and Tolbert, P.S. (1997). Institutionalization and structuration: studying the links between action and institution. Organization
Studies 18 (1): 93-118.
Institutional Theory Orlikowski, W. J., & Barley, S. R. (2001). Technology and institutions: What can research on information technology and research on
organizations learn from each other? MIS Quarterly, 25(2), 145.
Design Science Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28 (1), 75.
Informative SpeechCourse COM103 Public SpeakingCriteria.docxpauline234567
Informative Speech
Course: COM103 Public Speaking
Criteria Level 4 Level 3 Level 2 Level 1 Criterion Score
Introduction / 10
Material / 8
Transitions / 10
10 points
Introduction
contained a
strong
attention
getter,
introduction of
the topic,
credibility
statement, and
previewed the
speech.
7 points
Introduction
contained 3 of
the following:
a strong
attention
getter,
introduction of
the topic,
credibility
statement, and
previewed the
speech.
4 points
Introduction
contained 2 of
the following:
a strong
attention
getter,
introduction of
the topic,
credibility
statement, and
previewed the
speech.
0 points
Introduction
contained 1 of
the following:
a strong
attention
getter,
introduction of
the topic,
credibility
statement, and
previewed the
speech.
8 points
Material was
clear AND
well organized
5.6 points
Material was
either clear
OR well
organized
3.2 points
NA
0 points
Material was
neither clear
and well
organized
10 points
Transitions
were clear and
used after the
intro, between
each main idea
and before the
conclusion
7 points
Transitions
were clear, but
were not used
in all areas:
after the intro,
between each
main idea and
before the
conclusion
4 points
Transitions
used after the
intro, between
each main idea
and before the
conclusion,
but were not
effective
0 points
Transitions
were not used.
Rubric Assessment - COM103 Public Speaking - National University https://nationalu.brightspace.com/d2l/lms/competencies/rubric/rubrics_a...
1 of 4 12/6/22, 5:38 PM
Criteria Level 4 Level 3 Level 2 Level 1 Criterion Score
Conclusion / 8
Time limit / 8
Preparation
outline
uploaded
/ 8
8 points
The
conclusion
contained a
strong closing
AND the
speaker
signaled the
end of the
speech
5.6 points
The
conclusion
contained a
strong closing
OR the
speaker
signaled the
end of the
speech
3.2 points
The speaker
needs
improvement
signalling the
end of the
speech and a
stronger
closing.
0 points
The
conclusion
neither
contained a
strong closing
and the
speaker did
not signal the
end of the
speech
8 points
The length of
the speech
was between
5 and 6
minutes
5.6 points
NA
3.2 points
The length of
the speech
was shorter
than 5 minutes
or longer than
6 minutes
0 points
NA
8 points
The
preparation
outline was
uploaded with
the speech
5.6 points
The
preparation
outline was
uploaded after
delivering the
speech
3.2 points
The
preparation
outline was
not in a
preparation
outline format
0 points
The
preparation
outline was
not uploaded.
Rubric Assessment - COM103 Public Speaking - National University https://nationalu.brightspace.com/d2l/lms/competencies/rubric/rubrics_a...
2 of 4 12/6/22, 5:38 PM
Criteria Level 4 Level 3 Level 2 Level 1 Criterion Score
Eye Contact / 10
Delivery / 10
Non verbals / 10
Overall
preparation
/ 8
10 points
The speaker
had strong eye
contac.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
InstructionsA SWOT analysis is used as a strategic planning tech.docx
1. Instructions
A SWOT analysis is used as a strategic planning technique by
businesses and/or individuals to identify
strengths, weaknesses, opportunities, and threats to a
planned project. It identifies conditions that are favorable or
unfavorable to achieving the goal of the project by grouping
them into categories:
Strengths and Weakness are frequently internally related, while
Opportunities and Threats commonly focus on environmental
factors.
·
Strengths: characteristics of the projected solution that
give it an advantage over others.
·
Weaknesses: characteristics that place the projected
solution at a disadvantage relative to others.
·
Opportunities: elements in the environment that the
projected solution could exploit to its advantage.
·
Threats: elements in the environment that could cause
trouble for the projected solution.
Instructions:
1. Download the worksheet attached to this assignment below.
2. Based on what you found in your annotated bibliography,
identify a plan that you could propose/implement that addresses
your health issue.
3. Be specific and list the objectives and the interventions
needed on the worksheet.
2. 4. Then list 3 each of the strengths, weaknesses, opportunities,
and threats to your proposed solution.
5. Finally, list ways you can capitalize on the strengths and
opportunities, and ways you can mitigate the weaknesses and
threats, when thinking about your plan.
Complete this SWOT Analysis and upload it to this assignment
dropbox. This assignment is due by Sunday at 11:59 pm CT.
INTRODUCTION TO DATA MINING
INTRODUCTION TO DATA MINING
SECOND EDITION
PANG-NING TAN
Michigan StateUniversit
MICHAEL STEINBACH
University of Minnesota
ANUJ KARPATNE
University of Minnesota
VIPIN KUMAR
University of Minnesota
330 Hudson Street, NY NY 10013
Director, Portfolio Management: Engineering, Computer
4. reserved. Manufactured in
the United States of America. This publication is
protected by Copyright, and
permission should be obtained from the publisher
prior to any prohibited
reproduction, storage in a retrieval system, or
transmission in any form or by
any means, electronic, mechanical, photocopying,
recording,or likewise. For
information regarding permissions, request forms
and the appropriate
contacts within the Pearson Education Global Rights
& Permissions
department, please visit
www.pearsonhighed.com/permissions/.
Many of the designations by manufacturers and
sellers to distinguish their
products are claimed as trademarks. Where those
designations appear in this
book, and the publisher was aware of a
trademark claim, the designations
have been printed in initial caps or all caps.
Library of Congress Cataloging-in-Publication Data on
File
Names: Tan, Pang-Ning, author. | Steinbach,
Michael, author. | Karpatne,
Anuj, author. | Kumar, Vipin, 1956- author.
Title: Introduction to Data Mining / Pang-Ning
Tan, Michigan StateUniversity,
Michael Steinbach, University of Minnesota,
Anuj Karpatne, University of
Minnesota, Vipin Kumar, University of
5. Minnesota.
Description: Second edition. | New York, NY :
Pearson Education, [2019] |
Includes bibliographical referencesand index.
Identifiers: LCCN 2017048641 | ISBN
9780133128901 | ISBN 0133128903
Subjects: LCSH: Data mining.
Classification: LCC QA76.9.D343 T35 2019 | DDC
006.3/12–dc23 LC record
available at https://lccn.loc.gov/2017048641
1 18
ISBN-10: 0133128903
ISBN-13: 9780133128901
To our families …
Preface to the Second Edition
Since the first edition, roughly 12 years ago,
much has changed in the field of
data analysis. The volume and variety of data being
collected continues to
increase, as has the rate (velocity) at which it is
being collected and used to
6. make decisions. Indeed, the term, Big Data,
has been used to refer to the
massive and diverse data sets now available. In
addition, the term data
science has been coined to describe an emerging
area that applies tools and
techniques from various fields, such as data mining,
machine learning,
statistics, and many others, to extract actionable
insights from data, oftenbig
data.
The growth in data has created numerous
opportunities for all areasof data
analysis. The most dramatic developments have been in
the area of predictive
modeling, across a wide range of application
domains. For instance, recent
advances in neural networks, known as deep
learning, have shown
impressive results in a number of challenging
areas, such as image
classification, speech recognition, as well as text
categorization and
understanding. While not as dramatic, otherareas,
e.g., clustering,
association analysis, and anomaly detection have also
continued to advance.
This new edition is in response to those
advances.
Overview
As with the first edition, the second edition of
the book provides a
comprehensive introduction to data mining and is
7. designed to be accessible
and useful to students, instructors, researchers,
and professionals. Areas
covered include data preprocessing, predictive modeling,
association
analysis, cluster analysis, anomaly detection, and
avoiding false discoveries.
The goal is to present fundamental concepts and
algorithms for each topic,
thus providing the reader with the necessary background
for the application of
data mining to real problems. As before,
classification, association analysis
and cluster analysis, are each covered in a pair of
chapters. The introductory
chapter covers basicconcepts, representative algorithms,
and evaluation
techniques, while the more following chapter
discusses advanced concepts
and algorithms. As before, our objective is to
provide the reader with a sound
understanding of the foundations of data mining,
while still covering many
important advanced topics. Because of this approach,
the book is useful both
as a learning tool and as a reference.
To help readers better understand the concepts
that have been presented,we
provide an extensive set of examples, figures, and
exercises. The solutions to
the original exercises, which are already circulating
on the web, will be made
8. public. The exercises are mostly unchanged from
the last edition, with the
exception of new exercises in the chapter on
avoiding false discoveries. New
exercises for the otherchapters and their solutions will be
available to
instructors via the web. Bibliographic notes are
included at the end of each
chapter for readers who are interested in more
advanced topics, historically
important papers, and recent trends. These have
also been significantly
updated. The book also contains a comprehensive subject
and author index.
What is New in the Second Edition?
Some of the most significant improvements in
the text have been in the two
chapters on classification. The introductory chapter
uses the decision tree
classifier for illustration, but the discussion on
many topics—those that apply
across all classification approaches—has been greatly
expanded and
clarified, including topics such as overfitting,
underfitting, the impact of training
size, model complexity, model selection, and
common pitfalls in model
evaluation. Almost every section of the
advanced classification chapter has
been significantly updated. The material on Bayesian
networks, support vector
9. machines, and artificial neural networks has been
significantly expanded. We
have added a separate section on deep networks to
address the current
developments in this area. The discussion of
evaluation, which occurs in the
section on imbalanced classes, has also been updated
and improved.
The changes in association analysis are more
localized. We have completely
reworked the section on the evaluation of
association patterns (introductory
chapter), as well as the sections on sequence and
graph mining (advanced
chapter). Changes to cluster analysis are also
localized. The introductory
chapter added the K-means initialization technique
and an updated the
discussion of cluster evaluation. The advanced
clustering chapter adds a new
section on spectral graph clustering. Anomaly
detection has been greatly
revised and expanded. Existing approaches—statistical,
nearest
neighbor/density-based, and clustering based—have been
retained and
updated, while new approaches have been added:
reconstruction-based, one-
class classification, and information-theoretic. The
reconstruction-based
approach is illustrated using autoencoder networks
that are part of the deep
learning paradigm. The data chapter has been updated to
include discussions
of mutual information and kernel-based techniques.
10. The last chapter, which discusses how to avoid
false discoveries and produce
validresults, is completely new, and is novel
among othercontemporary
textbooks on data mining. It supplements the
discussions in the other
chapters with a discussion of the statistical
concepts (statistical significance,
p-values, false discovery rate, permutation testing, etc.)
relevant to avoiding
spurious results, and then illustrates theseconcepts in
the context of data
mining techniques. This chapter addresses the
increasingconcern over the
validity and reproducibility of results obtained from
data analysis. The addition
of this last chapter is a recognition of the
importance of this topicand an
acknowledgment that a deeper understanding of
this area is needed for those
analyzing data.
The data exploration chapter has been deleted, as
have the appendices, from
the print edition of the book, but will remain
available on the web. A new
appendix provides a brief discussion of scalability
in the context of big data.
To the Instructor
As a textbook, this book is suitable for a wide
11. range of students at the
advanced undergraduate or graduate level. Since
students come to this
subject with diverse backgrounds that may not include
extensive knowledge of
statistics or databases,our book requires minimal
prerequisites. No database
knowledge is needed, and we assume only a
modest background in statistics
or mathematics, although such a background will
make for easier going in
somesections. As before, the book, and more
specifically, the chapters
covering major data mining topics, are designed to
be as self-contained as
possible. Thus, the order in which topics
can be covered is quiteflexible. The
core material is covered in chapters 2 (data), 3
(classification), 5 (association
analysis), 7 (clustering), and 9 (anomaly detection).
We recommend at least a
cursory coverage of Chapter 10 (Avoiding False
Discoveries) to instill in
students somecaution when interpreting the results
of their data analysis.
Although the introductory data chapter (2) should be
covered first, the basic
classification (3), association analysis (5), and
clustering chapters (7), can be
covered in any order. Because of the relationship
of anomaly detection (9) to
classification (3) and clustering (7), thesechapters should
precede Chapter 9.
12. Various topics can be selected from the advanced
classification, association
analysis, and clustering chapters (4, 6, and 8,
respectively) to fit the schedule
and interests of the instructor and students. We also
advise that the lectures
be augmented by projects or practical exercises in
data mining. Although they
are time consuming, such hands-on assignments greatly
enhance the value of
the course.
Support Materials
Support materials available to all readers of this
book are available at
http://www-users.cs.umn.edu/~kumar/dmbook.
PowerPoint lecture slides
Suggestions for student projects
Data mining resources, such as algorithms and data
sets
Online tutorials that give step-by-step examples for
selected data mining
techniques described in the book using actual
data sets and data analysis
software
Additional support materials, including solutions to
exercises, are available
only to instructors adopting this textbook for
classroom use. The book’s
resources will be mirrored at
www.pearsonhighered.com/cs-resources.
Comments and suggestions, as well as reports of
errors, can be sent to the
13. authors through [email protected]
Acknowledgments
http://www.pearsonhighered.com/cs-resources
Many people contributed to the first and second
editions of the book. We
begin by acknowledging our families to whom
this book is dedicated.Without
their patience and support, this project would have
been impossible.
We would like to thank the current and former
students of our data mining
groups at the University of Minnesota and
Michigan Statefor their
contributions. Eui-Hong (Sam) Han and Mahesh Joshi
helped with the initial
data mining classes. Some of the exercises and
presentation slides that they
created can be found in the book and its
accompanying slides. Students in our
data mining groups who provided comments on drafts
of the book or who
contributed in otherways include Shyam Boriah,
Haibin Cheng, Varun
Chandola, Eric Eilertson, Levent Ertöz, Jing Gao, Rohit
Gupta, Sridhar Iyer,
Jung-Eun Lee, Benjamin Mayer, Aysel Ozgur, Uygar
Oztekin, Gaurav Pandey,
Kashif Riaz, JerryScripps, Gyorgy Simon, Hui
Xiong, Jieping Ye, and
Pusheng Zhang. We would also like to thank
the students of our data mining
classes at the University of Minnesota and
14. Michigan StateUniversity who
worked with earlydrafts of the book and provided
invaluable feedback. We
specifically note the helpful suggestions of
Bernardo Craemer, Arifin Ruslim,
Jamshid Vayghan, and Yu Wei.
Joydeep Ghosh (University of Texas) and Sanjay
Ranka (University of Florida)
class tested earlyversions of the book. We also
received many useful
suggestions directly from the following UT students:
Pankaj Adhikari, Rajiv
Bhatia, Frederic Bosche, Arindam Chakraborty,
Meghana Deodhar, Chris
Everson, David Gardner, Saad Godil, ToddHay, Clint
Jones, Ajay Joshi,
Joonsoo Lee, Yue Luo, Anuj Nanavati, Tyler Olsen,
Sunyoung Park, Aashish
Phansalkar, Geoff Prewett, Michael Ryoo, Daryl
Shannon, and Mei Yang.
Ronald Kostoff (ONR) read an earlyversion of
the clustering chapter and
offered numerous suggestions. George Karypis
provided invaluable LATEX
assistance in creating an author index. Irene
Moulitsas also provided
assistance with LATEX and reviewed someof the
appendices. Musetta
Steinbach was very helpful in finding errors in
the figures.
15. We would like to acknowledge our colleagues at
the University of Minnesota
and Michigan Statewho have helped create a
positive environment for data
mining research. They include Arindam Banerjee, Dan
Boley, Joyce Chai, Anil
Jain, Ravi Janardan, Rong Jin, George Karypis, Claudia
Neuhauser, Haesun
Park, William F. Punch, György Simon, Shashi
Shekhar, and Jaideep
Srivastava. The collaborators on our many data
mining projects, who also
have our gratitude, include Ramesh Agrawal, Maneesh
Bhargava, Steve
Cannon, Alok Choudhary, Imme Ebert-Uphoff, Auroop
Ganguly, Piet C. de
Groen, Fran Hill, Yongdae Kim, Steve Klooster, Kerry
Long, Nihar Mahapatra,
Rama Nemani, Nikunj Oza, Chris Potter, Lisiane
Pruinelli, Nagiza Samatova,
Jonathan Shapiro, Kevin Silverstein, Brian Van
Ness, Bonnie Westra, Nevin
Young, and Zhi-Li Zhang.
The departments of Computer Science and Engineering
at the University of
Minnesota and Michigan StateUniversity provided
computing resources and a
supportive environment for this project. ARDA,
ARL, ARO, DOE, NASA,
NOAA, and NSF provided research support for Pang-
Ning Tan, Michael Stein-
bach, Anuj Karpatne, and Vipin Kumar. In
particular, Kamal Abdali, Mitra
Basu, Dick Brackney, Jagdish Chandra, Joe Coughlan,
Michael Coyle,
16. Stephen Davis, Frederica Darema, Richard Hirsch,
ChandrikaKamath,
Tsengdar Lee, Raju Namburu, N. Radhakrishnan,James
Sidoran, Sylvia
Spengler, Bhavani Thuraisingham, Walt Tiernin, Maria
Zemankova, Aidong
Zhang, and Xiaodong Zhang have been supportive of
our research in data
mining and high-performance computing.
It was a pleasure working with the helpful staff at
Pearson Education. In
particular, we would like to thank Matt
Goldstein, Kathy Smith, Carole Snyder,
and Joyce Wells. We would also like to thank
George Nichols, who helped
with the art work and Paul Anagnostopoulos, who
provided LATEX support.
We are grateful to the following Pearson reviewers:
Leman Akoglu (Carnegie
Mellon University), Chien-Chung Chan(University of
Akron), Zhengxin Chen
(University of Nebraska at Omaha), Chris Clifton
(Purdue University), Joy-
deep Ghosh (University of Texas, Austin), Nazli
Goharian (Illinois Institute of
Technology), J. Michael Hardin (University of
Alabama),Jingrui He (Arizona
StateUniversity), James Hearne (Western Washington
University), Hillol
Kargupta (University of Maryland,Baltimore County
and Agnik, LLC), Eamonn
17. Keogh (University of California-Riverside), Bing
Liu (University of Illinois at
Chicago), Mariofanna Milanova (University of
Arkansas at Little Rock),
Srinivasan Parthasarathy (Ohio StateUniversity),
Zbigniew W. Ras (University
of North Carolina at Charlotte), Xintao Wu
(University of North Carolina at
Charlotte), and Mohammed J. Zaki (Rensselaer
Polytechnic Institute).
Over the years sincethe first edition, we have also
received numerous
comments from readers and students who have pointed
out typos and various
otherissues. We are unable to mention these
individuals by name, but their
inputis much appreciated and has been taken
into account for the second
edition.
Contents
Preface to the Second Edition v
1 Introduction 1
1.1 What Is Data Mining? 4
1.2 Motivating Challenges 5
1.3 The Origins of Data Mining 7
1.4 Data Mining Tasks 9
1.5 Scope and Organization of the Book 13
18. 1.6 Bibliographic Notes 15
1.7 Exercises 21
2 Data 23
2.1 Types of Data 26
2.1.1Attributes and Measurement 27
2.1.2Types of Data Sets 34
2.2 Data Quality 42
2.2.1Measurement and Data Collection Issues 42
2.2.2Issues Related to Applications 49
2.3 Data Preprocessing 50
2.3.1Aggregation 51
2.3.2Sampling 52
2.3.3Dimensionality Reduction 56
2.3.4Feature Subset Selection 58
2.3.5Feature Creation 61
2.3.6Discretization and Binarization 63
2.3.7Variable Transformation69
2.4 Measures of Similarity and Dissimilarity 71
2.4.1Basics 72
19. 2.4.2Similarity and Dissimilarity between Simple
Attributes 74
2.4.3Dissimilarities between Data Objects 76
2.4.4Similarities between Data Objects 78
2.4.5Examples of Proximity Measures 79
2.4.6Mutual Information 88
2.4.7Kernel Functions* 90
2.4.8Bregman Divergence* 94
2.4.9Issues in Proximity Calculation 96
2.4.10 Selecting the Right Proximity Measure 98
2.5 Bibliographic Notes 100
2.6 Exercises 105
3 Classification: Basic Concepts and Techniques
113
3.1 Basic Concepts 114
3.2 General Framework for Classification 117
3.3 Decision Tree Classifier 119
3.3.1A Basic Algorithmto Build a Decision
Tree 121
20. 3.3.2Methods for Expressing Attribute Test Conditions
124
3.3.3Measures for Selecting an Attribute Test Condition
127
3.3.4Algorithmfor Decision Tree Induction 136
3.3.5Example Application: Web Robot Detection 138
3.3.6Characteristics of Decision Tree Classifiers 140
3.4 Model Overfitting 147
3.4.1Reasons for Model Overfitting 149
3.5 Model Selection 156
3.5.1Using a Validation Set 156
3.5.2Incorporating Model Complexity 157
3.5.3Estimating Statistical Bounds 162
3.5.4Model Selection for Decision Trees 162
3.6 Model Evaluation 164
3.6.1Holdout Method 165
3.6.2Cross-Validation 165
3.7 Presence of Hyper-parameters 168
3.7.1Hyper-parameter Selection 168
3.7.2Nested Cross-Validation 170
21. 3.8 Pitfalls of Model Selection and Evaluation
172
3.8.1Overlap between Training and Test Sets 172
3.8.2Use of Validation Error as Generalization
Error 172
3.9 Model Comparison 173
3.9.1Estimating the Confidence Interval for
Accuracy 174
3.9.2Comparing the Performance of Two Models
175
3.10 Bibliographic Notes 176
3.11 Exercises 185
4 Classification: Alternative Techniques 193
4.1 Types of Classifiers 193
4.2 Rule-Based Classifier 195
4.2.1How a Rule-Based Classifier Works 197
4.2.2Properties of a Rule Set 198
4.2.3Direct Methods for Rule Extraction 199
4.2.4Indirect Methods for Rule Extraction 204
4.2.5Characteristics of Rule-Based Classifiers 206
4.3 Nearest Neighbor Classifiers 208
4.3.1Algorithm209
22. 4.3.2Characteristics of Nearest Neighbor Classifiers
210
*
4.4 Naïve Bayes Classifier 212
4.4.1Basics of Probability Theory 213
4.4.2Naïve Bayes Assumption 218
4.5 Bayesian Networks 227
4.5.1Graphical Representation 227
4.5.2Inference and Learning 233
4.5.3Characteristics of Bayesian Networks 242
4.6 Logistic Regression 243
4.6.1Logistic Regression as a Generalized Linear
Model 244
4.6.2Learning Model Parameters 245
4.6.3Characteristics of Logistic Regression 248
4.7 Artificial Neural Network (ANN) 249
4.7.1Perceptron 250
4.7.2Multi-layer Neural Network 254
4.7.3Characteristics of ANN261
4.8 DeepLearning 262
4.8.1Using Synergistic Loss Functions 263
23. 4.8.2Using Responsive Activation Functions 266
4.8.3Regularization 268
4.8.4Initialization of Model Parameters 271
4.8.5Characteristics of DeepLearning 275
4.9 Support Vector Machine (SVM) 276
4.9.1Margin of a Separating Hyperplane 276
4.9.2Linear SVM278
4.9.3Soft-margin SVM284
4.9.4Nonlinear SVM290
4.9.5Characteristics of SVM294
4.10 Ensemble Methods 296
4.10.1 Rationale for Ensemble Method 297
4.10.2 Methods for Constructing an Ensemble
Classifier 297
4.10.3 Bias-Variance Decomposition 300
4.10.4 Bagging 302
4.10.5 Boosting 305
4.10.6 Random Forests 310
24. 4.10.7 Empirical Comparison among Ensemble
Methods 312
4.11 Class ImbalanceProblem 313
4.11.1 Building Classifiers with Class Imbalance
314
4.11.2 Evaluating Performance with Class
Imbalance318
4.11.3 Finding an Optimal Score Threshold 322
4.11.4 AggregateEvaluation of Performance 323
4.12 Multiclass Problem 330
4.13 Bibliographic Notes 333
4.14 Exercises 345
5 Association Analysis: Basic Concepts and
Algorithms 357
5.1 Preliminaries 358
5.2 Frequent Itemset Generation 362
5.2.1The Apriori Principle 363
5.2.2Frequent Itemset Generation in the Apriori
Algorithm364
5.2.3Candidate Generation and Pruning 368
5.2.4Support Counting 373
25. 5.2.5Computational Complexity 377
5.3 Rule Generation 380
5.3.1Confidence-Based Pruning 380
5.3.2Rule Generation in Apriori Algorithm381
5.3.3An Example: Congressional Voting Records 382
5.4 Compact Representation of Frequent Itemsets 384
5.4.1Maximal Frequent Itemsets 384
5.4.2Closed Itemsets 386
5.5 Alternative Methods for Generating Frequent
Itemsets* 389
5.6 FP-Growth Algorithm* 393
5.6.1FP-Tree Representation 394
5.6.2Frequent Itemset Generation in FP-Growth
Algorithm397
5.7 Evaluation of Association Patterns 401
5.7.1Objective Measures of Interestingness 402
5.7.2Measures beyond Pairsof Binary Variables 414
5.7.3Simpson’s Paradox 416
5.8 Effect of Skewed Support Distribution 418
5.9 Bibliographic Notes 424
34. 10.2 Modeling Null and Alternative Distributions 778
10.2.1 Generating Synthetic Data Sets 781
10.2.2 Randomizing Class Labels 782
10.2.3 Resampling Instances 782
10.2.4 Modeling the Distribution of the Test
Statistic 783
10.3 Statistical Testing for Classification 783
10.3.1 Evaluating Classification Performance 783
10.3.2 Binary Classification as Multiple Hypothesis
Testing 785
10.3.3 Multiple Hypothesis Testing in Model
Selection 786
10.4 Statistical Testing for Association Analysis 787
10.4.1 Using Statistical Models 788
10.4.2 Using Randomization Methods 794
10.5 Statistical Testing for Cluster Analysis 795
10.5.1 Generating a Null Distribution for Internal
Indices 796
10.5.2 Generating a Null Distribution for
External Indices 798
10.5.3 Enrichment 798
10.6 Statistical Testing for Anomaly Detection 800
35. 10.7 Bibliographic Notes 803
10.8 Exercises 808
Author Index 816
Subject Index 829
Copyright Permissions 839
1 Introduction
Rapid advances in data collection and storage
technology, coupled with the ease with which data
can
be generated and disseminated, have triggered the
explosive growth of data, leading to the current
age of
big data. Deriving actionableinsights from theselarge
data sets is increasingly important in decision making
across almost all areasof society, including
business
and industry; science and engineering; medicine and
biotechnology; and government and individuals.
However, the amount of data (volume), its complexity
(variety), and the rate at which it is being
collected and
processed (velocity) have simply become too greatfor
humans to analyze unaided. Thus, thereis a
great
need for automatedtools for extracting useful
36. information from the big data despite the challenges
posed by its enormity and diversity.
Data mining blends traditional data analysis
methods
with sophisticated algorithms for processing this
abundance of data. In this introductory chapter,
we
present an overview of data mining and outline
the key
topics to be covered in this book. We start
with a
description of someapplications that require more
advanced techniques for data analysis.
Business and Industry Point-of-sale data collection (bar
code scanners,
radiofrequency identification (RFID), and smart card
technology) have
allowed retailers to collect up-to-the-minute data
about customer purchases at
the checkout counters of their stores. Retailers can
utilize this information,
along with otherbusiness-critical data, such as web
server logs from e-
commerce websites and customer service records from
call centers, to help
them better understand the needs of their
customers and make more informed
business decisions.
Data mining techniques can be used to support a
wide range of business
37. intelligence applications, such as customer profiling,
targeted marketing,
workflow management, store layout, fraud detection,
and automatedbuying
and selling. An example of the last application is
high-speed stock trading,
where decisions on buying and selling have to be
made in less than a second
using data about financial transactions. Data mining
can also help retailers
answer important business questions, such as “Who
are the most profitable
customers?” “What products can be cross-sold or
up-sold?” and “What is the
revenue outlook of the company for next year?”
These questions have
inspired the development of such data mining
techniques as association
analysis (Chapters 5 and 6 ).
As the Internet continues to revolutionize the way
we interact and make
decisions in our everyday lives, we are generating
massive amounts of data
about our online experiences, e.g., web browsing,
messaging, and posting on
social networking websites. This has opened several
opportunities for
business applications that use web data. For example, in
the e-commerce
sector, data about our online viewing or
shopping preferences can be used to
provide personalized recommendations of products.
38. Data mining also plays a
prominent role in supporting several otherInternet-
based services, such as
filtering spammessages, answering search queries, and
suggesting social
updates and connections. The largecorpus of text,
images, and videos
available on the Internet has enabled a number of
advancements in data
mining methods, including deep learning, which is
discussed in Chapter 4 .
These developments have led to greatadvances in a
number of applications,
such as object recognition, natural language
translation, and autonomous
driving.
Another domain that has undergone a rapidbig
data transformation is the use
of mobile sensors and devices, such as smart
phones and wearable
computing devices. With better sensor
technologies, it has become possible
to collect a variety of information about
our physical world using low-cost
sensors embedded on everyday objects that are
connected to each other,
termed the Internet of Things (IOT). This deep
integration of physical sensors
in digital systems is beginning to generate large
amounts of diverse and
distributed data about our environment, which
can be used for designing
convenient, safe, and energy-efficient home
systems, as well as for urban
planning of smart cities.
39. Medicine, Science, and Engineering Researchers in
medicine, science, and
engineering are rapidly accumulating data that is
key to significant new
discoveries. For example, as an important step toward
improvingour
understanding of the Earth’s climate system, NASA
has deployed a series of
Earth-orbiting satellites that continuously generate global
observations of the
land surface, oceans, and atmosphere. However, because
of the size and
spatio-temporal nature of the data, traditional
methods are oftennot suitable
for analyzing thesedata sets. Techniques developed in
data mining can aid
Earth scientists in answering questions such as the
following: “What is the
relationship between the frequency and intensity of
ecosystemdisturbances
such as droughts and hurricanes to global
warming?” “How is land surface
precipitation and temperature affected by ocean
surface temperature?” and
“How well can we predict the beginning and end of
the growing season for a
region?”
As another example, researchers in molecular biology
hope to use the large
amounts of genomic data to better understand
the structure and function of
40. genes. In the past, traditional methods in
molecular biology allowed scientists
to study only a few genes at a time in a
given experiment. Recent
breakthroughs in microarray technology have enabled
scientists to compare
the behavior of thousands of genes under various
situations. Such
comparisons can help determine the function of each
gene, and perhaps
isolate the genes responsible for certain
diseases. However, the noisy, high-
dimensional nature of data requires new data
analysis methods. In addition to
analyzing gene expression data, data mining can also be
used to address
otherimportant biological challenges such as protein
structure prediction,
multiple sequence alignment, the modeling of
biochemical pathways, and
phylogenetics.
Another example is the use of data mining
techniques to analyze electronic
health record (EHR) data, which has become
increasingly available. Not very
long ago, studies of patients required manually
examining the physical
records of individual patients and extracting very specific
pieces of information
pertinent to the particular question being investigated.
EHRs allow for a faster
and broader exploration of such data. However, there
are significant
challenges sincethe observations on any one patient
typically occur during
41. their visits to a doctor or hospital and only a
small number of details about the
health of the patient are measured during any
particular visit.
Currently,EHR analysis focuses on simple types of
data, e.g., a patient’s
blood pressure or the diagnosis code of a disease.
However, largeamounts of
more complex types of medical data are also being
collected, such as
electrocardiograms (ECGs) and neuroimages from
magnetic resonance
imaging (MRI) or functional Magnetic Resonance
Imaging (fMRI). Although
challenging to analyze, this data also provides vital
information about patients.
Integrating and analyzing such data, with traditional
EHR and genomic data is
one of the capabilities needed to enable
precision medicine, which aims to
provide more personalized patient care.
1.1 What Is Data Mining?
Data mining is the process of automatically
discovering useful information in
largedata repositories. Data mining techniques are
deployed to scour large
data sets in order to find novel and useful
patterns that might otherwise
remain unknown. They also provide the capability to
42. predict the outcome of a
future observation, such as the amount a
customer will spend at an online or a
brick-and-mortar store.
Not all information discovery tasksare considered to
be data mining.
Examples include queries, e.g., looking up individual
records in a database or
finding web pages that contain a particular set of
keywords.This is because
such taskscan be accomplished through simple
interactions with a database
management system or an information retrieval
system. These systems rely
on traditional computer science techniques, which
include sophisticated
indexing structures and query processing algorithms,
for efficiently organizing
and retrieving information from largedata repositories.
Nonetheless, data
mining techniques have been used to enhance the
performance of such
systems by improvingthe quality of the search
results based on their
relevance to the inputqueries.
Data Mining and Knowledge Discovery in
Databases
Data mining is an integral part of knowledge
discovery in databases (KDD),
which is the overall process of converting
raw data into useful information, as
shown in Figure 1.1 . This process consists of a
series of steps, from data
preprocessing to postprocessing of data mining results.
43. Figure 1.1.
The process of knowledge discovery in databases
(KDD).
The inputdata can be stored in a variety of
formats (flat files,spreadsheets, or
relational tables) and may reside in a centralized
data repository or be
distributed across multiple sites. The purpose of
preprocessing is to
transform the raw inputdata into an appropriate format
for subsequent
analysis. The stepsinvolved in data preprocessing include
fusing data from
multiple sources, cleaning data to remove noise
and duplicate observations,
and selecting records and features that are relevant to
the data mining task at
hand. Because of the many ways data can be
collected and stored, data
preprocessing is perhaps the most laborious and time-
consuming step in the
overall knowledge discovery process.
“Closing the loop” is a phrase oftenused to
refer to the process of integrating
data mining results into decision support systems.
For example, in business
applications, the insights offered by data mining
results can be integrated with
campaign management tools so that effective marketing
promotions can be
conducted and tested. Such integration requires a
44. postprocessing step to
ensure that only validand useful results are
incorporated into the decision
support system. An example of postprocessing is
visualization, which allows
analysts to explore the data and the data mining
results from a variety of
viewpoints. Hypothesis testing methods can also be
applied during
postprocessing to eliminate spurious data mining results.
(See Chapter
10 .)
1.2 Motivating Challenges
As mentioned earlier, traditional data analysis
techniques have often
encountered practical difficulties in meeting the
challenges posed by big data
applications. The following are someof the specific
challenges that motivated
the development of data mining.
Scalability
Because of advances in data generation and
collection, data sets with sizes of
terabytes, petabytes, or even exabytes are becoming
common. If data mining
algorithms are to handle thesemassive data sets,
they must be scalable.
Many data mining algorithms employ special
45. search strategies to handle
exponential search problems. Scalability may also
require the implementation
of novel data structures to access individualrecords
in an efficient manner.
For instance, out-of-core algorithms may be
necessary when processing data
sets that cannot fit into main memory. Scalability
can also be improved by
using sampling or developing parallel and
distributed algorithms. A general
overview of techniques for scaling up data mining
algorithms is given in
Appendix F.
High Dimensionality
It is now common to encounter data sets with
hundreds or thousands of
attributes instead of the handful common a few
decades ago. In
bioinformatics, progress in microarray technology
has produced gene
expression data involving thousands of features. Data
sets with temporal or
spatial components also tend to have high
dimensionality.For example,
consider a data set that contains measurements of
temperature at various
locations. If the temperature measurements are taken
repeatedly for an
extended period, the number of dimensions
(features) increases in proportion
46. to the number of measurements taken. Traditional
data analysis techniques
that were developed for low-dimensional data oftendo
not work well for such
high-dimensional data due to issues such as curse
of dimensionality (to be
discussed in Chapter 2 ). Also, for somedata
analysis algorithms, the
computational complexity increases rapidly as the
dimensionality (the number
of features) increases.
Heterogeneous and Complex Data
Traditional data analysis methods oftendeal with data
sets containing
attributes of the same type, either continuous or
categorical. As the role of
data mining in business, science, medicine, and other
fields has grown, so
has the need for techniques that can handle
heterogeneous attributes. Recent
years have also seen the emergence of more
complex data objects.
Examples of such non-traditional types of data include
web and social media
data containing text, hyperlinks, images, audio,
and videos; DNAdata with
sequential and three-dimensional structure; and climate
data that consists of
measurements (temperature, pressure, etc.) at various
times and locations on
the Earth’s surface. Techniques developed for mining
such complex objects
should take into consideration relationships in the
data, such as temporal and
47. spatial autocorrelation, graph connectivity, and
parent-child relationships
between the elements in semi-structured text and
XML documents.
Data Ownership and Distribution
Sometimes, the data needed for an analysis is
not stored in one location or
owned by one organization. Instead, the data is
geographically distributed
among resources belonging to multiple entities. This
requires the development
of distributed data mining techniques. The key
challenges faced by distributed
data mining algorithms include the following:
(1) how to reduce the amount of
communication needed to perform the distributed
computation, (2) how to
effectively consolidate the data mining results
obtained from multiple sources,
and (3) how to address data security and privacy
issues.
Non-traditionalAnalysis
The traditional statistical approach is based on a
hypothesize-and-test
paradigm. In otherwords, a hypothesis is
proposed, an experiment is
designed to gather the data, and then the data is
analyzed with respect to the
hypothesis. Unfortunately, this process is extremely
48. labor-intensive. Current
data analysis tasksoftenrequire the generation and
evaluation of thousands
of hypotheses, and consequently, the development of
somedata mining
techniques has been motivated by the desire to
automate the process of
hypothesis generation and evaluation. Furthermore,
the data sets analyzed in
data mining are typically not the result of a
carefully designed experiment and
oftenrepresent opportunistic samples of the data, rather
than random
samples.
1.3 The Origins of Data Mining
While data mining has traditionally been viewed as
an intermediate process
within the KDDframework, as shown in Figure
1.1 , it has emerged over the
years as an academic field within computer
science, focusing on all aspects of
KDD, including data preprocessing, mining, and
postprocessing.Its origin can
be traced back to the late 1980s, following a
series of workshops organized
on the topicof knowledge discovery in databases.
The workshops brought
together researchers from different disciplines to
discuss the challenges and
opportunities in applying computational techniques to
extract actionable
knowledge from largedatabases.The workshops quickly
grew into hugely
49. popular conferences that were attended by researchers
and practitioners from
both the academia and industry. The success of these
conferences, along
with the interest shown by businesses and
industry in recruiting new hires with
data mining background, have fueled the tremendous
growth of this field.
The field was initially built upon the methodology and
algorithms that
researchers had previously used. In particular,
data mining researchers draw
upon ideas, such as (1) sampling, estimation, and
hypothesis testing from
statistics and (2) search algorithms, modeling
techniques, and learning
theories from artificial intelligence, pattern recognition,
and machine learning.
Data mining has also been quick to adopt ideas
from otherareas, including
optimization, evolutionary computing, information
theory, signal processing,
visualization, and information retrieval, and extending
them to solve the
challenges of mining big data.
A number of otherareasalso play key supporting
roles. In particular, database
systems are needed to provide support for
efficient storage, indexing, and
query processing. Techniques from high performance
(parallel) computing are
50. oftenimportant in addressing the massive size of
somedata sets. Distributed
techniques can also help address the issueof size
and are essential when the
data cannot be gathered in one location. Figure
1.2 shows the relationship
of data mining to otherareas.
Figure 1.2.
Data mining as a confluence of many
disciplines.
Data Science and Data-Driven Discovery
Data science is an interdisciplinary field that studies
and applies tools and
techniques for deriving useful insights from data.
Although data science is
regarded as an emerging field with a distinct identity
of its own, the tools and
techniques oftencome from many different areasof
data analysis, such as
data mining, statistics, AI, machine learning, pattern
recognition, database
technology, and distributed and parallel computing.
(See Figure 1.2 .)
The emergence of data science as a new field is
a recognition that, often,
none of the existing areasof data analysis provides a
complete set of tools for
the data analysis tasksthat are oftenencountered in
emerging applications.
Instead, a broad range of computational,
51. mathematical, and statistical skills is
oftenrequired. To illustrate the challenges that arise in
analyzing such data,
consider the following example. Social media and
the Web present new
opportunities for social scientists to observe and
quantitatively measure
human behavior on a largescale. To conduct
such a study, social scientists
work with analysts who possess skills in areassuch as
web mining, natural
language processing (NLP), network analysis, data
mining, and statistics.
Compared to more traditional research in social
science, which is oftenbased
on surveys, this analysis requires a broader range
of skills and tools, and
involves far larger amounts of data. Thus, data
science is, by necessity, a
highly interdisciplinary field that builds on the
continuing work of many fields.
The data-driven approach of data science emphasizes
the direct discovery of
patterns and relationships from data, especially in large
quantities of data,
oftenwithout the need for extensive domain knowledge.
A notable example of
the success of this approach is represented by
advances in neural networks,
i.e., deep learning, which have been particularly
successfulin areaswhich
have long proved challenging, e.g., recognizing objects
in photos or videos
and words in speech, as well as in other
application areas. However, note that
52. this is just one example of the success of data-
driven approaches, and
dramatic improvements have also occurred in many
otherareasof data
analysis. Many of thesedevelopments are topics
described later in this book.
Some cautions on potential limitations of a
purely data-driven approach are
given in the Bibliographic Notes.
1.4 Data Mining Tasks
Data mining tasksare generally divided into two major
categories:
Predictive tasksThe objective of thesetasksis to predict
the value of a
particular attribute based on the values of other
attributes. The attribute to be
predicted is commonly known as the target or
dependent variable, while the
attributes used for making the prediction are known as
the explanatory or
independent variables.
Descriptive tasksHere, the objective is to derive
patterns (correlations,
trends, clusters, trajectories, and anomalies) that
summarize the underlying
relationships in data. Descriptive data mining tasks
are oftenexploratory in
nature and frequentlyrequire postprocessing techniques
to validate and
explain the results.
53. Figure 1.3 illustrates four of the core data mining
tasksthat are described
in the remainder of this book.
Figure 1.3.
Four of the core data mining tasks.
Predictive modeling refers to the task of building a
model for the target
variable as a function of the explanatory
variables. There are two types of
predictive modeling tasks: classification, which is
used for discrete target
variables, and regression, which is used for
continuous target variables. For
example, predicting whether a web user will make a
purchase at an online
bookstore is a classification task because the target
variable is binary-valued.
On the otherhand, forecasting the future priceof
a stock is a regressiontask
because priceis a continuous-valued attribute. The
goal of both tasksis to
learna model that minimizesthe error between the
predicted and true values
of the target variable. Predictive modeling can be
used to identify customers
who will respond to a marketing campaign,predict
disturbances in the Earth’s
ecosystem, or judge whether a patient has a
54. particular disease based on the
results of medical tests.
Example 1.1 (Predicting the Type of a Flower).
Consider the task of predicting a species of flower
based on the
characteristics of the flower. In particular,
consider classifying an Iris flower
as one of the following threeIris species: Setosa,
Versicolour, or Virginica.
To perform this task, we need a data set containing
the characteristics of
various flowers of thesethreespecies. A data set
with this type of
information is the well-known Iris data set from
the UCI Machine Learning
Repository at http://www.ics.uci.edu/~mlearn. In
addition to the species
of a flower, this data set contains four other
attributes:sepalwidth, sepal
length, petal length, and petal width. Figure 1.4
shows a plot of petal
width versus petal length for the 150 flowers in
the Iris data set. Petalwidth
is broken into the categories low, medium, and high,
which correspond to
the intervals [0, 0.75), [0.75, 1.75), ,
respectively. Also, petal
length is broken into categories low, medium,and
high,which correspond
to the intervals [0, 2.5), [2.5, 5), , respectively.
Based on these
categories of petal width and length, the following
rules can be derived:
Petalwidth low and petal length low implies Setosa.
55. Petalwidth medium and petal length medium implies
Versicolour.
Petalwidth high and petal length high implies
Virginica.
While theserules do not classify all the flowers,
they do a good (but not
perfect) job of classifying most of the flowers.
Note that flowers from the
Setosa species are well separated from the Versicolour
and Virginica
species with respect to petal width and length,
but the latter two species
overlap somewhat with respect to theseattributes.
[1.75, ∞)
[5, ∞)
http://www.ics.uci.edu/~mlearn
Figure 1.4.
Petalwidth versus petal length for 150 Iris
flowers.
Association analysis is used to discover patterns
that describe strongly
associatedfeatures in the data. The discovered
patterns are typically
represented in the form of implication rules or
feature subsets. Because of the
exponential size of its search space, the goal of
association analysis is to
56. extract the most interesting patterns in an
efficient manner. Useful applications
of association analysis include finding groups of
genes that have related
functionality, identifying web pages that are
accessed together, or
understanding the relationships between different
elements of Earth’s climate
system.
Example 1.2 (Market Basket Analysis).
The transactions shown in Table 1.1 illustrate
point-of-sale data
collected at the checkout counters of a grocery
store. Association analysis
can be applied to find items that are frequently
bought together by
customers. For example, we may discover the rule ,
which suggests that customers who buy diapers also
tend to buy milk. This
type of rule can be used to identify potential cross-
selling opportunities
among related items.
Table 1.1. Market basket data.
Transaction ID Items
1 {Bread, Butter, Diapers, Milk}
2 {Coffee, Sugar, Cookies, Salmon}
3 {Bread, Butter, Coffee, Diapers, Milk, Eggs}
57. 4 {Bread, Butter, Salmon, Chicken}
5 {Eggs, Bread, Butter}
6 {Salmon, Diapers, Milk}
7 {Bread, Tea, Sugar, Eggs}
8 {Coffee, Sugar, Chicken, Eggs}
9 {Bread, Diapers, Milk, Salt}
10 {Tea, Eggs, Cookies, Diapers, Milk}
Cluster analysis seeks to find groups of closely
related observations so that
observations that belong to the same cluster are
more similar to each other
than observations that belong to otherclusters.
Clustering has been used to
{Diapers}→{Milk}
group sets of related customers, find areasof
the ocean that have a
significant impact on the Earth’s climate, and
compress data.
Example 1.3 (Document Clustering).
The collection of news articles shown in Table
1.2 can be grouped
based on their respective topics. Each article is
represented as a set of
58. word-frequency pairs (w : c), where w is a
word and c is the number of
times the word appears in the article. There
are two natural clusters in the
data set. The first cluster consists of the first four
articles, which
correspond to news about the economy, while
the second cluster contains
the last four articles, which correspond to news
about health care. A good
clustering algorithm should be able to identify these
two clusters based on
the similarity between words that appear in the
articles.
Table 1.2. Collection of news articles.
Article Word-frequency pairs
1 dollar: 1, industry: 4, country: 2, loan:3, deal: 2,
government: 2
2 machinery: 2, labor: 3, market: 4, industry: 2,
work: 3, country: 1
3 job: 5, inflation: 3, rise: 2, jobless: 2, market:
3, country: 2, index: 3
4 domestic: 3, forecast: 2, gain:1, market: 2,
sale: 3, price: 2
5 patient: 4, symptom: 2, drug: 3, health: 2,
clinic: 2, doctor: 2
6 pharmaceutical: 2, company: 3, drug: 2,
vaccine: 1, flu: 3
59. 7 death: 2, cancer: 4, drug: 3, public: 4,
health: 3, director: 2
8 medical: 2, cost: 3, increase: 2, patient: 2,
health: 3, care: 1
Anomaly detection is the task of identifying
observations whose
characteristics are significantly different from the rest of
the data. Such
observations are known as anomalies or outliers.
The goal of an anomaly
detection algorithm is to discover the real anomalies
and avoid falsely labeling
normal objects as anomalous. In otherwords, a
good anomaly detector must
have a high detection rate and a low false alarm
rate. Applications of anomaly
detection include the detection of fraud, network
intrusions, unusual patterns
of disease, and ecosystemdisturbances, such as
droughts, floods, fires,
hurricanes, etc.
Example 1.4 (Credit Card Fraud Detection).
A credit card company records the transactions
made by every credit card
holder, along with personal information such as
credit limit, age, annual
income, and address. Since the number of
fraudulentcases is relatively
small compared to the number of legitimate
transactions, anomaly
60. detection techniques can be applied to builda
profile of legitimate
transactions for the users. When a new
transaction arrives, it is compared
against the profile of the user. If the
characteristics of the transaction are
very different from the previously created profile,
then the transaction is
flagged as potentially fraudulent.
1.5 Scope and Organization of the
Book
This book introduces the major principles and
techniques used in data mining
from an algorithmic perspective. A study of
theseprinciples and techniques is
essential for developing a better understanding of
how data mining technology
can be applied to various kinds of data. This
book also serves as a starting
pointfor readers who are interested in doing
research in this field.
We begin the technical discussion of this book
with a chapter on data
(Chapter 2 ), which discusses the basictypes of
data, data quality,
preprocessing techniques, and measures of similarity
and dissimilarity.
Although this material can be covered quickly, it
provides an essential
foundation for data analysis. Chapters 3 and 4
cover classification.
Chapter 3 provides a foundation by discussing
61. decision tree classifiersand
several issues that are important to all
classification: overfitting, underfitting,
model selection, and performance evaluation. Using
this foundation, Chapter
4 describes a number of otherimportant
classification techniques: rule-
based systems, nearest neighbor classifiers,
Bayesian classifiers, artificial
neural networks, including deep learning, support vector
machines, and
ensemble classifiers, which are collections of
classifiers. The multiclass and
imbalanced class problems are also discussed.These
topics can be covered
independently.
Association analysis is explored in Chapters 5
and 6 . Chapter 5
describes the basics of association analysis:
frequent itemsets, association
rules, and someof the algorithms used to
generate them. Specific types of
frequent itemsets—maximal, closed, and hyperclique—that
are important for
data mining are also discussed,and the chapter
concludes with a discussion
of evaluation measures for association analysis.
Chapter 6 considers a
variety of more advanced topics, including how
association analysis can be
applied to categorical and continuous data or to
data that has a concept
62. hierarchy. (A concept hierarchy is a hierarchical
categorization of objects, e.g.,
store items .) This chapter also
describes how association analysis can be extended to
find sequential
patterns (patterns involving order), patterns in graphs,
and negative
relationships (if one item is present, then the otheris
not).
Cluster analysis is discussed in Chapters 7 and 8 .
Chapter 7 first
describes the different types of clusters, and then
presents threespecific
clustering techniques: K-means, agglomerative hierarchical
clustering, and
DBSCAN.This is followed by a discussion of
techniques for validating the
results of a clustering algorithm. Additional
clustering concepts and
techniques are explored in Chapter 8 , including
fuzzy and probabilistic
clustering, Self-Organizing Maps (SOM), graph-
based clustering, spectral
clustering, and density-based clustering. There is
also a discussion of
scalability issues and factors to consider when
selecting a clustering
algorithm.
Chapter 9 , is on anomaly detection. After some
basicdefinitions, several
different types of anomaly detection are considered:
statistical, distance-
based, density-based, clustering-based, reconstruction-
based, one-class
63. classification, and information theoretic. The last
chapter, Chapter 10 ,
supplements the discussions in the otherChapters
with a discussion of the
statistical concepts important for avoiding spurious results,
and then
discusses those concepts in the context of data
mining techniques studied in
the previous chapters. These techniques include
statistical hypothesis testing,
p-values, the false discovery rate, and permutation testing.
Appendices A
through F give a brief review of important topics
that are used in portions of
store items→clothing→shoes→sneakers
the book: linear algebra, dimensionality reduction,
statistics, regression,
optimization, and scaling up data mining techniques
for big data.
The subject of data mining, while relatively young
compared to statistics or
machine learning, is already too largeto cover in
a single book. Selected
referencesto topics that are only briefly covered,
such as data quality, are
provided in the Bibliographic Notes section of
the appropriate chapter.
References to topics not covered in this book,
such as mining streaming data
and privacy-preserving data mining are provided in
the Bibliographic Notes of
64. this chapter.
1.6 Bibliographic Notes
The topicof data mining has inspired many
textbooks. Introductory textbooks
include those by Dunham [16], Han et al. [29],
Hand et al. [31], Roiger and
Geatz [50], Zaki and Meira [61], and Aggarwal [2].
Data mining books with a
stronger emphasis on business applications include
the works by Berry and
Linoff [5], Pyle [47], and Parr Rud [45]. Books with an
emphasis on statistical
learning include those by Cherkassky and Mulier
[11], and Hastie et al. [32].
Similar books with an emphasis on machine
learning or pattern recognition
are those by Duda et al. [15], Kantardzic
[34], Mitchell [43], Webb [57], and
Witten and Frank [58]. There are also somemore
specialized books:
Chakrabarti [9] (web mining), Fayyad et al. [20]
(collection of earlyarticles on
data mining), Fayyad et al. [18] (visualization),
Grossman et al. [25] (science
and engineering), Kargupta and Chan[35] (distributed
data mining), Wang et
al. [56] (bioinformatics), and Zaki and Ho [60]
(parallel data mining).
There are several conferences related to data
mining. Some of the main
conferences dedicated to this field include the ACM
SIGKDD International
65. Conference on Knowledge Discovery and Data Mining
(KDD), the IEEE
International Conference on Data Mining (ICDM),
the SIAM International
Conference on Data Mining (SDM), the European
Conference on Principles
and Practice of Knowledge Discovery in Databases
(PKDD), and the Pacific-
Asia Conference on Knowledge Discovery and Data
Mining (PAKDD). Data
mining papers can also be found in othermajor
conferences such as the
Conference and Workshop on Neural Information
Processing Systems
(NIPS),the International Conference on Machine
Learning (ICML), the ACM
SIGMOD/PODS conference, the International
Conference on Very Large Data
Bases (VLDB), the Conference on Information
and Knowledge Management
(CIKM), the International Conference on Data
Engineering (ICDE), the
National Conference on Artificial Intelligence
(AAAI), the IEEE International
Conference on Big Data, and the IEEE
International Conference on Data
Science and Advanced Analytics (DSAA).
Journal publications on data mining include IEEE
Transactions on Knowledge
and Data Engineering, Data Mining and Knowledge
Discovery, Knowledge
and Information Systems, ACM Transactions on
66. Knowledge Discovery from
Data, Statistical Analysis and Data Mining, and
Information Systems. There
are various open-source data mining software
available, including Weka [27]
and Scikit-learn [46]. More recently, data mining
software such as Apache
Mahout and Apache Spark have been developed for
large-scale problems on
the distributed computing platform.
There have been a number of general articles on
data mining that define the
field or its relationship to otherfields, particularly
statistics. Fayyad et al. [19]
describe data mining and how it fits into the total
knowledge discovery
process. Chenet al. [10] give a database perspective
on data mining.
Ramakrishnan and Grama [48] provide a general
discussion of data mining
and present several viewpoints. Hand [30]
describes how data mining differs
from statistics, as does Friedman [21]. Lambert [40]
explores the use of
statistics for largedata sets and provides somecomments on
the respective
roles of data mining and statistics. Glymour et al.
[23] consider the lessons
that statistics may have for data mining. Smyth et al.
[53] describe how the
evolution of data mining is being driven by
new types of data and applications,
such as those involving streams, graphs, and text.
Han et al. [28] consider
emerging applications in data mining and Smyth
67. [52] describes some
research challenges in data mining. Wu et al.
[59] discuss how developments
in data mining research can be turned into
practical tools. Data mining
standards are the subject of a paper by
Grossman et al. [24]. Bradley [7]
discusses how data mining algorithms can be scaled
to largedata sets.
The emergence of new data mining applications
has produced new
challenges that need to be addressed. For
instance, concerns about privacy
breaches as a result of data mining have
escalated in recent years,
particularly in application domains such as web
commerce and health care.
As a result, thereis growing interest in
developing data mining algorithms that
maintain user privacy. Developing techniques for
mining encrypted or
randomized data is known as privacy-preserving
data mining. Some
general referencesin this area include papers by
Agrawal and Srikant [3],
Clifton et al. [12] and Kargupta et al. [36].
Vassilios et al. [55] provide a survey.
Another area of concern is the bias in predictive
models that may be used for
someapplications, e.g., screening job applicants or
deciding prison parole
[39]. Assessing whether such applications are producing
biased results is
68. made more difficult by the fact that the predictive
models used for such
applications are oftenblack box models, i.e., models
that are not interpretable
in any straightforwardway.
Data science, its constituent fields, and more
generally, the new paradigm of
knowledge discovery they represent [33], have great
potential, someof which
has been realized. However, it is important to
emphasize that data science
works mostly with observational data, i.e., data that
was collected by various
organizations as part of their normal operation. The
consequence of this is
that sampling biases are common and the
determination of causal factors
becomes more problematic. For this and a number of
otherreasons, it is often
hard to interpret the predictive models built from this
data [42, 49]. Thus,
theory, experimentation and computational simulations
will continue to be the
methods of choice in many areas, especially
those related to science.
More importantly, a purely data-driven
approach oftenignores the existing
knowledge in a particular field. Such models
may perform poorly, for example,
predicting impossible outcomes or failing to
generalizeto new situations.
However, if the model does work well, e.g., has high
predictive accuracy, then
69. this approach may be sufficient for practical purposes in
somefields. But in
many areas, such as medicine and science, gaining
insight into the underlying
domain is oftenthe goal. Some recent work
attempts to address theseissues
in order to create theory-guided data science,
which takespre-existing domain
knowledge into account [17, 37].
Recent years have witnessed a growing number of
applications that rapidly
generate continuous streams of data. Examples of
stream data include
network traffic, multimedia streams, and stock
prices. Several issues must be
considered when mining data streams, such as
the limited amount of memory
available, the need for online analysis, and the change
of the data over time.
Data mining for stream data has become an
important area in data mining.
Some selected publications are Domingos and Hulten
[14] (classification),
Giannella et al. [22] (association analysis), Guha et
al. [26] (clustering), Kifer
et al. [38] (change detection), Papadimitriou et al.
[44] (time series), and Law
et al. [41] (dimensionality reduction).
Another area of interest is recommender and
collaborative filtering systems [1,
6, 8, 13, 54], which suggest movies, television
shows, books, products, etc.
70. that a person might like. In many cases,
this problem, or at least a component
of it, is treated as a prediction problem and
thus, data mining techniques can
be applied [4, 51].
Bibliography
[1] G. Adomavicius and A. Tuzhilin. Toward the
next generation of
recommender systems: A survey of the state-of-the-
art and possible
extensions. IEEE transactions on knowledge
and data engineering,
17(6):734–749,2005.
[2] C. Aggarwal.Data mining: The Textbook. Springer,
2009.
[3] R. Agrawal and R. Srikant. Privacy-preserving
data mining. In Proc. of
2000 ACMSIGMOD Intl. Conf. on Management of
Data, pages 439–450,
Dallas, Texas, 2000. ACM Press.
[4] X. Amatriain and J. M. Pujol. Data mining
methods for recommender
systems. In Recommender Systems Handbook, pages
227–262. Springer,
2015.
[5] M. J. A. Berry and G. Linoff. Data Mining
Techniques: For Marketing,
Sales, and Customer Relationship Management. Wiley
71. Computer
Publishing, 2nd edition, 2004.
[6] J. Bobadilla, F. Ortega, A. Hernando,and A.
Gutiérrez. Recommender
systems survey. Knowledge-based systems, 46:109–132,
2013.
[7] P. S. Bradley, J. Gehrke, R. Ramakrishnan,
and R. Srikant. Scaling mining
algorithms to largedatabases.Communications of
the ACM, 45(8):38–43,
2002.
[8] R. Burke. Hybrid recommender systems: Survey
and experiments. User
modeling and user-adapted interaction, 12(4):331–370,
2002.
[9] S. Chakrabarti. Mining the Web: Discovering
Knowledge from Hypertext
Data. Morgan Kaufmann, San Francisco, CA,
2003.
[10] M.-S. Chen, J. Han, and P. S. Yu. Data
Mining: An Overview from a
Database Perspective. IEEE Transactions on
Knowledge and Data
Engineering, 8(6):866–883, 1996.
[11] V. Cherkassky and F. Mulier. Learning from
Data: Concepts, Theory, and
Methods. Wiley-IEEE Press, 2nd edition, 1998.
72. [12] C. Clifton, M. Kantarcioglu, and J. Vaidya.
Defining privacy for data
mining. In National Science Foundation Workshop on
Next Generation
Data Mining, pages 126– 133, Baltimore, MD,
November2002.
[13] C. Desrosiers and G. Karypis. A
comprehensive survey of neighborhood-
based recommendation methods. Recommender systems
handbook,
pages 107–144, 2011.
[14] P. Domingos and G. Hulten. Mining high-speed
data streams. In Proc. of
the 6th Intl. Conf. on Knowledge Discovery and
Data Mining, pages 71–80,
Boston, Massachusetts, 2000. ACM Press.
[15] R. O. Duda, P. E. Hart,and D. G. Stork.
Pattern Classification. John Wiley
… Sons, Inc., New York, 2nd edition, 2001.
[16] M. H. Dunham. Data Mining: Introductory and
Advanced Topics. Prentice
Hall, 2006.
[17] J. H. Faghmous, A. Banerjee, S. Shekhar, M.
Steinbach, V. Kumar, A. R.
Ganguly, and N. Samatova.Theory-guided data science
for climate
change. Computer, 47(11):74–78, 2014.
73. [18] U. M. Fayyad, G. G. Grinstein, and A. Wierse,
editors. Information
Visualization in Data Mining and Knowledge
Discovery. Morgan Kaufmann
Publishers, San Francisco, CA, September 2001.
[19] U. M. Fayyad, G. Piatetsky-Shapiro, and P.
Smyth. From Data Mining to
Knowledge Discovery: An Overview.In Advances in
Knowledge Discovery
and Data Mining, pages 1–34. AAAI Press,
1996.
[20] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy,
editors. Advances in Knowledge Discovery and Data
Mining. AAAI/MIT
Press, 1996.
[21] J. H. Friedman. Data Mining and Statistics: What’s
the Connection?
Unpublished. www-stat.stanford.edu/~jhf/ftp/dm-stat.ps,
1997.
http://www-stat.stanford.edu/~jhf/ftp/dm-stat.ps
[22] C. Giannella, J. Han, J. Pei, X. Yan, and P.
S. Yu. Mining Frequent
Patterns in Data Streams at Multiple Time
Granularities. In H. Kargupta, A.
Joshi, K. Sivakumar, and Y. Yesha, editors,
Next Generation Data Mining,
pages 191–212. AAAI/MIT, 2003.
[23] C. Glymour, D. Madigan, D. Pregibon, and P.
74. Smyth. Statistical Themes
and Lessons for Data Mining. Data Mining and
Knowledge Discovery,
1(1):11–28, 1997.
[24] R. L. Grossman, M. F. Hornick, and G.
Meyer. Data mining standards
initiatives. Communications of the ACM,
45(8):59–61, 2002.
[25] R. L. Grossman, C. Kamath, P. Kegelmeyer,
V. Kumar, and R. Namburu,
editors. Data Mining for Scientific and Engineering
Applications. Kluwer
Academic Publishers, 2001.
[26] S. Guha, A. Meyerson, N. Mishra, R.
Motwani, and L. O’Callaghan.
Clustering Data Streams: Theory and Practice. IEEE
Transactions on
Knowledge and Data Engineering, 15(3):515–528,
May/June 2003.
[27] M. Hall, E. Frank, G. Holmes, B. Pfahringer,
P. Reutemann, and I. H.
Witten. The WEKA Data Mining Software: An Update.
SIGKDD
Explorations, 11(1), 2009.
[28] J. Han, R. B. Altman, V. Kumar, H.
Mannila, and D. Pregibon. Emerging
scientific applications in data mining. Communications
of the ACM,
45(8):54–58, 2002.
75. [29] J. Han, M. Kamber, and J. Pei. Data Mining:
Concepts and Techniques.
Morgan KaufmannPublishers, San Francisco, 3rd
edition, 2011.
[30] D. J. Hand. Data Mining: Statistics and More?
The American Statistician,
52(2): 112–118, 1998.
[31] D. J. Hand, H. Mannila, and P. Smyth.
Principles of Data Mining. MIT
Press, 2001.
[32] T. Hastie, R. Tibshirani, and J. H.
Friedman. The Elements of Statistical
Learning: Data Mining, Inference, Prediction. Springer,
2nd edition, 2009.
[33] T. Hey, S. Tansley, K. M. Tolle, et al.
The fourth paradigm: data-intensive
scientific discovery, volume 1. Microsoft research
Redmond, WA, 2009.
[34] M. Kantardzic. Data Mining: Concepts, Models,
Methods, and Algorithms.
Wiley-IEEE Press, Piscataway, NJ, 2003.
[35] H. Kargupta and P. K. Chan, editors.
Advances in Distributed and Parallel
Knowledge Discovery. AAAI Press, September
2002.
[36] H. Kargupta, S. Datta, Q. Wang, and K.
Sivakumar. On the Privacy
Preserving Properties of Random Data Perturbation
76. Techniques. In Proc.
of the 2003 IEEE Intl. Conf. on Data Mining,
pages 99–106, Melbourne,
Florida, December 2003. IEEE Computer Society.
[37] A. Karpatne, G. Atluri, J. Faghmous, M.
Steinbach, A. Banerjee, A.
Ganguly, S. Shekhar, N. Samatova,and V. Kumar.
Theory-guided Data
Science: A New Paradigm for Scientific Discovery from
Data. IEEE
Transactions on Knowledge and Data Engineering,
2017.
[38] D. Kifer, S. Ben-David, and J. Gehrke.
Detecting Change in Data
Streams. In Proc. of the 30th VLDB Conf.,
pages 180–191, Toronto,
Canada, 2004. Morgan Kaufmann.
[39] J. Kleinberg, J. Ludwig, and S.
Mullainathan. A Guide to Solving Social
Problems with Machine Learning. Harvard Business
Review, December
2016.
[40] D. Lambert. What Use is Statistics for
Massive Data? In ACM SIGMOD
Workshop on Research Issues in Data Mining and
Knowledge Discovery,
pages 54–62, 2000.
[41] M. H. C. Law, N. Zhang, and A. K. Jain.
Nonlinear Manifold Learning for
77. Data Streams. In Proc. of the SIAM Intl. Conf.
on Data Mining, Lake Buena
Vista, Florida, April 2004. SIAM.
[42] Z. C. Lipton. The mythos of model
interpretability. arXiv preprint
arXiv:1606.03490, 2016.
[43] T. Mitchell. Machine Learning. McGraw-Hill, Boston,
MA, 1997.
[44] S. Papadimitriou, A. Brockwell, and C.
Faloutsos.Adaptive, unsupervised
stream mining. VLDB Journal, 13(3):222–239,2004.
[45] O. Parr Rud. Data Mining Cookbook: Modeling
Data for Marketing, Risk
and Customer Relationship Management. John Wiley …
Sons, New York,
NY, 2001.
[46] F. Pedregosa, G. Varoquaux, A. Gramfort, V.
Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V.
Dubourg, J. Vanderplas, A.
Passos, D. Cournapeau, M. Brucher, M. Perrot,
and E. Duchesnay. Scikit-
learn: Machine Learning in Python. Journal of
Machine Learning
Research, 12:2825–2830, 2011.
[47] D. Pyle.Business Modeling and Data Mining. Morgan
Kaufmann, San
Francisco, CA, 2003.
78. [48] N. Ramakrishnan and A. Grama. Data Mining:
From Serendipity to
Science—Guest Editors’ Introduction. IEEE
Computer, 32(8):34–37, 1999.
[49] M. T. Ribeiro, S. Singh, and C. Guestrin.
Why should i trust you?:
Explaining the predictions of any classifier. In
Proceedings of the 22nd
ACM SIGKDD International Conference on
Knowledge Discovery and
Data Mining, pages 1135–1144. ACM, 2016.
[50] R. Roiger and M. Geatz. Data Mining: A
Tutorial Based Primer. Addison-
Wesley, 2002.
[51] J. Schafer. The Application of Data-Mining to
Recommender Systems.
Encyclopedia of data warehousing and mining, 1:44–
48, 2009.
[52] P. Smyth. Breaking out of the Black-Box:
Research Challenges in Data
Mining. In Proc. of the 2001 ACM SIGMOD
Workshop on Research Issues
in Data Mining and Knowledge Discovery, 2001.
[53] P. Smyth, D. Pregibon, and C. Faloutsos.Data-
driven evolution of data
mining algorithms. Communications of the ACM,
45(8):33–37, 2002.
79. [54] X. Su and T. M. Khoshgoftaar. A survey of
collaborative filtering
techniques. Advances in artificial intelligence, 2009:4,
2009.
[55] V. S. Verykios, E. Bertino, I. N. Fovino, L.
P. Provenza, Y. Saygin, and Y.
Theodoridis. State-of-the-artin privacy preserving
data mining. SIGMOD
Record, 33(1):50–57, 2004.
[56] J. T. L. Wang, M. J. Zaki, H.
Toivonen, and D. E. Shasha, editors. Data
Mining in Bioinformatics. Springer, September
2004.
[57] A. R. Webb. Statistical Pattern Recognition.
John Wiley … Sons, 2nd
edition, 2002.
[58] I. H. Witten and E. Frank. Data Mining:
Practical Machine Learning Tools
and Techniques. Morgan Kaufmann, 3rd edition,
2011.
[59] X. Wu, P. S. Yu, and G. Piatetsky-Shapiro.
Data Mining: How Research
Meets Practical Development? Knowledge and
Information Systems,
5(2):248–261, 2003.
[60] M. J. Zaki and C.-T. Ho, editors. Large-Scale
Parallel Data Mining.
Springer, September 2002.
80. [61] M. J. Zaki and W. Meira Jr. Data Mining
and Analysis: Fundamental
Concepts and Algorithms. Cambridge University
Press, New York, 2014.
1.7 Exercises
1. Discuss whether or not each of the following
activities is a data mining task.
a. Dividing the customers of a company according to
their gender.
b. Dividing the customers of a company according to
their profitability.
c. Computing the total sales of a company.
d. Sorting a student database based on student
identification numbers.
e. Predicting the outcomes of tossing a (fair)
pair of dice.
f. Predicting the future stock priceof a
company using historical records.
g. Monitoring the heartrate of a patient for
abnormalities.
h. Monitoring seismic waves for earthquake
activities.
i. Extracting the frequencies of a sound
81. wave.
2. Suppose that you are employed as a data mining
consultantfor an Internet
search engine company. Describe how data mining
can help the company by
giving specific examples of how techniques, such as
clustering, classification,
association rule mining, and anomaly detection can be
applied.
3. For each of the following data sets, explain
whether or not data privacy is
an important issue.
a. Census data collected from 1900–1950.
b. IP addresses and visit times of web userswho
visit your website.
c. Images from Earth-orbiting satellites.
d. Names and addresses of people from the
telephone book.
e. Names and email addresses collected from the
Web.
2 Data
This chapter discusses several data-related issues
that
82. are important for successfuldata mining:
The Type of Data Data sets differ in a number of
ways. For example, the
attributes used to describe data objects can be of
different types—quantitative
or qualitative—and data sets oftenhave special
characteristics; e.g., some
data sets contain time series or objects with explicit
relationships to one
another. Not surprisingly, the type of data determines
which tools and
techniques can be used to analyze the data. Indeed,
new research in data
mining is oftendriven by the need to
accommodate new application areasand
their new types of data.
The Quality of the Data Data is oftenfar from perfect.
While most data
mining techniques can tolerate somelevel of
imperfection in the data, a focus
on understanding and improvingdata quality typically
improves the quality of
the resulting analysis. Data quality issues that often
need to be addressed
include the presence of noise and outliers;
missing, inconsistent, or duplicate
data; and data that is biased or, in someotherway,
unrepresentative of the
phenomenon or population that the data is
supposed to describe.
Preprocessing Steps to Make the Data More
Suitable for Data Mining
Often, the raw data must be processed in order to
83. make it suitable for
analysis. While one objective may be to improve
data quality, othergoals
focus on modifyingthe data so that it better
fits a specified data mining
technique or tool. For example, a continuous
attribute, e.g., length, sometimes
needs to be transformed into an attribute with
discrete categories, e.g., short,
medium, or long,in order to apply a
particular technique. As another example,
the number of attributes in a data set is often
reduced because many
techniques are more effective when the data has a
relatively small number of
attributes.
AnalyzingData in Terms of Its Relationships One
approach to data
analysis is to find relationships among the data
objects and then perform the
remaining analysis using theserelationships rather
than the data objects
themselves. For instance, we can compute the
similarity or distance between
pairs of objects and then perform the analysis—
clustering, classification, or
anomaly detection—based on thesesimilarities or
distances. There are many
such similarity or distance measures, and the proper
choice depends on the
type of data and the particular application.
84. Example 2.1 (An Illustration of Data-Related
Issues).
To further illustrate the importance of theseissues,
consider the following
hypothetical situation. You receive an email from a
medical researcher
concerning a project that you are eager to
work on.
Hi,
I’ve attached the data file that I mentioned in
my previous email. Each line contains the
information for a single patient and consists of
five fields. We want to predict the last field
using
the otherfields. I don’t have time to provide
any more information about the data sinceI’m
going
out of town for a couple of days, but
hopefully that won’t slow you down too much.
And if you
don’t mind, could we meet when I get
back to discuss your preliminary results? I
might invite a
few othermembers of my team.
Thanks and see you in a couple of days.
85. Despite somemisgivings, you proceed to analyze
the data. The first few rows
of the file are as follows:
012 232 33.5 0 10.7
020 121 16.9 2 210.1
027 165 24.0 0 427.6
⋮
A brief look at the data reveals nothing strange.
You put your doubts aside
and start the analysis. There are only 1000 lines, a
smaller data file than you
had hoped for, but two days later, you feel that
you have made some
progress. You arrive for the meeting, and while
waiting for others to arrive, you
strike up a conversation with a statistician
who is working on the project.
When she learns that you have also been analyzing
the data from the project,
she asks if you would mind giving her a brief
overview of your results.
Statistician: So, you got the data for all the
patients?
Data Miner: Yes. I haven’t had much time for
analysis, but I do have a
few interesting results.
Statistician: Amazing. There were so many data
issues with this set of
86. patients that I couldn’t do much.
Data Miner: Oh? I didn’t hear about any
possible problems.
Statistician: Well, first thereis field 5, the
variable we want to predict.
It’s common knowledge among people who analyze
this type of data
that results are better if you work with the log of
the values, but I didn’t
discover this until later. Was it mentioned to
you?
Data Miner: No.
Statistician: But surely you heard about what
happened to field 4? It’s
supposed to be measured on a scalefrom 1 to
10, with 0 indicating a
missing value, but because of a data entryerror,
all 10’s were changed
into 0’s. Unfortunately, sincesomeof the patients have
missing values
for this field, it’s impossible to say whether a
0 in this field is a real 0 or
a 10. Quite a few of the records have that
problem.
Data Miner: Interesting. Were thereany other
problems?
Statistician: Yes, fields 2 and 3 are basically
87. the same, but I assume
that you probably noticed that.
Data Miner: Yes, but thesefields were only weak
predictors of field 5.
Statistician: Anyway, given all those problems,
I’m surprised you were
able to accomplish anything.
Data Miner: True, but my results are really
quitegood. Field1 is a very
strong predictor of field 5. I’m surprised that this
wasn’t noticed before.
Statistician: What? Field1 is just an
identification number.
Data Miner: Nonetheless, my results speak for
themselves.
Statistician: Oh, no! I just remembered. We
assigned ID numbers after
we sorted the records based on field 5. There
is a strong connection,
but it’s meaningless. Sorry.
Although this scenario represents an extreme situation, it
emphasizes the
importance of “knowing your data.” To that end,
this chapter will address each
of the four issues mentioned above, outlining
someof the basicchallenges
88. and standard approaches.
2.1 Types of Data
A data set can oftenbe viewed as a collection of
data objects. Other names
for a data object are record, point, vector,
pattern, event, case, sample,
instance, observation, or entity. In turn, data objects
are described by a
number of attributes that capture the characteristics of
an object, such as the
mass of a physical object or the time at which
an event occurred. Other
names for an attribute are variable, characteristic,
field, feature, or dimension.
Example 2.2 (Student Information).
Often, a data set is a file, in which the
objects are records (or rows) in the
file and each field (or column) corresponds to an
attribute. For example,
Table 2.1 shows a data set that consists of
student information. Each
row corresponds to a student and each column is
an attribute that
describes someaspect of a student, such as grade
pointaverage (GPA) or
identification number (ID).
Table 2.1. A sample data set containing student
information.
Student ID Year Grade Point Average (GPA) …
89. ⋮
1034262 Senior 3.24 …
1052663 Freshman 3.51 …
1082246 Sophomore 3.62 …
Although record-based data sets are common, either in
flat files or relational
database systems, thereare otherimportant types of
data sets and systems
for storing data. In Section 2.1.2 , we will discuss
someof the types of data
sets that are commonly encountered in data mining.
However, we first
consider attributes.
2.1.1Attributes and Measurement
In this section, we consider the types of
attributes used to describe data
objects. We first define an attribute, then consider
what we mean by the type
of an attribute, and finally describe the types of
attributes that are commonly
encountered.
What Is an Attribute?
We start with a more detailed definition of an
attribute.
Definition2.1.
An attribute is a property or characteristic of an
90. object that can
vary,either from one object to another or from
one time to
another.
For example, eye colorvaries from person to person,
while the temperature of
an object varies over time.Note that eye coloris a
symbolic attribute with a
small number of possible values {brown, black,
blue, green, hazel, etc.} , while
temperature is a numerical attribute with a
potentially unlimited number of
values.
At the most basiclevel, attributes are not about
numbers or symbols.
However, to discuss and more precisely analyze the
characteristics of objects,
we assign numbers or symbols to them. To do
this in a well-defined way, we
need a measurement scale.
Definition2.2.
A measurement scaleis a rule (function) that
associates a
numerical or symbolic value with an attribute of an
object.
Formally, the process of measurement is the
application of a measurement
scaleto associate a value with a particular
attribute of a specific object. While
91. this may seema bit abstract, we engage in the
process of measurement all
the time.For instance, we step on a bathroom scaleto
determine our weight,
we classify someone as male or female, or we
count the number of chairs in a
roomto see if therewill be enough to seat all
the people coming to a meeting.
In all thesecases, the “physical value” of an
attribute of an object is mapped
to a numerical or symbolic value.
With this background, we can discuss the type of an
attribute, a concept that
is important in determining if a particular data
analysis technique is consistent
with a specific type of attribute.
The Type of an Attribute
It is common to refer to the type of an
attribute as the type of a measurement
scale. It should be apparent from the previous
discussion that an attribute can
be described using different measurement scales
and that the properties of an
attribute need not be the same as the properties of
the values used to
measure it. In otherwords, the values used to
represent an attribute can have
properties that are not properties of the attribute itself,
and vice versa. This is
illustratedwith two examples.
Example 2.3 (Employee Age and ID Number).
92. Two attributes that might be associatedwith an
employee are ID and age
(in years). Both of theseattributes can be represented
as integers.
However, while it is reasonable to talk about
the average age of an
employee,it makes no sense to talk about
the average employee ID.
Indeed, the only aspect of employees that we
want to capture with the ID
attribute is that they are distinct. Consequently, the
only validoperation for
employee IDs is to test whether they are equal.
There is no hint of this
limitation, however, when integers are used to
represent the employee ID
attribute. For the age attribute, the properties of the
integers used to
represent age are very much the properties of the
attribute. Even so, the
correspondence is not complete because, for
example, ages have a
maximum, while integers do not.
Example 2.4 (Length of Line Segments).
Consider Figure 2.1 , which shows someobjects—
line segments—and
how the length attribute of theseobjects can be
mapped to numbers in two
different ways. Each successive line segment, going
from the top to the
bottom, is formed by appendingthe topmost line
segment to itself. Thus,
93. the second line segment from the top is formed by
appendingthe topmost
line segment to itselftwice, the third line segment
from the top is formed by
appendingthe topmost line segment to itselfthreetimes,
and so forth. In a
very real (physical) sense, all the line segments are
multiples of the first.
This fact is captured by the measurements on the
right side of the figure,
but not by those on the left side. More
specifically, the measurement scale
on the left side captures only the ordering of the
length attribute, while the
scaleon the right side captures both the ordering and
additivity properties.
Thus, an attribute can be measured in a way
that does not capture all the
properties of the attribute.
Figure 2.1.
The measurement of the length of line segments on
two different scales of
measurement.
Knowing the type of an attribute is important because
it tells us which
properties of the measured values are consistent with
the underlying
properties of the attribute, and therefore, it allows us
to avoid foolish actions,
such as computing the average employee ID.
94. The Different Types of Attributes
A useful (and simple) way to specify the type of
an attribute is to identify the
properties of numbers that correspond to underlying
properties of the attribute.
For example, an attribute such as length has many
of the properties of
numbers. It makes sense to compare and order
objects by length, as well as
to talk about the differences and ratios of
length. The following properties
(operations) of numbers are typically used to
describe attributes.
1. Distinctness and
2. Order and
3. Addition and
4. Multiplication and /
Given theseproperties, we can define four types
of attributes: nominal ,
ordinal, interval , and ratio. Table 2.2 gives
the definitions of thesetypes,
along with information about the statistical
operations that are validfor each
type. Each attribute type possesses all of the properties
and operations of the
attribute types above it. Consequently, any
property or operation that is valid
for nominal, ordinal, and interval attributes is also
validfor ratio attributes. In
otherwords, the definition of the attribute types is
cumulative. However, this
does not mean that the statistical operations
appropriate for one attribute type
are appropriate for the attribute types above it.
95. Table 2.2. Different attribute types.
Attribute Type Description Examples Operations
Categorical Nominal The values of a nominal
attribute zip codes, mode,
= ≠
<, ≤, >, ≥
+ −
×
(Qualitative) are just different names; i.e.,
nominal values provide only
enough information to distinguish
one object from another.
employee ID
numbers, eye
color, gender
entropy,
contingency
correlation,
test
Ordinal The values of an ordinal attribute
provide enough information to
order objects.
hardness of
minerals, {good,
96. better, best},
grades, street
numbers
median,
percentiles,
rank
correlation,
run tests,
sign tests
Numeric
(Quantitative)
Interval For interval attributes, the
differences between values are
meaningful, i.e., a unit of
measurement exists.
calendar dates,
temperature in
Celsius or
Fahrenheit
mean,
standard
deviation,
Pearson’s
correlation,
t and F
tests
Ratio For ratio variables, both
differences and ratios are
meaningful.
97. temperature in
Kelvin, monetary
quantities, counts,
age, mass,
length, electrical
current
geometric
mean,
harmonic
mean,
percent
variation
Nominal and ordinal attributes are collectively
referred to as categorical or
qualitative attributes. As the name suggests,
qualitative attributes, such as
employee ID, lack most of the properties of numbers.
Even if they are
represented by numbers, i.e., integers, they should be
treated more like
symbols. The remaining two types of attributes, interval
and ratio, are
collectively referred to as quantitative or
numeric attributes. Quantitative
attributes are represented by numbers and have most of
the properties of
(=, ≠) χ2
(<, >)
(+, −)
(×, /)
98. numbers. Note that quantitative attributes can be integer-
valued or
continuous.
The types of attributes can also be described in
terms of transformations that
do not change the meaning of an attribute. Indeed,
S. Smith Stevens, the
psychologist who originally defined the types of
attributes shown in Table
2.2 , defined them in terms of thesepermissible
transformations. For
example, the meaning of a length attribute is
unchanged if it is measured in
meters instead of feet.
The statistical operations that make sense for a
particular type of attribute are
those that will yieldthe same results when the
attribute is transformed by
using a transformation that preserves the attribute’s
meaning. To illustrate, the
average length of a set of objects is
different when measured in meters rather
than in feet, but both averages represent the same length.
Table 2.3 shows
the meaning-preserving transformations for the four
attribute types of Table
2.2 .
Table 2.3. Transformations that define attribute
levels.
99. Attribute Type Transformation Comment
Categorical
(Qualitative)
Nominal Any one-to-one mapping,
e.g., a permutation of values
If all employee ID numbers are
reassigned, it will not make any
difference.
Ordinal An order-preserving change
of values, i.e.,
where f is a monotonic
function.
An attribute encompassing the notion
of good, better, best can be
represented equally well by the values
{1, 2, 3} or by {0.5, 1, 10}.
Numeric
(Quantitative)
Interval
a and b constants.
The Fahrenheit and Celsius
temperature scales differ in the
new_value=f(old_value),
new_value=a×old_value+b,
100. location of their zero value and the
size of a degree (unit).
Ratio Length can be measured in meters or
feet.
Example 2.5 (Temperature Scales).
Temperature provides a good illustration of someof
the concepts that have
been described.First, temperature can be either an
interval or a ratio
attribute, dependingon its measurement scale. When
measured on the
Kelvin scale, a temperature of 2 is, in a
physicallymeaningful way, twice
that of a temperature of 1 . This is not true
when temperature is measured
on either the Celsius or Fahrenheit scales,
because, physically, a
temperature of 1 Fahrenheit (Celsius) is not
much different than a
temperature of 2 Fahrenheit (Celsius). The
problem is that the zero points
of the Fahrenheit and Celsius scales are, in a
physical sense, arbitrary,
and therefore, the ratio of two Celsius or Fahrenheit
temperatures is not
physicallymeaningful.
Describing Attributes by the Number of Values
101. An independent way of distinguishing between
attributes is by the number of
values they can take.
Discrete A discrete attribute has a finite or
countably infinite set of values.
Such attributes can be categorical, such as zip codes
or ID numbers, or
numeric, such as counts. Discrete attributes are often
represented using
integer variables. Binary attributes are a special
case of discrete attributes
and assume only two values, e.g., true/false, yes/no,
male/female, or 0/1.
new_value=a×old_value
◦
◦
◦
◦
Binary attributes are oftenrepresented as Boolean
variables, or as integer
variables that only take the values 0 or 1.
Continuous A continuous attribute is one whose
values are real numbers.
Examples include attributes such as temperature, height,
or weight.
Continuous attributes are typically represented as
102. floating-point variables.
Practically, real values can be measured and
represented only with limited
precision.
In theory, any of the measurement scaletypes—
nominal, ordinal, interval, and
ratio—could be combined with any of the types
based on the number of
attribute values—binary, discrete, and continuous.
However, some
combinations occur only infrequently or do not
make much sense. For
instance, it is difficult to thinkof a realistic
data set that contains a continuous
binary attribute. Typically,nominal and ordinal
attributes are binary or discrete,
while interval and ratio attributes are continuous.
However, count attributes ,
which are discrete, are also ratio attributes.
Asymmetric Attributes
For asymmetric attributes, only presence—a non-zero
attribute value—is
regarded as important. Consider a data set in
which each object is a student
and each attribute records whether a student took a
particular course at a
university. For a specific student, an attribute
has a value of 1 if the student
took the course associatedwith that attribute and a
value of 0 otherwise.
Because students take only a small fraction of all
available courses, most of
the values in such a data set would be 0.
Therefore, it is more meaningful and