Chapter 24
Data Mining: A
Research Tool
Objectives
1. Describe big data.
2. Assess knowledge discovery in data.
3. Explore data mining.
4. Compare data mining models.
Data Mining
• Iterative process
• Explores and models big data
• Identifies patterns
• Provides meaningful insights
Big Data
IBM (2013) describes big data in a way that is
easy to understand.
Every day, we create 2.5 quintillion bytes of data —
so much that 90% of the data in the world today has
been created in the last two years alone. This data
comes from everywhere: sensors used to gather
climate information, posts to social media sites,
digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few.
This data is big data (p. 1).
Data Mining Focus
• Producing a solution that generates useful
forecasting through a four phase process:
– 1. Problem identification,
– 2. Exploration of the data,
– 3. Pattern discovery, and
– 4. Knowledge deployment, application to new
data to forecast or generate predictions.
Data Mining Facilitates
• Data exploration and resulting
knowledge discovery fosters
proactive, knowledge driven decision
making
Exploratory Data Analysis (EDA)
• Sometimes known as model building or
pattern identification
• Pattern discovery is a complex phase of data
mining
• Yields a highly predictive, consistent pattern
identifying model
Data Mining Known as KDD
• KDD is known as
–knowledge discovery and data mining
–knowledge discovery and data
–knowledge discovery in databases
KDD
• Term knowledge discovery is key
• Data mining looks at the data from different
vantage points, aspects and perspectives
• Brings new insights to the data set
Data Mining Defined
Process of finding correlations or patterns
among the data.
KDD and Research
• Berger and Berger (2004)
–nurse researchers are positioned to
use data mining technologies to
transform the repositories of big data
into comprehensible knowledge that is
useful for guiding nursing practice and
facilitating interdisciplinary research.
CART
(classification and regression trees)
• data mining method for
analyzing outcomes and
service use
Data Mining Concepts
• Bagging
• Boosting
• Data reduction
• Drill down
• EDA
• Feature selection
• Machine learning
• Meta-learning
• Predictive
• Stacking
Data Mining Techniques
• Neural networks
• Decision trees
– Chi square automatic interaction detection (CHAID)
• Rule induction
• Algorithm
• Nearest neighbor
• Text mining
• Online Analytic Processing (OLAP)
• Brushing
Data Mining Models
• CRISP-DM
– 6 steps: business understanding, data
understanding, data preparation, modeling,
evaluation and deployment
• Six Sigma
– DMAIC steps: define, measure, analyze, improve
and control.
• SEMMA
– sample, explore, modify, model, assess
Benefits of KDD
• Enhance business aspects
• Help to improve patient care
Ethics of Data Mining
• Dependent on the use of private health
information (PHI)
• Insure data is de-identified and
confidentiality maintained
• Follow changes and specific
requirements for compliance with HIPAA
laws
References
• Berger, A. M., & Berger, C. R. (2004). Data mining as a tool for research and
knowledge development in nursing. Comput Inform Nurs, 22(3), 123-131.
PubMed ID: 15520581
• DeGruy, K. B. (2000). Healthcare applications of knowledge discovery in
databases. J Healthc Inf Manag, 14(2), 59-69. PubMed ID: 11066649
• Fernández-Llatas, C., Garcia-Gomez, J. M., Vicente, J., Naranjo, J. C.,
Robles, M., Benedi, J. M., & Traver, V. (2011). Behaviour patterns detection
for persuasive design in Nursing Homes to help dementia patients. Conf
Proc IEEE Eng Med Biol Soc, 2011, 6413-6417. PubMed ID: 22255806
• Goodwin, L., Saville, J., Jasion, B., Turner, B., Prather, J., Dobousek, T., &
Egger, S. (1997). A collaborative international nursing informatics research
project: predicting ARDS risk in critically ill patients. Stud Health Technol
Inform, 46, 247-249. PubMed ID: 10175406
References
• Green, J., Paladugu, S., Shuyu, X., Stewart, B., Shyu, C.,
& Armer, J. (2013). Using temporal mining to examine
the development of lymphedema in breast cancer
survivors. Nurs Res, 62(2), 122-129. PubMed ID:
23458909
• IBM. (2013). Big data at the speed of business.
Retrieved from http://www-
01.ibm.com/software/data/bigdata/
• Lee, T., Lin K., Mills, M., & Kuo, Y. (2012). Factors
related to the prevention and management of pressure
ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed
ID: 22584879
References
• Lee, T., Lin K., Mills, M., & Kuo, Y. (2012). Factors
related to the prevention and management of pressure
ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed
ID: 22584879
• Lee, T., Liu, C., Kuo, Y., Mills, M., Fong, J., & Hung, C.
(2011). Application of data mining to the identification
of critical factors in patient falls using a web-based
reporting system. Int J Med Inform, 80(2), 141-150.
PubMed ID: 21115393
• Madigan, E. & Curet, O. (2006). A data mining approach
in home healthcare: outcomes and service use. BMC
Health Serv Res, 6, 18. PubMed ID: 16504115
References
• Manyika, J., Chu, M., Brown, B., Bughin, J.,
Dobbs, R., Roxburgh, C., & Byers, A. (2011).
McKinsey Global Institute: Big data: The next
frontier for innovation, competition, and
productivity. Retrieved from
http://www.mckinsey.com/insights/business_
technology/big_data_the_next_frontier_for_i
nnovation
References
• SAS. (n.d.). SAS enterprise miner. Retrieved from
http://www.sas.com/offices/europe/uk/technologies/analy
tics/datamining/miner/semma.html
• Tishgart, D. (2012). Why security matters for big data and
health care: Data integrity requires good data security.
Retrieved from http://soa.sys-con.com/node/2389698
• Trangenstein, P., Weiner, E., Gordon, J., & McNew, R.
(2007). Data mining results from an electronic clinical log
for nurse practitioner students. Stud Health Technol Inform,
2007; 129, 1387-1391. PubMed ID: 17911941
• Zupan, B. & Demsar, J. (2008). Open-source tools for data
mining. Clin Lab Med, 28(1), 37-54. PubMed ID: 18194717

Chapter 24

  • 1.
    Chapter 24 Data Mining:A Research Tool
  • 2.
    Objectives 1. Describe bigdata. 2. Assess knowledge discovery in data. 3. Explore data mining. 4. Compare data mining models.
  • 3.
    Data Mining • Iterativeprocess • Explores and models big data • Identifies patterns • Provides meaningful insights
  • 4.
    Big Data IBM (2013)describes big data in a way that is easy to understand. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data (p. 1).
  • 5.
    Data Mining Focus •Producing a solution that generates useful forecasting through a four phase process: – 1. Problem identification, – 2. Exploration of the data, – 3. Pattern discovery, and – 4. Knowledge deployment, application to new data to forecast or generate predictions.
  • 6.
    Data Mining Facilitates •Data exploration and resulting knowledge discovery fosters proactive, knowledge driven decision making
  • 7.
    Exploratory Data Analysis(EDA) • Sometimes known as model building or pattern identification • Pattern discovery is a complex phase of data mining • Yields a highly predictive, consistent pattern identifying model
  • 8.
    Data Mining Knownas KDD • KDD is known as –knowledge discovery and data mining –knowledge discovery and data –knowledge discovery in databases
  • 9.
    KDD • Term knowledgediscovery is key • Data mining looks at the data from different vantage points, aspects and perspectives • Brings new insights to the data set
  • 10.
    Data Mining Defined Processof finding correlations or patterns among the data.
  • 11.
    KDD and Research •Berger and Berger (2004) –nurse researchers are positioned to use data mining technologies to transform the repositories of big data into comprehensible knowledge that is useful for guiding nursing practice and facilitating interdisciplinary research.
  • 12.
    CART (classification and regressiontrees) • data mining method for analyzing outcomes and service use
  • 13.
    Data Mining Concepts •Bagging • Boosting • Data reduction • Drill down • EDA • Feature selection • Machine learning • Meta-learning • Predictive • Stacking
  • 14.
    Data Mining Techniques •Neural networks • Decision trees – Chi square automatic interaction detection (CHAID) • Rule induction • Algorithm • Nearest neighbor • Text mining • Online Analytic Processing (OLAP) • Brushing
  • 15.
    Data Mining Models •CRISP-DM – 6 steps: business understanding, data understanding, data preparation, modeling, evaluation and deployment • Six Sigma – DMAIC steps: define, measure, analyze, improve and control. • SEMMA – sample, explore, modify, model, assess
  • 16.
    Benefits of KDD •Enhance business aspects • Help to improve patient care
  • 17.
    Ethics of DataMining • Dependent on the use of private health information (PHI) • Insure data is de-identified and confidentiality maintained • Follow changes and specific requirements for compliance with HIPAA laws
  • 18.
    References • Berger, A.M., & Berger, C. R. (2004). Data mining as a tool for research and knowledge development in nursing. Comput Inform Nurs, 22(3), 123-131. PubMed ID: 15520581 • DeGruy, K. B. (2000). Healthcare applications of knowledge discovery in databases. J Healthc Inf Manag, 14(2), 59-69. PubMed ID: 11066649 • Fernández-Llatas, C., Garcia-Gomez, J. M., Vicente, J., Naranjo, J. C., Robles, M., Benedi, J. M., & Traver, V. (2011). Behaviour patterns detection for persuasive design in Nursing Homes to help dementia patients. Conf Proc IEEE Eng Med Biol Soc, 2011, 6413-6417. PubMed ID: 22255806 • Goodwin, L., Saville, J., Jasion, B., Turner, B., Prather, J., Dobousek, T., & Egger, S. (1997). A collaborative international nursing informatics research project: predicting ARDS risk in critically ill patients. Stud Health Technol Inform, 46, 247-249. PubMed ID: 10175406
  • 19.
    References • Green, J.,Paladugu, S., Shuyu, X., Stewart, B., Shyu, C., & Armer, J. (2013). Using temporal mining to examine the development of lymphedema in breast cancer survivors. Nurs Res, 62(2), 122-129. PubMed ID: 23458909 • IBM. (2013). Big data at the speed of business. Retrieved from http://www- 01.ibm.com/software/data/bigdata/ • Lee, T., Lin K., Mills, M., & Kuo, Y. (2012). Factors related to the prevention and management of pressure ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed ID: 22584879
  • 20.
    References • Lee, T.,Lin K., Mills, M., & Kuo, Y. (2012). Factors related to the prevention and management of pressure ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed ID: 22584879 • Lee, T., Liu, C., Kuo, Y., Mills, M., Fong, J., & Hung, C. (2011). Application of data mining to the identification of critical factors in patient falls using a web-based reporting system. Int J Med Inform, 80(2), 141-150. PubMed ID: 21115393 • Madigan, E. & Curet, O. (2006). A data mining approach in home healthcare: outcomes and service use. BMC Health Serv Res, 6, 18. PubMed ID: 16504115
  • 21.
    References • Manyika, J.,Chu, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. (2011). McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity. Retrieved from http://www.mckinsey.com/insights/business_ technology/big_data_the_next_frontier_for_i nnovation
  • 22.
    References • SAS. (n.d.).SAS enterprise miner. Retrieved from http://www.sas.com/offices/europe/uk/technologies/analy tics/datamining/miner/semma.html • Tishgart, D. (2012). Why security matters for big data and health care: Data integrity requires good data security. Retrieved from http://soa.sys-con.com/node/2389698 • Trangenstein, P., Weiner, E., Gordon, J., & McNew, R. (2007). Data mining results from an electronic clinical log for nurse practitioner students. Stud Health Technol Inform, 2007; 129, 1387-1391. PubMed ID: 17911941 • Zupan, B. & Demsar, J. (2008). Open-source tools for data mining. Clin Lab Med, 28(1), 37-54. PubMed ID: 18194717