SlideShare a Scribd company logo
BAS 250
Lesson 1: Introduction to Data Mining
• Rapid Miner
– [Plain text URL:
https://rapidminer.com/products/studio/]
• RapidMiner Studio: 6.5 or greater
Software
Please follow link above to download the
free software.
• Define the discipline of Data Mining
• List and define various types of data
• List and define various sources of data
• Explain the fundamental differences
between databases, data warehouses,
and data sets
Learning Objectives (1 of 2)
• Explain some of the ethical dilemmas
associated with data mining and outline
possible solutions
• Explain the CRISP-DM Method
Learning Objectives (2 of 2)
Data Mining
• 15 out of 17 sectors in the United States have more data
stored per company than the US Library of Congress
• $5 million vs. $400: Price of the fastest supercomputer in
1975 and an iPhone with equal performance
• $600 to buy a disk drive that can store all of the world’s music
• 5 billion mobile phones in use in 2010
• 30 billion pieces of content shared on Facebook every month
• 40% projected growth in global data generated per year vs.
5% growth in IT spending
• 235 terabytes of data collected by the US Library of Congress
by April 2011
Why Data Mining?
Why Mine Data?
 Lots of data is being collected and stored
 Web data, e-commerce, point of sale
 Credit card transactions, social media
 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge
 e.g. in Customer Relationship Management
 Information is valuable and can be monetized
• Demand for deep analytical talent in the
United States could be 50-60% greater
than its projected supply by 2018
Demand for Data Mining
• Data contains value and knowledge, but to
extract the knowledge, data needs to be
– Stored
– Managed
– Analyzed  this class
• Data Mining ≈ Big Data ≈
Predictive Analytics ≈ Data Science
Why Data Mining?
• “An interdisciplinary subfield of computer
science. It is the computational process of
discovering patterns in large data sets
involving methods at the intersection of
artificial intelligence, machine learning,
statistics, and database systems. The overall
goal of data mining is to extract information
from a data set and transform it into an
understandable structure for further use.”
– (Wikipedia) [Plain text URL:
https://en.wikipedia.org/wiki/Data_mining]
What is Data Mining? (1 of 4)
• Data Enormity Issue
• Discover patterns and models that are:
• Valid: data has some certainty
• Useful: should be able to act on the insight
• Unexpected: non-obvious to the system
• Understandable: humans can interpret the patterns
What is Data Mining? (2 of 4)
• Descriptive methods
– Find patterns that describe the data
• Example: Clustering with k-means
• Predictive methods
– Use target variables to predict unknown or
future values of other variables
• Example: Scoring with neural networks
What is Data Mining? (3 of 4)
What is Data Mining? (4 of 4)
Data Mining
• Prevalence of names in
US locations
• O’Brien, O’Rurke,
O’Reilly in Boston Area
• Group together similar
documents
• Returned by search
engine according to
their context
Not Data Mining
• Look up phone number
in a phone directory
• Query a web search
engine for information
about “Amazon”
Origins of Data Mining
Data is Highly Dimensional
• Scalability
• Dimensionality
• Complex and Heterogeneous Data
• Data Quality
• Data Ownership and Distribution
• Privacy Preservation
• Streaming Data
Challenges of Data Mining
• Garbage In, Garbage Out (GIGO)
– Collected incorrectly
– Out-of-date
• Day-to-day:
– Use available resources
– Acceptable risk
– Professional experience
– Common sense
Limits to Data Mining
• A risk with “Data mining” is that an analyst can
“discover” patterns that are meaningless.
• Statisticians call it Bonferroni’s principle:
– “If you look in more places for interesting patterns
than your amount of data will support, you are bound
to find crap”
Meaningfulness of Analytic Answers (1 of 2)
18
Meaningfulness of Analytic Answers (2 of 2)
19
National Security Agency example:
“We consider suspicious when a pair of (unrelated) people stayed at least twice
in the same hotel on the same day”
◦ Suppose 1 billion people tracked during 1,000 days
◦ Each person stays in a hotel 1% of the time (1 day out of 100)
◦ Each hotel holds 100 people (so need 100,000 hotels)
“If everyone behaves randomly (i.e. no terrorist), can we still detect something
suspicious?”
• Probability that a specific pair of people visit same hotel on same day is 10-9
• Probability this happens twice is 10-18 (really, really, really small)
 Expected number of “suspicious” pairs is actually about 250,000!
Example taken from Rajamaran et al., Mining of Massive Datasets
• To mine different types of data:
– Data is highly dimensional
– Data is a graph
– Data is infinite / never-ending
– Data is labeled
What will we learn? (1 of 4)
20
• To solve real-world problems:
– Market basket analysis
– Customer segmentation
– Forecasting new product demand
– Evaluating athletic talent
– Probabilities of a health risk
– Text sentiment analysis
What will we learn? (2 of 4)
21
• Use of various “tools”:
– Association Rules
– Clustering with K-means
– Logistic and Linear regression
– Decision Trees
– Neural Networks
– Text Mining
What will we learn? (3 of 4)
22
• Regression
• Decision Trees
• Cluster Analysis
• Text Mining
• Ensemble Models
• Neural Nets
• Association Rules
What will we learn? (4 of 4)
Privacy & Security
• Consider the real people behind the data
• Ethical and moral obligations
• Protect against crimes including identity
theft
• Objectives should never justify unethical
means
Privacy & Security
• Things to consider in data mining efforts:
– Protection of privacy
– Respect for individual rights
– Willingness to embrace transparency of
actions and methods
– Ask for permission to gather and use data
– Ensure you are doing fair and just work that
will help and benefit others
Privacy & Security (1 of 2)
• We can protect privacy by:
– Aggregating data
– Anonymizing observations through removal of
names and personally identifiable information
(PII)
– Storing data in secure and protected
environments
Privacy & Security (2 of 2)
Database, Data Warehouse,
Data Mart, Data Set
• Organized grouping of information within a specific structure
• Table - a database container made
• Relational databases more common today
– Relate tables to one another in a logical fashion
– Tables are broken apart to reduce redundancy through normalization
Database
• Handles high volume of reads and writes
• Not efficient for analysis due to lengthy
retrieval of data
– Must use a query containing joins
– Intensive and time consuming
Online Transactional Processing (OLTP)
• Denormalized to intentionally combine
multiple tables into a single table
– Results in duplicate data in some columns
– Reduces number of joins necessary to query
related data
– Online Analytical Processing (OLAP)
Data Warehouse (1 of 2)
Data Warehouse (2 of 2)
• Contain archived data copied from transactional database
o Can become out-of-sync if source data is updated
• Can contain data moved from transactional system
o Data may be unavailable for updates or viewing
• Organizational data store created in
conjunction to meet needs of specific
business unit
• One-stop shop
• Must be known, current, accurate, and
well-managed (privacy and security)
Data Mart
• Subset of a database or data warehouse
• Usually denormalized
• Typically related to a specific:
– Business question
– Business problem
– Business unit
Data Set
• Database
– Rows = Records
– Columns = Fields
• RapidMiner
– Rows = Examples
• Data Warehouses and Data Sets
– Rows = Observations, Examples, or Cases
– Columns = Variables or Attributes
Rows and Columns
The Data Mining Process
• For this course, we will channel every
homework assignment through the
CRISP-DM process.
CRISP-DM
– Define the questions you want to answer.
– Who will you work with to understand the
issue?
– Design what you are going to build.
– Get buy-in of the problem to be solved
1. Business Understanding
– What is the source of the data?
– How was it collected?
– How accurate or reliable is it?
– What are the correct variables to collect?
2. Data Understanding
– Join necessary data sets
– Reduce data sets to only include pertinent
variables
– Scrub data to remove anomalies- outliers or
missing data
– Reformat for consistency
3. Data Preparation
– Two types:
• Classification (Descriptive)
• Prediction
– Can be overlapping (Decision Trees)
– Note: We will spend most of our time in this
step
4. Modeling
– Is the insight useful?
• Should another technique be used?
– What can be done with the results?
– Testing for false positives
– Human experience and operational
knowledge
5. Evaluation
– Automation of model
– Communication with end-users
– Integration with existing systems
– Continuous monitoring and gaining feedback
for improvement (fine-tuning)
6. Deployment
• Clearly communicate model’s:
– Function
– Utility to stakeholders
• Thoroughly test and prove the model
• Plan for and monitor implementation
Keys to Successful Deployment
Summary
• Data mining is the statistical and logical methods of analysis
to describe large data sets and create predictive models to
uncover insights
• Databases, data warehouses, and data sets are unique kinds
of digital record keeping systems with some similarities
• Data mining is most effective on data sets extracted from
OLAP rather than OLTP
• Data is highly dimensional and has inherent risks, such as
quality
• Remember human factor behind manipulation of numbers
and figures- ethical responsibilities
• CRISP-DM is the most used standard method for analysis
Summary
“This workforce solution was funded by a grant awarded by the U.S. Department of
Labor’s Employment and Training Administration. The solution was created by the
grantee and does not necessarily reflect the official position of the U.S. Department of
Labor. The Department of Labor makes no guarantees, warranties, or assurances of any
kind, express or implied, with respect to such information, including any information on
linked sites and including, but not limited to, accuracy of the information or its
completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building
Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is
licensed under the Creative Commons Attribution 4.0 International License. To view a
copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

What's hot

Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
Kartik Kalpande Patil
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
Pratik Tambekar
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
Kartik Kalpande Patil
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
Dhilsath Fathima
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
Data mining
Data miningData mining
Data mining
pradeepa n
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
Thanveen
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
Kartik Kalpande Patil
 
Data mining
Data miningData mining
Data mining
Birju Tank
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
DeepaR42
 
Discovery informaticsstanton
Discovery informaticsstantonDiscovery informaticsstanton
Discovery informaticsstanton
Syracuse University
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
Krishan Pareek
 
Data mining
Data miningData mining
Data mining
heba_ahmad
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 

What's hot (20)

Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Data mining
Data miningData mining
Data mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Data mining
Data miningData mining
Data mining
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Discovery informaticsstanton
Discovery informaticsstantonDiscovery informaticsstanton
Discovery informaticsstanton
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 

Viewers also liked

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
Wake Tech BAS
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
Wake Tech BAS
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
Wake Tech BAS
 
Base 9.1 preparation guide
Base 9.1 preparation guideBase 9.1 preparation guide
Base 9.1 preparation guideimaduddin91
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
Edureka!
 
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolutionLearning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Vibeesh CS
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Edureka!
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
Venkata Reddy Konasani
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
The Second Little Book of Leadership
The Second Little Book of LeadershipThe Second Little Book of Leadership
The Second Little Book of LeadershipPhil Dourado
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
Durgadatta Dash
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 

Viewers also liked (14)

BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
 
BAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 LectureBAS 150 Lesson 1 Lecture
BAS 150 Lesson 1 Lecture
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
 
Base 9.1 preparation guide
Base 9.1 preparation guideBase 9.1 preparation guide
Base 9.1 preparation guide
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
 
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolutionLearning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
The Second Little Book of Leadership
The Second Little Book of LeadershipThe Second Little Book of Leadership
The Second Little Book of Leadership
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Similar to BAS 250 Lecture 1

Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
kusamee0
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
nayanakarsh469
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
tafosepsdfasg
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
AsifImran37
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
IfedayoOladeji1
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
ImXaib
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
PrasadG76
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
AravindReddy565690
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
infinix8
 
DBMS
DBMSDBMS
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
Kathirvel Ayyaswamy
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
AkhirulAminulloh2
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
NivaTripathy2
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
teodroscampaus
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 

Similar to BAS 250 Lecture 1 (20)

Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
DBMS
DBMSDBMS
DBMS
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 

More from Wake Tech BAS

BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
Wake Tech BAS
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
BAS 250 Lecture 3
BAS 250 Lecture 3BAS 250 Lecture 3
BAS 250 Lecture 3
Wake Tech BAS
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
Wake Tech BAS
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
Wake Tech BAS
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
Wake Tech BAS
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
Wake Tech BAS
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
Wake Tech BAS
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
Wake Tech BAS
 

More from Wake Tech BAS (9)

BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
BAS 250 Lecture 3
BAS 250 Lecture 3BAS 250 Lecture 3
BAS 250 Lecture 3
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
 

Recently uploaded

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 

Recently uploaded (20)

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 

BAS 250 Lecture 1

  • 1. BAS 250 Lesson 1: Introduction to Data Mining
  • 2. • Rapid Miner – [Plain text URL: https://rapidminer.com/products/studio/] • RapidMiner Studio: 6.5 or greater Software Please follow link above to download the free software.
  • 3. • Define the discipline of Data Mining • List and define various types of data • List and define various sources of data • Explain the fundamental differences between databases, data warehouses, and data sets Learning Objectives (1 of 2)
  • 4. • Explain some of the ethical dilemmas associated with data mining and outline possible solutions • Explain the CRISP-DM Method Learning Objectives (2 of 2)
  • 6. • 15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress • $5 million vs. $400: Price of the fastest supercomputer in 1975 and an iPhone with equal performance • $600 to buy a disk drive that can store all of the world’s music • 5 billion mobile phones in use in 2010 • 30 billion pieces of content shared on Facebook every month • 40% projected growth in global data generated per year vs. 5% growth in IT spending • 235 terabytes of data collected by the US Library of Congress by April 2011 Why Data Mining?
  • 7. Why Mine Data?  Lots of data is being collected and stored  Web data, e-commerce, point of sale  Credit card transactions, social media  Computers have become cheaper and more powerful  Competitive Pressure is Strong  Provide better, customized services for an edge  e.g. in Customer Relationship Management  Information is valuable and can be monetized
  • 8. • Demand for deep analytical talent in the United States could be 50-60% greater than its projected supply by 2018 Demand for Data Mining
  • 9. • Data contains value and knowledge, but to extract the knowledge, data needs to be – Stored – Managed – Analyzed  this class • Data Mining ≈ Big Data ≈ Predictive Analytics ≈ Data Science Why Data Mining?
  • 10. • “An interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of data mining is to extract information from a data set and transform it into an understandable structure for further use.” – (Wikipedia) [Plain text URL: https://en.wikipedia.org/wiki/Data_mining] What is Data Mining? (1 of 4)
  • 11. • Data Enormity Issue • Discover patterns and models that are: • Valid: data has some certainty • Useful: should be able to act on the insight • Unexpected: non-obvious to the system • Understandable: humans can interpret the patterns What is Data Mining? (2 of 4)
  • 12. • Descriptive methods – Find patterns that describe the data • Example: Clustering with k-means • Predictive methods – Use target variables to predict unknown or future values of other variables • Example: Scoring with neural networks What is Data Mining? (3 of 4)
  • 13. What is Data Mining? (4 of 4) Data Mining • Prevalence of names in US locations • O’Brien, O’Rurke, O’Reilly in Boston Area • Group together similar documents • Returned by search engine according to their context Not Data Mining • Look up phone number in a phone directory • Query a web search engine for information about “Amazon”
  • 14. Origins of Data Mining
  • 15. Data is Highly Dimensional
  • 16. • Scalability • Dimensionality • Complex and Heterogeneous Data • Data Quality • Data Ownership and Distribution • Privacy Preservation • Streaming Data Challenges of Data Mining
  • 17. • Garbage In, Garbage Out (GIGO) – Collected incorrectly – Out-of-date • Day-to-day: – Use available resources – Acceptable risk – Professional experience – Common sense Limits to Data Mining
  • 18. • A risk with “Data mining” is that an analyst can “discover” patterns that are meaningless. • Statisticians call it Bonferroni’s principle: – “If you look in more places for interesting patterns than your amount of data will support, you are bound to find crap” Meaningfulness of Analytic Answers (1 of 2) 18
  • 19. Meaningfulness of Analytic Answers (2 of 2) 19 National Security Agency example: “We consider suspicious when a pair of (unrelated) people stayed at least twice in the same hotel on the same day” ◦ Suppose 1 billion people tracked during 1,000 days ◦ Each person stays in a hotel 1% of the time (1 day out of 100) ◦ Each hotel holds 100 people (so need 100,000 hotels) “If everyone behaves randomly (i.e. no terrorist), can we still detect something suspicious?” • Probability that a specific pair of people visit same hotel on same day is 10-9 • Probability this happens twice is 10-18 (really, really, really small)  Expected number of “suspicious” pairs is actually about 250,000! Example taken from Rajamaran et al., Mining of Massive Datasets
  • 20. • To mine different types of data: – Data is highly dimensional – Data is a graph – Data is infinite / never-ending – Data is labeled What will we learn? (1 of 4) 20
  • 21. • To solve real-world problems: – Market basket analysis – Customer segmentation – Forecasting new product demand – Evaluating athletic talent – Probabilities of a health risk – Text sentiment analysis What will we learn? (2 of 4) 21
  • 22. • Use of various “tools”: – Association Rules – Clustering with K-means – Logistic and Linear regression – Decision Trees – Neural Networks – Text Mining What will we learn? (3 of 4) 22
  • 23. • Regression • Decision Trees • Cluster Analysis • Text Mining • Ensemble Models • Neural Nets • Association Rules What will we learn? (4 of 4)
  • 25. • Consider the real people behind the data • Ethical and moral obligations • Protect against crimes including identity theft • Objectives should never justify unethical means Privacy & Security
  • 26. • Things to consider in data mining efforts: – Protection of privacy – Respect for individual rights – Willingness to embrace transparency of actions and methods – Ask for permission to gather and use data – Ensure you are doing fair and just work that will help and benefit others Privacy & Security (1 of 2)
  • 27. • We can protect privacy by: – Aggregating data – Anonymizing observations through removal of names and personally identifiable information (PII) – Storing data in secure and protected environments Privacy & Security (2 of 2)
  • 29. • Organized grouping of information within a specific structure • Table - a database container made • Relational databases more common today – Relate tables to one another in a logical fashion – Tables are broken apart to reduce redundancy through normalization Database
  • 30. • Handles high volume of reads and writes • Not efficient for analysis due to lengthy retrieval of data – Must use a query containing joins – Intensive and time consuming Online Transactional Processing (OLTP)
  • 31. • Denormalized to intentionally combine multiple tables into a single table – Results in duplicate data in some columns – Reduces number of joins necessary to query related data – Online Analytical Processing (OLAP) Data Warehouse (1 of 2)
  • 32. Data Warehouse (2 of 2) • Contain archived data copied from transactional database o Can become out-of-sync if source data is updated • Can contain data moved from transactional system o Data may be unavailable for updates or viewing
  • 33. • Organizational data store created in conjunction to meet needs of specific business unit • One-stop shop • Must be known, current, accurate, and well-managed (privacy and security) Data Mart
  • 34. • Subset of a database or data warehouse • Usually denormalized • Typically related to a specific: – Business question – Business problem – Business unit Data Set
  • 35. • Database – Rows = Records – Columns = Fields • RapidMiner – Rows = Examples • Data Warehouses and Data Sets – Rows = Observations, Examples, or Cases – Columns = Variables or Attributes Rows and Columns
  • 36. The Data Mining Process • For this course, we will channel every homework assignment through the CRISP-DM process.
  • 38. – Define the questions you want to answer. – Who will you work with to understand the issue? – Design what you are going to build. – Get buy-in of the problem to be solved 1. Business Understanding
  • 39. – What is the source of the data? – How was it collected? – How accurate or reliable is it? – What are the correct variables to collect? 2. Data Understanding
  • 40. – Join necessary data sets – Reduce data sets to only include pertinent variables – Scrub data to remove anomalies- outliers or missing data – Reformat for consistency 3. Data Preparation
  • 41. – Two types: • Classification (Descriptive) • Prediction – Can be overlapping (Decision Trees) – Note: We will spend most of our time in this step 4. Modeling
  • 42. – Is the insight useful? • Should another technique be used? – What can be done with the results? – Testing for false positives – Human experience and operational knowledge 5. Evaluation
  • 43. – Automation of model – Communication with end-users – Integration with existing systems – Continuous monitoring and gaining feedback for improvement (fine-tuning) 6. Deployment
  • 44. • Clearly communicate model’s: – Function – Utility to stakeholders • Thoroughly test and prove the model • Plan for and monitor implementation Keys to Successful Deployment
  • 46. • Data mining is the statistical and logical methods of analysis to describe large data sets and create predictive models to uncover insights • Databases, data warehouses, and data sets are unique kinds of digital record keeping systems with some similarities • Data mining is most effective on data sets extracted from OLAP rather than OLTP • Data is highly dimensional and has inherent risks, such as quality • Remember human factor behind manipulation of numbers and figures- ethical responsibilities • CRISP-DM is the most used standard method for analysis Summary
  • 47. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information