SlideShare a Scribd company logo
Application of predictive analytics on semi-
structured north Atlantic tropical cyclone forecasts
Dr. Caroline Howard, Ph.D., Research Supervisor and Chair
Dr. Richard Livingood, Ph.D., Committee Member
Dr. Cynthia Calongne, D.CS, Committee Member
By
Michael K. Hernandez
February 2017
Final Presentation
• Proposal Recap
• Problem Opportunity Statement
• Tropical Cyclone (TC) Lifecycle
• Three gaps in knowledge
• Research Question & Hypothesis
• Theoretical Framework and Lens
• Methodology
• Instrument, Sampling Procedure & Data Collection
• Finding
• Descriptive Analytics on TC data
• Term Document Frequency over time
• Information Gain
• Decision Trees
• Conclusions
• Implications for Practice
• Limitations
• Future Research
2
Overview of Presentation
3
Computer
Science
Hurricanes
Data
Analytics
Dissertation
Proposal Recap
4
Tropical Cyclones
(TCs) threaten
global coastlines,
annually.
2
TC threaten to
make landfall on
US coastlines
annually.
General Problem
50
%
Improvement in
forecast accuracy
is needed by
2019.
Specific Problem
Narrow focus on the use of
forecasting models and in-
situ data. Not on using data
analytics on text data.
TC forecasting
is a wicked
problem.
There are no
“one size fits all”
solution.
CentralProblem
This study attempts to solve one aspect of the
problem, due to the framing of the research
question.
Problem Opportunity Statement
5
Tropical
Storm
Tropical
Cyclones
Extratropical
Cyclones
(Jones et al. 2003; Hart & Evans 2001; Guishard & Evans, 2008)
Other
~10%
100%
Dissipate
Tropical Cyclone (TC) Lifecycle
6
Described the critical success factors to assess the improvement made on
forecasting Tropical Cyclone (TC) through the use of dynamical and
ensemble forecasting models, but they did not take into account other
methods of big data analytics.
Had identified that subject matter experts are not always available to
verify the importance and accuracy of the data mined results.
There is a need to add another instance of predictive text analytics to other
fields, thus deepening the body of knowledge further in one vertical (data
analytics).
(Gall et
al., 2013)
(Garcia,
Ferraz, &
Vivacqua,
2009)
(Corrales,
Ledezma,
&
Corrales,
2015)
5131 instances of explicit knowledge (containing over 1.35 million words) are in the form of tropical
discussions. Tropical discussions contain the explained their reasoning behind the National Hurricane Center
TC forecasts.
Study results were evaluated from both perspectives: meteorological and big data analytics.
The application of the big data analysis on meteorological data accomplished this.
Three Gaps in the Body of Knowledge
Which weather pattern components can improve the Atlantic
TC forecast accuracy; through the use of C4.5 algorithm on all
five-day tropical discussions from 2001-2015?
The null hypothesis (H0) in this study is non-directional,
whereas the alternative hypothesis (Ha) is directional:
• H0: There are no significant differences in the C4.5 algorithm
derived weather pattern components, which can decipher the
difference between a successful and unsuccessful TC forecast.
• Ha: There are significant differences in the C4.5 algorithm derived
weather pattern components, which can decipher the difference
between a successful and unsuccessful TC forecast.
7
Hypothesis
Research Question
8
Diffusion of
Innovation
(Theoretical Lens)
Financial Market Forecasts Tropical Cyclone Forecasts
Figure 2. Research design for text mining and this study.
Theoretical Framework & Lens
9
Figure 3. Research design for text mining and this study.
Text Mining
Predictive Data Analytics
Model Creation
Preprocessing Interpretation & Evaluation
import training data
Data Cleaning
CollectingRawData
Data Preparation
tokenization &
word dictionary
stop-word removal
word-normalization:
stemming & case similarity
common format
addressing missing data
algorithm & features
selection
Assess Model
Model Prediction
import testing data
removal of HTML tags
actual performance
measurements
review accuracy
positives, false
positives, negatives, &
false negatives
determine next steps
Integrating Data Sets
review process
Data Visualization
Methodology
10
• Microsoft Visual Studios: Screen Scraping tools
• Microsoft Excel: Data cleaning, integrating data sets, data preparation, descriptive stats
• WEKA: C4.5 Algorithm (predictive data analytics)
Instrumentation
Data Collection
• Entire population of tropical discussions: 9784 instances with 2.5M words
• Stratified purposive sampling:
• 66.66% used for training the C4.5 algorithm and 33.34% is used for testing the C4.5 algorithm results
• Atlantic Ocean basin tropical discussions: 5131 instances with 1.35M words
• Atlantic Ocean basin tropical discussions is from the National Hurricane Center
• Tropical verification scores is from the National Hurricane Center
• Total verifiable tropical discussion data sample: 4812 instances with 1.31M words
Sampling Procedure
• Descriptive analytics on TC data
• Interesting trends in initial TC intensity with forecast results
• Doesn’t showcase that “two heads are better than one”
• Term document frequency over time shows that token words generally
don’t change in frequency over time, ensuring homogeneous data.
• Information gained showed key tokens that should be further studied.
• Decision trees results show that this study fails to reject the null
hypothesis.
11
Findings
12Figure 4. Descriptive statistics showing the track and intensity classification scores.
The stronger the initial TC intensity,
the better the forecast track (c) and
vice versa for intensity forecasts (d).
Of the 4812 verifiable tropical
discussions, approximately 60% of
them (a & b) had better than average
forecast error.
No significant difference between
the number of forecasters and the
probability to the outcomes of either
track or intensity forecasts (e & f).
Descriptive Analytics on TC data
13
Figure 5. Red-white-green chart of the normalized frequency of certain token words
The tokenize words and
their normalized document
frequency per year show
that there are no trends in
the usage of words.
These tokenized words had
to be normalized per year,
to reduce the influence of
highly active Atlantic TC
Seasons; for instance,
2005 had the most active
TC season in recorded
history.
Term document frequency over time
14
Table 1: The information gained ranked scores on the track classification scores
* Highlighted tokens appeared in all three runs
Information Gain on Track Forecasts
15
Table 2: The information gained ranked scores on the intensity classification scores
* Highlighted tokens appeared in all three runs
Information Gain on Intensity Forecasts
16
• Ranked as non-zero information gain tokens, from
all randomly sampled training data sets:
• TC eye
• reconnaissance
• TC eyewall
• eyewall replacement
• Suggesting that gaining a further understanding of
these tokens are key for improving the overall TC
forecasts and warrant more research on them.
Information Gain Summary
17
• Meets the 55%
threshold value to be
considered a
successful method
for classification.
• Spread between
these values is
small, ensuring
validity of the
method.
• The average kappa
statistic value is
under 0.20 showing
slight to no inter-
rater agreement.
• Also, shows that we
cannot reject H0.
Table 3: Descriptive statistics for the randomly sampled C4.5 decision trees for all runs at a
90% confidence interval.
Decision Tree Summary
18
Intensity Run #1
Track Run #1
To the first approximation the TC track is
dependent on environment conditions and
steering flow whereas, TC intensity is
dependent on the internal dynamics of the
storm.
Figure 6. C4.5 output
for the first of three
randomly sampled
classified track &
intensity outcomes
Sample Decision Trees
19
Intensity Run #1
Track Run #1
Steering was never brought up in the ranked
information gain on track forecasts, indicating
why this algorithm’s inability to correctly
decipher which weather components (via the
kappa statistic) aided in improving the
forecasts.
Figure 6. C4.5 output
for the first of three
randomly sampled
classified track &
intensity outcomes
Sample Decision Trees
• Failed to reject the null hypothesis:
• There are no significant differences in the C4.5 algorithm derived weather
pattern components, which can decipher the difference between a
successful and unsuccessful TC forecast.
• All three Gaps in the body of knowledge have been filled in.
20
Conclusions
• Known Limitations
• the knowledge that was either included or excluded from the tropical
discussion but still used as part of the TC analysis by the hurricane
specialist
• analysis of a static 15-year snapshot of TC in one oceanic basin
• the C4.5 algorithm was the sole predictive analytical algorithm
• Emerging Limitations
• the words used for stemming and tokenization came from the term
document frequency thresholds of approximately the top 1000 terms
during the preprocessing phase.
• the binary classification of forecasts, which was initially chosen to
aid in generating simple decision trees.
• the interactions between track forecast errors and intensity errors
could have played a role in providing a low kappa statistic value.
• The testing to training data ratio of 66.66% to 33.34% could have
been varied in this study to encompass the huge range that exists in
the body of literature(50%-90% of their entire dataset for training)
21
Limitations
Recommendations for Practitioners:
1. look to other tangential fields to help find new innovative ways to
solve their current problems.
2. analyze the results from all perspectives, which is the best approach
to analyzing a result from a project that stems from multiple
perspectives.
3. take into account all the different fields of study when combining
fields to solve a problem; if not, the conclusions are not complete.
4. apply predictive analytical processes and techniques to other
weather components and phenomena, i.e. tornado forecasting.
5. could prioritize projects on the four tokens (TC eye, eyewall,
eyewall replacement, and the reconnaissance program), to yield a
higher return on investment.
6. create a checklist for weather components to be analyzing and
forecasting TCs that are great for knowledge sharing from the 60
tokens/weather components derived from this study.
22
Implications for Practice
23
1| Data analytics research:
More fields need to adopt data analysis in order to help deepen the body of knowledge
further in data analysis.
2| Meteorological research :
Use the same research question and hypothesis on the remaining different oceanic basins as
an immediate next step: North Eastern Pacific, North Western Pacific, North Indian, South
Western Indian, South Eastern Indian, and South Western Pacific.
3| Computer science, data analytics, and meteorological research:
Could focus on changing the predictive text analytics algorithm because by changing the
algorithm and testing that different algorithm against the same dataset should allow a future
researcher to obtain different results that could be statistically significant.
Future Research
4| Data analytics, and meteorological research:
This study could act as a foundation for doing a predictive text analytics on the TC
Reanalysis project. A proposed project could be to analyze the text reports generated from
this project to see what are common issues, readjustments, and re-analysis made on the
“best track” data, to help improve first time quality in future hurricane specialist’s tropical
discussion.
24
THANK
YOU
QUESTIONS

More Related Content

What's hot

Sampling
SamplingSampling
Sampling
Rohit Kumar
 
Hmtc1300663
Hmtc1300663Hmtc1300663
Hmtc1300663
Ruturaj Deshpande
 
Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
 Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c... Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
ijiert bestjournal
 
journal club
journal clubjournal club
journal club
Lei Shi
 
Decision Support Systems in Clinical Engineering
Decision Support Systems in Clinical EngineeringDecision Support Systems in Clinical Engineering
Decision Support Systems in Clinical Engineering
Asmaa Kamel
 
Lung cancer disease analyzes using pso based fuzzy logic system
Lung cancer disease analyzes using pso based fuzzy logic systemLung cancer disease analyzes using pso based fuzzy logic system
Lung cancer disease analyzes using pso based fuzzy logic system
eSAT Journals
 
Application of the analytic hierarchy process (AHP) for selection of forecast...
Application of the analytic hierarchy process (AHP) for selection of forecast...Application of the analytic hierarchy process (AHP) for selection of forecast...
Application of the analytic hierarchy process (AHP) for selection of forecast...
Gurdal Ertek
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
MDO_Lab
 
Market analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterionMarket analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterion
Editor IJMTER
 
Sampling Theory Part 1
Sampling Theory Part 1Sampling Theory Part 1
Sampling Theory Part 1
FellowBuddy.com
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
IAEME Publication
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
Elinor Velasquez
 

What's hot (12)

Sampling
SamplingSampling
Sampling
 
Hmtc1300663
Hmtc1300663Hmtc1300663
Hmtc1300663
 
Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
 Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c... Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
Applicability of Hooke’s and Jeeves Direct Search Solution Method to Metal c...
 
journal club
journal clubjournal club
journal club
 
Decision Support Systems in Clinical Engineering
Decision Support Systems in Clinical EngineeringDecision Support Systems in Clinical Engineering
Decision Support Systems in Clinical Engineering
 
Lung cancer disease analyzes using pso based fuzzy logic system
Lung cancer disease analyzes using pso based fuzzy logic systemLung cancer disease analyzes using pso based fuzzy logic system
Lung cancer disease analyzes using pso based fuzzy logic system
 
Application of the analytic hierarchy process (AHP) for selection of forecast...
Application of the analytic hierarchy process (AHP) for selection of forecast...Application of the analytic hierarchy process (AHP) for selection of forecast...
Application of the analytic hierarchy process (AHP) for selection of forecast...
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
 
Market analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterionMarket analysis of transmission expansion planning by expected cost criterion
Market analysis of transmission expansion planning by expected cost criterion
 
Sampling Theory Part 1
Sampling Theory Part 1Sampling Theory Part 1
Sampling Theory Part 1
 
Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...Parametric estimation of construction cost using combined bootstrap and regre...
Parametric estimation of construction cost using combined bootstrap and regre...
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
 

Similar to Application of predictive analytics on semi-structured north Atlantic tropical cyclone forecasts (2/2017)

12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
hyacinthshackley2629
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehta
Cytel
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugm
Cytel USA
 
Six sigma
Six sigmaSix sigma
Six sigma
kmsonam
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Michael DePue
 
Data analysis and Interpretation
Data analysis and InterpretationData analysis and Interpretation
Data analysis and Interpretation
Mehul Gondaliya
 
5991-5411EN_Agilent_LC_Theory_English.pptx
5991-5411EN_Agilent_LC_Theory_English.pptx5991-5411EN_Agilent_LC_Theory_English.pptx
5991-5411EN_Agilent_LC_Theory_English.pptx
placementgspGTU
 
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
B087PutraMaulanaSyah
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
 
Did something change? Using Statistical Techniques to Interpret Service and ...
Did something change?  Using Statistical Techniques to Interpret Service and ...Did something change?  Using Statistical Techniques to Interpret Service and ...
Did something change? Using Statistical Techniques to Interpret Service and ...
Frank Bereznay
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
Long Beach City College
 
Beamer slide(hfd)
Beamer slide(hfd)Beamer slide(hfd)
Beamer slide(hfd)
New Level Research
 
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
ISCRAM Events
 
09 chen d
09 chen d09 chen d
09 chen d
hofidatur
 
Factors affecting the usage of ChatGPT: Advancing an information technology a...
Factors affecting the usage of ChatGPT: Advancing an information technology a...Factors affecting the usage of ChatGPT: Advancing an information technology a...
Factors affecting the usage of ChatGPT: Advancing an information technology a...
Mark Anthony Camilleri
 
Developing an Incident Response Process Model for Chemical Facilities
Developing an Incident Response Process Model for Chemical FacilitiesDeveloping an Incident Response Process Model for Chemical Facilities
Developing an Incident Response Process Model for Chemical Facilities
Mirjam-Mona
 
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Eugm 2011   mehta - adaptive designs for phase 3 oncology trialsEugm 2011   mehta - adaptive designs for phase 3 oncology trials
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Cytel USA
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
Anjani Dhrangadhariya
 
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&DCharles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter, PhD
 

Similar to Application of predictive analytics on semi-structured north Atlantic tropical cyclone forecasts (2/2017) (20)

12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehta
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugm
 
Six sigma
Six sigmaSix sigma
Six sigma
 
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
Risk And Uncertainty Analysis:  A Primer for Floodplain ManagersRisk And Uncertainty Analysis:  A Primer for Floodplain Managers
Risk And Uncertainty Analysis: A Primer for Floodplain Managers
 
Data analysis and Interpretation
Data analysis and InterpretationData analysis and Interpretation
Data analysis and Interpretation
 
5991-5411EN_Agilent_LC_Theory_English.pptx
5991-5411EN_Agilent_LC_Theory_English.pptx5991-5411EN_Agilent_LC_Theory_English.pptx
5991-5411EN_Agilent_LC_Theory_English.pptx
 
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
(Chapman & Hall_CRC texts in statistical science series) Peter Sprent, Nigel ...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Did something change? Using Statistical Techniques to Interpret Service and ...
Did something change?  Using Statistical Techniques to Interpret Service and ...Did something change?  Using Statistical Techniques to Interpret Service and ...
Did something change? Using Statistical Techniques to Interpret Service and ...
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
 
Beamer slide(hfd)
Beamer slide(hfd)Beamer slide(hfd)
Beamer slide(hfd)
 
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
ISCRAM 2013: A multi-objective optimization model for relocating relief goods...
 
09 chen d
09 chen d09 chen d
09 chen d
 
Factors affecting the usage of ChatGPT: Advancing an information technology a...
Factors affecting the usage of ChatGPT: Advancing an information technology a...Factors affecting the usage of ChatGPT: Advancing an information technology a...
Factors affecting the usage of ChatGPT: Advancing an information technology a...
 
Developing an Incident Response Process Model for Chemical Facilities
Developing an Incident Response Process Model for Chemical FacilitiesDeveloping an Incident Response Process Model for Chemical Facilities
Developing an Incident Response Process Model for Chemical Facilities
 
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
Eugm 2011   mehta - adaptive designs for phase 3 oncology trialsEugm 2011   mehta - adaptive designs for phase 3 oncology trials
Eugm 2011 mehta - adaptive designs for phase 3 oncology trials
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
 
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&DCharles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&D
 

More from Skylar Hernandez

Protecting your intellectual property
Protecting your intellectual propertyProtecting your intellectual property
Protecting your intellectual property
Skylar Hernandez
 
Communication with English as a Second Language
Communication with English as a Second LanguageCommunication with English as a Second Language
Communication with English as a Second Language
Skylar Hernandez
 
Scenario Planning example: Superstruct and ELCC (4/2019)
Scenario Planning example: Superstruct and ELCC (4/2019)Scenario Planning example: Superstruct and ELCC (4/2019)
Scenario Planning example: Superstruct and ELCC (4/2019)
Skylar Hernandez
 
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
Skylar Hernandez
 
Research Proposal: The effect of varying the reconnaissance flight patterns o...
Research Proposal: The effect of varying the reconnaissance flight patterns o...Research Proposal: The effect of varying the reconnaissance flight patterns o...
Research Proposal: The effect of varying the reconnaissance flight patterns o...
Skylar Hernandez
 
Choosing the right physics for WRF (5/2009)
Choosing the right physics for WRF (5/2009)Choosing the right physics for WRF (5/2009)
Choosing the right physics for WRF (5/2009)
Skylar Hernandez
 

More from Skylar Hernandez (6)

Protecting your intellectual property
Protecting your intellectual propertyProtecting your intellectual property
Protecting your intellectual property
 
Communication with English as a Second Language
Communication with English as a Second LanguageCommunication with English as a Second Language
Communication with English as a Second Language
 
Scenario Planning example: Superstruct and ELCC (4/2019)
Scenario Planning example: Superstruct and ELCC (4/2019)Scenario Planning example: Superstruct and ELCC (4/2019)
Scenario Planning example: Superstruct and ELCC (4/2019)
 
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
The Effect of Latent Heat on the Extratropical Transition of Typhoon Sinlaku ...
 
Research Proposal: The effect of varying the reconnaissance flight patterns o...
Research Proposal: The effect of varying the reconnaissance flight patterns o...Research Proposal: The effect of varying the reconnaissance flight patterns o...
Research Proposal: The effect of varying the reconnaissance flight patterns o...
 
Choosing the right physics for WRF (5/2009)
Choosing the right physics for WRF (5/2009)Choosing the right physics for WRF (5/2009)
Choosing the right physics for WRF (5/2009)
 

Recently uploaded

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

Application of predictive analytics on semi-structured north Atlantic tropical cyclone forecasts (2/2017)

  • 1. Application of predictive analytics on semi- structured north Atlantic tropical cyclone forecasts Dr. Caroline Howard, Ph.D., Research Supervisor and Chair Dr. Richard Livingood, Ph.D., Committee Member Dr. Cynthia Calongne, D.CS, Committee Member By Michael K. Hernandez February 2017 Final Presentation
  • 2. • Proposal Recap • Problem Opportunity Statement • Tropical Cyclone (TC) Lifecycle • Three gaps in knowledge • Research Question & Hypothesis • Theoretical Framework and Lens • Methodology • Instrument, Sampling Procedure & Data Collection • Finding • Descriptive Analytics on TC data • Term Document Frequency over time • Information Gain • Decision Trees • Conclusions • Implications for Practice • Limitations • Future Research 2 Overview of Presentation
  • 4. 4 Tropical Cyclones (TCs) threaten global coastlines, annually. 2 TC threaten to make landfall on US coastlines annually. General Problem 50 % Improvement in forecast accuracy is needed by 2019. Specific Problem Narrow focus on the use of forecasting models and in- situ data. Not on using data analytics on text data. TC forecasting is a wicked problem. There are no “one size fits all” solution. CentralProblem This study attempts to solve one aspect of the problem, due to the framing of the research question. Problem Opportunity Statement
  • 5. 5 Tropical Storm Tropical Cyclones Extratropical Cyclones (Jones et al. 2003; Hart & Evans 2001; Guishard & Evans, 2008) Other ~10% 100% Dissipate Tropical Cyclone (TC) Lifecycle
  • 6. 6 Described the critical success factors to assess the improvement made on forecasting Tropical Cyclone (TC) through the use of dynamical and ensemble forecasting models, but they did not take into account other methods of big data analytics. Had identified that subject matter experts are not always available to verify the importance and accuracy of the data mined results. There is a need to add another instance of predictive text analytics to other fields, thus deepening the body of knowledge further in one vertical (data analytics). (Gall et al., 2013) (Garcia, Ferraz, & Vivacqua, 2009) (Corrales, Ledezma, & Corrales, 2015) 5131 instances of explicit knowledge (containing over 1.35 million words) are in the form of tropical discussions. Tropical discussions contain the explained their reasoning behind the National Hurricane Center TC forecasts. Study results were evaluated from both perspectives: meteorological and big data analytics. The application of the big data analysis on meteorological data accomplished this. Three Gaps in the Body of Knowledge
  • 7. Which weather pattern components can improve the Atlantic TC forecast accuracy; through the use of C4.5 algorithm on all five-day tropical discussions from 2001-2015? The null hypothesis (H0) in this study is non-directional, whereas the alternative hypothesis (Ha) is directional: • H0: There are no significant differences in the C4.5 algorithm derived weather pattern components, which can decipher the difference between a successful and unsuccessful TC forecast. • Ha: There are significant differences in the C4.5 algorithm derived weather pattern components, which can decipher the difference between a successful and unsuccessful TC forecast. 7 Hypothesis Research Question
  • 8. 8 Diffusion of Innovation (Theoretical Lens) Financial Market Forecasts Tropical Cyclone Forecasts Figure 2. Research design for text mining and this study. Theoretical Framework & Lens
  • 9. 9 Figure 3. Research design for text mining and this study. Text Mining Predictive Data Analytics Model Creation Preprocessing Interpretation & Evaluation import training data Data Cleaning CollectingRawData Data Preparation tokenization & word dictionary stop-word removal word-normalization: stemming & case similarity common format addressing missing data algorithm & features selection Assess Model Model Prediction import testing data removal of HTML tags actual performance measurements review accuracy positives, false positives, negatives, & false negatives determine next steps Integrating Data Sets review process Data Visualization Methodology
  • 10. 10 • Microsoft Visual Studios: Screen Scraping tools • Microsoft Excel: Data cleaning, integrating data sets, data preparation, descriptive stats • WEKA: C4.5 Algorithm (predictive data analytics) Instrumentation Data Collection • Entire population of tropical discussions: 9784 instances with 2.5M words • Stratified purposive sampling: • 66.66% used for training the C4.5 algorithm and 33.34% is used for testing the C4.5 algorithm results • Atlantic Ocean basin tropical discussions: 5131 instances with 1.35M words • Atlantic Ocean basin tropical discussions is from the National Hurricane Center • Tropical verification scores is from the National Hurricane Center • Total verifiable tropical discussion data sample: 4812 instances with 1.31M words Sampling Procedure
  • 11. • Descriptive analytics on TC data • Interesting trends in initial TC intensity with forecast results • Doesn’t showcase that “two heads are better than one” • Term document frequency over time shows that token words generally don’t change in frequency over time, ensuring homogeneous data. • Information gained showed key tokens that should be further studied. • Decision trees results show that this study fails to reject the null hypothesis. 11 Findings
  • 12. 12Figure 4. Descriptive statistics showing the track and intensity classification scores. The stronger the initial TC intensity, the better the forecast track (c) and vice versa for intensity forecasts (d). Of the 4812 verifiable tropical discussions, approximately 60% of them (a & b) had better than average forecast error. No significant difference between the number of forecasters and the probability to the outcomes of either track or intensity forecasts (e & f). Descriptive Analytics on TC data
  • 13. 13 Figure 5. Red-white-green chart of the normalized frequency of certain token words The tokenize words and their normalized document frequency per year show that there are no trends in the usage of words. These tokenized words had to be normalized per year, to reduce the influence of highly active Atlantic TC Seasons; for instance, 2005 had the most active TC season in recorded history. Term document frequency over time
  • 14. 14 Table 1: The information gained ranked scores on the track classification scores * Highlighted tokens appeared in all three runs Information Gain on Track Forecasts
  • 15. 15 Table 2: The information gained ranked scores on the intensity classification scores * Highlighted tokens appeared in all three runs Information Gain on Intensity Forecasts
  • 16. 16 • Ranked as non-zero information gain tokens, from all randomly sampled training data sets: • TC eye • reconnaissance • TC eyewall • eyewall replacement • Suggesting that gaining a further understanding of these tokens are key for improving the overall TC forecasts and warrant more research on them. Information Gain Summary
  • 17. 17 • Meets the 55% threshold value to be considered a successful method for classification. • Spread between these values is small, ensuring validity of the method. • The average kappa statistic value is under 0.20 showing slight to no inter- rater agreement. • Also, shows that we cannot reject H0. Table 3: Descriptive statistics for the randomly sampled C4.5 decision trees for all runs at a 90% confidence interval. Decision Tree Summary
  • 18. 18 Intensity Run #1 Track Run #1 To the first approximation the TC track is dependent on environment conditions and steering flow whereas, TC intensity is dependent on the internal dynamics of the storm. Figure 6. C4.5 output for the first of three randomly sampled classified track & intensity outcomes Sample Decision Trees
  • 19. 19 Intensity Run #1 Track Run #1 Steering was never brought up in the ranked information gain on track forecasts, indicating why this algorithm’s inability to correctly decipher which weather components (via the kappa statistic) aided in improving the forecasts. Figure 6. C4.5 output for the first of three randomly sampled classified track & intensity outcomes Sample Decision Trees
  • 20. • Failed to reject the null hypothesis: • There are no significant differences in the C4.5 algorithm derived weather pattern components, which can decipher the difference between a successful and unsuccessful TC forecast. • All three Gaps in the body of knowledge have been filled in. 20 Conclusions
  • 21. • Known Limitations • the knowledge that was either included or excluded from the tropical discussion but still used as part of the TC analysis by the hurricane specialist • analysis of a static 15-year snapshot of TC in one oceanic basin • the C4.5 algorithm was the sole predictive analytical algorithm • Emerging Limitations • the words used for stemming and tokenization came from the term document frequency thresholds of approximately the top 1000 terms during the preprocessing phase. • the binary classification of forecasts, which was initially chosen to aid in generating simple decision trees. • the interactions between track forecast errors and intensity errors could have played a role in providing a low kappa statistic value. • The testing to training data ratio of 66.66% to 33.34% could have been varied in this study to encompass the huge range that exists in the body of literature(50%-90% of their entire dataset for training) 21 Limitations
  • 22. Recommendations for Practitioners: 1. look to other tangential fields to help find new innovative ways to solve their current problems. 2. analyze the results from all perspectives, which is the best approach to analyzing a result from a project that stems from multiple perspectives. 3. take into account all the different fields of study when combining fields to solve a problem; if not, the conclusions are not complete. 4. apply predictive analytical processes and techniques to other weather components and phenomena, i.e. tornado forecasting. 5. could prioritize projects on the four tokens (TC eye, eyewall, eyewall replacement, and the reconnaissance program), to yield a higher return on investment. 6. create a checklist for weather components to be analyzing and forecasting TCs that are great for knowledge sharing from the 60 tokens/weather components derived from this study. 22 Implications for Practice
  • 23. 23 1| Data analytics research: More fields need to adopt data analysis in order to help deepen the body of knowledge further in data analysis. 2| Meteorological research : Use the same research question and hypothesis on the remaining different oceanic basins as an immediate next step: North Eastern Pacific, North Western Pacific, North Indian, South Western Indian, South Eastern Indian, and South Western Pacific. 3| Computer science, data analytics, and meteorological research: Could focus on changing the predictive text analytics algorithm because by changing the algorithm and testing that different algorithm against the same dataset should allow a future researcher to obtain different results that could be statistically significant. Future Research 4| Data analytics, and meteorological research: This study could act as a foundation for doing a predictive text analytics on the TC Reanalysis project. A proposed project could be to analyze the text reports generated from this project to see what are common issues, readjustments, and re-analysis made on the “best track” data, to help improve first time quality in future hurricane specialist’s tropical discussion.

Editor's Notes

  1. Globe Image provided for free at https://www.iconfinder.com/icons/285647/globe_icon#size=512 Monitor Image provided for free at https://www.iconfinder.com/icons/473802/business_chart_computer_data_finance_graph_statistics_icon#size=512 Bar chart in donut chart Image provided for free at https://www.iconfinder.com/icons/1312833/analysis_business_data_office_seo_work_icon#size=512 Images of TC Sinlaku from Cira Satellite website: 09/13/2008/0830Z Gall, R., Franklin, J., Marks, F., Rappaport, E. N., & Toepfer, F. (2013). The hurricane forecast improvement project. Bulletin of the American Meteorological Society, 94(3), 329–343. Doi: http://doi.org/10.1175/BAMS-D-12-00071.1 McAdie, C. J., & Lawrence, M. B. (2000). Improvements in tropical cyclone track forecasting in the Atlantic basin, 1970-98. Bulletin of the American Meteorological Society, 81(5), 989. Rittel, H. W., & Webber, M. M. (1973). Dilemmas in a general theory of planning. Policy sciences, 4(2), 155-169. Sheets, R. C. (1990). The National Hurricane Center-past, present, and future. Weather and Forecasting, 5(2), 185-232. Zhao, K., Lin, Q., Lee, W., Sun, Y. Q., & Zhang, F. (2016). Doppler radar analysis of triple eyewalls in Typhoon Usagi (2013). Bulletin of the American Meteorological Society, 97(1), 25-30. Doi: http://dx.doi.org/10.1175/BAMS-D-15-00029.12
  2. Images of TC Sinlaku from Cira website: 09/08/2008/1230Z, 09/13/2008/0830Z, 09/19/2008/1830Z, and 09/22/2008/1713Z (Hart & Evans 2001; Jones et al. 2003; Guishard, 2006) Guishard, M. P. & Evans, J. L. (2008). Atlantic subtropical storms. Part II: Climatology. Journal of Climate, 22, 3574-3594. Retrieved form http://moe.met.fsu.edu/~rhart/papers-hart/2009GuishardEvansHart.pdf Hart, R. & Evans, J. (2001). A Climatology of the Extratropical Transition of Atlantic Tropical Cyclones. Journal of Climate, 14, 546–564, doi: 10.1175/1520-0442(2001)014<0546:ACOTET>2.0.CO;2. Jones, S. C., Harr, P. A., Abraham, J., L. Bosart, F., Bowyer, P. J., Evans, J. L., Hanley, D. E., Hanstrum, B. N., Hart, R. E., Lalaurette, F., Sinclair, M. R., Smith, R. K., & Thorncroft, C, (2003).The extratropical transition of tropical cyclones: Forecast challenges, current understanding and future directions. Weather Forecasting, 18, 1052– 1092.
  3. Gall, R., Franklin, J., Marks, F., Rappaport, E. N., & Toepfer, F. (2013). The hurricane forecast improvement project. Bulletin of the American Meteorological Society, 94(3), 329–343. Doi: http://doi.org/10.1175/BAMS-D-12-00071.1 Garcia, A. C. B., Ferraz, I., & Vivacqua, A. S. (2009). From data to knowledge mining. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 23(04), 427-441. Corrales, D. C., Ledezma, A., & Corrales, J. C. (2015). A conceptual framework for data quality in knowledge discovery tasks (FDQ-KDT): A Proposal. Journal of Computers, V10 (6), 396-405. Doi: 10.17706/jcp.10.6.396-405.
  4. Ahiaga-Dagbui, D. D., & Smith, S. D. (2014). Rethinking construction cost overruns: cognition, learning and estimation. Journal of Financial Management of Property and Construction, 19(1), 38–54. http://doi.org/10.1108/JFMPC-06-2013-0027 Angadi, M. C., & Kulkarni, A. P. (2015). Time series data analysis for stock market prediction using data mining techniques with R. International Journal of Advanced Research in Computer Science, 6(6), 104–108. Barak, S., & Modarres, M. (2015). Developing an approach to evaluate stocks by forecasting effective features with data mining methods. Expert Systems with Applications, 42(3), 1325–1339. http://doi.org/10.1016/j.eswa.2014.09.028 Corrales, D. C., Ledezma, A., & Corrales, J. C. (2015). A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal. Journal of Computers, 10(6), 396–405. http://doi.org/10.17706/jcp.10.6.396-405 Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. Advances in Knowledge Discovery and Data Mining, 17(3), 37–54. Gera, M., & Goel, S. (2015). Data Mining -Techniques, Methods and Algorithms: A Review on Tools and their Validity. International Journal of Computer Applications, 113(18), 22–29. Hashimi, H., & Hafez, A. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51, 729–733. http://doi.org/10.1016/j.chb.2014.10.062 He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33, 464–472. http://doi.org/10.1016/j.ijinfomgt.2013.01.001 Hoonlor, A. (2011). Sequential patterns and temporal patterns for text mining. UMI Dissertation Publishing. Kim, Y., Jeong, S. R., & Ghani, I. (2014). Text Opinion Mining to Analyze News for Stock Market Prediction. International Journal of Advances in Soft Computing and Its Applications, 6(1), 1–13. Nassirtoussi, K. A., Aghabozorgi, S., Ying Wah, T., & Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16), 7653–7670. http://doi.org/10.1016/j.eswa.2014.06.009 Mandrai, Priyanka; Barskar, R. (2014). A survey of conceptual data mining and applications. International Journal of Computer Science and Information Security, 11(5), 17–23. Meyer, D., Hornik, K., Feinerer, I., & Feinerer Wirtschaftsuniversität Wien Kurt Hornik Wirtschaftsuniversität Wien David Meyer Wirtschaftsuniversität Wien, I. (2008). ePub WU Institutional Repository Text Mining Infrastructure in R. Ingo Journal of Statistical Software Journal of Statistical Software, 25(5), 1–54. Retrieved from http://epub.wu.ac.at/3978/ Miranda, S. (n.d.). An Introduction to Social Analytics : Concepts and Methods. Pletscher-frankild, S., Pallejà, A., Tsafou, K., Binder, J. X., & Jensen, L. J. (2015). DISEASES: Text mining and data integration of disease−gene associations. Methods, 74, 83–89. http://doi.org/10.1016/j.ymeth.2014.11.022 Sharma, D. M., Sharma, A. K., & Sharma, S. A. (2012). USING DATA MINING FOR PREDICTION: A CONCEPTUAL ANALYSIS. Journal on Information Technology, 2(1), 1–9. Thanh, H. T. P., & Meesad, P. (2014). Stock Market Trend Prediction Based on Text Mining of Coporate Web and Time Series Data. Journal of Advanced Computational Intelligence and Intelligent Informatics, 18(1), 22–31.
  5. Extratropical Cyclone Michael with Tropical Storm Nadine