2. Applications of Data Mining to Predict Mesoscale Weather Events (Tornadoes and
Cloudbursts)
http://www.iaeme.com/IJCET/index.asp 21 editor@iaeme.com
years thus is not expected to improve in near future until new advancement in the
technology is not developed.
Figure 1 Nationwide tornado warning verification statistics from 1986–2007 as well as NWS
goals for new storm-based beginning in 2008: Probability of Detection (black line with
circles), false alarm ration (red line with squares) and lead time (blue line) with future goals
(same with dotted lines). [Data courtesy of B. MacAloney II, National Weather Service
Performance Branch, 2008]
2. RELATED WORK
2.1. Predicting Tornadoes by Applying Data Mining Techniques
In [1] the goal of much of Amy McGovern’s research as an associate professor in the
School of Computer Science at the University of Oklahoma has been to revolutionize
tornado prediction and other forms of severe weather. The author has done these using
artificial intelligent techniques, data mining, machine learning, and storm simulations.
The research proves that Radars provide an incomplete picture of the atmosphere.
Although they can sense the intensity of the precipitation and a single dimension of
the wind vector, there are many other important variables such as the full three-
dimensional wind field, pressure, temperature, etc. that are important to prediction
[2].The author has developed a unique set of simulations of supercell thunderstorms
which are most severe type of thunderstorms and cause most destructive tornadoes.
McGovern’s models provide the ability to identify spatiotemporal relationships
between these regions that can be used to predict the severe weather events. Novel
data mining models has been developed that make use of the spatiotemporal nature of
the data because neither space nor time can be ignored for weather prediction.
Weather is three-dimensional and the models can identify arbitrary shapes and
relationships between the shapes. In [3] McGovern et al. developed spatiotemporal
models and applied these models to severe weather data. These models addressed
both the spatial and spatiotemporal changes in data using a relational approach. In
their work they have also developed a set of high resolution simulations capable of
resolving tornadoes.
In [4] V Lakshmanan, Gregory J. Stumpf, Arthur Witt developed A Mesocyclone
Detection Algorithm (MDA) and a near-storm environment (NSE) algorithm at the
National Severe Storms Laboratory. The MDA algorithm identified those storm-scale
circulations which are precursors to tornadoes. Marzban and Stumpf in [5] and [6]
3. Miss Gurbrinder Kaur
http://www.iaeme.com/IJCET/index.asp 22 editor@iaeme.com
developed a neural network based on the MDA parameters to identify which of the
circulations would be tornadic using a small set of data cases [5] . That work was
extended to cover 43 storm days in [7] using a more robust methodology. The neural
networks developed in this paper (both for MDA and MDA+NSE inputs) achieve
similiar Heidke skill scores on the training, validation and independent data sets. The
low variability of the Receiver Operating Characteristic (ROC) plots in this paper also
suggest that the neural networks developed in this paper are robust and not over-
trained.
In [8] Indra Adrianto, Theodore B. Trafalis, And Valliappa lakshmanan make use
of Support Vector Machines for predicting the location and time of tornadoes. They
extended the work of Lakshmanan et al [7] to use a set of 33 storm days and
introduced some variations to the above results. The objective of the research was to
estimate the probability of a tornado event at a particular location within a given time
window. They presented least-squares methodology to estimate shear, quality control
of radar reflectivity, morphological image processing to estimate gradients, fuzzy
logic to generate compact measures of tornado possibility and support vector machine
classification to generate the final spatiotemporal probability field. “The results of the
research proved that it might increase the lead time of tornado warning since the
estimated probability that there would be a tornado at a particular spatial location in
the next 30 minutes, while the average lead time of a tornado being predicted by the
National Weather Service currently is 18 minutes. Thus the results were promising.
“Thus more spatial inputs can be considered and other classification methods such as
Bayesian SVMs and Bayesian neural networks may improve the results.
2.2. Application of Data Mining In Predicting Cloudburst Formation
There is no satisfactory technique for anticipating the occurrence of cloud bursts
because of their small scale. A very fine net work of radars is required to be able to
detect the likelihood of a cloud burst and this would be prohibitively expensive. Only
the areas likely to receive heavy rainfall can be identified on a short range scale. A
real life case of cloudburst has been discussed using DM k-means clustering
technique by Kavita in [9]. It is observed that this very large region of relative
humidity is an early signal of formation of cloudburst. In the research, the derivation
of sub-grid scale weather systems from NWP model output products is demonstrated.
Such signals are not possible through normal MOS technique. The study has
demonstrated that intelligent systems can be a good alternative for unstable MOS.
Data mining, specially clustering when applied on divergence and relative humidity
can provide an early indication of formation of cloudburst. This study is an effort
towards providing timely and actionable information of these events using data
mining techniques in supplement with NWP models that can be a great benefit to
society.
3. PRINCIPAL AND METHODOLOGY OF WEATHER
FORECASTING
3.1. Ensemble Forecasting
A forecast is an estimate of the future state of the atmosphere. It is created by estimating
the current state of the atmosphere using observations, and then calculating how this
state will evolve in time using a numerical weather prediction computer model. As the
atmosphere is a chaotic system, very small errors in its initial state can lead to large
errors in the forecast. This means that we can never create a perfect forecast system
4. Applications of Data Mining to Predict Mesoscale Weather Events (Tornadoes and
Cloudbursts)
http://www.iaeme.com/IJCET/index.asp 23 editor@iaeme.com
because we can never observe each detail of the atmosphere's initial state. Tiny errors in
the initial state will be amplified, so there is always a limit to how far ahead we can
predict any detail. To test how these small differences in the initial conditions may
affect the outcome of the forecast, an ensemble system can be used to produce many
forecasts. Instead of running just a single forecast, the computer model is run a number
of times from slightly different starting conditions. The complete set of forecasts is
referred to as the ensemble, and individual forecasts within it as ensemble members.
Instead of running just a single forecast, the computer model is run a number of times
from slightly different starting conditions. The complete set of forecasts is referred to as
the ensemble, and individual forecasts within it as ensemble members.
Figure 2 Schematic of how the ensemble samples the uncertainty in the forecast.
The notion of ensemble forecasting was first introduced in the studies of Lorenz
[10], where he examined the initial state uncertainties and well known butterfly effect.
The study of Lorenz showed that no matter how good the observations are, or how
good the forecasting techniques, there is almost certainly an insurmountable limit as
to how far into the future one can forecast. In ensemble forecasting the major issue
relates to the removal of the collective errors of multimodels. The major drawback of
straight average approach of assigning an equal weight of 1.0 to each model is that it
may include several poor models. The average of these poor models degrades the
overall results. To address this problem if ensemble forecasting, in [11] and [12]
Krishnamurti introduced a multimodel super ensemble technique that shows a major
improvement in the prediction skill.
3.2. Observation and Assimilation of Observational Data
Observations are important to the process of creating forecasts. Around huge number
of observations is received recording the atmospheric conditions around the world
every day. Current main sources of observations are: Surface and marine data,
satellites, weather balloons and aircraft. To use these observations in an operational
weather forecasting system, observations have to monitor their availability; quality
controls them, and processes them into a form that can be used by the computer
models and forecasters. Current main sources of observations are surface and marine
data, satellites, radiosondes and aircrafts. Even with the many observations received
we do not have enough information to tell us what the atmosphere is doing at all
points on and above the Earth's surface. There are large areas of ocean, inaccessible
regions on land and remote levels in the atmosphere where we have very few, or no,
5. Miss Gurbrinder Kaur
http://www.iaeme.com/IJCET/index.asp 24 editor@iaeme.com
observations. To fill in the 'gaps' we can combine what observations we do have with
forecasts of what we expect the conditions in the atmosphere to be. This is a process
called data assimilation and gives us our best estimate of the current state of the
atmosphere - the first step in producing a weather forecast. Without data assimilation,
any attempt to produce reliable forecasts is almost certain to end in failure. Data
assimilation research is focused on making the best use of observations using
advanced variational and ensemble data assimilation techniques.
3.3. Numerical Weather Prediction Model
The numerical weather prediction (NWP) process involves assimilation of
observations to provide the starting conditions for a numerical weather forecast
model. The model is essentially a computer simulation of the processes in the Earth's
atmosphere, land surface and oceans which affect the weather. Once current weather
conditions are known, the changes in the weather are predicted by the model. Even
tiny changes in the atmospheric conditions can lead to drastically different weather
patterns after only a short time, so it is vital that the current state of the atmosphere is
represented as accurately as possible. This process is highly mathematical and takes
the supercomputer longer to accurately estimate the current atmospheric state than it
does to actually make the forecast. Weather Forecasting entails predicting how the
present state of the atmosphere will change. Present weather conditions are obtained
by ground observations, observation from satellites, ships, aircraft, buoys, balloons
and weather stations covering the entire planet. This includes information from over
the oceans, from the surface (ships and buoys), from high in the atmosphere
(satellites) and below the oceans (a network of special floats called Argo).Creating
forecasts is a complex process which is constantly being updated. “Weather forecasts
made for 12 and 24 hours are typically quite accurate. Forecasts made for two and
three days are usually good. But beyond about five days, forecast accuracy falls off
rapidly.” The rate of data generation and storage far exceeds the rate of data analyses.
This represents lost opportunities in terms of scientific insights not gained and
impacts or adaptation strategies not adequately informed.
3.4. The Synoptic and Mesoscale Weather Phenomenon
The synoptic scale in meteorology is the term used to describe the scale of large-scale
weather systems of the scale of the order of 1000 kilometres or more. The
extratropical weather. This corresponds to weather events to occur at low pressure
areas e.g extropical cyclones. The term “mesoscale” is believed to have been
introduced by Ligda in [13] reviewing the use of weather radar, in order to describe
phenomena smaller than the synoptic scale but larger than the “microscale,” a term
that was widely used at the time (and still is) in reference to phenomena having a
scale of a few kilometers or less. Several weather events associated with small-scale
disturbances, regarded as noise in daily weather analyses, became the focal point of
storm researchers a micro study by Fujita [14].Meanwhile U.S weather Bureau
defined the mesoscale to be centered between 10 and 100 mi, leading to the
publication of mesometeorological (mesometeorological study of squall lines by
Fujita[15].Further Fujita in [16] found that diameter of tornadoes rarely exceeds
1000m or the mesoscale.
6. Applications of Data Mining to Predict Mesoscale Weather Events (Tornadoes and
Cloudbursts)
http://www.iaeme.com/IJCET/index.asp 25 editor@iaeme.com
Figure 3
Typical Time and Space Scale of atmospheric motion (Source: DTU university of Denmark)
Figure 4 From large scale to small scale forecast (Source: Mesoscale meteorological
modeling, university of Denmark)
4. CONCLUSION
While forecasters can identify conditions favorable for major tornado outbreaks several
days in advance, short-term forecasting of individual storms, providing additional
advanced notice, and predicting probable tornado paths remain a challenge. Because of
these limitations the weather forecasters strongly need to corporate additional
information to develop the better understanding of the formation of tornadoes.
ACKNOWLEDGEMENT
The author would like to express deepest sense of gratitude to Guide Dr. Rattan K.
Datta, Former Advisor, Department of Science & Technology, Government of India
and currently Director, Mohyal Educational Research Institute of Technology, for his
encouragement, guidance and mentoring. Without his support, it would not have been
possible to take up research in this challenging field.
REFERENCES
[1] McGovern, A. and Barto, A. G. Autonomous Discovery of Temporal
Abstractions from Interaction with an Environment. Poster presentation at the
7. Miss Gurbrinder Kaur
http://www.iaeme.com/IJCET/index.asp 26 editor@iaeme.com
Symposium on Abstraction, Refomulation, and Approximation (SARA 2002),
Volume 2371/2002, 2002, pp. 338–339.
[2] McGovern, A., Hiers, N., Collier, M., Gagne II, D. J. and Brown, R. A. 2008.
Spatiotemporal Relational Probability Trees. Proceedings of the 2008 IEEE
International Conference on Data Mining, Pisa, Italy. 15–19 December 2008, pp.
935–940.
[3] McGovern, A., Gagne II, D. J., Troutman, N., Brown, R. A., Basara, J. and
Williams, J. Using Spatiotemporal Relational Random Forests to Improve our
Understanding of Severe Weather Processes. Statistical Analysis and Data
Mining, special issue on the best of the 2010 NASA Conference on Intelligent
Data Understanding. 4(4), 2011, pp. 407–429.
[4] Lakshmanan, V., Rabin, R. and DeBrunner, V. Multiscale storm identification
and forecast. Atmospheric Research, 67–68, 2003a, pp. 367–380.
[5] Lakshmanan, V., Hondl, K., Stumpf, G. and Smith, T. Quality control of weather
radar data using texture features and a neural network, in 5th International
Conferece on Advances in Pattern Recognition. Kolkota, India, IEEE, 2003b.
[6] Lakshmanan, V., Adrianto, I., Smith, T. and Stumpf, G. A spatiotemporal
approach to tornado prediction, in Proceedings of 2005 IEEE International Joint
Conference on Neural Networks. Montreal, Canada, 3, 2005a, pp. 1642–1647.
[7] Lakshmanan, V., Stumpf, G. and Witt, A. A neural network for detecting and
diagnosing tornadic circulations using the mesocyclone detection and near storm
21 environment algorithms, in 21st International Conference on Information
Processing Systems. San Diego, CA, American Meteorological Society, CD–
ROM, J5.2, 2005b.
[8] Adrianto, I., Trafalis, T. B. and Lakshmanan, V. Support vector machines for
spatiotemporal tornado prediction. International Journal of General Systems,
38(7), 2009, pp. 759–776.
[9] Pabreja, K. and Datta, R. K. A data warehousing and data mining approach for
analysis and forecast of cloudburst events using OLAP-based data hypercube. Int.
J. of Data Analysis Techniques and Strategies, 4(1), 2012, pp. 57–82
[10] Lorenz, E. N. Deterministic non-periodic flow. J. Atmos. Sci., 42, 1963, pp. 433–
471.
[11] Krishnamurti, T. N., Kishtawal, C. M., LaRow, T., Bachiochi, D., Zhang, Z.,
Williford, C. E., Gadgil, S. and Surendran, S. Improved weather and seasonal
climate forecasts from multimodel superensemble. Science, 285, 1999, pp. 1548–
1550, doi:10.1126/science.285.5433.1548.
[12] Krishnamurti, T. N., Kishtawal, C. M., Zhang, Z., LaRow, T., Bachiochi, D.,
Williford, C. E., Gadgil, S. and Surendran, S. Multimodel ensemble forecasts for
weather and seasonal climate. J. Clim., 13, (2000), pp. 4196–4216,
doi:10.1175/1520-0442(2000)0132.0.CO.
[13] Ligda, M. G. H. Radar storm observation. Compendium of Meteorology, Malone,
T. F., ed., Amer. Meteor. Soc., 1951, pp. 1265–1282
[14] Fujita, T. T. Proposed mechanism of tornado formation from rotating
thunderstorms, 1973.
[15] Climatological Data, National Summary, 4, 6, 1953, p. 181. FUJITA, T.
Microanalytical study of thundernose. Geoph. Mag. of Japan, 22(2), 1950, pp.
71–88.
[16] Fujita, T. T. Analytical mesometeorology: A review. Meteor. Monogr., 5(27),
Amer. Meteor. Soc., , 1963, 77–125