THE ROLE OF DATA MINING IN THE PREDICTION OF WINDSTORM DAMAGES N. O. Nawari, Ph.D., P.E., M.ASCE, Kent State University, College of Architecture and Environmental Design, Taylor Building, Kent, OH 44242 Email: firstname.lastname@example.org
Abstract Prediction and assessment of hurricane buildings damage involves many sources of uncertain data that make it difficult task using conventional prediction models. These data would be imprecise and multidimensional in nature; some hidden relationships within the data can only be retrieved using comprehensive data analysis techniques like data mining. Data mining employs algorithms that are a mixture of statistics, fuzzy logic, genetic algorithms, maths and artificial intelligence. There are a large number of algorithms that seek relationships within datasets from which rules of some kind can be derived and subsequently used for design prediction, classification or other functions. This work discusses the role of data mining algorithms in the prediction and classification of damages due to hurricane forces. The research focuses on the conceptual framework for the data mining models to assist in the prediction and assessment of buildings damages caused by hurricanes. Different data mining models defined in MS SQL2005 and in particular algorithms were chosen that were amongst the simplest examples of these groups of models, namely Association Rules, Decision Tree and Naïve Bayes, Clustering and Neural Network .
INTRODUCTRION The areas along the United States Gulf and Atlantic coasts where most of this country's windstorm related fatalities have occurred are also now experiencing the country's most significant growth in population. This situation, in combination with continued building along the coast, will lead to serious problems for many areas in hurricanes. Because it is likely that people will always be attracted to live along the shoreline, a solution to the problem lies in proper engineering design and mitigation. OBJECTIVES The objective of the study is to establish the importance of data mining models to assist in the prediction and assessment of building damages due to tropical cyclones. The presentation below offers a conceptual perspective on the role of the data mining algorithms in supporting changes and improvement to building codes and standards to achieve higher performance and safety measures of residential buildings.
UNCERTAINTIES Windstorm is a very complicated phenomenon. It is air and water in turbulent flow, which means that the motion of individual air or water particles is so erratic that in studying storm one ought to be concerned with statistical distributions of speeds and directions rather than with simple averages or fixed physical quantities. For analytical model, storm forces can be classified as one of a combination of: - Wind Pressure - Windborne Debris - Falling objects - Flood Pressure - Rain Forces The resistance of buildings to wind pressures has been the subject of considerable research and is addressed by building codes. However, normal design loads specified in these codes are substantially lower than those that occur during a windstorm. This is due to many sources of uncertainties involved in the computational model.
The ASCE 7 provision describes computational method for wind pressure using a number of coefficients that require considerable judgment to determine which pressure coefficients to use, how to determine tributary areas for cladding and framing elements, and whether building elements should be designed as part of the main wind force resisting system or components and cladding. The corners, edges, and eave overhang of a building are subjected to complicated forces as windstorm passes these obstructions, causing higher localized suction forces that are not considered appropriately in Building Codes. In addition, t here is no computational model or standard test protocol in the industry for the critical structural elements that addresses storm pressures generated by hurricanes or tornadoes.
DATA MINING MODELES D ata mining is the process of extracting valid, authentic, and meaningful relationships from large quantities of data. It involves uncovering patterns in the data and is often tied to data warehousing because it attempts to make large amounts of data actionable. Data mining employs algorithms that are a mixture of statistics, fuzzy logic, genetic algorithms, and artificial intelligence. Building a mining model is part of a larger process that includes all from defining the basic problem that the model will solve, to deploying the model into a working environment. This process can be defined by using the following basic steps: - Define the problem - Preparing data - Defining models - Validation and exploration - Deploying and updating models
The following diagram shows the steps involved in a typical data-mining project. Figure 1- Data mining components
A mining model is defined by a data mining structure object, a data mining model object, and a data mining algorithm. Microsoft SQL Server 2005 Analysis Services (SSAS) provides several algorithms for use in data mining solutions: - Decisions Trees, Clustering, Association Rules, Naïve Bayes, and Neural Network. The following is a brief illustration of these algorithms. Decisions Trees Decision tree is a classification and regression analysis for discrete or continuous attributes, the algorithm makes predictions based on the relationships between input columns in a dataset (Figure 2). It uses the values, or states, of those columns to predict the states of a column that is designated as predictable. Figure 2 - Decision Tree Diagram
Association Rules Association rules algorithm is a mining mechanism for finding correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a forecast analysis. Association models are built on datasets that contain identifiers both for individual cases and for the items that the cases contain. Naïve Bayes This algorithm calculates the conditional probability between input and predictable columns, and assumes that the columns are independent. It is based upon the simplifying hypothesis that when you evaluate column A as a predictor for target columns B1, B2, and son on, you can disregard dependencies between these target columns.
Clustering The Microsoft Clustering algorithm is a segmentation algorithm provided that uses iterative techniques to group data cases into clusters that contain similar characteristics. These groupings are useful for exploring data, identifying anomalies in the data, and creating predictions. Clustering models identify relationships in a dataset that might not be derived logically through normal observation. The Microsoft Clustering algorithm first identifies relationships in a dataset and generates a series of clusters based on those relationships. A scatter plot is a useful way to visually represent how the algorithm groups data, as shown in the following diagram. Figure 4 . Cluster groups diagram.
Neural Network The Microsoft Neural Network algorithm creates classification and regression mining models by constructing a Multilayer Perceptron network of neurons. In this Multilayer Perceptron network, each neuron receives one or more inputs and produces one or more identical outputs. Similar to the Microsoft Decision Trees algorithm, the Neural Network algorithm calculates probabilities for each possible state of the input attribute when given each state of the predictable attribute. These probabilities can be used to predict an outcome of the predicted attribute, based on the input attributes.
Analysis The relationship between common sizes and geometric shapes of concrete, steel, timber and masonry residential building, gravity and lateral resistive systems, the intensity of storm, and the degree of damages can be analyzed using the data mining models techniques to provide supportive damage prediction system. The output prediction vector is based on the damage categories specified by FEMA320. These categories are modified to include the distinction between envelope and structural damages . In contrast to FEMA320, ten damage categories are proposed in this study: (1)- Minimal: No real structural damage is done. Minor building envelope damages may occur (less than 5%). (2)- Low Moderate: 6%-10% Roof and other envelope components are damaged. (3)- Moderate: 11%-20% Roof and other envelope components are damaged.
(4)- High Moderate: more than 20% Roof and other envelope components are damaged. (5)- Low Extensive: less than 10% Structural damage is done along with damages to envelope. (6)- Extensive: less than 20% Structural damage is done along with damages to envelope. (7)- Low Extreme: Extensive damage is done to many envelope components ( 21% - 30%) accompanied by structural damages (21% - 30%) (8)- Extreme: Extensive damage is done to many envelope components (31% - 50%) accompanied by structural damages (31% - 50%) that may result in complete building failure. (9)- Very Extreme: Extensive damage is done to many envelope components (51% - 60%) accompanied by structural damages (51% - 60%) that may result in complete building failure. (10)- Catastrophic : Envelope damage is extensive and widespread (> 60%). Structural damage is considerable and there are complete to near complete buildings failure.
CONCLUSIONS The relationship between windstorm and building damages sited raises many uncertainties about current building design and construction practices. Application of Data Mining Techniques provide a supportive tool to handle uncertainty and discover hidden relationships and rules that assist in classifying, predicting and associating different building damages and windstorm patterns. The system could also be instrumental in updating Building Codes and standard. From the severity of destructions shown in recent hurricanes and tornadoes, it is apparent that Building Codes and Standards need to re-address windstorm resistive systems.