SlideShare a Scribd company logo
1 of 1
Heat Vulnerability Indexes for Urban Environments Using the
Mahalanobis Taguchi System
Danton Zhao
Advisors: Professor Lindsey Van Wagenen & Professor Michel Lobenberg
INTRODUCTION
The Mahalanobis Taguchi System (MTS) is a multivariate statistical method which combines the
Mahalanobis distance with the testing of Taguchi orthogonal arrays. The Mahalanobis distance
is a metric of how far a sample deviates from the normal/training group. Should the training
group fall within a Gaussian distribution, the mean of the training group’s Mahalanobis
distances should be approximately equal to 1. The Taguchi orthogonal arrays are two factor
matrices which specify which variables to keep or turn off while running multiple variations of
an experiment. These arrays aim to increase testing efficiency by reducing the number of trials
needed for categorizing variables as beneficial or harmful to experimental data. By
incorporating the two into MTS, essential variables for the Mahalanobis distance can be
determined from the Signal-to-Noise Ratios yielded by the Taguchi testing.
Previous research into climate change has shown that the frequency and risk of heat-related
illnesses will rise alongside temperatures. However, these cases are not geographically
distributed, in other words, some areas will be more at-risk than other areas[2]. Many studies
have attempted to create indexes for this risk by analyzing historical health and climate
data.[3]. In 2011, an extensive case study of vulnerabilities for communities in New York City
was concluded, which utilized Z-scores to build a composite vulnerability index[1]. These Z-
scores were drawn from variables related to not only climate and health data, but also
geographical and socioeconomic data. The Mahalanobis distance is similar to the Z-score, in
the sense that it assigns some metric to the deviation of data, but it is different in that it uses a
multivariate approach. This potentially enables us to more efficiently create a heat
vulnerability index, utilizing data from a broader selection of sources.
REFERENCES
[1] Jaime Madrigano, Kazuhiko Ito, Sarah Johnson, Patrick L. Kinney, and Thomas Matte A Case-Only Study of
Vulnerability to Heat Wave Related Mortality in New York City (20002011) Environ Health Perspect 123;
doi:10.1289/ehp.1408178.
[2] G Brooke Anderson and Michelle L. Bell Heat Waves in the United States: Mortality Risk during Heat Waves and
Effect Modification by Heat Wave Characteristics in 43 U.S. Communities Environ Health Perspect 119:210218;
doi: 10.1289/ehp.1002313.
[3] California Environmental Public Tracking Network, 20140624, Heat-related inpatient hospitalizations and
emergency room visits among California residents, May-September, 2000-2012.
[4] Quantum GIS Development Team (2016). Quantum GIS Geographic Information System. Open Source Geospatial
Foundation Project. http://qgis.osgeo.org
[5] R Core Team (2016) R: A Language and Environment for Statistical
[6] MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States.
METHODOLOGY
Collecting Data
Member tracts of the normal (low vulnerability) and outside groups were identified via their corresponding
colors, which were extracted from the raster image provided by the New York City case study[1]. By utilizing
112 points, the raster image was mapped to and aligned with the 2010 New York City TIGER shapefile. The
primary color for each respective census tract was found by using QGIS’s Zonal Statistics plugin to identify
the most frequently occurring pixel value within the Red, Green, and Blue color bands. Tracts, which had
been assigned with blue for their low vulnerability, were classified as the normal group, while tracts, in red
or orange, were classified as the outside group. Although many previous studies on heat vulnerability made
use of health data from state repositories, the resource and time constraints of this project restricted us
from utilizing that data. Instead, publicly available geographical and socioeconomic data was collected and
processed.
RESULTS
CONCLUSION AND FUTURE GOALS
• No separation in Mahalanobis distances could be identified with the selected variables.
• The negative SNRs from the Taguchi arrays support the previous statement
• Findings are inconclusive, we will need more data, most likely medical data
• Hopefully, more complete datasets can be gathered in the future to increase the sample
size of eligible census tracts
• Crime data may be an interesting topic to explore with relation to heat vulnerability
Greenery coverage was found to be an important variable which helped to differentiate at-risk and not-at-
risk communities[1]. Trees classified with “Good” or “Excellent” health were imported into RStudio from
the 2005 Street Tree Census comma separated value file, hosted by NYC Open Data. Approximately 86% of
all trees linked to Staten Island were not listed with a borough census tract code. In an effort to promote
the integrity of the dataset, Staten Island trees were removed from the table alongside the other 17573
unlisted trees. Tree count, mean tree diameter at breast
height (DBH), and the standard deviation of tree DBH for each listed tract code were aggregated into a data
frame.
Looking at the colored vulnerability index map, the at-risk communities appear to be primarily located in
very urban neighborhoods. Gentrification has somewhat shifted the demographics of the people living
within these neighborhoods, but we still wanted to observe if this geographical data played any significant
role in the vulnerability. Tax lot data was extracted from the November 2011 file (11v2) within the NYC
PLUTO archive. To simplify the process of merging datasets, an R script was written which would output the
corresponding 7 digit borough tract code when a tract county and tract code were inputted. From this data,
I extracted several variables regarding the area committed to certain land use categories, from each census
tract, into a separate data frame. Number of buildings, area of land allocated to hospitals, and area of land
allocated to schools were a few of the variables tested.
Socioeconomic data was extracted from the 2011 American Community Survey. Referring
back to the colored map, the at-risk communities also appear to be located in areas of low income.
Variables such as the per capita income, unemployment count, and time required for travelling to work
were among the collection of variables aggregated to test socioeconomic impact.
METHODOLOGY (CONT.)
Writing the Program for Analysis
The Mahalanobis distance is canonically expressed, for a matrix with i rows and j columns, as:
where k is the number of columns/variables being tested, Z is the standardized version of the
matrix, and C is the covariance matrix for Z.
In R, mathematical operations performed between a matrix and a one-dimensional vector are
done across the Nth row of a matrix and the Nth element of the vector. In order to standardize an
incoming numerical data frame or matrix , the following operations would need to be
implemented.
The Taguchi portion of the MTS program, consisted of looping the Mahalanobis distance
calculations on the outside group. Each loop would use a different row of the orthogonal Taguchi
Array, which was provided by the DoE.base R package, as a reference for which variables would
be tested. The SNRs for each loop were aggregated into a one-dimensional vector, which would
later be used to find the average SNR for trials where a variable was on, and trials where a
variable was off.Fig 3: Transformed Case-study Raster
Fig 2: Points georeferenced to TIGER shapefile
Fig 4: Summary of Tree Counties
Fig 5: Mahalanobis Distances of Variables with Highest SNR Difference
Fig 6: Average SNR from Taguchi Trials
Figure 1: A Multivariate Gaussian distribution with Mahalanobis Distances Depicted on a Color Scale
Mathematics Department
NYU Tandon School of Engineering
6 MetroTech Center, Brooklyn, NY 11201
Email: dantonz@smu.edu

More Related Content

What's hot

A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...
A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...
A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...IJERA Editor
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in Indiaarjun_bhardwaj
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGIJDKP
 

What's hot (8)

B00624300_AlfredoConetta_EGM716_MAUP_Projectc
B00624300_AlfredoConetta_EGM716_MAUP_ProjectcB00624300_AlfredoConetta_EGM716_MAUP_Projectc
B00624300_AlfredoConetta_EGM716_MAUP_Projectc
 
ESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSIONESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSION
 
Sem sample size
Sem sample sizeSem sample size
Sem sample size
 
A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...
A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...
A Mathematical Model for the Genetic Variation of Prolactin and Prolactin Rec...
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
 
Austin Statistics
Austin StatisticsAustin Statistics
Austin Statistics
 
Mixed models
Mixed modelsMixed models
Mixed models
 

Similar to Heat Vulnerability Index Using Mahalanobis Taguchi System

70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...
70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...
70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...Julio Cahuana Cuba
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurementShivamKhajuria3
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
 
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxCT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxannettsparrow
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...Damian R. Mingle, MBA
 
Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysissirjana Tiwari
 
hb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskeshb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn HeskesSteyn Heskes
 
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...AI Publications
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...Lisiane Zanella
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
 

Similar to Heat Vulnerability Index Using Mahalanobis Taguchi System (20)

70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...
70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...
70321301 lepeltier-c-1969-a-simplified-statistical-treatment-of-geochemical-d...
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurement
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
C054
C054C054
C054
 
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxCT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
 
1-Manuscript_Template.docx
1-Manuscript_Template.docx1-Manuscript_Template.docx
1-Manuscript_Template.docx
 
1-Manuscript_Template.docx
1-Manuscript_Template.docx1-Manuscript_Template.docx
1-Manuscript_Template.docx
 
Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysis
 
hb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskeshb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskes
 
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...
Tripartite Sequential classification Sampling Plans tomonitor Tetranychus urt...
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...
Zuur et al 2010 methods in ecology and evolution a protocol for data explorat...
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
K-MEANS AND D-STREAM ALGORITHM IN HEALTHCARE
K-MEANS AND D-STREAM ALGORITHM IN HEALTHCAREK-MEANS AND D-STREAM ALGORITHM IN HEALTHCARE
K-MEANS AND D-STREAM ALGORITHM IN HEALTHCARE
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 

Heat Vulnerability Index Using Mahalanobis Taguchi System

  • 1. Heat Vulnerability Indexes for Urban Environments Using the Mahalanobis Taguchi System Danton Zhao Advisors: Professor Lindsey Van Wagenen & Professor Michel Lobenberg INTRODUCTION The Mahalanobis Taguchi System (MTS) is a multivariate statistical method which combines the Mahalanobis distance with the testing of Taguchi orthogonal arrays. The Mahalanobis distance is a metric of how far a sample deviates from the normal/training group. Should the training group fall within a Gaussian distribution, the mean of the training group’s Mahalanobis distances should be approximately equal to 1. The Taguchi orthogonal arrays are two factor matrices which specify which variables to keep or turn off while running multiple variations of an experiment. These arrays aim to increase testing efficiency by reducing the number of trials needed for categorizing variables as beneficial or harmful to experimental data. By incorporating the two into MTS, essential variables for the Mahalanobis distance can be determined from the Signal-to-Noise Ratios yielded by the Taguchi testing. Previous research into climate change has shown that the frequency and risk of heat-related illnesses will rise alongside temperatures. However, these cases are not geographically distributed, in other words, some areas will be more at-risk than other areas[2]. Many studies have attempted to create indexes for this risk by analyzing historical health and climate data.[3]. In 2011, an extensive case study of vulnerabilities for communities in New York City was concluded, which utilized Z-scores to build a composite vulnerability index[1]. These Z- scores were drawn from variables related to not only climate and health data, but also geographical and socioeconomic data. The Mahalanobis distance is similar to the Z-score, in the sense that it assigns some metric to the deviation of data, but it is different in that it uses a multivariate approach. This potentially enables us to more efficiently create a heat vulnerability index, utilizing data from a broader selection of sources. REFERENCES [1] Jaime Madrigano, Kazuhiko Ito, Sarah Johnson, Patrick L. Kinney, and Thomas Matte A Case-Only Study of Vulnerability to Heat Wave Related Mortality in New York City (20002011) Environ Health Perspect 123; doi:10.1289/ehp.1408178. [2] G Brooke Anderson and Michelle L. Bell Heat Waves in the United States: Mortality Risk during Heat Waves and Effect Modification by Heat Wave Characteristics in 43 U.S. Communities Environ Health Perspect 119:210218; doi: 10.1289/ehp.1002313. [3] California Environmental Public Tracking Network, 20140624, Heat-related inpatient hospitalizations and emergency room visits among California residents, May-September, 2000-2012. [4] Quantum GIS Development Team (2016). Quantum GIS Geographic Information System. Open Source Geospatial Foundation Project. http://qgis.osgeo.org [5] R Core Team (2016) R: A Language and Environment for Statistical [6] MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States. METHODOLOGY Collecting Data Member tracts of the normal (low vulnerability) and outside groups were identified via their corresponding colors, which were extracted from the raster image provided by the New York City case study[1]. By utilizing 112 points, the raster image was mapped to and aligned with the 2010 New York City TIGER shapefile. The primary color for each respective census tract was found by using QGIS’s Zonal Statistics plugin to identify the most frequently occurring pixel value within the Red, Green, and Blue color bands. Tracts, which had been assigned with blue for their low vulnerability, were classified as the normal group, while tracts, in red or orange, were classified as the outside group. Although many previous studies on heat vulnerability made use of health data from state repositories, the resource and time constraints of this project restricted us from utilizing that data. Instead, publicly available geographical and socioeconomic data was collected and processed. RESULTS CONCLUSION AND FUTURE GOALS • No separation in Mahalanobis distances could be identified with the selected variables. • The negative SNRs from the Taguchi arrays support the previous statement • Findings are inconclusive, we will need more data, most likely medical data • Hopefully, more complete datasets can be gathered in the future to increase the sample size of eligible census tracts • Crime data may be an interesting topic to explore with relation to heat vulnerability Greenery coverage was found to be an important variable which helped to differentiate at-risk and not-at- risk communities[1]. Trees classified with “Good” or “Excellent” health were imported into RStudio from the 2005 Street Tree Census comma separated value file, hosted by NYC Open Data. Approximately 86% of all trees linked to Staten Island were not listed with a borough census tract code. In an effort to promote the integrity of the dataset, Staten Island trees were removed from the table alongside the other 17573 unlisted trees. Tree count, mean tree diameter at breast height (DBH), and the standard deviation of tree DBH for each listed tract code were aggregated into a data frame. Looking at the colored vulnerability index map, the at-risk communities appear to be primarily located in very urban neighborhoods. Gentrification has somewhat shifted the demographics of the people living within these neighborhoods, but we still wanted to observe if this geographical data played any significant role in the vulnerability. Tax lot data was extracted from the November 2011 file (11v2) within the NYC PLUTO archive. To simplify the process of merging datasets, an R script was written which would output the corresponding 7 digit borough tract code when a tract county and tract code were inputted. From this data, I extracted several variables regarding the area committed to certain land use categories, from each census tract, into a separate data frame. Number of buildings, area of land allocated to hospitals, and area of land allocated to schools were a few of the variables tested. Socioeconomic data was extracted from the 2011 American Community Survey. Referring back to the colored map, the at-risk communities also appear to be located in areas of low income. Variables such as the per capita income, unemployment count, and time required for travelling to work were among the collection of variables aggregated to test socioeconomic impact. METHODOLOGY (CONT.) Writing the Program for Analysis The Mahalanobis distance is canonically expressed, for a matrix with i rows and j columns, as: where k is the number of columns/variables being tested, Z is the standardized version of the matrix, and C is the covariance matrix for Z. In R, mathematical operations performed between a matrix and a one-dimensional vector are done across the Nth row of a matrix and the Nth element of the vector. In order to standardize an incoming numerical data frame or matrix , the following operations would need to be implemented. The Taguchi portion of the MTS program, consisted of looping the Mahalanobis distance calculations on the outside group. Each loop would use a different row of the orthogonal Taguchi Array, which was provided by the DoE.base R package, as a reference for which variables would be tested. The SNRs for each loop were aggregated into a one-dimensional vector, which would later be used to find the average SNR for trials where a variable was on, and trials where a variable was off.Fig 3: Transformed Case-study Raster Fig 2: Points georeferenced to TIGER shapefile Fig 4: Summary of Tree Counties Fig 5: Mahalanobis Distances of Variables with Highest SNR Difference Fig 6: Average SNR from Taguchi Trials Figure 1: A Multivariate Gaussian distribution with Mahalanobis Distances Depicted on a Color Scale Mathematics Department NYU Tandon School of Engineering 6 MetroTech Center, Brooklyn, NY 11201 Email: dantonz@smu.edu