The document summarizes a project analyzing factors that impact the landing distance of commercial flights. It includes:
- Combining and cleaning two datasets of 950 flight landings, removing duplicates and abnormal values
- Descriptive analysis finding linear relationships between distance and ground speed, and less linear relationships with other factors
- Linear regression identifying ground speed, height, and aircraft type as significant factors impacting landing distance
- The final model uses 832 observations and explains landing distance based on ground speed, height, and whether the aircraft is an Airbus or Boeing
The objective of this analysis is to quantify the factors that impact the landing distance of a commercial flight and built a linear regression model to predict the risk of landing overrun.
Regression diagnostics - Checking if linear regression assumptions are violat...Jerome Gomes
Checking if linear regression assumptions ( Linearity, Normality, Independence and Constant variance) are violated with R - Not for beginners One should have the basic concept in statistics to understand this and the different terms associated with this work sheet. #Regression diagnostics #R #Data & Analytics
Effect of stepwise adjustment of Damping factor upon PageRank : REPORTSubhajit Sahu
This is my report on Effect of stepwise adjustment of Damping factor upon PageRank (v1).
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Abstract — The effect of adjusting damping factor α, from a small initial value α0 to the final desired αf value, upon then iterations needed for PageRank computation is observed. Adjustment of the damping factor is done in one or more steps. Results show no improvement in performance over a fixed damping factor based PageRank.
Index terms — PageRank algorithm, Step-wise adjustment, Damping factor.
The objective of this analysis is to quantify the factors that impact the landing distance of a commercial flight and built a linear regression model to predict the risk of landing overrun.
Regression diagnostics - Checking if linear regression assumptions are violat...Jerome Gomes
Checking if linear regression assumptions ( Linearity, Normality, Independence and Constant variance) are violated with R - Not for beginners One should have the basic concept in statistics to understand this and the different terms associated with this work sheet. #Regression diagnostics #R #Data & Analytics
Effect of stepwise adjustment of Damping factor upon PageRank : REPORTSubhajit Sahu
This is my report on Effect of stepwise adjustment of Damping factor upon PageRank (v1).
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Abstract — The effect of adjusting damping factor α, from a small initial value α0 to the final desired αf value, upon then iterations needed for PageRank computation is observed. Adjustment of the damping factor is done in one or more steps. Results show no improvement in performance over a fixed damping factor based PageRank.
Index terms — PageRank algorithm, Step-wise adjustment, Damping factor.
Conducted Regression Analysis to study the relationship between Horsepower, Displacement, Cylinders, Acceleration on Miles Per Gallon (Mpg).
Performed Multiple Transformations (Log Transformations, Dummy Variables) and found out that the Adjusted R-Squared improved with each model.
Conducted heteroskedasticity checks and corrected the heteroskedasticity problem using robust standard errors.
FAA Flight Landing Distance Forecasting and AnalysisQuynh Tran
The overall goal of this project is to get an ideal model to forecast landing distance based on variables given in the dataset. To be able to come up with a good model that fits the dataset, we need to go through some certain steps to explore, clean, visualize, and analyze values in the dataset.
Predicting aircraft landing overruns using quadratic linear regressionPrerit Saxena
The aim of the project is to predict landing overrun for an aircraft given the airstrip length of the airport. The methodology used is quadratic linear regression. Mean R-squared of 98% is achieved with MSE being just 9% of the mean landing distance.
Use of Linear Regression in Machine Learning for Rankingijsrd.com
Machine Learning is a growing field today in AI. We discuss use of a Supervised Learning algorithm called as Regression Learning in this paper for ranking. Regression Learning is used as Prediction Model. The values of dependent variable are predicted by Regression Model based on values of Independent Variables. By Regression Learning if after Experience E, program improves its performance P, then program is said to be doing Regression Learning. We chose to use Linear Regression for Ranking and discuss approaches for Rank Regression Model Building by selecting best Ranking parameters from Knowledge and confirming their selection further by performing Regression Analysis during Model building. Example is explained. Analysis of Results and we discuss the Combined Regression and Ranking approach how it is better for enhancing use of Linear Regression for Ranking purpose. We conclude and suggesting future work Ranking and Regression.
Air traffic forecast serves as an important quantitative basis for airport planning - in particular for capacity planning CAPEX ,as well as for aeronautical and non-aeronautical revenue planning. High level decisions and planning in airports relies heavilly on future airport activity.
A study of the Behavior of Floating-Point Errorsijpla
The dangers of programs performing floating-point computations are well known. This is due to numerical reliability issues resulting from rounding errors arising during the computations. In general, these round-off errors are neglected because they are small. However, they can be accumulated and propagated and lead to faulty execution and failures. Typically, in critical embedded systems scenario, these faults may cause dramatic damages (eg. failures of Ariane 5 launch and Patriot Rocket mission). The ufp (unit in the first place) and ulp (unit in the last place) functions are used to estimate maximum value of round-off errors. In this paper, the idea consists in studying the behavior of round-off errors, checking their numerical stability using a set of constraints and ensuring that the computation results of round-off errors do not become larger when solving constraints about the ufp and ulp values.
Predicting landing distance: Adrian VallesAdrián Vallés
Conducted data preparation/cleaning and statistical modeling in a project using SAS to consider factors affecting flight landing overrun and predicting landing distance of commercial flights to reduce overrun. This was the final project for the statistical computing class (BANA 6043)
Conducted Regression Analysis to study the relationship between Horsepower, Displacement, Cylinders, Acceleration on Miles Per Gallon (Mpg).
Performed Multiple Transformations (Log Transformations, Dummy Variables) and found out that the Adjusted R-Squared improved with each model.
Conducted heteroskedasticity checks and corrected the heteroskedasticity problem using robust standard errors.
FAA Flight Landing Distance Forecasting and AnalysisQuynh Tran
The overall goal of this project is to get an ideal model to forecast landing distance based on variables given in the dataset. To be able to come up with a good model that fits the dataset, we need to go through some certain steps to explore, clean, visualize, and analyze values in the dataset.
Predicting aircraft landing overruns using quadratic linear regressionPrerit Saxena
The aim of the project is to predict landing overrun for an aircraft given the airstrip length of the airport. The methodology used is quadratic linear regression. Mean R-squared of 98% is achieved with MSE being just 9% of the mean landing distance.
Use of Linear Regression in Machine Learning for Rankingijsrd.com
Machine Learning is a growing field today in AI. We discuss use of a Supervised Learning algorithm called as Regression Learning in this paper for ranking. Regression Learning is used as Prediction Model. The values of dependent variable are predicted by Regression Model based on values of Independent Variables. By Regression Learning if after Experience E, program improves its performance P, then program is said to be doing Regression Learning. We chose to use Linear Regression for Ranking and discuss approaches for Rank Regression Model Building by selecting best Ranking parameters from Knowledge and confirming their selection further by performing Regression Analysis during Model building. Example is explained. Analysis of Results and we discuss the Combined Regression and Ranking approach how it is better for enhancing use of Linear Regression for Ranking purpose. We conclude and suggesting future work Ranking and Regression.
Air traffic forecast serves as an important quantitative basis for airport planning - in particular for capacity planning CAPEX ,as well as for aeronautical and non-aeronautical revenue planning. High level decisions and planning in airports relies heavilly on future airport activity.
A study of the Behavior of Floating-Point Errorsijpla
The dangers of programs performing floating-point computations are well known. This is due to numerical reliability issues resulting from rounding errors arising during the computations. In general, these round-off errors are neglected because they are small. However, they can be accumulated and propagated and lead to faulty execution and failures. Typically, in critical embedded systems scenario, these faults may cause dramatic damages (eg. failures of Ariane 5 launch and Patriot Rocket mission). The ufp (unit in the first place) and ulp (unit in the last place) functions are used to estimate maximum value of round-off errors. In this paper, the idea consists in studying the behavior of round-off errors, checking their numerical stability using a set of constraints and ensuring that the computation results of round-off errors do not become larger when solving constraints about the ufp and ulp values.
Predicting landing distance: Adrian VallesAdrián Vallés
Conducted data preparation/cleaning and statistical modeling in a project using SAS to consider factors affecting flight landing overrun and predicting landing distance of commercial flights to reduce overrun. This was the final project for the statistical computing class (BANA 6043)
Designed to construct a statistical model describing the impact of a two or more quantitative factors on a dependent variable. The fitted model may be used to make predictions, including confidence limits and/or prediction limits. Residuals may also be plotted and influential observations identified.
Air Passenger Prediction Using ARIMA Model AkarshAvinash
How has the Airline industry suffered during the pandemic? was a question that always stuck in my mind
when I saw articles on how travelling has been banned and movement of people not only from one country
to another country but also one state to another was being restricted. Hence as a statistics Student with a
curious mind I set out on a quest to find the effect of pandemic on the airline Industry. Tying statistics to
business problems that could benefit a business excites me. Hence I took up the initiative and called two
friends and decided to take their help in this task
we decided to get month wise domestic and international aviation data of the number of departures and
passengers in India during Jan 2010 to April 2022 from Airport Authorities of India website. We then took
this data cleaned, processed and transformed it to make it usable for our analysis. The analysis I suggested
to do for this objective was a familiar one which we had recently learnt in our fifth semester which was
Time series analysis under which we used the Auto Regressive Integrated Moving Average model which
creates a model that uses the past data to predict the future. As I am comfortable in coding I did the analysis
using R studio and python which has some excellent libraries to assist us in the analysis. We created the
model in such a way that the data could predict how the industry would behave if covid had not occurred.
We then compared the reality with the simulation which gave us some interesting interpretations. The results we found is that, international aviation industry on an average suffered five crores thirty three lakhs
per flight per month in losses and the domestic industry on an average suffered eighty two lakhs twenty
four thousand per flight per month in losses. But the key takeaway for the aviation industry from our
simulation vs reality analysis is that international travel is almost back on track after a major setback like
travel ban and it took 2 years and 3 months to do so whereas domestic travel is yet to recover.
I presented our findings and analysis to my statistics professor Mrs.Anwesha Roy also under whose
guidance we could come this far. She was thrilled with our work and encouraged us to get it published and
my team is currently working on it.
The aim of the project is to track the on-time performance of major domestic carriers in the US. The complete information on air travel report including raw data and summary statistics is available which enables to make predictions about possible delays in flights
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Stats computing project_final
1. _
BANA 6043 Project
NAME: AYANK GUPTA UCID:M12388639
Background: Flight landing.
Motivation: To reduce the risk of landing overrun.
Goal: To study what factors and how they would impact the landing distance of
a commercial flight.
Data: Landing data (landing distance and other parameters) from 950
commercial flights (not real data set but simulated from statistical models). See
two Excel files ‘FAA-1.xls’ (800 flights) and ‘FAA-2.xls’ (150 flights).
2. _
Chapter 1: Data Preparation
1. Combining of the data sets from different sources
Output of both the imports
/**FAA1**/
4. _
/*Checking for Duplicates and removing them from the combines datasets*/
Note: We observed 100 duplicates entries from the combines dataset hence
removed it from it.
5. _
2. Performing the completeness check of each variable – examine if
missing values are present;
Variable N Missing Values % Missing Values
Duration 800 50 5.8%
no_pasg 850 0 0%
speed_ground 850 0 0%
speed_air 208 642 75%
Height 850 0 0%
Pitch 850 0 0%
Distance 850 0 0%
Note:
1. 16% of the values of the DURATION variable are missing because 50 rows are missing from
the FAA2 datasets
2. 75% of the values of the speed_air are missing and we need to further examine the column
for data cleaning
Performing the validity check of each variable – examine if abnormal values are present;
6. _
NOTE: Here we see that the height of the few values in height are negative and we need to flag them
out from our next analysis.
In our next analysis, we will perform the analysis on each and every variable based on the business
rule given for each variables.
7. _
/*Checking for outliers in height*/
Note: By performing the above step we are able to identify the heights with negative hieghts.
8. _
Cleaning the data based on the results of Steps 2 and 3
Note : We are able to remove 18 Values according to the abnormalities
1. For now we are not removing the missing values rows because it will create bias in the data
a. I am planning to impute the missing values.
b. Or I will be using some approximations like mean to fill the missing values
9. _
Summarizing the distribution of each variable
We went ahead to see the distribution of each and every variable to see which of the variable
shows a normal distribution and those variables who are in a way skewed or biased to
Variable Label N Mean Std Dev Minimum Maximum
Skweness
duration duration 782 154.731 48.335 41.949 305.622
0.192089
no_pasg no_pasg 832 60.060 7.488 29.000 87.000
-0.015304
speed_ground speed_ground 832 79.611 18.829 33.574 136.659
0.110191
speed_air speed_air 204 103.646 9.982 90.003 136.423
0.9447
height height 832 30.474 9.791 6.228 59.946
0.125057
pitch pitch 832 4.005 0.526 2.284 5.927
0.016221
distance distance 832 1,528.240 911.045 41.722 6,309.950
1.560395
DURATION
16. _
My Interpretation on the XY plot of the data
1. Distance Vs Duration: The values seem to scatter and the relationship
doesn’t seem to be linear
2. Distance Vs No_Pasg: the relationship is not linear
3. Distance Vs Speed_Ground: The relation is linear or in other words the
relationship shows a monotonic relationship
4. Distance Vs Speed air is fairly linear but we have a lot of missing values in
the speed air, hence the relationship cannot be considered significant
5. Distance Vs Height and Pitch seems a bit scattered
17. _
Correlation Matrix between the variables and their interpretation:
Interpretation of the Correlation between the independent Variables
➢ We need to check the collinearity between all the independent variables to check for multi
collinearity between the independent variables which might lead to some discrepancy in our
linear regression models
➢ We observe that correlation between speed air and speed ground and hence while
considering both the variables in regression we need to be extra carful
➢ Except of that we can observe that all the other variables are fairly uncorrelated with each
other which is a good sign for our regression model
Note: Argument against considering the Air speed variables:
We observe that air speed variables have almost 70% missing values which means if we try to
impute the variables using sensible imputation or through predictive imputation we will be
predicting more that 70% of the values based on the remaining 30% values which may not be a wise
or a sensible decision to do.
Another factor since values of ground speed and air speed are very much correlated we can instead
only use air ground for our regression model.
18. _
Chapter 3: Statistical modelling
Please look at the R square which is value which we can use to check the regression model with one
another to check for the accuracy of the regression model.
Our Aim on the model improvement will be to have a model with a better R Square but with a
caution that we don’t overfit the model.
Note: For our next iteration of the model we will consider only the variables speed ground , height
and pitch
19. _
Now we need to check the variables that we need to consider for our regression Analysis.
All the variables with P vales more that 0.1 will be not considered for the analysis.
For the variables with P value slightly significant should be carefully selected as we might be over
fitting our model which will be harmful when we are testing our results on the test sets.
20. _
Note :
We observe a few things like the residual shows a normal distribution.
Since the R square values doesn’t change we have our regression model finalized with the significant
variables. And R square value seems pretty good for a model in terms of accuracy
We further need to validate a model.
We can either validate our regression model by testing its accuracy on the test data set.
Since at this movement we don’t have a test data set present, we can perform a basic validation
with the help of model checking.
21. _
Model checking
Observation
1. The residual is normal distributed
2. The mean of the residual is 0
3. We have a constant Variance
Hence, we can conclude that the model is validates through model checking
22. _
Chapter 4: Project Summary
Summary of the Project
Background: Flight landing.
Motivation: To reduce the risk of landing overrun.
Goal: To study what factors and how they would impact the landing distance of a commercial flight.
Data: Landing data (landing distance and other parameters) from 950 commercial flights (not real
data set but simulated from statistical models
1. Data Preparation
a. Combined both data sets.
b. Removed duplicates on the datasets
c. Removed the abnormal observation from the data sets
d. Checked the distribution of each variable in the datasets.
2. Descriptive Study (XY plots and correlation studies)
a. Studying the X-Y plot between the different variables.
i. We observed that relationship between distance and ground speed is highly
linear
ii. Whereas relationship between distance w.r.t height and pitch are slightly
linear
iii. Relationship between of distance with duration and Nonpigs is obviously not
linear
b. Studying the Correlation between the independent variables
i. Only ground speed and air speed showed a great collinearity but since the
speed air is highly empty we can remove it from our regression model and
hence we don’t need to worry about the multi collinearity.
ii. All the other variables are quite non- collinear.
3. Statistical modelling- Linear regression.
a. To study the factors with respect to the landing distance we made a linear
regression.
i. R2
of the model was roughly 0.84.
ii. It showed ground speed, height and aircraft as significant variables with P
value less than .0001
b. Correction in the model: To make a better model we consider only the significant
variables and then checked the R2
which has increased slightly.
i. Now our dependent variable which is distance depends on the independent
variables which are Ground speed, Height and aircraft.
Our regression models
Distance= 42.7*(Ground Speed)+14.5*(Height)-501(air_craft_flag)-2052
23. _
Answering the Questions
How many observations (flights) do you use to fit your final model? If not all 950 flights,
why?
1. There were 832 observation that I used to train my data to fit the linear
regression models
1. We removed 100 observations because they were duplicates
2. We further removed 18 values since they were the abnormal values.
3. We could have removed 50 observations for which duration was empty but we did
not because duration was not a significant parameter when considering for
regression
2. What factors and how they impact the landing distance of a flight?
Factors that Affect the landing distance as follows:
1. Ground Speed: With an increase in ground speed the landing distance increases
2. Height: With an increase in height the landing distance increases
3. Air_Craft_flag: Where 1 stands for Airbus and 0 stands for Boing. Both make of the
aircraft showed different behaviour in terms of landing distance
3.Is there any difference between the two makes Boeing and Airbus
24. _
For Airbus N=444
For Boeing N=388
When we make a regression, model check them with respect to aircraft make we observe
For Boeing, pitch is insignificant in the regression model whereas for air bus, it is quite significant