SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15Ayapparaj SKS
I have added answers to exercise sums (chapters 7 to 15 - even number problems) for Ron Cody's Learning SAS by Example Programmer's Guide.
In seventh chapter i gathered knowledge about using conditional statements such as IF, ELSE IF, WHERE, SELECT , sub-setting with the help of the above statements and using Boolean operators. In eighth chapter i gathered knowledge about using DO, DO WHILE, DO UNTIL along with LEAVE and CONTINUE statements and also to make a simple gplot. Ninth chapter talks about dealing with dates, finding the difference with respect to day, weekday, month, year and also computing difference quarterly, imputing missing values etc., using various functions and also to make qplot.Tenth chapter talks mainly about merging two datasets and subsetting using IN= function, updating a master table using another table and much more.Eleventh chapter talks about Functions to round and truncate numerical values, missing values, computing constant values, generating random values, to fetch values from previous observations etc.
Chapter twelve talks about functions dealing with manipulating characters. Chapter thirteen talks about array functions. Chapter Fourteen mainly deals with presenting the data. Fifteen is about generating reports.
The Boston Housing (Regression) is a classic dataset that has details about 506 properties with their median housing prices. By using algorithms such as Linear Regression (Generalized Linear Model), LASSO regression, Regression Tree, GAM and Neural Network – the prediction power of the models built using these techniques were compared.
In this project, I will work in SpaceX company and try to predict the Falcon 9 first stage. It’s important to know if the rockets will land successfully or not because the failure will cost the company many resources.
Ibm data science capstone project-SpaceX launch analysisHang (Henry) YAN
SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. The project task is to predicting if the first stage of the SpaceX Falcon 9 rocket will land successfully. Data science methodologies have been applied to analyse launch data and optimise prediction of launches.
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15Ayapparaj SKS
I have added answers to exercise sums (chapters 7 to 15 - even number problems) for Ron Cody's Learning SAS by Example Programmer's Guide.
In seventh chapter i gathered knowledge about using conditional statements such as IF, ELSE IF, WHERE, SELECT , sub-setting with the help of the above statements and using Boolean operators. In eighth chapter i gathered knowledge about using DO, DO WHILE, DO UNTIL along with LEAVE and CONTINUE statements and also to make a simple gplot. Ninth chapter talks about dealing with dates, finding the difference with respect to day, weekday, month, year and also computing difference quarterly, imputing missing values etc., using various functions and also to make qplot.Tenth chapter talks mainly about merging two datasets and subsetting using IN= function, updating a master table using another table and much more.Eleventh chapter talks about Functions to round and truncate numerical values, missing values, computing constant values, generating random values, to fetch values from previous observations etc.
Chapter twelve talks about functions dealing with manipulating characters. Chapter thirteen talks about array functions. Chapter Fourteen mainly deals with presenting the data. Fifteen is about generating reports.
The Boston Housing (Regression) is a classic dataset that has details about 506 properties with their median housing prices. By using algorithms such as Linear Regression (Generalized Linear Model), LASSO regression, Regression Tree, GAM and Neural Network – the prediction power of the models built using these techniques were compared.
In this project, I will work in SpaceX company and try to predict the Falcon 9 first stage. It’s important to know if the rockets will land successfully or not because the failure will cost the company many resources.
Ibm data science capstone project-SpaceX launch analysisHang (Henry) YAN
SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. The project task is to predicting if the first stage of the SpaceX Falcon 9 rocket will land successfully. Data science methodologies have been applied to analyse launch data and optimise prediction of launches.
R is one of the most powerful, easy to use and open source statistical software packages. In these slides, basics of R, data structures in R, data management and analysis using R are presented.
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 4: Probability
4.3: Complements and Conditional Probability, and Bayes' Theorem
Aan introduction to SAS, one of the more frequently used statistical packages in business. With hands-on exercises, explore SAS's many features and learn how to import and manage datasets and and run basic statistical analyses. This is an introductory workshop appropriate for those with little or no experience with SAS.
Complete workshop materials include demo SAS programs available at http://projects.iq.harvard.edu/rtc/sas-intro
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 5
Module 5
Chapter 10: Correlation and Regression
Chapter 11: Goodness of Fit and Contingency Tables
Chapter 12: Analysis of Variance
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Big Data Analytics & Travel Industry – The Best Deal AroundSPEC INDIA
Technologies like Internet of Things - IoT are rapidly integrating with analytical platforms to ensure continual improvements in the travel experiences. With technology being easily accessible, the contemporary traveler expects more from the integration of technologies with enterprises.
Customers will keep looking for innovative as well as competitively priced means for vacations. Big data and analytics services are projected to become the backbone of the travel industry with a marked digital transformation.
Get More Insight on Big Data & the Travel Industry at:
http://blog.spec-india.com/big-data-analytics-services-travel-industry-best-deal-around/
Travel and hospitality industry - 2017 analytics landscapeMetriplica
Use and importance of analytical strategies in the travel and hospitality domain. This relatively recent development presents both a unique challenge and an extraordinary opportunity. An opportunity that many brands are not fully capitalizing on. We travel to connect, detach, explore and experience the world outside our homes. Some of us travel for business while others travel to discover themselves.
Let's see how.
R is one of the most powerful, easy to use and open source statistical software packages. In these slides, basics of R, data structures in R, data management and analysis using R are presented.
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 4: Probability
4.3: Complements and Conditional Probability, and Bayes' Theorem
Aan introduction to SAS, one of the more frequently used statistical packages in business. With hands-on exercises, explore SAS's many features and learn how to import and manage datasets and and run basic statistical analyses. This is an introductory workshop appropriate for those with little or no experience with SAS.
Complete workshop materials include demo SAS programs available at http://projects.iq.harvard.edu/rtc/sas-intro
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 5
Module 5
Chapter 10: Correlation and Regression
Chapter 11: Goodness of Fit and Contingency Tables
Chapter 12: Analysis of Variance
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Big Data Analytics & Travel Industry – The Best Deal AroundSPEC INDIA
Technologies like Internet of Things - IoT are rapidly integrating with analytical platforms to ensure continual improvements in the travel experiences. With technology being easily accessible, the contemporary traveler expects more from the integration of technologies with enterprises.
Customers will keep looking for innovative as well as competitively priced means for vacations. Big data and analytics services are projected to become the backbone of the travel industry with a marked digital transformation.
Get More Insight on Big Data & the Travel Industry at:
http://blog.spec-india.com/big-data-analytics-services-travel-industry-best-deal-around/
Travel and hospitality industry - 2017 analytics landscapeMetriplica
Use and importance of analytical strategies in the travel and hospitality domain. This relatively recent development presents both a unique challenge and an extraordinary opportunity. An opportunity that many brands are not fully capitalizing on. We travel to connect, detach, explore and experience the world outside our homes. Some of us travel for business while others travel to discover themselves.
Let's see how.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Predicting likelihood of flight cancellations using machine learning:
Commerical flights get delayed quite often but canceled much less frequently. Even though the flight cancellation rate is not high, that one rare event causes a lot of troubles to passengers in terms of rescheduling their travel plans. It would be helpful to passengers if they knew the chance that their flights will be canceled or not. Travel planner and booking companies such as booking.com, expedia.com, kayak.com, priceline.com, etc.. can use a model capable of predicting the likelihood of the cancellation of a flight. They can then inform their customers well in advance, even before the airlines' management inform the passengers, about the probability of the cancellation of their upcoming flight. In this data science project, I develop such a predictive model for only U.S. domestic flights operating at selected airports.
The flight data is acquired from the Bureau of Transportation Statistics. This data contains information about each flight and their on-time performance, including delay, cancellation and diversion details. For this study, the data is acquired only for the years 2015 and 2016. Often the flight cancellations are caused due to bad weather conditions. Therefore, it is also important to have the hourly weather data at selected airports for 2015 and 2016. We use wunderground.com API to collect all the weather data.
The link to the GitHub repository: https://github.com/aajains/springboard-datascience-intensive/tree/master/capstone_project
AVM 3201 – Aviation Planning Case Study Deer Valley Airpor.docxrock73
AVM 3201 – Aviation Planning
Case Study
Deer Valley Airport – Demand/Capacity Analysis, & Site Selection Study
Page 1 of 13
CASE STUDY INSTRUCTIONS
DEMAND/CAPACITY ANALYSIS & SITE SELECTION STUDY
DEER VALLEY AIRPORT (DVT)
INSTRUCTIONS
This paper is mandatory meaning a final course grade of F will be received for this course if it is
not submitted. Conduct a capacity and site selection study at Deer Valley Airport, Phoenix,
Arizona. You should do the following:
1. Describe the historical aviation activity at the airport.
2. Develop a forecast of total annual demand for the airport using a trend analysis. You
may use Microsoft Excel or other statistical software.
3. Determine hourly capacity, hourly delay and annual service volume for Deer Valley
Airport using the short-term planning methodology and FAA Figures provided in FAA
Advisory Circular AC 150/5060-5, Airport Capacity and Delay. This methodology is also
provided in Chapter 5 of the Aviation Planning textbook.
4. Select a suitable alternative site for Deer Valley Airport that could the same population
as the existing airport.
5. Document your findings in a written paper.
6. Your documentation should be in the form of a written paper that includes the following
sections:
1. HISTORICAL AVIATION ACTIVITY
This section should describe the types and levels of aviation activity at the airport over the
past 20 years. It should include tables and figures as appropriate. The tabulated data
should include the number of based aircraft, the number of annual airport operations
and the split between air carrier, air taxi, general aviation and military operations as
applicable. As a minimum, 20 years of historical data should be provided. Sources of data
include: FAA ATADS at http://aspm.faa.gov/opsnet/sys/Main.asp?force=atads. You may use
the information presented in your paper on the airport’s existing conditions.
2. ANNUAL OPERATIONS, ADPM, PEAK HOUR FORECASTS
This section should include a forecast of total annual operations for Deer Valley Airport for
the years 2015 through to 2025 using a trend analysis. Your trend analysis should be based
on the 20 years of data. In addition, develop forecast for the Average Day Peak Month and
Peak Hour operation in the years 2020 and 2025. In the Year 2020 the peak month will be
March and is 9.5 percent of annual operations. In addition, peak hour operations are 6
percent of operations during the average day of the peak month. In the Year 2025 the peak
month will also be March and is 9.3 percent of annual operations and peak hour operations
are 5.5 percent of operations during the average day of the peak month.
http://aspm.faa.gov/opsnet/sys/Main.asp?force=atads
AVM 3201 – Aviation Planning
Case Study
Deer Valley Airport – Demand/Capacity Analysis, & Site Selection Study
Page 2 of 13
3. HOURLY CAPACITY
This section should report on the determination of hourly capacity ...
A Novel Approach To The Weight and Balance Calculation for The De Haviland Ca...CSCJournals
The main objective of this research is to provide companies operating different fleets of the De Havilland Canada Twin Otter DHC-6 seaplanes with an alternative method to the time-consuming Whizz Wheel procedure when calculating the weight and balance. Using this application, these operators can lower their aircraft turnaround, speed up the passenger boarding, dispatch the flights efficiently and save on fuel and dock expenses. Furthermore, this research shows how operators do their calculations currently and the positive impact of the application on their entire operation, including extra revenue generation amounting to $4M per year. Most DHC-6 seaplane operators are mainly in the Maldives. Therefore, this research was conducted while piloting these seaplanes and studying the day-to-day operations. While this paper presents the implementation of this software and its design model, it also discusses how two major operators used this application in the Maldives and one in St Vincent and the Grenadines.
Application of Data Science in the Airline industryEshaNair4
The presentation is about the application of data science in the airline industry. It gives a brief understanding about how data science tools can be applied to reduce costs, increase efficiency and most importantly to ensure a happy flying!
The aim of the project is to track the on-time performance of major domestic carriers in the US. The complete information on air travel report including raw data and summary statistics is available which enables to make predictions about possible delays in flights
taking the lead, article about Dubai Airport, using ACI data base, the collection data are analyized, in terms of passengers and cargo, using BCG matrix, hope to enjoy - mohammed
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. Table of Contents
○ Introduction
○ Business Question
○ Description of the Data
○ Exploratory Plots and Tables
○ Unsupervised and Supervised Analytics Models
○ Recommendations and Conclusion
○ Possible next steps
2
2
3. Introduction
Air travel cancellation has always been a universal problem. As more and more economic connections happen
among different countries, this issue can cause huge problems to frequent travellers, especially long-distance
travellers, such as international students and business persons. Our group members come from different parts
of the world, so this question is of key interest to us. So we decided to base our projects on the statistic data
of Bureau of Transportation Statistics of the United States, and hoped to generate some interesting insights
regarding air travel cancellation, thus to provide some useful insights for the frequent travellers mentioned
above.
Air cancellation can bring about a series of problems to various shareholders in tourism industry: the agenda of
customers get delayed, the airports get crowded, and the needs for hotel rooms rockets if a large number of
flights got cancelled on the same day due to a severe weather. On acknowledging our insights, travellers can
plan ahead accordingly, airlines and airports can make efforts to reduce cancellation based on our findings, and
hotels can plan their marketing and sales according to certain flight cancellation pattern.
3
3
4. Business Question
Flight cancellation can happen due to a variety of reasons. The most common causes are as follows:
1. Weather
2. Natural Disasters
3. Mechanical Errors
4. Monopoly Routes
5. Aircraft Size
Our team is interested in figuring out the different factors that will lead to a flight cancellation. After deciding
our datasets for this project and initial analysis of the datasets, we decided to focus on the following domains:
1. Segments - by the Airport ID of original airport and Destination Airport ID pair
2. Airport - by every Origin Airport ID
3. Airlines - by Airline ID
We have learned to analyze data with Decision Tree Model and Regression Model in Business Intelligence and
Data Mining class. So we decided to try both models to analyze the above mentioned factors, and choose the
best model that has the smallest average squared error at the initial stage of our analysis.
*In order to work with 2 datasets, we used SQL to combine these two datasets first before we start to
conduct the analysis using SAS Enterprise Miner.
4
4
5. Description of the Data
After careful observation, we choose two datasets:
(1) T100 Domestic Airline Segment Data
(2) Airline On-Time Performance Data.
Those two datasets comes from Bureau of Transportation Statistics of Research and Innovative Technology
Administration (RITA). The first dataset has more than 70k rows and contains domestic market data reported
by U.S. air carriers, including carrier, origin, destination, and service class for enplaned passengers, freight and
mail when both origin and destination airports are located within the boundaries of the United States and its
territories.1 Each month, every certificated U.S. air carriers reports their traffic information to Office of Airline
Information, using an internal normalized form named T-100, and this dataset summarized T-100 data from
1993 to 2013.
The dataset named Airline On-Time Performance Data has more than a million rows. It is collected by the
Office of Airline Information, Bureau of Transportation Statistics (BTS), and contains on-time arrival data for
non-stop domestic flights by major air carriers, and provides such additional items as departure and arrival
delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times,
cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance.2
Variables Available
These two datasets have sufficient data volume and variables for data analysis on the relationship between air
traffic patterns and externalities which hereby defined as airports and airlines.
(1) T100 Domestic Airline Segment Data
This dataset supplied key insights on the factors that result in flight cancellations. The key measures of
this dataset are listed below:
Variables
DepScheduled
Departures Performed
Payload
Available Payload (pounds)
Seats
Available Seats
Passengers
2
Departures Scheduled
DepPerformed
1
Definition
Non-Stop Segment Passengers Transported
Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=259
Source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236
5
5
6. Freight
Non-Stop Segment Freight Transported (pounds)
Mail
Non-Stop Segment Mail Transported (pounds)
Distance
Distance between airports (miles)
LoadFactor
Load Factor: Ratio of Passenger Miles to Available Seat Miles
RampTime
Ramp to Ramp Time (minutes)
AirTime
Airborne Time (minutes)
(2) Airline On-Time Performance Data
This dataset supplied the factors that affect the Delay and causes for different types of delays. The key
measures of this dataset are listed below:
Variables
Definition
CarrierDelay
Carrier Delay, in Minutes
WeatherDelay
Weather Delay, in Minutes
NASDelay
National Air System Delay, in Minutes
SecurityDelay
Security Delay, in Minutes
LateAircraftDelay
Late Aircraft Delay, in Minutes
Analysis Methodology:
1. Consolidated the data for the months of May, June and July
The first dataset contains T-100 data from 1993 to 2013 and more than 10 million records. To get
valuable and effective information, we consolidated the data from May 2013 and July 2013, and get
70,000+ records.
2. Clean and construct new variables
a) Generated variables: Flights_Cancelled, Flights_Adhoc, Adhoc?, Cancellation?
The original first dataset doesn’t have clear indicator about cancellation number, but contain
Flights_Scheduled and Flights_Performed. We subtract Flights_Performed from Flight_Scheduled and
get the number of flights with unexpected changes, including both cancellation and Adhoc. If the
6
6
7. unexpected changes is negative, we convert the changes into a new variable named”Flight_Cancelled”,
and if it’s positive, we convert the changes into another new variable named “Flights_Adhoc”. We also
created binary variables to show the occurrence of cancellation and adhoc, which are named
“Cancellation?” and “Adhoc?”.
Variables
Definition
Flights_Cancelled
Number of flights cancelled (Scheduled - Performed )
Flights_Adhoc
Number of flights which took off adhoc (Scheduled Performed)
Adhoc?
Binary Variable to depict adhoc flights
Cancellation?
Binary Variable to depict cancellations
b) Converted sum to average for: Passengers, Seats, Payload, Freight, Mail, Ramp_to_Ramp, AirTime
Several vital indicators which could be potential externalities impacting cancellations rates is in the sum
of the amount of all flights that day. Therefore, the actual flights numbers influence those indicators. To
exclude this bias possibility, we calculated the average number of the indicators (Total amount/ number
of flights performed) generated new variables to store the records.
Variables
Definition
Avg_Passengers
Avg_Passengers=Passengers/Departures Performed
Avg_Seats
Avg_Seats=Seats/Departures Performed
Avg_Freight
Avg_Freight=Freight/Departures Performed
Avg_Mail
Avg_Mail=Mail/Departures Performed
Avg_Ramp_to_Ramp
Avg_Ramp_to_Ramp=Ramp_to_Ramp/Departures Performed
Avg_AirTime
Avg_AirTime=AirTime/Departures Performed
3) Analyzed data individually for each of the datasets
Two datasets that we are interested in are related to flight cancellations and delays. They have different
7
7
8. primary keys and the internal calculation logic are intuitively different for each of these datasets.
Therefore, we decided to not to merge them, and analyzed them individually.
Exploratory Plots and Tables
We explored both our data sets to find relations between variables. Also, we tried to find interesting patterns
related to flight cancellations by using tableau.
Interesting Relationships
Using a scatter plot in the data exploration menu in SAS we were able to arrive at some interesting
relationships between key variables in our data set.
a) Departures Performed:
We plotted the variable “departures_performed” against the variable “Airline_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates
that a flight was cancelled. The above graph shows us that the density of the red pixels is very high for
departures exceeding 150. More specifically, airlines that had higher number of departures also
had flight cancellations.
8
8
9. The departures_performed variable was noted for further investigation.
b) Number of Passengers:
We plotted the variable “Total Passengers” against the variable “Airport_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red
indicates that a flight was cancelled. An increase in the number of red pixels above the 2500
passenger mark can be observed. More specifically, airports that handled higher passengers also
had flight cancellations.
The total_passengers variable was noted for further investigation.
c) Distance
9
9
10. We plotted the variable “Distance from Origin” against the variable “Dest_Airport_ID” with respect to
“Flight_Cancelled”. The color blue indicates that a flight was not cancelled and the color red indicates
that a flight was cancelled. Distances between the 500 and 750 miles mark see a larger density of red
pixels. It can be observed that shorter distance flights see more flights cancellations.
The distance variable was noted for further investigation.
Using tableau we tried to find interesting facts about key variables.
a) Monthly Distribution of cancellations:
The charts above show that June and July are the months with the highest flight delay and
cancellations. Also, the number of flights diverted increase in the month of June and July.
10
10
11. b) Geographic distribution of flight delays
The three graphs above show that:
1. Georgia had the maximum flights delayed due to weather.
2. Texas had the maximum flights delayed due to security checks.
3. Thursday sees the maximum amount of flight delays.
11
11
12. Unsupervised and Supervised Analytics Models
For this project, we used k-means clustering, as our unsupervised model, and tried decision trees and
regression models for each of the three domains: airports, airlines and segments.
Unsupervised Learning Model
In the segments domain, on running a K-means cluster analysis, we found the following:
We had 46 clusters of segments. We were primarily interested in grouping segments based on the
departures performed and the total flights cancelled in that segment.
We determined 5 major clusters. The range of departures performed in the clusters was from 6 to 864. The
range of flights cancelled for segments in the cluster was from 0 to 75. The five clusters were in decreasing
order of frequency are:
12
12
13. ● The largest cluster comprised of segments that had approximately 9 departures as the average for
the cluster, and 0.05 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 55 departures as the average for
the cluster, and 0.21 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 37.4 departures as the average for
the cluster, and 3 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 119 departures as the average for
the cluster, and 0.39 as the average of flight cancellations for the cluster.
● The next cluster comprised of segments that had approximately 88 departures as the average for the
cluster, and 2.2 as the average of flight cancellations for the cluster.
We weren’t able to analyze a significant trend through the use of this model, so we continued with predictive
modelling.
Supervised Learning Models
The two models that we looked at were :
1. Regression
2. Decision Tree
We will finally base our analysis on one of these two models depending on which has lesser average square
error.
Regression Analysis
We conducted Regression analysis to determine the significant factors that influence flight cancellations. We
performed backward, forward and stepwise regression. The diagram below represents the regression diagram :
The following actions were performed on the data:
13
13
14. 1. Data Partition: The data was partitioned into training and validation for basic model fitting and to prevent
overfitting the training data.
2. Impute: The data was imputed to fill in the missing values.
3. Regression Snapshots:
Stepwise Regression(With Airline ID as Target):
The ASE for Validation (Stepwise) : 0.100689
14
14
15. We looked at the Regressions for the other selection models too, and decided to go ahead with Stepwise as
it had the least average square error.
Output of the stepwise Regression, depicting all significant variables:
Stepwise Regression(With Origin Airport ID as Target):
The ASE for this model was 0.112633
Similarly, for the segment-wise regression model analysis, we got an ASE of 0.090134.
15
15
16. These errors that we saw with the Regression model were much higher than what the decision tree gave us,
so we rejected the regression model and based our analysis on the Decision Tree .
Decision Tree Analysis
Decision trees are a simple, but powerful form of multiple variable analysis. They provide unique capabilities to
supplement, complement, and substitute for traditional statistical forms of analysis. To access the important
variables in this study we apply the decision tree model in terms of SAS to acquire the critical variables in our
dataset.By cross validation, we found the most important variables for our target and conducted further
analysis to provide business suggestion on factors that affect the flight cancellations.
A) Based on Airline ID domain
Experiment Methodology:
1. Import the following dataset :
T-100 Segment data for the months of May,June and July (84,232 rows).
2. Edit variables and set different roles to all of variables
Variable
Role
Level
Airline ID
ID
Nominal
Aircraft Config
Input
Interval
Aircraft Group
Input
Interval
Aircraft Categorization
Input
Nominal
Departure Performed
Input
Interval
Class
Input
Nominal
Average Freight
Input
Interval
16
16
17. Average Airtime
Input
Interval
Average Total Time at ground on bot
Input
Interval
Average Mail
Input
Interval
Average Passengers
Input
Interval
Average Payload
Input
Interval
Average Ramp to Ramp
Input
Interval
Distance
Input
Interval
Month
Input
Interval
Flight Cancelled
Target
Nominal
The other variables which are not important for this analysis, were rejected.
3.Data Partition
With 70% for training and 30% for validation, all the rest is following the default setting.
4. Transformation
Variable transformations can be used to stabilize variance, remove nonlinearity, improve
additivity, and counter non-normality.The following variables were transformed in order to
address these irregularities
Variable
Method
Average Ramp to Ramp
Log
Average Payload
Log
Average Passengers
Log
Average Airtime
Log
Aircraft Categorisation
Dummy Indicator
Class
Dummy Indicator
Post transformation, the variables skewness reduced considerably and in seen in the below figures:
17
17
18. 5. Decision Tree Analysis
Applying with Cross validation, Rest are following the default settings.
6. Results
The ASE for Validation data is : 0.078363
18
18
19. Decision Tree:
We also looked at the various important variables for this dataset:
The subtree assessment plot depicted that the tree was pruned such that there are 45 leaves.
19
19
20. 7. Outcomes
For a given airline, if :
● the number of departures performed is more than approximately 3,
● the average number of passengers travelling is less than approximately 3
then there is a 99.6% probability that a flight of that airline will not be cancelled.
20
20
21. For a given airline, if :
● the average payload is less than 10,
● the Class is F
● the departures performed less than 49
then there is 82.4% probability that the flight would get cancelled.
For a given airline, if:
● the departures performed are more than 70,
● the average payload is more than 9 pounds,
● the average total time on ground is more than 18 minutes
then there is 83.3% probability that the flight would get cancelled.
B) Based on Airport ID
Changing the ID variable to Origin Airport ID and keeping the other configurations similar, we see the following
results:
The ASE for Validation data is 0.0987131
21
21
22. The decision tree:
We see that the same set of variables were important for this analysis as well:
The subtree assessment plot with the average square errors:
22
22
23. Outcomes
For a given Airport, if
● the departures performed more than 42,
● the average payload of less than 10 pounds,
● the average mails sent is more than 1,
then it is very unlikely (100%) that the flight would get cancelled.
For a particular Airport ID,
● the departures performed more than 70,
● they belong to Class F,
● the average payload of less than 10 pounds and Aircraft Config lesser than 2
then it is 83.6% likely that the flight would get cancelled.
23
23
24. C) Based on Segments (Origin Airport ID and Destination Airport ID pairs)
Experiment Methodology:
1. Import the following dataset :
T-100 Segment data for the months of May,June and July (84,232 rows).
2. Edit variables and set different roles to all of variables
Variable
Role
Level
Origin_Airport_ID
ID
Nominal
Dest_Airport_ID
ID
Nominal
flightAdHoc?
Input
Binary
Aircraft Config
Input
Interval
Aircraft Group
Input
Interval
Aircraft Categorization
Input
Nominal
Departure Performed
Input
Interval
Class
Input
Nominal
Average Freight
Input
Interval
Average Airtime
Input
Interval
Average Total Time at ground on bot
Input
Interval
Average Mail
Input
Interval
Average Passengers
Input
Interval
Average Payload
Input
Interval
Distance
Input
Interval
24
24
25. Month
Input
Interval
Flight Cancelled?
Target
Nominal
The other variables which are not important for this analysis were rejected.
3.Data Partition
With 70% for training and 30% for validation, all the rest is following the default setting.
4. Transformation
Variable
Method
Average Payload
Log
Average Passengers
Log
Average Airtime
Log
Aircraft Categorisation
Dummy Indicator
Class
Dummy Indicator
Post transformation, the variables skewness reduced considerably as seen in the figures depicted above in the
airline-based analysis.
5. Decision Tree Analysis
Applying with cross validation, rest are following the default settings.
6.Results
The ASE for Validation data is : 0.081963
25
25
26. Decision Tree:
We also looked at the various important variables for this dataset:
The subtree assessment plot depicted that the tree was pruned such that there are 36 leaves.
26
26
27. 7. Outcomes
For a given segment, if :
● The number of departures performed is more than approximately 70,
● The average allotted payload is less than approximately 9 pounds,
then there is an 88% probability that flights in that segment will get cancelled
27
27
28. For a given segment, if :
● The number of departures performed is more than approximately 70,
● The average allotted payload is more than approximately 9 pounds
● The average total time on ground for both source airport and destination airport is greater than
approximately 19 minutes
then there is an 83.3% probability that flights in that segment will get cancelled
For a given segment, if :
● The number of departures performed is less than approximately 10 and greater than 2
● The flights too off randomly without schedule,
then there is a 94.7% probability that flights in that segment will get cancelled
28
28
29. Recommendations and Conclusion
Important Variables Venn Analysis
We performed a venn analysis on the important variables in each of the three domains and plotted them,
considering those ones that were important at arriving at our recommendations.
● Departures Performed and Avg. Payload are the most important variable in our analysis for all the
29
29
30. three domains. They are the game-changing decider variables that decide cancellations for segments,
airlines and airports
● Airlines and Segments share avg total time on ground at both source and destination as an
important variable. This is interesting because it is counter-intuitive. One would think that this would
appear as a decider variable for airports
● Airlines and airports share the aircraft_class variable as common
● FlightAdHoc, Avg. Passengers, and Airport Config and Avg Mails are important for segments,
airlines and airports respectively
Findings and Recommendations
Segments
Findings:
● In segments that have flights with very less payload on an average (< 8 pounds) but fly
frequently are likely to get cancelled. Moreover, the segments that have flights with higher
payloads and fly frequently, but spend more than 18 minutes at both the source and destination
airports are also likely to get cancelled.
● In segments that have flights with few departures and are taking off without being scheduled
see less or no cancellations.
Recommendations:
● The airport should pilot a program to redirect a few congested segments’ traffic to runways
that handle the non-scheduled flights. Based on the results, it can determine whether priority
given to non-scheduled aircrafts was causing cancellations.
● A new runway should be opened to speed up ground handling and reduce the average time
spent for higher payload aircrafts on ground at both source and destination
● The airport is accommodating flights of non-congested segments, that too flights that are not
scheduled. However, congested, heavy-traffic segments but with less or no passengers are
being cancelled, and those with passengers and cargo, and those that take time on the ground
at both source and destination, are being cancelled.
Airlines:
Findings:
● For small flights (accommodating three or lesser people) that fly more often (more than 3
departures) have very little chance of getting cancelled.
● For flights that fly more often with little payload (lesser than 9 pounds) tend to get cancelled
more often. They also spend a considerable about of time at the airports (18 minutes).
Recommendations:
● The last recommendation for the segments ties into the same for the airlines domain. Ground
crew of airline companies should make sure that quick ground handling time is instilled at the
30
30
31. airport for
higher payload aircrafts on ground at both source and destination
● The payload analysis from segments complies with our finding for aircrafts with lesser number
of passengers. Just as it was found that less payload but high departure segment flights were
getting cancelled, the same for airlines hold true. Airlines ground staff at airports should be
alert when these flights are schedules to arrive and depart at airports, to make sure that
handling time is fast.
Airports:
Findings:
● For airports with frequent departures (more than 70) with relatively lesser payload ( 10 pounds
or lesser) and belonging to Class F, and with avg. mails being loaded into the aircrafts, it is very
likely that these flights would get cancelled.
Recommendations:
● As these delays affect a large population, the airports should work on Scheduled
Passenger/cargo service flights to understand why these flights result in frequent cancellations.
From our findings, it is apparent that the handling time, in terms of baggage and mail loading
into the aircrafts, is deciding the cancellations, apart from other important variables. In
conclusion, handling at the airports is taking time.
31
31
32. Possible next steps
According to Wall Street Journal, illness, family emergencies, and rescheduled business meetings are a big
business for airline companies. 3 At some airlines, the resulting change fee and penalties passengers ended up
paying added up to $2 billion a year, which is even higher than the total baggage fees. If airlines can delve more
into the seasonal client data to figure out a cancellation pattern from the passenger’s side, adjust change fees
and penalties according to the patterns discovered, the airlines can generate a higher revenue based on that
finding.
3
Source: http://online.wsj.com/news/articles/SB10001424052970204563304574318212311819146
32
32