SlideShare a Scribd company logo
Marwan Ashraf
30/09/2023
2
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
Outline
3
● Summary of methodologies
○ Data Collection with API
○ Data Collection with Web Scraping
○ Data Wrangling
○ Exploratory Data Analysis with SQL
○ Exploratory Data Analysis with Visualization
○ Interactive map with Folium
○ Interactive Dashboards with Dash
○ Model prediction with Machine Learning
● Summary of all results
○ Exploratory Data Analysis result
○ Interactive Analytics visuals
○ Predictive modeling results
Executive Summary
4
Introduction
● Project background and context
SpaceX advertises Falcon 9 rocket launches on its website, with a cost of
62 million dollars; other providers cost upward of 165 million dollars each,
much of the savings is because SpaceX can reuse the first stage.
Therefore if we can determine if the first stage will land, we can determine
the cost of a launch. This information can be used if an alternate company
wants to bid against SpaceX for a rocket launch
● Problems you want to find answers
○ What factors determine if launch was successful?
○ The interaction amongst various features that determine the
success rate of a successful landing.
○ What operating conditions needs to be in place to ensure a
successful landing program.
5
Section
1
6
Executive Summary
Data collection methodology:
Data was collected using SpaceX API and web scraping from Wikipedia.
Perform data wrangling
One-hot encoding was applied to categorical features
Perform exploratory data analysis (EDA) using visualization and SQL
Perform interactive visual analytics using Folium and Plotly Dash
Perform predictive analysis using classification models
How to build, tune, evaluate classification models
Methodology
7
● The Data was collected by various methods
○ Data Collection by SpaceX API
○ Next, I decoded the response content as a Json using .json() function
call and turn it into a pandas dataframe using .json_normalize().
○ Then I cleaned the data by checking for missing values and fill in
missing values where it’s necessary
○ Also, we performed web scraping from Wikipedia for Falcon 9 launch
records with Beautifulsoup
Data Collection
8
• I used the get request to the
SpaceX API to collect and clean
the request data
• The notebook github link
Data Collection – SpaceX API
9
• I used web scraping
techniques to obtain
Falcon 9 launches data
from wikipedia
• The notebook github link
Data Collection - Scraping
10
• I performed exploratory data analysis and
determined the training labels.
• I calculated the number of launches at
each site, and the number and occurrence of
each orbits
• I created landing outcome label from
outcome column and exported the results to
csv
• The notebook github link
Data Wrangling
11
•We explored the data by visualizing the
relationship between flight number and
launch Site, payload and launch site,
success rate of each orbit type, flight
number and orbit type, the launch
success yearly trend.
EDA with Data Visualization
• The notebook github link
12
● SQL queries performed include:
○ Displaying the names of the unique launch sites in the space mission
○ Displaying 5 records where launch sites begin with the string ‘KSC’
○ Displaying the total payload mass carried by boosters launched by NASA (CRS)
○ Displaying average payload mass carried by booster version F9 v1.1
○ Listing the data where the successful landing outcome in drone ship was achieved
○ Listing the data where the successful landing outcomes in drone ship was achieved
○ Listing the names of the boosters which have success in ground pad and have payload mass
greater than 4000 but less than 6000
○ Listing the total number of successful and failure mission outcomes
○ Listing the names of the booster version which have carried maximum payload mass.
○ Listing the records which will display the month names, successful landing. Outcomes in
ground pad booster
○ Various launch site for the months in year 2017
○ Ranking the count of successful landing outcomes between dates 2010 to 2017
● The SQL Code
EDA with SQL
13
• I added markers for the aim of
finding an optimal location for
building a launch site
• The notebook github link
Build an Interactive Map with Folium
14
• I built an Interactive dashboard with plotly and dash
• I plotted pie charts showing the total launches by certain sites
• I plotted scatter plot showing the correlation between outcome and PayloadMass
with different booster versions
• The code github link
Build a Dashboard with Plotly Dash
15
• I loaded the data using numpy and pandas, transformed the data, split our data into training
and testing.
• I built different machine learning models and tune different hyperparameters using
GridSearchCV.
• I used accuracy as the metric for our model, improved the model using feature engineering
and algorithm tuning.
• I found the best performing classification model.
•The notebook link
Predictive Analysis (Classification)
• Exploratory data analysis results
• Interactive analytics demo in screenshots
• Predictive analysis results
16
Results
Section
2
18
• From the plot, we found that the larger the flight amount at a launch site, the
greater the success rate at a launch site
Flight Number vs. Launch Site
19
● From the plot, we found that the larger the flight amount at a launch site,
the greater the success rate at a launch site
Payload vs. Launch Site
20
•From the plot, we can see that ES-L1, GEO, HEO, SSO, VLEO had the
most success rate.
Success Rate vs. Orbit Type
21
• The plot below shows the Flight Number vs. Orbit type. We observe that in the LEO orbit,
success is related to the number of flights whereas in the GTO orbit, there is no relationship
between flight number and the orbit
Flight Number vs. Orbit Type
22
•We can observe that with heavy payloads, the successful landing are more for
PO, LEO and ISS orbits.
Payload vs. Orbit Type
23
•From the plot, we can
observe that success rate
since 2013 kept on
increasing till 2020.
Launch Success Yearly Trend
24
•I used the keyword distinct to
show only unique launch sites
from the SpaceX data.
All Launch Site Names
25
• This query display 5
records where the
launch site begin with
‘CCA’
Launch Site Names Begin with 'CCA'
26
•This query calculate the
total payload carried by
boosters from NASA as
45596
Total Payload Mass
27
• This query calculate
the average payload
mass carried by
booster version F9
v1.1 as 2928.4
Average Payload Mass by F9 v1.1
28
• This query shows that
the date of the first
successful landing
outcome on ground pad
was 22nd December 2015
First Successful Ground Landing Date
29
•The WHERE clause to filter for boosters which have successfully landed
on drone ship and applied the AND condition to determine successful
landing with payload mass greater than 4000 but less than 6000
Successful Drone Ship Landing with Payload between 4000 and 6000
30
• This query counts the the number of mission outcome and group by the
mission outcome. This means that there were 100 successful missions
and 1 failed mission.
Total Number of Successful and Failure Mission Outcomes
31
•We determined the booster that have carried the maximum payload using
a subquery in the WHERE clause and the MAX() function.
Boosters Carried Maximum Payload
32
• The used combinations of the WHERE clause, LIKE, AND, and BETWEEN conditions to
filter for failed landing outcomes in drone ship, their booster versions, and launch site
names for year 2015
2015 Launch Records
33
•We selected Landing outcomes and the COUNT of landing outcomes from the data and used the WHERE clause
to filter for landing outcomes BETWEEN 2010-06-04 to 2010-03-20
We applied the GROUP BY clause to group the landing outcomes and the ORDER BY clause to order the grouped
landing outcome in descending order
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
Section
3
35
All launch sites global map markers
36
The first 3 locations are launch sites in Florida while the last image is the launch sites in
California. Each launch site has a label if the label is green this indicates that the launch was
successful.
Showing launch sites with colored labels
37
This map shows the distance between the launch site and its closest railway, highway,
coast, and city. It helps us understand that the closest feature to this launch site is the
highway, followed by the coast and then the railway. The farthest feature is the city.
Launch Site Proximity Mapping: Railway, Highway, Coast, City
Section
4
39
Here we can see that KSC LC-39A has the most successful launches
from all launch sites
Pie Chart Depicting the Success Rate of Each Launch
Site
40
KSC LC-39A has 76.9% success rate while getting only 23.1% failure rate.
Pie Chart Showing the Launch site with the highest success ratio
41
We can see that the success of light weighted payload is higher than the
success of ht
Correlation Between Payload Mass and The success of all
Section
5
43
For accuracy test, all methods performed similar. We could get more test
data to decide between them. But if we really need to choose one right now,
we would take the decision tree.
Classification Accuracy
44
• The 4 models has the same
confusion matrix as they has
the same accuracy test
percentage the main problem
of this models is false
positivity
Confusion Matrix
45
• The success of a mission can be explained by several factors such as the launch site,
the orbit and especially the number of previous launches. Indeed, we can assume that
there has been a gain in knowledge between launches that allowed to go from a
launch failure to a success.
• The orbits with the best success rates are GEO, HEO, SSO, ES-L1.
• Depending on the orbits, the payload mass can be a criterion to take into account for
the success of a mission. Some orbits require a light or heavy payload mass. But
generally low weighted payloads perform better than the heavy weighted payloads.
• With the current data, we cannot explain why some launch sites are better than others
(KSC LC-39A is the best launch site). To get an answer to this problem, we could obtain
atmospheric or other relevant data.
• For this dataset, we choose the Decision Tree Algorithm as the best model even if the
test accuracy between all the models used is identical. We choose Decision Tree
Algorithm because it has a better train accuracy.
Conclusions
IBM_Data_Science_Capstone_Pressenation.pptx

More Related Content

What's hot

Spark core
Spark coreSpark core
Spark core
Freeman Zhang
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
Yingjun Wu
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
Amit Raj
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
MariaDB plc
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
EDB
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
Spark Summit
 
Backup and-recovery2
Backup and-recovery2Backup and-recovery2
Backup and-recovery2
Command Prompt., Inc
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Implementing Highly Performant Distributed Aggregates
Implementing Highly Performant Distributed AggregatesImplementing Highly Performant Distributed Aggregates
Implementing Highly Performant Distributed Aggregates
ScyllaDB
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
Neo4j
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Spark Summit
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
MariaDB plc
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Spark Summit
 
Manage your ODI Development Cycle – ODTUG Webinar
Manage your ODI Development Cycle – ODTUG WebinarManage your ODI Development Cycle – ODTUG Webinar
Manage your ODI Development Cycle – ODTUG Webinar
Jérôme Françoisse
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
Bobby Curtis
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4j
Neo4j
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
Thamme Gowda
 

What's hot (20)

Spark core
Spark coreSpark core
Spark core
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
 
Streaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScaleStreaming Operational Data with MariaDB MaxScale
Streaming Operational Data with MariaDB MaxScale
 
Discover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQLDiscover PostGIS: Add Spatial functions to PostgreSQL
Discover PostGIS: Add Spatial functions to PostgreSQL
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Backup and-recovery2
Backup and-recovery2Backup and-recovery2
Backup and-recovery2
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Implementing Highly Performant Distributed Aggregates
Implementing Highly Performant Distributed AggregatesImplementing Highly Performant Distributed Aggregates
Implementing Highly Performant Distributed Aggregates
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Manage your ODI Development Cycle – ODTUG Webinar
Manage your ODI Development Cycle – ODTUG WebinarManage your ODI Development Cycle – ODTUG Webinar
Manage your ODI Development Cycle – ODTUG Webinar
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4j
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
 

Similar to IBM_Data_Science_Capstone_Pressenation.pptx

Florence Spacex Final Presentation DS Capstone.pdf
Florence Spacex Final Presentation DS Capstone.pdfFlorence Spacex Final Presentation DS Capstone.pdf
Florence Spacex Final Presentation DS Capstone.pdf
florencewyy
 
Analytical Survey Report
Analytical Survey ReportAnalytical Survey Report
Analytical Survey Report
Satoru Shibata
 
ds--with-ml-capstone-project-template-edx.pptx
ds--with-ml-capstone-project-template-edx.pptxds--with-ml-capstone-project-template-edx.pptx
ds--with-ml-capstone-project-template-edx.pptx
TONY562
 
ds-capstone-template-coursera.pptx
ds-capstone-template-coursera.pptxds-capstone-template-coursera.pptx
ds-capstone-template-coursera.pptx
MustaquimAhmad1
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
ExtremeEarth
 
chris guidice RESUME Ver2
chris guidice RESUME Ver2chris guidice RESUME Ver2
chris guidice RESUME Ver2
Christopher Guidice
 
Porosity Calculation Using Techlog Softwares
Porosity Calculation Using Techlog SoftwaresPorosity Calculation Using Techlog Softwares
Porosity Calculation Using Techlog Softwares
Mohamed Qasim
 
Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...
Johnny Li
 
Area filling algo
Area filling algoArea filling algo
Area filling algo
Prince Soni
 
Business visualisation
Business visualisationBusiness visualisation
Business visualisation
Niharika Varshney
 
ESWC 2009 In-Use Track: SCOVO
ESWC 2009 In-Use Track: SCOVOESWC 2009 In-Use Track: SCOVO
ESWC 2009 In-Use Track: SCOVO
Michael Hausenblas
 
Praveen cv performancetesting
Praveen cv performancetestingPraveen cv performancetesting
Praveen cv performancetesting
praveen manchukonda
 
AE30ProjectFIFIReport
AE30ProjectFIFIReportAE30ProjectFIFIReport
AE30ProjectFIFIReport
Taylor Nguyen
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
Redux vs GraphQL
Redux vs GraphQLRedux vs GraphQL
Redux vs GraphQL
Jordon McKoy
 
NASA_CCO_status-2013 update
NASA_CCO_status-2013 updateNASA_CCO_status-2013 update
NASA_CCO_status-2013 update
Dmitry Tseitlin
 
Clip, Clip, Hooray! Get Just the Data You Need with Clipping
Clip, Clip, Hooray! Get Just the Data You Need with ClippingClip, Clip, Hooray! Get Just the Data You Need with Clipping
Clip, Clip, Hooray! Get Just the Data You Need with Clipping
Safe Software
 
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docxhwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
adampcarr67227
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
xKinAnx
 
FlightDelayAnalysis
FlightDelayAnalysisFlightDelayAnalysis
FlightDelayAnalysis
Sonali Chaudhari
 

Similar to IBM_Data_Science_Capstone_Pressenation.pptx (20)

Florence Spacex Final Presentation DS Capstone.pdf
Florence Spacex Final Presentation DS Capstone.pdfFlorence Spacex Final Presentation DS Capstone.pdf
Florence Spacex Final Presentation DS Capstone.pdf
 
Analytical Survey Report
Analytical Survey ReportAnalytical Survey Report
Analytical Survey Report
 
ds--with-ml-capstone-project-template-edx.pptx
ds--with-ml-capstone-project-template-edx.pptxds--with-ml-capstone-project-template-edx.pptx
ds--with-ml-capstone-project-template-edx.pptx
 
ds-capstone-template-coursera.pptx
ds-capstone-template-coursera.pptxds-capstone-template-coursera.pptx
ds-capstone-template-coursera.pptx
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
chris guidice RESUME Ver2
chris guidice RESUME Ver2chris guidice RESUME Ver2
chris guidice RESUME Ver2
 
Porosity Calculation Using Techlog Softwares
Porosity Calculation Using Techlog SoftwaresPorosity Calculation Using Techlog Softwares
Porosity Calculation Using Techlog Softwares
 
Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...Understanding Open Source Serverless Platforms: Design Considerations and Per...
Understanding Open Source Serverless Platforms: Design Considerations and Per...
 
Area filling algo
Area filling algoArea filling algo
Area filling algo
 
Business visualisation
Business visualisationBusiness visualisation
Business visualisation
 
ESWC 2009 In-Use Track: SCOVO
ESWC 2009 In-Use Track: SCOVOESWC 2009 In-Use Track: SCOVO
ESWC 2009 In-Use Track: SCOVO
 
Praveen cv performancetesting
Praveen cv performancetestingPraveen cv performancetesting
Praveen cv performancetesting
 
AE30ProjectFIFIReport
AE30ProjectFIFIReportAE30ProjectFIFIReport
AE30ProjectFIFIReport
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Redux vs GraphQL
Redux vs GraphQLRedux vs GraphQL
Redux vs GraphQL
 
NASA_CCO_status-2013 update
NASA_CCO_status-2013 updateNASA_CCO_status-2013 update
NASA_CCO_status-2013 update
 
Clip, Clip, Hooray! Get Just the Data You Need with Clipping
Clip, Clip, Hooray! Get Just the Data You Need with ClippingClip, Clip, Hooray! Get Just the Data You Need with Clipping
Clip, Clip, Hooray! Get Just the Data You Need with Clipping
 
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docxhwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
hwinstructions.docxCSCI 6642 –Computer Networks & Data Commun.docx
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
 
FlightDelayAnalysis
FlightDelayAnalysisFlightDelayAnalysis
FlightDelayAnalysis
 

Recently uploaded

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 

Recently uploaded (20)

原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 

IBM_Data_Science_Capstone_Pressenation.pptx

  • 2. 2 • Executive Summary • Introduction • Methodology • Results • Conclusion • Appendix Outline
  • 3. 3 ● Summary of methodologies ○ Data Collection with API ○ Data Collection with Web Scraping ○ Data Wrangling ○ Exploratory Data Analysis with SQL ○ Exploratory Data Analysis with Visualization ○ Interactive map with Folium ○ Interactive Dashboards with Dash ○ Model prediction with Machine Learning ● Summary of all results ○ Exploratory Data Analysis result ○ Interactive Analytics visuals ○ Predictive modeling results Executive Summary
  • 4. 4 Introduction ● Project background and context SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch ● Problems you want to find answers ○ What factors determine if launch was successful? ○ The interaction amongst various features that determine the success rate of a successful landing. ○ What operating conditions needs to be in place to ensure a successful landing program.
  • 6. 6 Executive Summary Data collection methodology: Data was collected using SpaceX API and web scraping from Wikipedia. Perform data wrangling One-hot encoding was applied to categorical features Perform exploratory data analysis (EDA) using visualization and SQL Perform interactive visual analytics using Folium and Plotly Dash Perform predictive analysis using classification models How to build, tune, evaluate classification models Methodology
  • 7. 7 ● The Data was collected by various methods ○ Data Collection by SpaceX API ○ Next, I decoded the response content as a Json using .json() function call and turn it into a pandas dataframe using .json_normalize(). ○ Then I cleaned the data by checking for missing values and fill in missing values where it’s necessary ○ Also, we performed web scraping from Wikipedia for Falcon 9 launch records with Beautifulsoup Data Collection
  • 8. 8 • I used the get request to the SpaceX API to collect and clean the request data • The notebook github link Data Collection – SpaceX API
  • 9. 9 • I used web scraping techniques to obtain Falcon 9 launches data from wikipedia • The notebook github link Data Collection - Scraping
  • 10. 10 • I performed exploratory data analysis and determined the training labels. • I calculated the number of launches at each site, and the number and occurrence of each orbits • I created landing outcome label from outcome column and exported the results to csv • The notebook github link Data Wrangling
  • 11. 11 •We explored the data by visualizing the relationship between flight number and launch Site, payload and launch site, success rate of each orbit type, flight number and orbit type, the launch success yearly trend. EDA with Data Visualization • The notebook github link
  • 12. 12 ● SQL queries performed include: ○ Displaying the names of the unique launch sites in the space mission ○ Displaying 5 records where launch sites begin with the string ‘KSC’ ○ Displaying the total payload mass carried by boosters launched by NASA (CRS) ○ Displaying average payload mass carried by booster version F9 v1.1 ○ Listing the data where the successful landing outcome in drone ship was achieved ○ Listing the data where the successful landing outcomes in drone ship was achieved ○ Listing the names of the boosters which have success in ground pad and have payload mass greater than 4000 but less than 6000 ○ Listing the total number of successful and failure mission outcomes ○ Listing the names of the booster version which have carried maximum payload mass. ○ Listing the records which will display the month names, successful landing. Outcomes in ground pad booster ○ Various launch site for the months in year 2017 ○ Ranking the count of successful landing outcomes between dates 2010 to 2017 ● The SQL Code EDA with SQL
  • 13. 13 • I added markers for the aim of finding an optimal location for building a launch site • The notebook github link Build an Interactive Map with Folium
  • 14. 14 • I built an Interactive dashboard with plotly and dash • I plotted pie charts showing the total launches by certain sites • I plotted scatter plot showing the correlation between outcome and PayloadMass with different booster versions • The code github link Build a Dashboard with Plotly Dash
  • 15. 15 • I loaded the data using numpy and pandas, transformed the data, split our data into training and testing. • I built different machine learning models and tune different hyperparameters using GridSearchCV. • I used accuracy as the metric for our model, improved the model using feature engineering and algorithm tuning. • I found the best performing classification model. •The notebook link Predictive Analysis (Classification)
  • 16. • Exploratory data analysis results • Interactive analytics demo in screenshots • Predictive analysis results 16 Results
  • 18. 18 • From the plot, we found that the larger the flight amount at a launch site, the greater the success rate at a launch site Flight Number vs. Launch Site
  • 19. 19 ● From the plot, we found that the larger the flight amount at a launch site, the greater the success rate at a launch site Payload vs. Launch Site
  • 20. 20 •From the plot, we can see that ES-L1, GEO, HEO, SSO, VLEO had the most success rate. Success Rate vs. Orbit Type
  • 21. 21 • The plot below shows the Flight Number vs. Orbit type. We observe that in the LEO orbit, success is related to the number of flights whereas in the GTO orbit, there is no relationship between flight number and the orbit Flight Number vs. Orbit Type
  • 22. 22 •We can observe that with heavy payloads, the successful landing are more for PO, LEO and ISS orbits. Payload vs. Orbit Type
  • 23. 23 •From the plot, we can observe that success rate since 2013 kept on increasing till 2020. Launch Success Yearly Trend
  • 24. 24 •I used the keyword distinct to show only unique launch sites from the SpaceX data. All Launch Site Names
  • 25. 25 • This query display 5 records where the launch site begin with ‘CCA’ Launch Site Names Begin with 'CCA'
  • 26. 26 •This query calculate the total payload carried by boosters from NASA as 45596 Total Payload Mass
  • 27. 27 • This query calculate the average payload mass carried by booster version F9 v1.1 as 2928.4 Average Payload Mass by F9 v1.1
  • 28. 28 • This query shows that the date of the first successful landing outcome on ground pad was 22nd December 2015 First Successful Ground Landing Date
  • 29. 29 •The WHERE clause to filter for boosters which have successfully landed on drone ship and applied the AND condition to determine successful landing with payload mass greater than 4000 but less than 6000 Successful Drone Ship Landing with Payload between 4000 and 6000
  • 30. 30 • This query counts the the number of mission outcome and group by the mission outcome. This means that there were 100 successful missions and 1 failed mission. Total Number of Successful and Failure Mission Outcomes
  • 31. 31 •We determined the booster that have carried the maximum payload using a subquery in the WHERE clause and the MAX() function. Boosters Carried Maximum Payload
  • 32. 32 • The used combinations of the WHERE clause, LIKE, AND, and BETWEEN conditions to filter for failed landing outcomes in drone ship, their booster versions, and launch site names for year 2015 2015 Launch Records
  • 33. 33 •We selected Landing outcomes and the COUNT of landing outcomes from the data and used the WHERE clause to filter for landing outcomes BETWEEN 2010-06-04 to 2010-03-20 We applied the GROUP BY clause to group the landing outcomes and the ORDER BY clause to order the grouped landing outcome in descending order Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
  • 35. 35 All launch sites global map markers
  • 36. 36 The first 3 locations are launch sites in Florida while the last image is the launch sites in California. Each launch site has a label if the label is green this indicates that the launch was successful. Showing launch sites with colored labels
  • 37. 37 This map shows the distance between the launch site and its closest railway, highway, coast, and city. It helps us understand that the closest feature to this launch site is the highway, followed by the coast and then the railway. The farthest feature is the city. Launch Site Proximity Mapping: Railway, Highway, Coast, City
  • 39. 39 Here we can see that KSC LC-39A has the most successful launches from all launch sites Pie Chart Depicting the Success Rate of Each Launch Site
  • 40. 40 KSC LC-39A has 76.9% success rate while getting only 23.1% failure rate. Pie Chart Showing the Launch site with the highest success ratio
  • 41. 41 We can see that the success of light weighted payload is higher than the success of ht Correlation Between Payload Mass and The success of all
  • 43. 43 For accuracy test, all methods performed similar. We could get more test data to decide between them. But if we really need to choose one right now, we would take the decision tree. Classification Accuracy
  • 44. 44 • The 4 models has the same confusion matrix as they has the same accuracy test percentage the main problem of this models is false positivity Confusion Matrix
  • 45. 45 • The success of a mission can be explained by several factors such as the launch site, the orbit and especially the number of previous launches. Indeed, we can assume that there has been a gain in knowledge between launches that allowed to go from a launch failure to a success. • The orbits with the best success rates are GEO, HEO, SSO, ES-L1. • Depending on the orbits, the payload mass can be a criterion to take into account for the success of a mission. Some orbits require a light or heavy payload mass. But generally low weighted payloads perform better than the heavy weighted payloads. • With the current data, we cannot explain why some launch sites are better than others (KSC LC-39A is the best launch site). To get an answer to this problem, we could obtain atmospheric or other relevant data. • For this dataset, we choose the Decision Tree Algorithm as the best model even if the test accuracy between all the models used is identical. We choose Decision Tree Algorithm because it has a better train accuracy. Conclusions