In this project, I will work in SpaceX company and try to predict the Falcon 9 first stage. It’s important to know if the rockets will land successfully or not because the failure will cost the company many resources.
The commercial space age is here, companies are making space travel affordable for everyone. Different company providing suborbital space flights. The most successful is SpaceX. SpaceX’s accomplishments include: sending spacecraft to the International Space Station. Starlink, a
satellite internet constellation providing satellite Internet access.Sending
manned missions to space. One reason SpaceX can do this is the rocket
launches are relatively inexpensive. SpaceX advertises Falcon 9 rocket
launches on its website with a cost of 62 million dollars; other providers
cost upwards of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. Therefore, if we can determine if the first
stage will land, we can determine the cost of a launch. SpaceX’s Falcon 9
launch like regular rockets.
Ibm data science capstone project-SpaceX launch analysisHang (Henry) YAN
SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. The project task is to predicting if the first stage of the SpaceX Falcon 9 rocket will land successfully. Data science methodologies have been applied to analyse launch data and optimise prediction of launches.
Presentation by Lawrence Williams (SpaceX) at the Von Braun Memorial Symposium in Huntsville, Alabama, 22 October 2008.
<a href="http://astronautical.org/vonbraun/vonbraun-2008/session5">http://astronautical.org/vonbraun/vonbraun-2008/session5</a>
This document summarizes a Coursera capstone project analyzing apartment rental options in Manhattan. The project involved:
1) Defining a problem to find an apartment within budget and walking distance of subway;
2) Gathering data on neighborhoods, venues, subway stations and apartment listings;
3) Using mapping and analysis to identify two candidate apartments meeting criteria;
4) Selecting apartment 1 due to proximity to work and similar neighborhood amenities.
The project demonstrated skills learned in the IBM Data Science certification course and provided a practical example for job applications.
SpaceX was founded in 2002 by Elon Musk to advance space technology and reduce costs. It has achieved several spaceflight milestones as the first private company to launch, orbit and recover a spacecraft. SpaceX aims to further decrease the cost of space exploration and make space tourism more accessible. However, due to US regulations, only US citizens can apply for jobs and internships at SpaceX.
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
Kubernetes is an open-source containerization framework that makes it easy to manage applications in isolated environments at scale. In Apache Spark 2.3, Spark introduced support for native integration with Kubernetes. Palantir has been deeply involved with the development of Spark’s Kubernetes integration from the beginning, and our largest production deployment now runs an average of ~5 million Spark pods per day, as part of tens of thousands of Spark applications.
Over the course of our adventures in migrating deployments from YARN to Kubernetes, we have overcome a number of performance, cost, & reliability hurdles: differences in shuffle performance due to smaller filesystem caches in containers; Kubernetes CPU limits causing inadvertent throttling of containers that run many Java threads; and lack of support for dynamic allocation leading to resource wastage. We intend to briefly describe our story of developing & deploying Spark-on-Kubernetes, as well as lessons learned from deploying containerized Spark applications in production.
We will also describe our recently open-sourced extension (https://github.com/palantir/k8s-spark-scheduler) to the Kubernetes scheduler to better support Spark workloads & facilitate Spark-aware cluster autoscaling; our limited implementation of dynamic allocation on Kubernetes; and ongoing work that is required to support dynamic resource management & stable performance at scale (i.e., our work with the community on a pluggable external shuffle service API). Our hope is that our lessons learned and ongoing work will help other community members who want to use Spark on Kubernetes for their own workloads.
This document provides an overview of a talk on Apache Spark. It introduces the speaker and their background. It acknowledges inspiration from a previous Spark training. It then outlines the structure of the talk, which will include: a brief history of big data; a tour of Spark including its advantages over MapReduce; and explanations of Spark concepts like RDDs, transformations, and actions. The document serves to introduce the topics that will be covered in the talk.
The commercial space age is here, companies are making space travel affordable for everyone. Different company providing suborbital space flights. The most successful is SpaceX. SpaceX’s accomplishments include: sending spacecraft to the International Space Station. Starlink, a
satellite internet constellation providing satellite Internet access.Sending
manned missions to space. One reason SpaceX can do this is the rocket
launches are relatively inexpensive. SpaceX advertises Falcon 9 rocket
launches on its website with a cost of 62 million dollars; other providers
cost upwards of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. Therefore, if we can determine if the first
stage will land, we can determine the cost of a launch. SpaceX’s Falcon 9
launch like regular rockets.
Ibm data science capstone project-SpaceX launch analysisHang (Henry) YAN
SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because
SpaceX can reuse the first stage. The project task is to predicting if the first stage of the SpaceX Falcon 9 rocket will land successfully. Data science methodologies have been applied to analyse launch data and optimise prediction of launches.
Presentation by Lawrence Williams (SpaceX) at the Von Braun Memorial Symposium in Huntsville, Alabama, 22 October 2008.
<a href="http://astronautical.org/vonbraun/vonbraun-2008/session5">http://astronautical.org/vonbraun/vonbraun-2008/session5</a>
This document summarizes a Coursera capstone project analyzing apartment rental options in Manhattan. The project involved:
1) Defining a problem to find an apartment within budget and walking distance of subway;
2) Gathering data on neighborhoods, venues, subway stations and apartment listings;
3) Using mapping and analysis to identify two candidate apartments meeting criteria;
4) Selecting apartment 1 due to proximity to work and similar neighborhood amenities.
The project demonstrated skills learned in the IBM Data Science certification course and provided a practical example for job applications.
SpaceX was founded in 2002 by Elon Musk to advance space technology and reduce costs. It has achieved several spaceflight milestones as the first private company to launch, orbit and recover a spacecraft. SpaceX aims to further decrease the cost of space exploration and make space tourism more accessible. However, due to US regulations, only US citizens can apply for jobs and internships at SpaceX.
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
Kubernetes is an open-source containerization framework that makes it easy to manage applications in isolated environments at scale. In Apache Spark 2.3, Spark introduced support for native integration with Kubernetes. Palantir has been deeply involved with the development of Spark’s Kubernetes integration from the beginning, and our largest production deployment now runs an average of ~5 million Spark pods per day, as part of tens of thousands of Spark applications.
Over the course of our adventures in migrating deployments from YARN to Kubernetes, we have overcome a number of performance, cost, & reliability hurdles: differences in shuffle performance due to smaller filesystem caches in containers; Kubernetes CPU limits causing inadvertent throttling of containers that run many Java threads; and lack of support for dynamic allocation leading to resource wastage. We intend to briefly describe our story of developing & deploying Spark-on-Kubernetes, as well as lessons learned from deploying containerized Spark applications in production.
We will also describe our recently open-sourced extension (https://github.com/palantir/k8s-spark-scheduler) to the Kubernetes scheduler to better support Spark workloads & facilitate Spark-aware cluster autoscaling; our limited implementation of dynamic allocation on Kubernetes; and ongoing work that is required to support dynamic resource management & stable performance at scale (i.e., our work with the community on a pluggable external shuffle service API). Our hope is that our lessons learned and ongoing work will help other community members who want to use Spark on Kubernetes for their own workloads.
This document provides an overview of a talk on Apache Spark. It introduces the speaker and their background. It acknowledges inspiration from a previous Spark training. It then outlines the structure of the talk, which will include: a brief history of big data; a tour of Spark including its advantages over MapReduce; and explanations of Spark concepts like RDDs, transformations, and actions. The document serves to introduce the topics that will be covered in the talk.
Parallelizing with Apache Spark in Unexpected WaysDatabricks
"Out of the box, Spark provides rich and extensive APIs for performing in memory, large-scale computation across data. Once a system has been built and tuned with Spark Datasets/Dataframes/RDDs, have you ever been left wondering if you could push the limits of Spark even further? In this session, we will cover some of the tips learned while building retail-scale systems at Target to maximize the parallelization that you can achieve from Spark in ways that may not be obvious from current documentation. Specifically, we will cover multithreading the Spark driver with Scala Futures to enable parallel job submission. We will talk about developing custom partitioners to leverage the ability to apply operations across understood chunks of data and what tradeoffs that entails. We will also dive into strategies for parallelizing scripts with Spark that might have nothing to with Spark to support environments where peers work in multiple languages or perhaps a different language/library is just the best thing to get the job done. Come learn how to squeeze every last drop out of your Spark job with strategies for parallelization that go off the beaten path.
"
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
The document discusses space junk, which refers to debris left in Earth's orbit from space missions and satellite launches. There are estimated to be over 10 million pieces of space junk currently orbiting Earth, posing risks to active satellites and space stations. Collisions between space junk have increased the amount of debris and make future space missions more dangerous. Researchers are working on methods like laser removal to reduce space junk and mitigate the growing problem.
SpaceX was founded in 2002 by Elon Musk with $100 million in initial funding. By 2006, SpaceX won its first NASA contract and in 2010 was the first company to launch and return a spacecraft to orbit. SpaceX is privately funded, cutting costs through efficiency and reusable rockets, and has deployed satellites at a fraction of typical costs through its competitive approach and leadership of Elon Musk.
Starlink aims to provide global internet access using a satellite constellation to connect areas that currently lack coverage. While nearly half the world is not connected, the main reason is a lack of infrastructure like fiber optic cables or reliable 3G/4G networks. Starlink satellites would be cheaper to launch than traditional communication satellites and provide lower latency internet even to remote regions. Users would access the internet through a transceiver purchased from Starlink and pay a monthly subscription fee. The system has been in development since 2015 and is launching satellites on Falcon 9 rockets with the goal of starting limited commercial service in late 2020 or early 2021.
SpaceX was founded in 2002 by Elon Musk with the goal of reducing space travel costs to enable colonization of Mars. It has developed reusable Falcon launch vehicles and Dragon spacecraft, and plans future vehicles like the Mars Colonial Transporter. Financially, SpaceX has raised over $1 billion from investors and is currently valued at $10 billion. While reusable rockets have shown promise, the first live landing attempt failed. SpaceX aims to provide low-cost internet access worldwide via a satellite constellation. It faces competition from companies like Boeing and Lockheed Martin but has an advantage due to lower costs and faster development. The author predicts SpaceX will help land humans on Mars by 2025-2030.
SpaceX was founded in 2002 by Elon Musk with the goal of reducing space transportation costs to enable colonization of Mars. SpaceX develops launch vehicles like the Falcon 9 and Falcon Heavy rockets, as well as spacecraft like the Dragon cargo capsule. SpaceX manufactures 70% of its Falcon launch vehicles at its Hawthorne, California factory to reduce costs and delays by building major parts like rocket engines, boosters, and spacecraft under one roof.
Dense Retrieval with Apache Solr Neural Search.pdfSease
This document provides an overview of dense retrieval with Apache Solr neural search. It discusses semantic search problems that neural search aims to address through vector-based representations of queries and documents. It then describes Apache Solr's implementation of neural search using dense vector fields and HNSW graphs to perform k-nearest neighbor retrieval. Functions are shown for indexing and searching vector data. The document also discusses using vector queries for filtering, re-ranking, and hybrid searches combining dense and sparse criteria.
This talk is about monitoring with Prometheus. A progression is shown from monitoring concept, to Micrometer, Prometheus and Grafana.
Presented at Alithya by Richard Langlois and Gervais Naoussi, on September 19th, 2018
SpaceX is a private company that designs, manufactures, and launches advanced rockets and spacecraft. It was founded in 2002 by Elon Musk with the goal of reducing space transportation costs to enable the colonization of Mars. SpaceX has achieved several "firsts" such as being the first private company to launch, orbit, and recover a spacecraft. It aims to continue developing fully and rapidly reusable launch vehicles to make spaceflight more accessible and commonplace.
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
This document discusses using ClickHouse to manage log data. It begins with an introduction to ClickHouse and its features. It then covers different ways to model log data in ClickHouse, including storing logs as JSON blobs or converting them to a tabular format. The document demonstrates using materialized views to ingest logs into ClickHouse tables in an efficient manner, extracting values from JSON and converting to columns. It shows how this approach allows flexible querying of log data while scaling to large volumes.
What project management can learn from Falcon Heavy launchKateryna Kuslyva
The work was done by Kateryna Kuslyva, Nguyen Le, Joni Lankila, Drizana Latifi and Kuang Yun. The purpose is to show the power of SpaceX from the project management side.
The document looks much informative... and also here missed two videos (since it is a PDF). First is taken from the SpaceX web page, where the company compares its rockets' costs an payload with the other rockets, and the second video in the category Risk management actually shows all crashes of Falcons and boosters.
Coroutines are Kotlin's solution for concurrency and allow doing something else while waiting for results. They avoid issues with threading like memory usage and bottlenecks. Coroutines use suspending functions to pause execution and switch to other coroutines. They can be cancelled, have timeouts set, and structured hierarchically with jobs. Common coroutine patterns include producers that return values on channels and actors that receive messages on inbound channels. Coroutines are backed by event loops and may require yield() if CPU bound to allow other coroutines to run.
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved.
What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data?
What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output?
When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency?
How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions.
These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."
GraalVM is a high-performance runtime that can accelerate applications written in Java and other JVM languages. It includes a new just-in-time (JIT) compiler called Graal that can compile code ahead-of-time into a standalone native image executable. This ahead-of-time compilation allows applications to start faster and use less memory compared to the traditional HotSpot JVM. GraalVM is best suited for applications that benefit from fast startup times and a small memory footprint like command-line tools, containerized services, and embedded systems.
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
Flink Forward San Francisco 2022.
What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases.
by
Patrick Lucas
Florence Spacex Final Presentation DS Capstone.pdfflorencewyy
Coursera IBM Applied Data Science Capstone SpaceX project final presentation
Github link for reference: https://github.com/florenceyyw/Applied-Data-Science-Capstone
- Data was collected from SpaceX API and web scraping, then analyzed using EDA, visualization, SQL and machine learning.
- EDA identified KSC LC-39A as the best launch site with 76.9% success. Payloads under 6,000kg on FT boosters were most successful.
- A decision tree classifier model predicted landings with over 87% accuracy, showing key factors for success were launch site and payload weight.
Parallelizing with Apache Spark in Unexpected WaysDatabricks
"Out of the box, Spark provides rich and extensive APIs for performing in memory, large-scale computation across data. Once a system has been built and tuned with Spark Datasets/Dataframes/RDDs, have you ever been left wondering if you could push the limits of Spark even further? In this session, we will cover some of the tips learned while building retail-scale systems at Target to maximize the parallelization that you can achieve from Spark in ways that may not be obvious from current documentation. Specifically, we will cover multithreading the Spark driver with Scala Futures to enable parallel job submission. We will talk about developing custom partitioners to leverage the ability to apply operations across understood chunks of data and what tradeoffs that entails. We will also dive into strategies for parallelizing scripts with Spark that might have nothing to with Spark to support environments where peers work in multiple languages or perhaps a different language/library is just the best thing to get the job done. Come learn how to squeeze every last drop out of your Spark job with strategies for parallelization that go off the beaten path.
"
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
RocksDB is the default state store for Kafka Streams. In this talk, we will discuss how to improve single node performance of the state store by tuning RocksDB and how to efficiently identify issues in the setup. We start with a short description of the RocksDB architecture. We discuss how Kafka Streams restores the state stores from Kafka by leveraging RocksDB features for bulk loading of data. We give examples of hand-tuning the RocksDB state stores based on Kafka Streams metrics and RocksDB’s metrics. At the end, we dive into a few RocksDB command line utilities that allow you to debug your setup and dump data from a state store. We illustrate the usage of the utilities with a few real-life use cases. The key takeaway from the session is the ability to understand the internal details of the default state store in Kafka Streams so that engineers can fine-tune their performance for different varieties of workloads and operate the state stores in a more robust manner.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
The document discusses space junk, which refers to debris left in Earth's orbit from space missions and satellite launches. There are estimated to be over 10 million pieces of space junk currently orbiting Earth, posing risks to active satellites and space stations. Collisions between space junk have increased the amount of debris and make future space missions more dangerous. Researchers are working on methods like laser removal to reduce space junk and mitigate the growing problem.
SpaceX was founded in 2002 by Elon Musk with $100 million in initial funding. By 2006, SpaceX won its first NASA contract and in 2010 was the first company to launch and return a spacecraft to orbit. SpaceX is privately funded, cutting costs through efficiency and reusable rockets, and has deployed satellites at a fraction of typical costs through its competitive approach and leadership of Elon Musk.
Starlink aims to provide global internet access using a satellite constellation to connect areas that currently lack coverage. While nearly half the world is not connected, the main reason is a lack of infrastructure like fiber optic cables or reliable 3G/4G networks. Starlink satellites would be cheaper to launch than traditional communication satellites and provide lower latency internet even to remote regions. Users would access the internet through a transceiver purchased from Starlink and pay a monthly subscription fee. The system has been in development since 2015 and is launching satellites on Falcon 9 rockets with the goal of starting limited commercial service in late 2020 or early 2021.
SpaceX was founded in 2002 by Elon Musk with the goal of reducing space travel costs to enable colonization of Mars. It has developed reusable Falcon launch vehicles and Dragon spacecraft, and plans future vehicles like the Mars Colonial Transporter. Financially, SpaceX has raised over $1 billion from investors and is currently valued at $10 billion. While reusable rockets have shown promise, the first live landing attempt failed. SpaceX aims to provide low-cost internet access worldwide via a satellite constellation. It faces competition from companies like Boeing and Lockheed Martin but has an advantage due to lower costs and faster development. The author predicts SpaceX will help land humans on Mars by 2025-2030.
SpaceX was founded in 2002 by Elon Musk with the goal of reducing space transportation costs to enable colonization of Mars. SpaceX develops launch vehicles like the Falcon 9 and Falcon Heavy rockets, as well as spacecraft like the Dragon cargo capsule. SpaceX manufactures 70% of its Falcon launch vehicles at its Hawthorne, California factory to reduce costs and delays by building major parts like rocket engines, boosters, and spacecraft under one roof.
Dense Retrieval with Apache Solr Neural Search.pdfSease
This document provides an overview of dense retrieval with Apache Solr neural search. It discusses semantic search problems that neural search aims to address through vector-based representations of queries and documents. It then describes Apache Solr's implementation of neural search using dense vector fields and HNSW graphs to perform k-nearest neighbor retrieval. Functions are shown for indexing and searching vector data. The document also discusses using vector queries for filtering, re-ranking, and hybrid searches combining dense and sparse criteria.
This talk is about monitoring with Prometheus. A progression is shown from monitoring concept, to Micrometer, Prometheus and Grafana.
Presented at Alithya by Richard Langlois and Gervais Naoussi, on September 19th, 2018
SpaceX is a private company that designs, manufactures, and launches advanced rockets and spacecraft. It was founded in 2002 by Elon Musk with the goal of reducing space transportation costs to enable the colonization of Mars. SpaceX has achieved several "firsts" such as being the first private company to launch, orbit, and recover a spacecraft. It aims to continue developing fully and rapidly reusable launch vehicles to make spaceflight more accessible and commonplace.
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
This document discusses using ClickHouse to manage log data. It begins with an introduction to ClickHouse and its features. It then covers different ways to model log data in ClickHouse, including storing logs as JSON blobs or converting them to a tabular format. The document demonstrates using materialized views to ingest logs into ClickHouse tables in an efficient manner, extracting values from JSON and converting to columns. It shows how this approach allows flexible querying of log data while scaling to large volumes.
What project management can learn from Falcon Heavy launchKateryna Kuslyva
The work was done by Kateryna Kuslyva, Nguyen Le, Joni Lankila, Drizana Latifi and Kuang Yun. The purpose is to show the power of SpaceX from the project management side.
The document looks much informative... and also here missed two videos (since it is a PDF). First is taken from the SpaceX web page, where the company compares its rockets' costs an payload with the other rockets, and the second video in the category Risk management actually shows all crashes of Falcons and boosters.
Coroutines are Kotlin's solution for concurrency and allow doing something else while waiting for results. They avoid issues with threading like memory usage and bottlenecks. Coroutines use suspending functions to pause execution and switch to other coroutines. They can be cancelled, have timeouts set, and structured hierarchically with jobs. Common coroutine patterns include producers that return values on channels and actors that receive messages on inbound channels. Coroutines are backed by event loops and may require yield() if CPU bound to allow other coroutines to run.
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
"Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. However, expressing the business logic is only part of the larger problem of building end-to-end streaming pipelines that interact with a complex ecosystem of storage systems and workloads. It is important for the developer to truly understand the business problem needs to be solved.
What are you trying to consume? Single source? Joining multiple streaming sources? Joining streaming with static data?
What are you trying to produce? What is the final output that the business wants? What type of queries does the business want to run on the final output?
When do you want it? When does the business want to the data? What is the acceptable latency? Do you really want to millisecond-level latency?
How much are you willing to pay for it? This is the ultimate question and the answer significantly determines how feasible is it solve the above questions.
These are the questions that we ask every customer in order to help them design their pipeline. In this talk, I am going to go through the decision tree of designing the right architecture for solving your problem."
GraalVM is a high-performance runtime that can accelerate applications written in Java and other JVM languages. It includes a new just-in-time (JIT) compiler called Graal that can compile code ahead-of-time into a standalone native image executable. This ahead-of-time compilation allows applications to start faster and use less memory compared to the traditional HotSpot JVM. GraalVM is best suited for applications that benefit from fast startup times and a small memory footprint like command-line tools, containerized services, and embedded systems.
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
Flink Forward San Francisco 2022.
What if my data is already in order? Stream Processing has given us an elegant and powerful solution for running analytic queries and logic over high volumes of continuously arriving data. However, in both Apache Flink and Apache Beam, the notion of time-ordering is baked in at a very low level, making it difficult to express computations that are interested in a semantic-, rather than time-ordering of the data. In financial services, what often matters the most about the data moving between systems is not when the data was created, but in what order, to the extent that many institutions engineer a global sequencing over all data entering and produced by their systems to achieve complete determinism. How, then, can financial institutions and others best employ Stream Processing on streams of data that are already ordered? I will cover various techniques that can make this work, as well as seek input from the community on how Flink might be improved to better support these use-cases.
by
Patrick Lucas
Florence Spacex Final Presentation DS Capstone.pdfflorencewyy
Coursera IBM Applied Data Science Capstone SpaceX project final presentation
Github link for reference: https://github.com/florenceyyw/Applied-Data-Science-Capstone
- Data was collected from SpaceX API and web scraping, then analyzed using EDA, visualization, SQL and machine learning.
- EDA identified KSC LC-39A as the best launch site with 76.9% success. Payloads under 6,000kg on FT boosters were most successful.
- A decision tree classifier model predicted landings with over 87% accuracy, showing key factors for success were launch site and payload weight.
The document outlines a methodology for analyzing SpaceX launch data to predict successful first stage landings. Data was collected from SpaceX APIs and web scraping, then wrangled and explored with SQL, visualization, Folium maps, and Dashboards. Classification models were built, tuned, and evaluated to predict landing outcomes based on factors like launch site, orbit type, payload mass, and flight number. The best performing model was a decision tree with similar accuracy to other models but potential for higher precision with more data.
The document outlines a data science project on SpaceX launch data. It describes collecting data through the SpaceX API and web scraping. The methodology section explains data wrangling, exploratory data analysis using visualization and SQL, and building interactive maps and dashboards. Results are presented on launch trends. Predictive modeling is used to classify launch outcomes, with the best model achieving the highest accuracy.
The document outlines a data science project on SpaceX launch data. It describes collecting data through the SpaceX API and web scraping. The methodology section explains data wrangling and exploratory data analysis using visualization, SQL, Folium maps, and Plotly dashboards. Predictive modeling is performed to classify launch outcomes, with the best model evaluated using accuracy and a confusion matrix. Key results are summarized in screenshots and code snippets are provided in an appendix for reference.
This document contains a practical report on C# .NET submitted by Rahul Singh to Abhishek Kr Pathak at Uttaranchal Institute of Management. It includes 30 programs covering concepts like data types, string handling, classes, inheritance, exceptions, threads, delegates and windows forms. The programs demonstrate basic to advanced .NET concepts through examples.
1. The document summarizes steps towards integrating the H2O and Spark frameworks, including allowing data sharing between Spark and H2O.
2. A demonstration is shown of loading airline data from a CSV into a Spark SQL table, querying the table, and transferring the results to an H2O frame to run a GBM algorithm.
3. Next steps discussed include optimizing data transfers between Spark and H2O, developing an H2O backend for MLlib, and addressing open challenges in areas like transferring results and supporting Parquet.
No sq ls-biggest-lie_sql-never-went-away_martin-esmannMartin Esmann
The document discusses NoSQL and Couchbase's N1QL query language. It begins by explaining that while NoSQL databases emerged, SQL never went away and is still useful for querying JSON documents. It then provides an overview of Couchbase Server 4.0's capabilities like multi-dimensional scaling, indexing, joins, and geospatial querying. The document also demonstrates how to write N1QL queries and shows examples using .NET SDK code.
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
In this Free Code Friday webinar, you’ll get an overview of machine learning with Apache Spark’s MLlib, and you’ll also learn how MLlib decision trees can be used to predict flight delays.
This tutorial demonstrates how to create and work with geological databases in Surpac. Key steps covered include:
1. Creating DTMs from strings, spot heights, and a combination of breaklines and spot heights.
2. Setting up a new Surpac database by defining mandatory collar and survey tables, as well as optional tables for assays and geology.
3. Importing data, viewing tables, displaying and manipulating drillholes, creating sections, compositing, extracting data using domains, and displaying histograms.
The tutorial provides a comprehensive introduction to building and utilizing geological databases in Surpac for tasks such as resource estimation and feasibility studies.
The document describes a project to simulate the flight parameters of a rocket using MATLAB and Simulink. Key assumptions made include treating the rocket as a point mass, neglecting lift forces, and assuming ideal atmospheric conditions. Thrust data was obtained from online databases. The simulation predicted altitude, velocity, acceleration, drag, and other parameters. Results showed reasonable trends but discrepancies of up to 40% compared to other simulation software, likely due to assumptions about drag being constant. Overall, the project provided a basic simulation of rocket flight within the constraints of limited time and resources.
This document discusses the data access library Dapper and how it provides a simple yet high-performance way to query and manipulate data in databases. It begins by covering traditional data access methods in .NET and issues with ORMs. It then introduces Dapper as a micro-ORM that maps database rows to objects quickly using dynamic code generation. Key features covered include querying, loading related entities, paging results, and basic CRUD operations. The document encourages further reading on these topics.
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build
and run applications that work with highly connected datasets. The core of Neptune is a purpose-built,
high-performance graph database engine. This engine is optimized for storing billions of relationships
and querying the graph with milliseconds latency. Neptune supports the popular graph query languages
Apache TinkerPop Gremlin, the W3C’s SPARQL, and Neo4j's openCypher, enabling you to build
queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as
recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon
S3, and replication across Availability Zones. Neptune provides data security features, with support
for encryption at rest and in transit. Neptune is fully managed, so you no longer need to worry about
database management tasks like hardware provisioning, software patching, setup, configuration, or
backups
The document discusses Apache Beam, a solution for next generation data processing. It provides a unified programming model for both batch and streaming data processing. Beam allows data pipelines to be written once and run on multiple execution engines. The presentation covers common challenges with historical data processing approaches, how Beam addresses these issues, a demo of running a Beam pipeline on different engines, and how to get involved with the Apache Beam community.
1) The document describes a presentation on test-driven development using the Cairngorm framework in Flex applications.
2) It discusses the architecture and guidelines of Cairngorm, including layers for presentation, application, domain, and infrastructure.
3) The presentation walks through an example of building an application to select NFL stadiums using tests, commands, dependency injection, and other Cairngorm features.
Bring the light in your Always FREE Oracle CloudDimitri Gielis
Presentation done at APEX@Home and ODTUG KScope 2020 online.
It shows extra information on using the Always FREE Oracle Cloud:
- Moving your data and APEX app to the Always Free Autonomous Cloud
- Getting more storage by using Advanced Compression
- Performance & Uptime Monitoring
- Setup Production from Always Free Autonomous Cloud
Cloud Native Spring - The role of Spring Cloud after Kubernetes became a main...Orkhan Gasimov
Presentation of my talk about Spring Cloud features that can integrate with AWS, GCP and Azure turning Spring Cloud into a distributed platform that is capable to work with different environments like Kubernetes, Cloud or Local with adoption of Spring abstractions.
Sample Questions The following sample questions are not in.docxtodd331
This document provides sample questions and content guides for the SAS 9.4 Base Programming certification exam. The exam will test skills in accessing and managing SAS data, conditional processing, using SAS functions, and generating reports. It will include performance-based practical programming questions involving reading, sorting, and categorizing SAS data, as well as standard multiple choice questions covering topics like using SAS procedures, formats, and loops.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
3. 3
• Data were collected with several ways
• Machine learning models were built
• Data visualizations were created
Executive Summary
• Summary of methodologies
• Summary of all results
• The optimal model was acquired
• Visualizations were great for decision making
4. 4
Introduction
• Project background and
context
In this project I will work in SpaceX company
and try to predict the Falcon 9 first stage. It’s
important to know if the rockets will land
successfully or not because the failure will cost
the company much resources.
• Problems that need answers
1. Which factors are behind the failure of landing?
2. Will the rockets land successfully?
3. What the accuracy of a successful landing?
6. 6
Executive Summary
• Data collection methodology:
• With Rest API and Web Scrapping
• Perform data wrangling
• Data were transformed and one hot encoded to be apply later on the Machine Learning models
• Perform exploratory data analysis (EDA) using visualization and SQL
• Discovering new patterns in the data with visualization techniques such as scatter plots
• Perform interactive visual analytics using Folium and Plotly Dash
• Dash and Folium were used to achieve this goal
• Perform predictive analysis using classification models
• Classification machine learning models were built to achieve this goal
Methodology
7. 7
Data sets were collected using the API call from several websites, I collected
rocket, launchpad, payloads, and cores data from https://api.spacexdata.com/v4
website.
Data Collection
1. Collecting the data with API call
2. Converting to data frame with help of
JSON
3. Updating columns and rows (pre-processing)
4. Filtering the data to keep only Falcon 9
launches
5. Convert the data to csv file with name
‘dataset_part_1.csv’
8. Data Collection – SpaceX API
8
1. Collecting the data with API call
2. Converting to data frame with help of
JSON
3. Updating columns and rows (pre-processing)
4. Filtering the data to keep only Falcon 9
launches
5. Convert the data to csv file with name
‘dataset_part_1.csv’ GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/1.%20spacex-data-
collection-api
9. 9
Data Collection - Scraping
1. Creating the BeautifulSoup
object
2. Getting column names
3. Creating the launch_dict
4. Converting to final data frame
5. Convert the data to csv file with name
‘spacex_web_scraped.csv’
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/2.%20webscraping
10. 10
Data Wrangling
1. Loading the data set
2. Creating landing outcomes
3. Finding the bad outcomes
4. Presenting outcomes as 0 and 1
5. Determining the success outcome
6. Convert the data to csv file with name
‘dataset_part_2.csv’
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/3.%20spacex-data-
wrangling
11. 11
EDA with Data Visualization
Categorial plot between Flight number and Pay load mass (kg)
Bar chart between Orbit
and Success rate of each
orbit
Scatter plot between Orbit and Flight number
Line chart
between Year
and Success
rate
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/5.%20eda-dataviz
12. 12
I used SQL queries to answer the following
questions:
• Display the names of the unique launch sites in the space mission
• Display 5 records where launch sites begin with the string 'CCA'
• Display the total payload mass carried by boosters launched by NASA (CRS)
• Display average payload mass carried by booster version F9 v1.1
• List the date when the first successful landing outcome in-ground pad was achieved
• List the names of the boosters which have success in drone ship and have payload mass greater
than 4000 but less than 6000
• List the total number of successful and failure mission outcomes
• List the names of the booster_versions which have carried the maximum payload mass. Use a
subquery
• List the failed landing_outcomes in drone ship, their booster versions, and launch site names for
the in year 2015
• Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad))
between the date 2010-06-04 and 2017-03-20, in descending order
EDA with SQL
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/4.%20eda-sql
13. 13
• folium.Marker() was used to create marks on the maps.
• folium.Circle() was used to create a circles above markers on the map.
• folium.Icon() was used to create an icon on the map.
• folium.PolyLine() was used to create polynomial line between the points.
• folium.plugins.AntPath() was used to create animated line between the points.
• markerCluster() was used to simplify the maps which contain several markers
with identical coordination.
Build an Interactive Map with Folium
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/6.%20launch-site-
location
14. 14
Build a Dashboard with Plotly Dash
• Dash and html components were used as they are the most important thing and
almost everything depends on them, such as graphs, tables, dropdowns, etc.
• Pandas was used to simplifying the work by creating dataframe.
• Plotly was used to plot the graphs.
• Pie chart and scatter chart were used to for plotting purposes.
• Rangeslider was used for payload mass range selection.
• Dropdown was used for launch sites.
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/7.%20dashboard
15. Predictive Analysis (Classification)
1. Building the model
2. Evaluating the model
3. Finding the optimal model
15
Create column for the class
Standardize the data
Split the data info train and test sets
Build GridSearchCV model and fit the data
Find the best hyperparameters for the models
Find the best model with highest accuracy
Confirm the optimal model
Calculating the accuracies
Calculating the confusion matrixes
Plot the results
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/8.%20machine-
learning-prediction
16. • Exploratory data analysis results
• Interactive analytics demo in screenshots
• Predictive analysis results
16
Results
17.
18. 18
With the increase of flight
number, the success rate is
increasing as well in the
launch sites
Flight Number vs. Launch Site
19. 19
Payload vs. Launch Site
With the increase of Pay load
Mass, the success rate is
increasing as well in the launch
sites
20. 20
Success Rate vs. Orbit Type
ES-L1, GEO, HEO, and SSO have
a success rate of 100%
SO has a success rate of 0%
21. 21
Flight Number vs. Orbit Type
It’s hard to tell anything here,
but we can say there is no
actual relationship between
flight number and GTO.
22. 22
Payload vs. Orbit Type
First thing to see is how the Pay
load Mass between 2000 and 3000
is affecting ISS.
Similarly, Pay load Mass between
3000 and 7000 is affecting GTO.
23. 23
Launch Success Yearly Trend
Since the year 2013, there was
a massive increase in success
rate. However, it dropped little
in 2018 but later it got stronger
than before.
24. 24
All Launch Site Names
We can get the unique
values by using “DISTINCT”
28. 28
First Successful Ground Landing Date
We can get the first
successful data by using
“MIN”, because first date is
same with the minimum date
29. 29
Successful Drone Ship Landing with Payload between 4000 and 6000
The payload mass data was
taken between 4000 and
6000 only, and the landing
outcome was determined to
be “success drone ship”
30. 30
Total Number of Successful and Failure Mission Outcomes
We can get the number of all
the successful mission by
using “COUNT” and LIKE
“Success%”
We can get the number of all
the failure mission by using
“COUNT” and LIKE
“Failure%”
32. 32
2015 Launch Records
We can get the months by
using month(DATE) and in
the WHERE function we
assigned the year value to
“2015”
33. 33
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
By using “ORDER” we can order
the values in descending order,
and with “COUNT” we can count
all numbers as we did previously
34.
35. 35
All Launch Sites’ Location Markers
All the launches are near
USA, Florida, and California
37. 37
Launch Sites to its Proximities
All distances from launch
sites to its proximities, they
weren’t far from railway
tracks.
38.
39. 39
Launch Success Count
KSC LC-39A has the highest success score with 41.7%
CCAFS LC-40 comes next with 29.2%
Finally, VAFB SLC-4E and CCAFS SLC-40 with 16.7% and 12.5% respectively
40. 40
Launch Site with Highest Score
KSC LC-39A has the highest score with 76.9% with payload range of
2000 kg – 10000 kg, and FT booster version has the highest score
41. 41
Payload vs. Launch Outcome
Payload 0 kg – 5000 kg
(first half)
Payload 6000 kg – 10000 kg
(second half)
45. 45
• We found the site with highest score which was KSC LC-39A
• The payload of 0 kg to 5000 kg was more diverse than 6000 kg to 10000 kg
• Decision Tree was the optimal model with accuracy of almost 0.89
• We calculated the launch sites distance to its proximities
Conclusions
46. 46
All codes can be found on my GitHub
Appendix
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project