The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for
exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which
not only require a considerable investment of manpower, time, and money, but also are limited by the
performance of astronomical telescopes. In this study, we proposed the idea of using machine learning
methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space
Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a
three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network;
we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised
learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a
result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the
supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIJCI JOURNAL
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsIRJET Journal
This document discusses using generative adversarial networks (GANs) to detect exoplanets from light curve data. Specifically, it proposes two models: ExoSGAN which uses a semi-supervised GAN (SGAN) and ExoACGAN which uses an auxiliary classifier GAN (ACGAN). SGAN and ACGAN are designed to handle issues with imbalanced and unlabeled data. The models are trained on K2 light curve data and achieve 100% recall and precision on the test set, outperforming previous methods. The document provides background on exoplanet detection techniques and reviews related work applying machine learning, including CNNs, to find planets from Kepler and K2 mission data.
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesSérgio Sacani
Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact
scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development
of novel machine-learning-based anomaly detection approaches, such as astronomaly. For the first time, we test the scalability
of astronomaly by applying it to almost 4 million images of galaxies from the Dark Energy Camera Legacy Survey. We use a
trained deep learning algorithm to learn useful representations of the images and pass these to the anomaly detection algorithm
isolation forest, coupled with astronomaly’s active learning method, to discover interesting sources.We find that data selection
criteria have a significant impact on the trade-off between finding rare sources such as strong lenses and introducing artefacts into
the dataset. We demonstrate that active learning is required to identify the most interesting sources and reduce artefacts, while
anomaly detection methods alone are insufficient. Using astronomaly, we find 1635 anomalies among the top 2000 sources in
the dataset after applying active learning, including 8 strong gravitational lens candidates, 1609 galaxy merger candidates, and 18
previously unidentified sources exhibiting highly unusual morphology. Our results show that by leveraging the human-machine
interface, astronomaly is able to rapidly identify sources of scientific interest even in large datasets.
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...csandit
Computer vision, astronomy, and astrophysics function quite productively together to the point where they are completely logical for each other. Out of computer vision algorithms the
progress of astronomy and astrophysics would have slowed down to reasonably a deadlock. The new researches and calculations can lead to more information as well as higher quality of data. Consequently, an organized view on planetary surfaces can change all in the long run. A new
discovery would be a puzzling complexity or a possible branching of paths, yet the quest to know more about the celestial bodies by dint of computer vision algorithms will continue. The detection of astronomical objects in celestial bodies is a challenging task. This paper presents
an implementation of how to detect astronomical objects in celestial bodies using computer vision algorithm with satisfactory performance. It also puts forward some observations linked
among computer vision, astronomy, and astrophysics.
The document discusses detailed numerical modeling of multi-planet systems discovered by the Kepler spacecraft. The author aims to model the effect of tides on the formation and evolution of Kepler multi-planet systems with two close planets in tight orbits. They will extract data from NASA's exoplanet archive to build numerical models incorporating the latest understanding of tidal dissipation. Comparing models to observations may help explain trends in planetary properties and the long-term stability of these systems.
The document describes the SuperWASP project, which uses two robotic telescopes (SuperWASP-N and SuperWASP-S) to search for transiting exoplanets. As of February 2015, SuperWASP had discovered 98 exoplanets, making it the most successful ground-based exoplanet survey. It discusses the history of exoplanet detection, details the SuperWASP instrumentation and detection method, and reviews its discoveries and role alongside space-based missions like Kepler.
Amelioration of the Lightkurve Package: Advancing Exoplanet Detection through...IRJET Journal
This document discusses using the Lightkurve Python package to detect exoplanets through the transit method by analyzing time-series photometric data from Kepler and TESS. It evaluates Lightkurve's ability to precisely identify known exoplanets and potentially find new candidates, especially small Earth-sized planets. The transit method monitors the dimming of a star when a planet passes in front, providing information on exoplanet properties. While there are limitations, Lightkurve has potential to advance exoplanet research through more efficient processing and analysis of large datasets from space telescopes.
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIJCI JOURNAL
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsIRJET Journal
This document discusses using generative adversarial networks (GANs) to detect exoplanets from light curve data. Specifically, it proposes two models: ExoSGAN which uses a semi-supervised GAN (SGAN) and ExoACGAN which uses an auxiliary classifier GAN (ACGAN). SGAN and ACGAN are designed to handle issues with imbalanced and unlabeled data. The models are trained on K2 light curve data and achieve 100% recall and precision on the test set, outperforming previous methods. The document provides background on exoplanet detection techniques and reviews related work applying machine learning, including CNNs, to find planets from Kepler and K2 mission data.
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesSérgio Sacani
Modern astronomical surveys are producing datasets of unprecedented size and richness, increasing the potential for high-impact
scientific discovery. This possibility, coupled with the challenge of exploring a large number of sources, has led to the development
of novel machine-learning-based anomaly detection approaches, such as astronomaly. For the first time, we test the scalability
of astronomaly by applying it to almost 4 million images of galaxies from the Dark Energy Camera Legacy Survey. We use a
trained deep learning algorithm to learn useful representations of the images and pass these to the anomaly detection algorithm
isolation forest, coupled with astronomaly’s active learning method, to discover interesting sources.We find that data selection
criteria have a significant impact on the trade-off between finding rare sources such as strong lenses and introducing artefacts into
the dataset. We demonstrate that active learning is required to identify the most interesting sources and reduce artefacts, while
anomaly detection methods alone are insufficient. Using astronomaly, we find 1635 anomalies among the top 2000 sources in
the dataset after applying active learning, including 8 strong gravitational lens candidates, 1609 galaxy merger candidates, and 18
previously unidentified sources exhibiting highly unusual morphology. Our results show that by leveraging the human-machine
interface, astronomaly is able to rapidly identify sources of scientific interest even in large datasets.
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...csandit
Computer vision, astronomy, and astrophysics function quite productively together to the point where they are completely logical for each other. Out of computer vision algorithms the
progress of astronomy and astrophysics would have slowed down to reasonably a deadlock. The new researches and calculations can lead to more information as well as higher quality of data. Consequently, an organized view on planetary surfaces can change all in the long run. A new
discovery would be a puzzling complexity or a possible branching of paths, yet the quest to know more about the celestial bodies by dint of computer vision algorithms will continue. The detection of astronomical objects in celestial bodies is a challenging task. This paper presents
an implementation of how to detect astronomical objects in celestial bodies using computer vision algorithm with satisfactory performance. It also puts forward some observations linked
among computer vision, astronomy, and astrophysics.
The document discusses detailed numerical modeling of multi-planet systems discovered by the Kepler spacecraft. The author aims to model the effect of tides on the formation and evolution of Kepler multi-planet systems with two close planets in tight orbits. They will extract data from NASA's exoplanet archive to build numerical models incorporating the latest understanding of tidal dissipation. Comparing models to observations may help explain trends in planetary properties and the long-term stability of these systems.
The document describes the SuperWASP project, which uses two robotic telescopes (SuperWASP-N and SuperWASP-S) to search for transiting exoplanets. As of February 2015, SuperWASP had discovered 98 exoplanets, making it the most successful ground-based exoplanet survey. It discusses the history of exoplanet detection, details the SuperWASP instrumentation and detection method, and reviews its discoveries and role alongside space-based missions like Kepler.
Amelioration of the Lightkurve Package: Advancing Exoplanet Detection through...IRJET Journal
This document discusses using the Lightkurve Python package to detect exoplanets through the transit method by analyzing time-series photometric data from Kepler and TESS. It evaluates Lightkurve's ability to precisely identify known exoplanets and potentially find new candidates, especially small Earth-sized planets. The transit method monitors the dimming of a star when a planet passes in front, providing information on exoplanet properties. While there are limitations, Lightkurve has potential to advance exoplanet research through more efficient processing and analysis of large datasets from space telescopes.
Computational Training for Domain Scientists & Data LiteracyJoshua Bloom
Data literacy connects the knowledge of deep concepts of statistics, computer science, visualization, and domain science with the practical understanding of the data---and the stories it tells---that we encounter in our lives. As data becomes more pervasive, we see teaching data literacy to students as part of a broad education as an imperative for 21st century. Learning how to arm the next generation with the tools to make the most of data, and avoid the common pitfalls of data, will be a major thrust of our efforts with the Moore/Sloan initiative at Berkeley, through the Berkeley Institute for Data Science.
Kepler’s last planet discoveries: two new planets and one single-transit cand...Sérgio Sacani
The Kepler space telescope was responsible for the discovery of over 2700 confirmed exoplanets, more than half of the total
number of exoplanets known today. These discoveries took place during both Kepler’s primary mission, when it spent 4 yr
staring at the same part of the sky, and its extended K2 mission, when a mechanical failure forced it to observe different parts of
the sky along the ecliptic. At the very end of the mission, when Kepler was exhausting the last of its fuel reserves, it collected
a short set of observations known as K2 Campaign 19. So far, no planets have been discovered in this data set because it only
yielded about a week of high-quality data. Here, we report some of the last planet discoveries made by Kepler in the Campaign
19 dataset. We conducted a visual search of the week of high-quality Campaign 19 data and identified three possible planet
transits. Each planet candidate was originally identified with only one recorded transit, from which we were able to estimate
the planets’ radii and estimate the semimajor axes and orbital periods. Analysis of lower-quality data collected after low fuel
pressure caused the telescope’s pointing precision to suffer revealed additional transits for two of these candidates, allowing
us to statistically validate them as genuine exoplanets. We also tentatively confirm the transits of one planet with TESS. These
discoveries demonstrate Kepler’s exoplanet detection power, even when it was literally running on fumes.
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
This document discusses training domain scientists in computational and data skills. It notes the increasing amount of data in fields like astronomy and challenges of traditional approaches. It advocates teaching skills like statistics, machine learning, and programming. Examples are given of bootcamps, seminars and degree programs in these areas at UC Berkeley taught by CS and statistics faculty. Challenges discussed include fitting such training into formal curricula and ensuring participation from underrepresented groups. The creation of collaborative spaces is proposed to better connect domain scientists with methodological experts to help scientists address the growing role of data in their fields.
This document provides an overview of data science education at UC Berkeley presented by Joshua Bloom. It discusses the core principles of data science as a discipline and debates whether it should be taught as an academic subject or skillset. It also describes Berkeley's efforts to teach data science through bootcamps, seminars, and degree programs. Examples are given of students' projects applying machine learning and Bayesian techniques to astronomy and other domains. The document advocates teaching data literacy before big data proficiency and stresses the importance of a well-rounded toolbox and hands-on training for modern data-driven scientists.
Describes data science efforts at Berkeley, with a particular focus on teaching and the new Berkeley Institute for Data Science (BIDS), funded by the Moore and Sloan Foundations. BIDS will be a space for the open and interdisciplinary work that is typical of the SciPy community. In the creation of BIDS, open source scientific tools for data science, and specifically the SciPy ecosystem, played an important role.
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
Context. Determining the size distribution of asteroids is key to understanding the collisional history and evolution of the inner Solar System. Aims. We aim to improve our knowledge of the size distribution of small asteroids in the main belt by determining the parallaxes of newly detected asteroids in the Hubble Space Telescope (HST) archive and subsequently their absolute magnitudes and sizes. Methods. Asteroids appear as curved trails in HST images because of the parallax induced by the fast orbital motion of the spacecraft. Taking into account the trajectory of this latter, the parallax effect can be computed to obtain the distance to the asteroids by fitting simulated trajectories to the observed trails. Using distance, we can obtain the absolute magnitude of an object and an estimation of its size assuming an albedo value, along with some boundaries for its orbital parameters. Results. In this work, we analyse a set of 632 serendipitously imaged asteroids found in the ESA HST archive. Images were captured with the ACS/WFC and WFC3/UVIS instruments. A machine learning algorithm (trained with the results of a citizen science project) was used to detect objects in these images as part of a previous study. Our raw data consist of 1031 asteroid trails from unknown objects, not matching any entries in the Minor Planet Center (MPC) database using their coordinates and imaging time. We also found 670 trails from known objects (objects featuring matching entries in the MPC). After an accuracy assessment and filtering process, our analysed HST asteroid set consists of 454 unknown objects and 178 known objects. We obtain a sample dominated by potential main belt objects featuring absolute magnitudes (H) mostly between 15 and 22 mag. The absolute magnitude cumulative distribution logN(H > H0) ∝ αlog(H0) confirms the previously reported slope change for 15 < H < 18, from α ≈ 0.56 to α ≈ 0.26, maintained in our case down to absolute magnitudes of around H ≈ 20, and therefore expanding the previous result by approximately two magnitudes. Conclusions. HST archival observations can be used as an asteroid survey because the telescope pointings are statistically randomly oriented in the sky and cover long periods of time. They allow us to expand the current best samples of astronomical objects at no extra cost in regard to telescope time.
Artigo que descreve a descoberta do exoplaneta Kepler-432b, um exoplaneta mais massivo que Júpiter que orbita uma estrela gigante vermelha bem próximo e numa órbita extremamente alongada.
A nearby m_star_with_three_transiting_super-earths_discovered_by_k2Sérgio Sacani
This document reports the discovery of three transiting super-Earth planets orbiting a nearby bright M0 dwarf star, EPIC 201367065, using data from the K2 mission. Photometry from K2 reveals three planetary candidates with orbital periods of 10.056, 24.641, and ~45 days and radii between 1.5-2.1 Earth radii. Spectroscopy of the host star determines it has a mass of 0.601 solar masses, radius of 0.561 solar radii, and is located about 45 parsecs away, making the planets some of the coolest small planets known around a nearby star. The system presents an opportunity to measure the planet masses and constrain atmospheric compositions via future observations
This document summarizes observations of the star KIC 8462852 from the Kepler space telescope. Key points:
- Kepler observed irregular dips in the star's brightness up to 22% over its 4-year mission. These dips lasted from 5-80 days and occurred at irregular intervals.
- Follow-up observations characterized KIC 8462852 as a main sequence F3V/IV star with a rotation period of 0.88 days and no significant infrared excess.
- Various scenarios are considered to explain the dips, including dust clumps or exocomet fragments passing in front of the star. The scenario best fitting current data involves a family of exocomet fragments
The search for_extraterrestrial_civilizations_with_large_energy_suppliesSérgio Sacani
This document discusses potential signatures that could distinguish transiting megastructures from exoplanets. It identifies nine potential anomalous signatures:
1) Non-disk shapes could produce anomalous transit light curves.
2) Non-gravitational forces could cause orbits inconsistent with the star's density.
3) "Swarms" of many structures could cause irregular or aperiodic transit signals.
4) Structures large enough to completely occult the star could be detected.
5) Very low inferred masses would be anomalous for objects blocking significant starlight.
6) Nearly achromatic eclipses would differ from exoplanets' atmosphere-influenced transits.
This document summarizes the discovery of HIP 116454b, the first planet discovered by the Kepler K2 mission. The host star HIP 116454 is a bright, nearby K-dwarf observed during the K2 engineering test in February 2014. Analysis of the K2 photometry using a new technique revealed a single transit, indicating a super-Earth sized planet with a radius of 2.53 R⊕. Follow-up radial velocity observations and photometry from other telescopes confirmed the planet and refined its properties to a mass of 11.82 M⊕ and orbital period of 9.1 days. This makes HIP 116454b one of the few super-Earth exoplanets around bright stars amenable to detailed
This document describes observations of the pre-cataclysmic variable binary star system NN Serpentis using a CCD sensor. Light curve data of the system's apparent magnitude over time was collected, extracted from files, and calibrated. The light curve showed periodic fluctuations due to the orbit as well as a deep eclipse when the red dwarf passed in front of the white dwarf. By fitting a sine wave to the light curve, the orbital period was calculated to be 3.206 ± 0.023 hours. Additional parameters of the binary system were also determined from the observations and prior literature.
This document summarizes the discovery and characterization of Kepler-432 b, a massive exoplanet in a highly eccentric orbit around a red giant star. Key findings include:
1) Radial velocity measurements from the CAFE spectrograph revealed Kepler-432 b has a mass of 4.87 ± 0.48 MJup and moves in a highly eccentric orbit (e = 0.535 ± 0.030) around its host star.
2) Simultaneous modeling of Kepler photometry and radial velocity data determined Kepler-432 b has a radius of 1.120 ± 0.036 RJup and orbits every 52.5 days.
3) Analysis of high-resolution images found a nearby star
This document summarizes the discovery of a transiting circumbinary planet (PH1) in a quadruple star system by volunteers with the Planet Hunters citizen science project. PH1 orbits outside a 20-day eclipsing binary consisting of an F dwarf and an M dwarf star. It has a radius of 6.18 ± 0.17 R⊕ and an upper mass limit of 169 M⊕. Beyond PH1's orbit is a distant visual binary bound to the system, making this the first known case of a transiting planet in a quadruple star configuration. Planet Hunters volunteers identified transit features in the Kepler light curve of KIC 4862625 through visual inspection and discussion, leading to the confirmation and characterization
1) NASA's Kepler mission discovered two distinct transit signals around a bright G-type star, Kepler-10.
2) Statistical tests established the planetary nature of the shorter period signal, now called Kepler-10b.
3) Forty precision Doppler measurements confirmed Kepler-10b, finding it has a radius of 1.416+0.033 R⊕ and is the smallest transiting exoplanet discovered.
Artificial intelligence is helping astronomers make new discoveries and analyze vast amounts of data more efficiently. A new technique using AI was able to automatically identify and count thousands of previously undetected craters on the moon. AI has also discovered new exoplanets, studied stellar sound waves to reveal properties of stars, found fast radio bursts that had evaded detection, and recognized patterns in galaxy images. Challenges remain, but AI is proving to be a valuable tool for astronomy by identifying patterns humans might miss and handling large datasets faster than human analysis alone.
A super earth_sized_planet_orbiting_in_or_near_the_habitable_zone_around_sun_...Sérgio Sacani
This document summarizes the discovery of a super-Earth-sized planet orbiting in or near the habitable zone of a Sun-like star called Kepler-69. Using data from the Kepler spacecraft, astronomers detected two planets transiting Kepler-69, called Kepler-69b and Kepler-69c. Kepler-69c is estimated to have a radius of 1.7 Earth radii and orbit every 242.5 days, placing it close to the star's habitable zone where liquid water could exist on the surface. At 1.7 Earth radii, Kepler-69c represents the smallest planet found by Kepler to be potentially habitable, and is an important discovery on the path to finding the first true Earth analog.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
More Related Content
Similar to EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS
Computational Training for Domain Scientists & Data LiteracyJoshua Bloom
Data literacy connects the knowledge of deep concepts of statistics, computer science, visualization, and domain science with the practical understanding of the data---and the stories it tells---that we encounter in our lives. As data becomes more pervasive, we see teaching data literacy to students as part of a broad education as an imperative for 21st century. Learning how to arm the next generation with the tools to make the most of data, and avoid the common pitfalls of data, will be a major thrust of our efforts with the Moore/Sloan initiative at Berkeley, through the Berkeley Institute for Data Science.
Kepler’s last planet discoveries: two new planets and one single-transit cand...Sérgio Sacani
The Kepler space telescope was responsible for the discovery of over 2700 confirmed exoplanets, more than half of the total
number of exoplanets known today. These discoveries took place during both Kepler’s primary mission, when it spent 4 yr
staring at the same part of the sky, and its extended K2 mission, when a mechanical failure forced it to observe different parts of
the sky along the ecliptic. At the very end of the mission, when Kepler was exhausting the last of its fuel reserves, it collected
a short set of observations known as K2 Campaign 19. So far, no planets have been discovered in this data set because it only
yielded about a week of high-quality data. Here, we report some of the last planet discoveries made by Kepler in the Campaign
19 dataset. We conducted a visual search of the week of high-quality Campaign 19 data and identified three possible planet
transits. Each planet candidate was originally identified with only one recorded transit, from which we were able to estimate
the planets’ radii and estimate the semimajor axes and orbital periods. Analysis of lower-quality data collected after low fuel
pressure caused the telescope’s pointing precision to suffer revealed additional transits for two of these candidates, allowing
us to statistically validate them as genuine exoplanets. We also tentatively confirm the transits of one planet with TESS. These
discoveries demonstrate Kepler’s exoplanet detection power, even when it was literally running on fumes.
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
This document discusses training domain scientists in computational and data skills. It notes the increasing amount of data in fields like astronomy and challenges of traditional approaches. It advocates teaching skills like statistics, machine learning, and programming. Examples are given of bootcamps, seminars and degree programs in these areas at UC Berkeley taught by CS and statistics faculty. Challenges discussed include fitting such training into formal curricula and ensuring participation from underrepresented groups. The creation of collaborative spaces is proposed to better connect domain scientists with methodological experts to help scientists address the growing role of data in their fields.
This document provides an overview of data science education at UC Berkeley presented by Joshua Bloom. It discusses the core principles of data science as a discipline and debates whether it should be taught as an academic subject or skillset. It also describes Berkeley's efforts to teach data science through bootcamps, seminars, and degree programs. Examples are given of students' projects applying machine learning and Bayesian techniques to astronomy and other domains. The document advocates teaching data literacy before big data proficiency and stresses the importance of a well-rounded toolbox and hands-on training for modern data-driven scientists.
Describes data science efforts at Berkeley, with a particular focus on teaching and the new Berkeley Institute for Data Science (BIDS), funded by the Moore and Sloan Foundations. BIDS will be a space for the open and interdisciplinary work that is typical of the SciPy community. In the creation of BIDS, open source scientific tools for data science, and specifically the SciPy ecosystem, played an important role.
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
Context. Determining the size distribution of asteroids is key to understanding the collisional history and evolution of the inner Solar System. Aims. We aim to improve our knowledge of the size distribution of small asteroids in the main belt by determining the parallaxes of newly detected asteroids in the Hubble Space Telescope (HST) archive and subsequently their absolute magnitudes and sizes. Methods. Asteroids appear as curved trails in HST images because of the parallax induced by the fast orbital motion of the spacecraft. Taking into account the trajectory of this latter, the parallax effect can be computed to obtain the distance to the asteroids by fitting simulated trajectories to the observed trails. Using distance, we can obtain the absolute magnitude of an object and an estimation of its size assuming an albedo value, along with some boundaries for its orbital parameters. Results. In this work, we analyse a set of 632 serendipitously imaged asteroids found in the ESA HST archive. Images were captured with the ACS/WFC and WFC3/UVIS instruments. A machine learning algorithm (trained with the results of a citizen science project) was used to detect objects in these images as part of a previous study. Our raw data consist of 1031 asteroid trails from unknown objects, not matching any entries in the Minor Planet Center (MPC) database using their coordinates and imaging time. We also found 670 trails from known objects (objects featuring matching entries in the MPC). After an accuracy assessment and filtering process, our analysed HST asteroid set consists of 454 unknown objects and 178 known objects. We obtain a sample dominated by potential main belt objects featuring absolute magnitudes (H) mostly between 15 and 22 mag. The absolute magnitude cumulative distribution logN(H > H0) ∝ αlog(H0) confirms the previously reported slope change for 15 < H < 18, from α ≈ 0.56 to α ≈ 0.26, maintained in our case down to absolute magnitudes of around H ≈ 20, and therefore expanding the previous result by approximately two magnitudes. Conclusions. HST archival observations can be used as an asteroid survey because the telescope pointings are statistically randomly oriented in the sky and cover long periods of time. They allow us to expand the current best samples of astronomical objects at no extra cost in regard to telescope time.
Artigo que descreve a descoberta do exoplaneta Kepler-432b, um exoplaneta mais massivo que Júpiter que orbita uma estrela gigante vermelha bem próximo e numa órbita extremamente alongada.
A nearby m_star_with_three_transiting_super-earths_discovered_by_k2Sérgio Sacani
This document reports the discovery of three transiting super-Earth planets orbiting a nearby bright M0 dwarf star, EPIC 201367065, using data from the K2 mission. Photometry from K2 reveals three planetary candidates with orbital periods of 10.056, 24.641, and ~45 days and radii between 1.5-2.1 Earth radii. Spectroscopy of the host star determines it has a mass of 0.601 solar masses, radius of 0.561 solar radii, and is located about 45 parsecs away, making the planets some of the coolest small planets known around a nearby star. The system presents an opportunity to measure the planet masses and constrain atmospheric compositions via future observations
This document summarizes observations of the star KIC 8462852 from the Kepler space telescope. Key points:
- Kepler observed irregular dips in the star's brightness up to 22% over its 4-year mission. These dips lasted from 5-80 days and occurred at irregular intervals.
- Follow-up observations characterized KIC 8462852 as a main sequence F3V/IV star with a rotation period of 0.88 days and no significant infrared excess.
- Various scenarios are considered to explain the dips, including dust clumps or exocomet fragments passing in front of the star. The scenario best fitting current data involves a family of exocomet fragments
The search for_extraterrestrial_civilizations_with_large_energy_suppliesSérgio Sacani
This document discusses potential signatures that could distinguish transiting megastructures from exoplanets. It identifies nine potential anomalous signatures:
1) Non-disk shapes could produce anomalous transit light curves.
2) Non-gravitational forces could cause orbits inconsistent with the star's density.
3) "Swarms" of many structures could cause irregular or aperiodic transit signals.
4) Structures large enough to completely occult the star could be detected.
5) Very low inferred masses would be anomalous for objects blocking significant starlight.
6) Nearly achromatic eclipses would differ from exoplanets' atmosphere-influenced transits.
This document summarizes the discovery of HIP 116454b, the first planet discovered by the Kepler K2 mission. The host star HIP 116454 is a bright, nearby K-dwarf observed during the K2 engineering test in February 2014. Analysis of the K2 photometry using a new technique revealed a single transit, indicating a super-Earth sized planet with a radius of 2.53 R⊕. Follow-up radial velocity observations and photometry from other telescopes confirmed the planet and refined its properties to a mass of 11.82 M⊕ and orbital period of 9.1 days. This makes HIP 116454b one of the few super-Earth exoplanets around bright stars amenable to detailed
This document describes observations of the pre-cataclysmic variable binary star system NN Serpentis using a CCD sensor. Light curve data of the system's apparent magnitude over time was collected, extracted from files, and calibrated. The light curve showed periodic fluctuations due to the orbit as well as a deep eclipse when the red dwarf passed in front of the white dwarf. By fitting a sine wave to the light curve, the orbital period was calculated to be 3.206 ± 0.023 hours. Additional parameters of the binary system were also determined from the observations and prior literature.
This document summarizes the discovery and characterization of Kepler-432 b, a massive exoplanet in a highly eccentric orbit around a red giant star. Key findings include:
1) Radial velocity measurements from the CAFE spectrograph revealed Kepler-432 b has a mass of 4.87 ± 0.48 MJup and moves in a highly eccentric orbit (e = 0.535 ± 0.030) around its host star.
2) Simultaneous modeling of Kepler photometry and radial velocity data determined Kepler-432 b has a radius of 1.120 ± 0.036 RJup and orbits every 52.5 days.
3) Analysis of high-resolution images found a nearby star
This document summarizes the discovery of a transiting circumbinary planet (PH1) in a quadruple star system by volunteers with the Planet Hunters citizen science project. PH1 orbits outside a 20-day eclipsing binary consisting of an F dwarf and an M dwarf star. It has a radius of 6.18 ± 0.17 R⊕ and an upper mass limit of 169 M⊕. Beyond PH1's orbit is a distant visual binary bound to the system, making this the first known case of a transiting planet in a quadruple star configuration. Planet Hunters volunteers identified transit features in the Kepler light curve of KIC 4862625 through visual inspection and discussion, leading to the confirmation and characterization
1) NASA's Kepler mission discovered two distinct transit signals around a bright G-type star, Kepler-10.
2) Statistical tests established the planetary nature of the shorter period signal, now called Kepler-10b.
3) Forty precision Doppler measurements confirmed Kepler-10b, finding it has a radius of 1.416+0.033 R⊕ and is the smallest transiting exoplanet discovered.
Artificial intelligence is helping astronomers make new discoveries and analyze vast amounts of data more efficiently. A new technique using AI was able to automatically identify and count thousands of previously undetected craters on the moon. AI has also discovered new exoplanets, studied stellar sound waves to reveal properties of stars, found fast radio bursts that had evaded detection, and recognized patterns in galaxy images. Challenges remain, but AI is proving to be a valuable tool for astronomy by identifying patterns humans might miss and handling large datasets faster than human analysis alone.
A super earth_sized_planet_orbiting_in_or_near_the_habitable_zone_around_sun_...Sérgio Sacani
This document summarizes the discovery of a super-Earth-sized planet orbiting in or near the habitable zone of a Sun-like star called Kepler-69. Using data from the Kepler spacecraft, astronomers detected two planets transiting Kepler-69, called Kepler-69b and Kepler-69c. Kepler-69c is estimated to have a radius of 1.7 Earth radii and orbit every 242.5 days, placing it close to the star's habitable zone where liquid water could exist on the surface. At 1.7 Earth radii, Kepler-69c represents the smallest planet found by Kepler to be potentially habitable, and is an important discovery on the path to finding the first true Earth analog.
Similar to EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS (20)
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
AI for Legal Research with applications, toolsmahaffeycheryld
AI applications in legal research include rapid document analysis, case law review, and statute interpretation. AI-powered tools can sift through vast legal databases to find relevant precedents and citations, enhancing research accuracy and speed. They assist in legal writing by drafting and proofreading documents. Predictive analytics help foresee case outcomes based on historical data, aiding in strategic decision-making. AI also automates routine tasks like contract review and due diligence, freeing up lawyers to focus on complex legal issues. These applications make legal research more efficient, cost-effective, and accessible.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Software Engineering and Project Management - Software Testing + Agile Method...Prakhyath Rai
Software Testing: A Strategic Approach to Software Testing, Strategic Issues, Test Strategies for Conventional Software, Test Strategies for Object -Oriented Software, Validation Testing, System Testing, The Art of Debugging.
Agile Methodology: Before Agile – Waterfall, Agile Development.
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Applications of artificial Intelligence in Mechanical Engineering.pdf
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS
1. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
DOI:10.5121/mlaij.2022.9101 1
EXOPLANETS IDENTIFICATION AND CLUSTERING
WITH MACHINE LEARNING METHODS
Yucheng Jin, Lanyi Yang, Chia-En Chiang
EECS Graduate Student at University of California-Berkeley, Berkeley, CA, USA
ABSTRACT
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for
exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which
not only require a considerable investment of manpower, time, and money, but also are limited by the
performance of astronomical telescopes. In this study, we proposed the idea of using machine learning
methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space
Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a
three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network;
we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised
learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a
result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the
supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
KEYWORDS
Exoplanets Identification and Clustering, Kepler Dataset, Classification Tree, Random Forest, Naïve
Bayes, Multi-layer Perceptron, K-means Clustering, K-fold Cross-validation
1. INTRODUCTION
Over the past decades, astronomers around the world have been relentlessly seeking for habitable
exoplanets (planets outside the Solar System) with great interest [1]. Some habitable exoplanet
candidates are Kepler-22b, which is 600 light-years away from the Sun with an orbital period of
290 days, that has a possible rocky surface at a temperature of approximately 295K with an active
atmosphere [2]; Kepler-69c, which is 2,700 light-years away from the Sun with an orbital period
of 242 days, that is of super-earth-sized with a smaller gravity of 7.159 m/s², compared to the
earth’s gravity [3]; Kepler-442b, which is 1,194 light-years away from the Sun with an orbital
period of 112 days, that is within the habitable zone of its solar system [4].
How can astronomers discover these exoplanets? The answer is they have devoted a considerable
amount of time, energy, and money to explore the vast universe with specialized equipment, such
as the astronomical telescope. As a result, astronomers collect observed data for further analysis,
while the identification of a potential exoplanet remains a complicated problem. How can we
confirm an observation from some astronomical telescopes is an existent exoplanet but not a false
positive sample?
There are several traditional exoplanet identification techniques, including the wobble method,
direct imaging, gravitational microlensing, etc. Specifically, the wobble method traces exoplanets
via a Doppler shift in the star’s light frequencies caused by its planets; direct imaging uses extra-
terrestrial telescopes to capture images from exoplanets; gravitational microlensing detects the
2. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
2
distortion of the background light [5]. These traditional methods are expensive, time-consuming,
and sensitive to variation in the measurement. Based on these limitations, our motivation for this
study is to develop a simple, efficient, and economic approach to identify exoplanets. Therefore,
we proposed the idea of using machine learning methods for the identification of exoplanets and
used two datasets, both were collected by NASA, to conduct a three-categorical classification
task (supervised learning) and a clustering task (unsupervised learning).
For the supervised learning task, we used the Kepler dataset consisted of false positive (labelled
as “-1”), candidate (labelled as “0”), and confirmed (labelled as “1”) exoplanet samples to
perform a three-categorical classification. The objective is to achieve a classification accuracy of
more than 95%. We selected and trained four models for this problem: classification tree, random
forest, naïve Bayes, and neural network. As a result, our models achieved accuracies of 99.06%,
92.11%, 88.50%, and 99.79%, respectively.
For the unsupervised learning task, we used another dataset consisted of the confirmed
exoplanets data to do the clustering. The objective is to find exoplanets that are in the same
cluster as the earth. We selected and trained a k-means model for this problem. As a result, our k-
means model partitioned reasonable clusters, and we visualized all exoplanets in the same cluster
as the earth in a star map.
The rest of this paper is organized as follows. Section II summarizes the related work from other
researchers. Before the modelling and inference, we conducted data cleaning, exploratory data
analysis (EDA), feature selection, etc., and these are written in Section III. In Section IV, we
present the detailed experimental setup, including model optimization and optimal parameters.
Section V analyses the results obtained by our models and discusses the meanings of the results.
Finally, Section VI concludes the paper with the significance of this study and future work to be
done.
2. RELATED WORK
NASA’s Kepler Mission devoted years of effort to discover exoplanets. William J. Borucki [6]
summarized how traditional approaches were used in the real-world scientific research to
confirm, validate, and model the discovered exoplanets by Kepler. Natalie M. Batalha [7] stated
that the progress made by the Kepler Mission indicates there are orbiting systems that might
potentially be habitable places for the human beings. Lissauer et al. [8] emphasized the
contribution made by the Kepler Mission and the significance of its data in exoplanets detection.
Based on the Kepler dataset or other similar datasets, there are studies that combine machine
learning methods with exoplanets detection, identification, and analysis. Christopher J. Shallue
and Andrew Vanderburg [9] used a convolutional neural network to process the astrophysical
signals and validated two new exoplanets according to the CNN predictions. Rohan Saha [10]
implemented logistic regression, decision tree, and neural network on the Kepler dataset to find
out the probability of existence of an exoplanet candidate. Maldonado et al. [11] summarized
studies about exoplanet transit discovery with ML-based algorithms. Fisher et al. [12] interpreted
high-resolution spectroscopy of exoplanets with cross-correlations and supervised learning such
as Bayesian neural networks. Basak et al. [13] constructed novel neural network architectures to
explore the habitability classification of exoplanets.
These related studies provided us with precious insights into the data pre-processing, model
training, selection, and inference procedures. We designed our experiment based on these studies
and used some of their results as benchmarks.
3. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
3
3. DATA PRE-PROCESSING
In this section, we introduce the data pre-processing procedures involved in this study and a short
description of the datasets.
3.1. Dataset Description
The first dataset (the Kepler dataset) in this study is collected by NASA from the Kepler Space
Observatory. In 2009, NASA launched the Kepler Mission, an effort to discover exoplanets with
the goal of finding potentially habitable places for human beings [14]. The mission lasted for
over nine years with remarkable legacies—a total number of 9,564 potential exoplanets are
contained in the Kepler dataset, each associated with features that indicate the characteristics of
the detected “exoplanet”.
Among these features, there is a categorical variable which we selected as the target variable,
koi_disposition, with three possible values, “CONFIRMED” (labelled as “1”), “CANDIDATE”
(labelled as “0”), and “FALSE POSITIVE” (labelled as “-1”). If an exoplanet is
“CONFIRMED”, its existence has been confirmed, and is associated with a name recorded by
kepler_name variable; if an exoplanet is “CANDIDATE”, its existence has not been proven yet;
if an exoplanet is “FALSE POSITIVE”, it has been proven a false positive observation. There are
totally 2,358 confirmed exoplanets, 2,366 candidate exoplanets, and the rest 4,840 exoplanets are
false positive. We used the Kepler dataset for classification, the supervised learning task.
The second dataset (the confirmed exoplanets dataset) contains stellar and planetary parameters
of the confirmed exoplanets [15]. These observations are worldwide, not solely from the Kepler
Mission. Crucial parameters include radius, mass, density, temperature, etc. There are totally
4,375 confirmed exoplanets in this dataset, some were originally discovered by the Kepler Space
Observatory, the rest were originally found by other space observatories. We used the confirmed
exoplanets dataset for clustering, the unsupervised learning task.
3.2. Data Cleaning
The first step of data cleaning was to calculate the proportion of empty entries in each column.
We excluded columns with a large proportion of empty data. Then columns with a small fraction
of empty data were manually selected and the empty entries in the preserved columns were filled
with proper values. Finally, some data were dropped if they had empty values after data cleaning.
3.3. Exploratory Data Analysis (EDA)
Before feature selection and model construction, exploratory data analysis (EDA) is necessary to
facilitate the data analysis process and make it easier and more precise [16]. In this study, EDA
was carried out to obtain an intuitive and high-level understanding of the datasets.
3.3.1. Correlation Analysis
We first conducted correlation analysis to identify highly correlated variables and reduce data
redundancy and collinearity. The correlation matrices of the two datasets are shown in Fig. 1 and
Fig. 2. Both the x-axis and y-axis of Fig. 1 and Fig. 2 are features, and the entry values represent
the correlation coefficients between pairs of features. These two figures are symmetric, because
the correlation coefficient between a variable X and another variable Y follows Corr(X, Y) =
Corr(Y, X). In addition, the diagonal from top left to bottom right is solely consisted of 1’s, since
4. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
4
for each variable X, Corr(X, X) = 1. For a pair of variables, the larger the correlation coefficient
is, the higher the collinearity is; therefore, features with one or more high correlation coefficients
were eliminated by us.
Figure 1. Correlation Matrix of the Kepler Dataset
5. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
5
Figure 2. Correlation Matrix of the Confirmed Exoplanets Dataset
3.3.2. Univariate Analysis
Then univariate analysis was performed to visualize the distribution of each variable.
Figure 3. Count Plots of the Binary Variables of the Kepler Dataset
6. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
6
Figure 4. Histogram of the Log Scale Orbital Periods of the Kepler Dataset
Fig. 3 shows the count of each binary variable in the Kepler dataset, which gives the distribution
of four binary characteristics (non-transit-like, stellar eclipse, centroid offset, ephemeris match
indicates contamination) of the exoplanet candidates.
Fig. 4. is a histogram that indicates the relationship between an important feature, the log scale
orbital period, and the target variable. From Fig. 4, the confirmed exoplanets follow a Gaussian
distribution with respect to the log scale orbital period in the range between -2 and 4. If the value
of the log scale orbital period is too high or too low, the candidate is more likely to be a false
positive observation.
3.3.3. Bivariate Analysis
Finally, we used bivariate analysis to investigate the pairwise relationship between different
features and observe how they affect the target variable.
Fig. 5 is a star map of the exoplanet candidates. In this star map, each pair of celestial coordinates
is measured by right ascension (RA), the celestial coordinate that represents longitude, and
declination (Dec), the celestial coordinate that represents latitude [17]. Fig. 5 demonstrates that
there is no strong correlation between the target variable and coordinates, because for every target
value, these is no clear cluster of exoplanets shown in Fig. 5.
Fig. 6 is a categorical plot of the log scale orbital period versus number of planets in the
exoplanet candidate’s solar system. The result shows that false positive samples are most likely to
7. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
7
have just one or two planets in their solar system. With a higher number of planets in a
candidate’s solar system, the probability of it being a real exoplanet is also higher.
Figure 5. Star Map by Right Ascension (RA) and Declination (Dec)
Figure 6. Categorical Plot of the Log Scale Orbital Period versus Number of Planets
8. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
8
3.4. Feature Selection
Based on the correlation heatmaps, we excluded variables that have high correlation coefficients
with one or more other input features or have a low correlation coefficient with the target
variable. For example, variables that store the deviation of observations, such as koi_period_err
(error of the orbital period) and koi_time0_err (error of the transit epoch), are highly dependent
on the observed values (orbital period and transit epoch). Therefore, we just kept variables that
store the observed values but abandoned variables that record deviation. Some preserved features
are listed in Table 1.
Table 1. Examples of Preserved Features after Feature Selection
Variable Dataset Meaning Description
kepler_name Kepler Kepler Name
Kepler name in the form of “Kepler-” plus a
number and a lower-case letter (e. g. Kepler-
22b, Kepler-186f)
koi_fpflag_nt Kepler
Non-Transit-Like
Flag
1 means the light curve is not consistent with
that of a transiting planet
koi_period Kepler Orbital Period Measured in days and is taken in log scale
koi_count Kepler
Number of
Planets
Number of exoplanet candidates identified in
a solar system
pl_radj
Confirmed
Exoplanets
Planet Radius
Measured in Jupiter Radius
pl_bmassj
Confirmed
Exoplanets
Planet Mass
Measured in Jupiter Mass
sy_dist
Confirmed
Exoplanets
Distance
Distance to the planetary system in parsecs
4. EXPERIMENTAL SETUP
In this section, we discuss the model training process in detail with optimization and parameters.
The training process was divided into two parts, training models for classification, and training
the k-means model for clustering. For these two parts, we used the Kepler dataset and confirmed
exoplanets dataset, respectively. For the supervised learning task, we used decision tree, random
forest, naïve Bayes, and neural network. For the unsupervised learning task, we used k-means
clustering.
4.1. Classification Tree
The reason why we chose the decision tree model is that it can split data with classification rules
based on the highest information gain, Info Gain = Entropyparent - Entropychildren. In a
classification tree model, each internal node represents a sample to be split, and each leaf node
represents a class label, or prediction.
We implemented the classification tree model using DecisionTreeClassifier from scikit-learn
library and optimized its performance with different parameters. We set the minimum number of
samples required to split an internal node from 2 to 100 and maximum depth of the tree from 1 to
20, then we selected the parameters that maximized accuracy. As a result, the optimal minimum
number of samples required to split an internal node is 53 and maximum depth of the tree is 7.
Finally, we visualized the optimal decision tree using Graphviz. The optimal decision tree is
shown in Fig. 7 in the next section.
9. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
9
4.2. Random Forest
Random forest combines multiple decision trees together at the training time. As a result, these
decision trees form an ensemble which predicts the class category using the label with the most
votes. For example, in a random forest of 9 decision trees, if 6 predict 1, then the random forest
outputs 1. Customized rules might be applied to break the tie.
We implemented the random forest model using RandomForestRegressor from scikit-learn
library. We optimized the number of trees in the forest by setting this parameter from 20 to 1000
with a step size of 20. As a result, the highest accuracy was obtained when the number of trees is
40.
4.3. Naïve Bayes
Naïve Bayes applies Bayes’ theorem under the assumption that features are independent. It’s a
simple probabilistic approach to conduct the classification task. For features X1, X2, …, Xn and
classes C1, C2, …, Cm, Naïve Bayes (x1, … xn) = argmax
𝑐
𝑝(𝐶 = 𝑐) ∏ 𝑝(𝑋𝑖 = 𝑥𝑖 | 𝐶 = 𝑐)
𝑛
𝑖=1 .
We implemented the naïve Bayes model using ComplementNB from scikit-learn library. Before
training the naïve Bayes classifier, we standardized input features, encoded the target variable as
a one-hot vector, and set each prior as the proportion of each class in the sample space.
4.4. Multilayer Perceptron
Neural network is another machine learning model that is widely used in classification problems.
In a neural network, each layer has an activation function that receives the output values from the
previous layer and outputs values calculated by the activation function. For each set of features,
the final prediction of the neural network is the output of its last layer.
We implemented the neural network model using MLPRegressor from scikit-learn library. We
tried three activation functions, logistic, tanh, and ReLU. These activation functions determine
how the weighted sum of the input to the current layer is converted into the output to the next
layer. We also chose three types of solvers, stochastic gradient descent, quasi-Newton method,
and Adam optimizer. Finally, we optimized the layer size and learning rate. The optimal neural
network uses tanh as the activation function and Adam as the optimizer with a layer size of 25
and a learning rate of 0.003.
4.5. K-Means Clustering
K-means is a classic unsupervised learning algorithm for clustering problems. It partitions data
into k clusters with the objective of minimizing the sum of distance of each sample to its cluster
centroid in a repeated way.
We aimed at discovering which exoplanets are in the same group as the earth. In this way, we
might find potentially habitable places for human beings. We implemented the k-means model
using KMeans from scikit-learn library. We set the number of clusters as 100.
5. EXPERIMENTAL RESULTS AND ANALYSIS
For the classification problem, the decision tree achieved an accuracy of 99.06%, random forest
achieved an accuracy of 92.11%, naïve Bayes achieved an accuracy of 88.50%, and multilayer
10. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
10
perceptron achieved an accuracy of 99.79%. These four models all performed well on the test set,
as each achieved a high classification accuracy.
We further evaluated the performance of these models with 10-fold cross-validation, a
resampling method that tests if a model generalizes well, using KFold from scikit-learn library.
As a result, the random forest model achieved the best average accuracy of 82.39%.
Figure 7. The Optimal Classification Tree Obtained
We visualized the classification tree with the highest accuracy using Graphviz and plotted the
importance of each feature. The optimal classification tree obtained is shown in Fig. 7 and the
importance of each feature is shown in Fig. 8 and Table 2. From Fig. 7, Fig. 8, and Table 2, the
most important feature is stellar eclipse, a flag variable indicating if some phenomena caused by
an eclipsing binary are observed, with a Gini importance score of 0.30512. The most important
continuous variable is the log scale orbital period measured in days, with a Gini importance score
of 0.05115.
11. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
11
Figure 8. The Importance Scores of Features in the Kepler Dataset
Table 2. Kepler Feature List with Variable Name, Importance, Meaning, and Description
Variable Importance Meaning Description
koi_fpflag_ss 0.30512 Stellar Eclipse
1 means some phenomena caused by an
eclipsing binary observed
koi_fpflag_co 0.27971 Centroid Offset
1 means the source of the signal is from a
nearby star
koi_fpflag_nt 0.27158 Non-Transit-Like
1 means the light curve is not consistent
with that of a transiting planet
koi_period 0.05115 Orbital Period Measured in days and is taken in log scale
koi_count 0.03464 Number of Planets
Number of exoplanet candidates identified
in a solar system
koi_fpflag_ec 0.03245
Ephemeris Match
Indicates
Contamination
1 means the candidate shares the same
period and epoch as another object
koi_time0 0.01978 Transit Epoch Measured in Barycentric Julian Day (BJD)
Finally, we visualized the exoplanets in the same cluster as the earth in Fig. 9. As a result, these
exoplanets are most likely to be habitable for human beings. For example, HD 7924 d, HD 33564
b, HD 17156 b, GJ 96 b, Teegarden’s Star b are among these exoplanets. There are totally 39
exoplanets in the same cluster as the earth.
12. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
12
Figure 9. Star Map of the Exoplanets in the Same Cluster as the Earth
6. CONCLUSION
In this project, we conducted both supervised and unsupervised learning on two datasets collected
by NASA, the Kepler dataset and confirmed exoplanets dataset. The Kepler dataset was used to
predict the existence of exoplanet candidates by classification techniques, including decision tree,
random forest, naïve Bayes, and neural network. The confirmed exoplanets dataset was used to
find habitable exoplanets by picking up exoplanets in the same cluster as the earth.
For the supervised learning task, comprehensive data cleaning and EDA were performed to help
visualize the data and get a higher-level understanding of the samples. Then correlation-based
feature selection was conducted, several redundant features were removed because of low feature
target correlation or high feature-feature correlation. The selected machine learning models were
then implemented. The optimal hyper-parameters were obtained by experiment and accuracy was
improved. Our models finally achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%,
respectively, and were compared by 10-fold cross validation. As a result, the random forest
model performed the best among these four.
For the unsupervised learning task, basic EDA and feature selection were also performed. Then
the earth’s features were added to the dataset before clustering. All exoplanets were divided into
100 clusters and exoplanets that are most likely to be habitable were found in the cluster that
contains the earth.
As an extension of this project, it might be interesting to create new features from the current
feature set using feature engineering techniques, because they can be helpful to improve the
model performance. In addition, we can investigate the receiver operating characteristics (ROC)
13. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
13
and the precision-recall curves to understand the diagnostic ability of different machine learning
models. McNamara’s test can also be applied to compare different algorithms [10]. Finally, the
discussion of statistical significance tests and execution time can be included in model
evaluation.
People from different cultures may connect to the concept of a “planet” as follows, they live on
one, our mother earth, view the moon shared by all human beings, and learn the names of the
other planets in our solar system from an early age. Planets that circle the star, rather than nebulae
or galaxies, are easier to fit into our shared cultural view of the universe. For us, exoplanet
exploration bridges the heaven with human consciousness and opens a vast exploration area to
look forward to—seeking other habitable worlds. Finally, it has increased the likelihood that our
long-term study points toward, we are not alone in the universe.
ACKNOWLEDGEMENTS
This study is a research project of the graduate course DATA 200: Principles and Techniques of
Data Science offered by UC Berkeley EECS Department in Fall 2021. The instructor is Prof.
Fernando Pérez.
REFERENCES
[1] Brennan, Pat (2019). “Why Do Scientists Search for Exoplanets? Here Are 7 Reasons”. NASA
Website. Online. Retrieved from https://exoplanets.nasa.gov/news/1610/why-do-scientists-search-
for-exoplanets-here-are-7-reasons/.
[2] Borucki, William J., et al. (2012). “Kepler-22b: A 2.4 Earth-radius planet in the habitable zone of a
Sun-like star.” The Astrophysical Journal 745.2, 2012: 120.
[3] Barclay, Thomas, et al. (2013). “A super-earth-sized planet orbiting in or near the habitable zone
around a sun-like star.” The Astrophysical Journal 768.2, 2013: 101.
[4] Torres, Guillermo, et al. (2015). “Validation of 12 small Kepler transiting planets in the habitable
zone.” The Astrophysical Journal 800.2, 2015: 99.
[5] Breitman, Daniela (2017). “How do astronomers find exoplanets?”. EarthSky.org. Online. Retrieved
from https://earthsky.org/space/how-do-astronomers-discover-exoplanets/.
[6] Borucki, William J. (2016). “KEPLER Mission: development and overview.” Reports on Progress
in Physics 79.3, 2016: 036901.
[7] Batalha, Natalie M. (2014). “Exoplanet populations with NASA’s Kepler Mission”. Proceedings of
the National Academy of Sciences. 111 (35) 12647-12654, September 2014. DOI: 10.1073/-
pnas.1304196111.
[8] Lissauer, Jack J., Dawson, R. I., and Tremaine, S. (2014). “Advances in exoplanet science from
Kepler”. Nature. 513, 336–344, 2014. DOI: 10.1038/nature13781.
[9] Shalluel, Christopher J. and Vanderburg, Andrew (2018). “Identifying Exoplanets with Deep
Learning: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90”.
The Astronomical Journal. Volume 155, Number 2, 30 January 2018. DOI: 10.3847/1538-
3881/aa9e09.
[10] Saha, Rohan (2019). “Comparing Classification Models on Kepler Data”. arXiv:2101.01904v2.
DOI: 10.13140/RG.2.2.11232.43523.
[11] Maldonado, Miguel J. (2020). “Transiting Exoplanet Discovery Using Machine Learning
Techniques: A Survey”. Earth Science Informatics. volume 13, pages 573–600, 5 June 2020. DOI:
https://doi.org/10.1007/s12145-020-00464-7.
[12] Fisher, Chloe, et al. (2020). “Interpreting High-resolution Spectroscopy of Exoplanets using Cross-
correlations and Supervised Machine Learning.” The astronomical journal 159.5, 2020:192.
[13] Basak, Suryoday, et al. (2021). “Habitability classification of exoplanets: a machine learning
insight.” The European Physical Journal Special Topics 230.10, 2021: 2221-2251.
[14] “Kepler and K2 Missions”. NASA Official Website. Online. Retrieved from https://astrobiolo-
gy.nasa.gov/missions/kepler/.
14. Machine Learning and Applications: An International Journal (MLAIJ) Vol.9, No.1, March 2022
14
[15] “NASA Exoplanet Archive”. NASA Official Website. Online. Retrieved from https://exoplanet-
archive.ipac.caltech.edu/docs-/API_compositepars_columns.html.
[16] Tukey, John W. (1962). “The Future of Data Analysis”. The Annals of Mathematical Statistics. Vol.
33, No. 1, pp. 1-67, March 1962.
[17] Richmond, Michael. “Celestial Coordinates”. Online. Retrieved from http://spiff.rit.edu/classes/-
phys373/lectures/radec/radec.html.
AUTHORS
Yucheng Jin is a graduate student at the EECS department of University of
California, Berkeley. His research interests are engineering applications of machine
learning, machine learning security, and brain-machine interface.
Lanyi Yang is a graduate student at the EECS department of University of
California, Berkeley.
Chia-En Chiang is a graduate student at the EECS department of University of
California, Berkeley.