Throughout naval aviation, data lakes provide the raw material for generating insights into predictive maintenance and increasing readiness across many platforms. Successfully leveraging these data lakes can be technically challenging.
Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
1. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 1
Boeing F/A-18F Super Hornet at takeoff at Danish Air Show 2014 on June 22
Image attribution: Slaunger
CC BY-SA 3.0
2. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 2
The views of the author(s) expressed herein do not necessarily
represent those of the U.S. Navy or Department of Defense
(DoD). Presentation of this material does not constitute or imply its
endorsement, recommendation, or favoring by the DoD.
3. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 3
• Team of 20 data scientists and
software engineers
• Navy civil servants and
contractor support personnel
• Naval Air Warfare Center
Training Systems Division in
Orlando, FL
NAWCTSD Enterprise Research Data
Science
4. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 4
Issue
Unscheduled maintenance and
component degradation impacting
readiness
Hornet Health Assessment and Readiness Tool (HhART)
Create a real-time monitoring tool for a fleet of aircraft that is driven by engineering-
approved features and models that can be used to assist maintenance decisions and
predict component degradation
6. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 6
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
7. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 7
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
Engagement
Education
Communication
Trust
Culture
EngineersMaintainers
Leaders
Data
scientists
Process
engineers
Developers
8. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 8
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
SES 405 - Exploration Systems Engineering (ASU)
System Hierarchy Module (9)
• Complex system of systems
• Data scientists can support at
each level
• Each component is unique
9. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 9
• Confounding effects
• Inconsistent recording resolutions
• Both discrete and continuous data
• Changing schemas between software versions
SME ETL Data science Deployment Feedback
• Identify
• Acquire
• Load
• Engineers
• Designers
• Maintainers
10. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 10
Batch analysis Streaming
SME ETL Data science Deployment Feedback
• Identify
• Acquire
• Load
• Engineers
• Designers
• Maintainers
Flight data Storage Warehouse
Staging Analysis
Cleaning
Validation
Analysis
Results
Models
Flight data Storage
Analysis
Livedata
Models
Streaming
Logic-based metrics
Dashboard
GitLab is a registered trademark of GitLab, Inc.
Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other
countries.
11. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 11
SME ETL Data science Deployment Feedback
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Engineers
• Designers
• Maintainers
• Learn
• Understand the problem domain
• Analyze
• Find indicators of the problem
• Develop
• Features to predict the indicators
• Refine
• Iterate with the SME to better
define the features
Iterate
Learn
Analyze
Develop
Refine
12. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 12
• Logic
• Supply engineers with data science
superpowers
• Enhance error detection
• Moderately predictive
• Deep learning
• Learn what normal behavior is
• Detect complex parameter
interaction
• Highly predictive
SME ETL Data science Deployment Feedback
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Engineers
• Designers
• Maintainers
Images
created
using
public
NASA
DashLink
dataset
13. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 13
SME ETL Data science Deployment Feedback
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Engineers
• Designers
• Maintainers
• Anomaly detection
• Finding errant behavior in noisy
signals
• Virtual sensors
• Replicating normal behavior
• Information compression
• Targeting specific interactions
ModelInput
ModelOutput
Reconstruction
Error
Potential
Anomalies
14. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 14
• Feature score normalization
• Aggregate scores
• Seeing trends
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Live
• Targeted
• Relevant
15. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 15
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Live
• Targeted
• Relevant
• Speed
• On-prem vs. cloud
• Automation
• Security
• Hardening containers
• RMF, C-ATO
• Platform One
• AF CSO- software.af.mil
• Live monitoring system
Kubernetes and the Kubernetes logo are registered trademarks of The Linux
Foundation.
16. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 16
• Critical at all stages of the process cycle
• User needs should drive development
• Tool should be accurate and explainable
• Regularly communicate with users
SME ETL Data science Deployment Feedback
• Engineers
• Designers
• Maintainers
• Identify
• Acquire
• Load
• Explore
• Develop
• Iterate
• Live
• Targeted
• Relevant
• Communicate
• Incorporate
• Update
17. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 17
• Continuous
• Tailored to new platforms and
sub-systems
• Adapted to meet the needs of the
user
Collaboration
Learning
Developing
RefiningVisualizing
Deploying
Feedback
18. UNCLASSIFIED // APPROVED FOR PUBLIC RELEASE 18
Lessons learned
• Data science is most effective when tightly integrated into
organizational structure
• Progress is difficult at best without Data Scientist understanding of
the system
Near-term goals
• Move to include additional platforms
• Deploying HhART at the edge in collaboration with partners