🚀 This is the source code for the SDR Classifier package that classifies potential aviation safety hazards from textual data. The work is a collaboration between FAA and Boeing data scientist teams, and these models aim to assist analysts in Continued Operational Safety (COS) processes.
SDRs are submitted via the service difficulty reporting system operators or certified repair stations as a means to document and share information with the aviation community about failures, malfunctions, or defects of aeronautical products. The free-form text description field often contains valuable COS-related information, however it lacks predictable grammatical structure and is not in any way standardized. Additionally, it can contain typographical errors, part numbers, abbreviations, and references to specific sections of maintenance manuals or operating procedures, making it difficult to reliably extract this information with regular expressions or language models designed to take in clean, full sentences as input.
Each of these COS classification models was trained to pick up on all of the variation described above for a specific COS criterion. Subject matter experts (SMEs) annotated hundreds of SDR records for training and testing of each model, and those datasets are available here alongside the code to train, test, and invoke the models.
Version 0.1.1 out now! Check out the release notes here.
It is highly recommended to use venv, virtualenv or conda python environments. Read more about creating virtual environments via venv https://docs.python.org/3/tutorial/venv.html#creating-virtual-environments
Run the command in the root folder to create the whl file in the dist folder
git clone https://github.com/Boeing/sdr-hazards-classification
python setup.py bdist_wheel
pip install ./distr/sdr_hazards_classification-0.1.0-py3-none-any.whl
pip install sdr_hazards_classification
🛩️ Please follow the contribution guideline
from sdr_hazards_classification.sdr_api import SdrInferenceAPI, CORROSION_LIMIT, DEGRADED_CONTROLLABILITY
import pandas as pd
depressurization_model = SdrInferenceAPI()
#test the prediction method
depressurization_model.test_sdr_depressurization_predictions()
event_text = """Lost cabin pressurization at flight level 30000, cabin altitude warning horn sounded at 10000 feet. Unabel to control
cabin pressure with outflow valve closed"""
pred, probs = depressurization_model.get_predictions([event_text])
degraded_controllability_model = SdrInferenceAPI(DEGRADED_CONTROLLABILITY)
degraded_controllability_model.test_sdr_degraded_controllability()
df = pd.read_csv('./src/sdr_classifier/data/SDR_Example.csv')
records = df["Text"]
#pass in a record list for prediction
pred, probs = depressurization_model.get_predictions(records)
df['Prediction'] = pred
df['Prob'] = probs
print(df.head(2))
print("Done")
Open Access Data via the FAA's Service Difficulty Reporting System