Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCI Hydraulic Preprocess Data Recipe #99

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

james94
Copy link

@james94 james94 commented Jul 15, 2020

The Hydraulic System Sensor Data Recipe does the following:

  • Downloads the Hydraulic Systems Condition Monitoring Dataset from UCI ML Repo
  • Reads the data zip file
  • Extracts the hydraulic sensors from individual files
    • Computes the average cycle for each hydraulic sensor
  • Extracts the chosen hydraulic condition monitoring label
  • Stores the hydraulic sensor data and condition monitoring label to csv file
  • Imports the preprocessed hydraulic sensor data into Driverless AI

james94 added 3 commits July 15, 2020 09:18
* Added data recipe for importing hydraulic condition monitoring dataset

* Renamed the file to be shorter
* Added data recipe for importing hydraulic condition monitoring dataset

* Renamed the file to be shorter

* Updated Data Recipe based on Edgar's feedback

Changed all places where I was using module level objects
to use only import modules for code readability.
Added type hints for each class method, which improves
readability by specifying the expected argument
and return types.
Updated comments with keyword arguments.
Replaced series of if/elif statements with a dictionary.
Updated code that specifies the destination filepath for
output file by using os join instead of file replace.
Changed all my custom variable names to lowercase
underscores instead of mixed case.
@james94 james94 requested a review from orendain July 15, 2020 17:26
Copy link
Member

@orendain orendain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Apologies for the delay. My GitHub pull request bot didn't tell me about this outstanding PR (but did for others). I only came across this when clearing out my github web-UI notification dashboard.

Comment on lines +3 to +11
import os
import zipfile
# call on members Union, List
import typing
# call on members data.CustomData, systemutils_more.download, systemutils.config
import h2oaicore
import datatable as dt
import numpy as np
import pandas as pd
Copy link
Member

@orendain orendain Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style comment, very much only a nitpick:

Imports should be grouped in the following order:

    Standard library imports.
    Related third party imports.
    Local application/library specific imports.

You should put a blank line between each group of imports.

Reference: https://www.python.org/dev/peps/pep-0008/#imports

In other words:

import os
import typing
import zipfile

import datatable as dt
import numpy as np
import pandas as pd

import h2oaicore

Also in alphabetical order and removed comments unless necessary.

Comment on lines +135 to +141
# url contains hydraulic systems condition monitoring data set
link = "https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip"
# returns temp_path to filename with filename extracted from URL
file = h2oaicore.systemutils_more.download(link, dest_path=temp_path)
# Choose a target_label: cool_cond_y, valve_cond_y, pump_leak_y, acc_gas_leak_y, stable_y
# The target_label you choose will get added on as the last column in the training dataset
target_label = "cool_cond_y"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would consider moving the link and target_label strings to constants at the top of the file, along with their comments, so it would be easy for someone to go in and quickly change these (where people typically look for these types of configurable fields).

Then having link and target_label referencing those constants. It may also make this block short and easier to quickly read.

# The target_label you choose will get added on as the last column in the training dataset
target_label = "cool_cond_y"
# creates csv destination filepath to store processed hydraulic condition monitoring data
output_file = os.path.join(temp_path, "hydCondMonitorData.csv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq/nit: Is there a significance for the filename being camelCase instead of under_scored? It might be beneficial to use underscores as it's more typical in filesystem naming and also matches python variable naming.

Just curious, and sharing thoughts, I don't mean to push.

Comment on lines +121 to +125
# import one of the hydraulic condition monitoring labels from profile based on target label
df_hyd_cond_monitor_label_y = import_target_label(zip, "profile.txt", target_label)

# Combine all sensor Dataframes and then add hydraulic condition monitoring label as last column
df_hyd_cond_monitor_data_x = pd.concat([df_ps1, df_ps2, df_ps3, df_ps4, df_ps5, df_ps6, df_vf1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are great 👍 - thanks! Made the code easy to glance through for me while not having to be familiar with the name of the files in the zip. 🤩

"pump_leak_y": pd.DataFrame(pd_frame.iloc[:,2]),
# store hydraulic accumulator gas leakage label data from column 3 into pd frame
"acc_gas_leak_y": pd.DataFrame(pd_frame.iloc[:,3]),
# store hydraulic stable flag label data from column 4 into pd frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very much just a nit/observation: Comments make up 50% of this block. I wonder if the could be combined into a single multi-line comment at the top, perhaps making the block flow more easily w/o being separated by comments each line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants