UCI Hydraulic Preprocess Data Recipe #99

james94 · 2020-07-15T17:25:58Z

The Hydraulic System Sensor Data Recipe does the following:

Downloads the Hydraulic Systems Condition Monitoring Dataset from UCI ML Repo
Reads the data zip file
Extracts the hydraulic sensors from individual files
- Computes the average cycle for each hydraulic sensor
Extracts the chosen hydraulic condition monitoring label
Stores the hydraulic sensor data and condition monitoring label to csv file
Imports the preprocessed hydraulic sensor data into Driverless AI

* Added data recipe for importing hydraulic condition monitoring dataset * Renamed the file to be shorter

* Added data recipe for importing hydraulic condition monitoring dataset * Renamed the file to be shorter * Updated Data Recipe based on Edgar's feedback Changed all places where I was using module level objects to use only import modules for code readability. Added type hints for each class method, which improves readability by specifying the expected argument and return types. Updated comments with keyword arguments. Replaced series of if/elif statements with a dictionary. Updated code that specifies the destination filepath for output file by using os join instead of file replace. Changed all my custom variable names to lowercase underscores instead of mixed case.

orendain

LGTM! Apologies for the delay. My GitHub pull request bot didn't tell me about this outstanding PR (but did for others). I only came across this when clearing out my github web-UI notification dashboard.

orendain · 2020-08-14T16:43:41Z

data/hydraulic-data.py

+import os
+import zipfile
+# call on members Union, List
+import typing
+# call on members data.CustomData, systemutils_more.download, systemutils.config
+import h2oaicore
+import datatable as dt
+import numpy as np
+import pandas as pd


Just a style comment, very much only a nitpick:

Imports should be grouped in the following order: Standard library imports. Related third party imports. Local application/library specific imports. You should put a blank line between each group of imports.

Reference: https://www.python.org/dev/peps/pep-0008/#imports

In other words:

import os import typing import zipfile import datatable as dt import numpy as np import pandas as pd import h2oaicore

Also in alphabetical order and removed comments unless necessary.

orendain · 2020-08-14T16:48:57Z

data/hydraulic-data.py

+        # url contains hydraulic systems condition monitoring data set
+        link = "https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip"
+        # returns temp_path to filename with filename extracted from URL
+        file = h2oaicore.systemutils_more.download(link, dest_path=temp_path)
+        # Choose a target_label: cool_cond_y, valve_cond_y, pump_leak_y, acc_gas_leak_y, stable_y
+        # The target_label you choose will get added on as the last column in the training dataset
+        target_label = "cool_cond_y"


nit: I would consider moving the link and target_label strings to constants at the top of the file, along with their comments, so it would be easy for someone to go in and quickly change these (where people typically look for these types of configurable fields).

Then having link and target_label referencing those constants. It may also make this block short and easier to quickly read.

orendain · 2020-08-14T16:51:40Z

data/hydraulic-data.py

+        # The target_label you choose will get added on as the last column in the training dataset
+        target_label = "cool_cond_y"
+        # creates csv destination filepath to store processed hydraulic condition monitoring data
+        output_file = os.path.join(temp_path, "hydCondMonitorData.csv")


qq/nit: Is there a significance for the filename being camelCase instead of under_scored? It might be beneficial to use underscores as it's more typical in filesystem naming and also matches python variable naming.

Just curious, and sharing thoughts, I don't mean to push.

orendain · 2020-08-14T16:54:12Z

data/hydraulic-data.py

+                # import one of the hydraulic condition monitoring labels from profile based on target label
+                df_hyd_cond_monitor_label_y = import_target_label(zip, "profile.txt", target_label)        
+
+                # Combine all sensor Dataframes and then add hydraulic condition monitoring label as last column
+                df_hyd_cond_monitor_data_x = pd.concat([df_ps1, df_ps2, df_ps3, df_ps4, df_ps5, df_ps6, df_vf1,


These comments are great 👍 - thanks! Made the code easy to glance through for me while not having to be familiar with the name of the files in the zip. 🤩

orendain · 2020-08-14T16:56:55Z

data/hydraulic-data.py

+                "pump_leak_y": pd.DataFrame(pd_frame.iloc[:,2]),
+                # store hydraulic accumulator gas leakage label data from column 3 into pd frame
+                "acc_gas_leak_y": pd.DataFrame(pd_frame.iloc[:,3]),
+                # store hydraulic stable flag label data from column 4 into pd frame


very much just a nit/observation: Comments make up 50% of this block. I wonder if the could be combined into a single multi-line comment at the top, perhaps making the block flow more easily w/o being separated by comments each line.

james94 added 3 commits July 15, 2020 09:18

Jm/hydraulic Data Recipe (#1)

38b7b03

* Added data recipe for importing hydraulic condition monitoring dataset * Renamed the file to be shorter

Merge branch 'master' of https://github.com/h2oai/driverlessai-recipes

87c3546

james94 requested a review from orendain July 15, 2020 17:26

orendain approved these changes Aug 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCI Hydraulic Preprocess Data Recipe #99

UCI Hydraulic Preprocess Data Recipe #99

james94 commented Jul 15, 2020

orendain left a comment

orendain Aug 14, 2020 •

edited

Loading

orendain Aug 14, 2020

orendain Aug 14, 2020

orendain Aug 14, 2020

orendain Aug 14, 2020

UCI Hydraulic Preprocess Data Recipe #99

Are you sure you want to change the base?

UCI Hydraulic Preprocess Data Recipe #99

Conversation

james94 commented Jul 15, 2020

orendain left a comment

Choose a reason for hiding this comment

orendain Aug 14, 2020 • edited Loading

Choose a reason for hiding this comment

orendain Aug 14, 2020

Choose a reason for hiding this comment

orendain Aug 14, 2020

Choose a reason for hiding this comment

orendain Aug 14, 2020

Choose a reason for hiding this comment

orendain Aug 14, 2020

Choose a reason for hiding this comment

orendain Aug 14, 2020 •

edited

Loading