Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#129 data recipe to augment new features #131

Open
wants to merge 3 commits into
base: rel-1.9.1
Choose a base branch
from

Conversation

gbakthavatchalam
Copy link

This PR adds the data recipe that lets user augment new features to the dataset by using the augment service

https://github.com/h2oai/h2oai/issues/20586

Copy link

@surenH2oai surenH2oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

data/augment.py Outdated
@@ -0,0 +1,758 @@
"""

This data recipe lets the user to augment new features to the dataset using the Augment Cloud Service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add more description about this recipe from the perspective of DAI. Example starting with requirements.
SnowFlake,
DataSet
DAI

description about augmentation, where the output of augment will be persisted, and how it is consumed by DAI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it

data/augment.py Outdated
6. The recipe polls the API for the completion of the table creation
6. The recipe exports the dataset back to user's snowflake account
7. The recipe downloads, saves the dataset from snowflake into driverlessai instance and returns the file path
8. A new dataset is created in DAI with the augmented columns

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customer facing recipe i would use full product name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surenH2oai I have it updated now :)

return "", str(e)


class AugmentDataset(CustomData):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the CustomData used? I guess this is needed since data recipe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surenH2oai CustomData is the base class and we are overriding the method create_data in the subclass 'AugmentDataset. DAI will find out the subclass that derives from CustomDatawhich in this case isAugmentDatasetand it will invoke thecreate_data` method to get the updated dataframe with original columns + augmented columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants