Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parquet for GAME model training/scoring #188

Open
XianXing opened this issue Oct 21, 2016 · 5 comments
Open

Add Parquet for GAME model training/scoring #188

XianXing opened this issue Oct 21, 2016 · 5 comments

Comments

@XianXing
Copy link
Contributor

It's helpful to support data in the format of Parquet for GAME model training and scoring, which (Parquet) is a first class citizen in Apache Spark.

@XianXing
Copy link
Contributor Author

Since it seems like this issue is dependent on PR #179, I am curious to know whether we have a rough ETA for #179?

@fastier-li
Copy link
Member

This one IMHO would be best handled by the community - meaning that at LinkedIn, we are pretty busy with features needed by LinkedIn. We can chaperone the development of a Parquet connector, but there is little probability that we would spend time on it in the near future. So I'm going to label this one Help wanted.

@fastier-li
Copy link
Member

@XianXing 179 was merged in Nov 2016. Moreover, we are going to redo the Driver so that it is more like a "script" calling library functions to prepare the data, the indexes, the normalization contexts... so we are going in that direction to support other data formats (GameEstimator will work off of DataFrame).

@XianXing
Copy link
Contributor Author

Thanks for the update.

@fastier-li
Copy link
Member

The basic design for this should be a ParquetDataReader that outputs a DataFrame and other data structures needed by GameEstimator.fit. We will keep it as help wanted: it is good to have, but might not handled soon by LinkedIn staff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants