-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why teach method fits the data? #95
Comments
You can say that, one may want to use predict method right after it teaches. That's why fit method is used inside the teach method, but as I described the above issue that approach is problematic. At least there should be an option of whether fitting the data will be performed or not. |
To only add training data without refitting the estimator, you can use the This is a "private" method, so I didn't include it in the documentation, but the method itself is documented, so it should be easy to use. I don't understand your use case and argument exactly. What is the underlying model you use? If by querying a single label you learn multiple other labels indirectly, than you can manually add these to the
|
Okay, I got your point. My another question is that, should I delete the queried instance from the X_pool and its corresponding label from y_pool after I make a query(learn the label) and teach them at each query iteration? Or is it unnecessary? |
Yes, it should be deleted manually. Otherwise, the query strategy might select data which is already part of your training data, hence possibly leading to model bias in some scenarios. There is a PR by @talolard who proposed a data manager class, but eventually decided to put this into a completely new package. I don't know the status on this, but will be very useful for this case. |
When you teach your Active Learner the queried instance and its label, instead of just adding these new instances to train dataset, it also fits the model with these newly labeled dataset. But this is unnecessary. Because in my case, when I learn the label of one instance, I learn the 300 (this number can vary) other instances label(since they share the same label) automatically. Therefore, I have to teach 300 new instances at each query iteration to the Active Learner which takes a lot time because of the fit method. For this reason, I believe that fitting the data should be performed only in query method.
The text was updated successfully, but these errors were encountered: