Why teach method fits the data? #95

Efesencan · 2020-08-03T20:51:15Z

When you teach your Active Learner the queried instance and its label, instead of just adding these new instances to train dataset, it also fits the model with these newly labeled dataset. But this is unnecessary. Because in my case, when I learn the label of one instance, I learn the 300 (this number can vary) other instances label(since they share the same label) automatically. Therefore, I have to teach 300 new instances at each query iteration to the Active Learner which takes a lot time because of the fit method. For this reason, I believe that fitting the data should be performed only in query method.

Efesencan · 2020-08-03T20:54:17Z

You can say that, one may want to use predict method right after it teaches. That's why fit method is used inside the teach method, but as I described the above issue that approach is problematic. At least there should be an option of whether fitting the data will be performed or not.

cosmic-cortex · 2020-08-04T07:00:33Z

To only add training data without refitting the estimator, you can use the ActiveLearner._add_training_data method. (Here is the implementation: https://github.com/modAL-python/modAL/blob/master/modAL/models/base.py#L68-L92)

This is a "private" method, so I didn't include it in the documentation, but the method itself is documented, so it should be easy to use.

I don't understand your use case and argument exactly. What is the underlying model you use?

If by querying a single label you learn multiple other labels indirectly, than you can manually add these to the X_new and y_new before calling the teach method. This is roughly what I mean:

query_idx, X_query = learner.query(X_pool)

# ...
# get the label for X_query somehow
# ...

X_other, y_other = ... # these are the instances and labels you find indirectly after querying a single label

X_new = np.concat((X_query, X_other))
y_new = np.concat((y_query, y_other))

learner.teach(X_new, y_new)

Efesencan · 2020-08-04T10:47:50Z

Okay, I got your point. My another question is that, should I delete the queried instance from the X_pool and its corresponding label from y_pool after I make a query(learn the label) and teach them at each query iteration? Or is it unnecessary?

cosmic-cortex · 2020-08-04T11:38:53Z

Yes, it should be deleted manually. Otherwise, the query strategy might select data which is already part of your training data, hence possibly leading to model bias in some scenarios.

There is a PR by @talolard who proposed a data manager class, but eventually decided to put this into a completely new package. I don't know the status on this, but will be very useful for this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why teach method fits the data? #95

Why teach method fits the data? #95

Efesencan commented Aug 3, 2020 •

edited

Loading

Efesencan commented Aug 3, 2020

cosmic-cortex commented Aug 4, 2020

Efesencan commented Aug 4, 2020 •

edited

Loading

cosmic-cortex commented Aug 4, 2020

Why teach method fits the data? #95

Why teach method fits the data? #95

Comments

Efesencan commented Aug 3, 2020 • edited Loading

Efesencan commented Aug 3, 2020

cosmic-cortex commented Aug 4, 2020

Efesencan commented Aug 4, 2020 • edited Loading

cosmic-cortex commented Aug 4, 2020

Efesencan commented Aug 3, 2020 •

edited

Loading

Efesencan commented Aug 4, 2020 •

edited

Loading