You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model.
For the same dataset, I apply both active learning (using minimum confidence and average confidence strategies) and random sampling. I select the same number of samples in both strategies, but the results from random sampling are significantly better than those from the active learning approach. I would like to know if this discrepancy might be due to an issue with my code or the modAL library's handling of multi-label classification. Below is my active learning loop:
for i in range(n_queries):
if i == 12:
n_instances = X_pool.shape[0]
else:
n_instances = batch(int(np.ceil(np.power(10, POWER))), BATCH_SIZE)
print(f"\nQuery {i + 1}: Requesting {n_instances} samples from a pool of size {X_pool.shape[0]}")
if X_pool.shape[0] < n_instances:
print("Not enough samples left in the pool to query the desired number of instances.")
break
query_idx, _ = learner.query(X_pool, n_instances=n_instances)
query_idx = np.unique(query_idx)
if len(query_idx) == 0:
print("No indices were selected, which may indicate an issue with the query function or pool.")
continue
# Add the newly selected samples to the cumulative training set
cumulative_X_train.append(X_pool[query_idx])
cumulative_y_train.append(y_pool[query_idx])
# Concatenate all the samples to form the cumulative training data
X_train_cumulative = np.concatenate(cumulative_X_train, axis=0)
y_train_cumulative = np.concatenate(cumulative_y_train, axis=0)
learner.teach(X_train_cumulative, y_train_cumulative)
# Log the selected sample names
selected_sample_names = train_df.loc[query_idx, "image"].tolist()
print(f"Selected samples in Query {i + 1}: {selected_sample_names}")
with open(samples_log_file, mode='a', newline='') as f:
writer = csv.writer(f)
writer.writerow([i + 1] + selected_sample_names)
# Remove the selected samples from the pool
X_pool = np.delete(X_pool, query_idx, axis=0)
y_pool = np.delete(y_pool, query_idx, axis=0)
# Evaluate the model
y_pred = learner.predict(X_test_np)
accuracy = accuracy_score(y_test_np, y_pred)
f1 = f1_score(y_test_np, y_pred, average='macro')
acc_test_data.append(accuracy)
f1_test_data.append(f1)
print(f"Accuracy after query {i + 1}: {accuracy}")
print(f"F1 Score after query {i + 1}: {f1}")
# Early stopping logic
if f1 > best_f1_score:
best_f1_score = f1
wait = 0
else:
wait += 1
if wait >= patience:
print(f"Stopping early after {i + 1} queries due to no improvement in F1 score.")
break
total_samples += len(query_idx)
print(f"Total samples used for training after query {i + 1}: {total_samples}")
POWER += 0.25
torch.cuda.empty_cache()
The text was updated successfully, but these errors were encountered:
I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model.
For the same dataset, I apply both active learning (using minimum confidence and average confidence strategies) and random sampling. I select the same number of samples in both strategies, but the results from random sampling are significantly better than those from the active learning approach. I would like to know if this discrepancy might be due to an issue with my code or the modAL library's handling of multi-label classification. Below is my active learning loop:
The text was updated successfully, but these errors were encountered: