Transform unavailable when model was fit with only a single data sample. #2053
-
Hi! I have 2 lists with strings: main = [...]
secondary = [...] I want to use the main list to extract topics, then use those topics in a I could train a model on the But when I pass those topics to train a new model with zero-shot, I'm getting the error:
There are around 53 strings in Here is my model declaration: representation_model = OpenAI(client=openai_client, model="gpt-3.5-turbo", delay_in_seconds=10, chat=True)
topic_model = BERTopic(
embedding_model=embedding_model,
zeroshot_topic_list=zeroshot_topic_list,
zeroshot_min_similarity=.3,
vectorizer_model=vectorizer_model,
min_topic_size=2,
nr_topics="auto",
representation_model=representation_model
)
topics, _ = topic_model.fit_transform(documents=texts, embeddings=np.array(embeddings)) What am I doing wrong and how can I possibly overcome this error? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 14 replies
-
I'm missing a bit of information to get the complete picture. Could you share your full code and your full error message? That helps me understand how certain variables are created, the order of things, etc. What version of BERTopic are you using? Lastly, how many documents are in |
Beta Was this translation helpful? Give feedback.
It is quite the opposite. Almost all documents in
secondary
are matched with the topics you created frommain
. What happens is that there was just a single document not matched which was then put through the default BERTopic pipeline. See the entire process here. In practice, you could also increase thezeroshot_min_similarity
value to make sure that there isn't one document left but potentially multiple.