Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report the model often starts creating repetitive sequences of tokens #220

Open
rossanodr opened this issue Jun 26, 2024 · 14 comments
Open
Labels
component:other Issues unrelated to examples/quickstarts status:awaiting response Awaiting a response from the author status:stale Issue/PR is marked for closure due to inactivity type:bug Something isn't working

Comments

@rossanodr
Copy link

Description of the bug:

Summary:
When using the “gemini-1.5-flash” model for generating long texts, the model often starts creating repetitive sequences of tokens, leading to an infinite loop and exhausting the token limit. This issue is observed with both the Vertex and Gemini APIs.

Example: ```
“The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…”
Steps to Reproduce:

Use the "gemini-1.5-flash" model via Vertex or Gemini API.
Generate a long text (e.g., legal or technical document).
Observe the generated output for repetition of phrases or sentences.
Expected Behavior:
The model should generate coherent and non-repetitive text.

Actual Behavior:
The model begins to repeat sequences of tokens indefinitely, leading to the maximum token limit being reached.

Impact:

Wastes tokens and API usage limits.
Generates unusable text, necessitating additional requests and costs.
Reproduction Rate:
Occurs frequently with long text generation tasks.

Workaround:
Currently, there is no known workaround to prevent this issue.

Request for Resolution:

Investigate the cause of the repetitive token generation.
Implement a fix to prevent the model from entering a repetitive loop.
Provide a mechanism for users to request refunds or credits for tokens wasted due to this bug.

Actual vs expected behavior:

Actual: “The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…”

Expected: “The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. ”

Any other information you'd like to share?

No response

@singhniraj08
Copy link
Contributor

@rossanodr,

Thank you reporting this issue.
This repository is for issues related to Gemini API Cookbook quickstarts and examples. For issues related to Gemini API, we would suggest you to use "Send Feedback" option in Gemini docs. Ref: Screenshot below. You can also post this issue on Google AI forum.

image

@singhniraj08 singhniraj08 added type:bug Something isn't working status:awaiting response Awaiting a response from the author component:other Issues unrelated to examples/quickstarts labels Jun 27, 2024
@rossanodr
Copy link
Author

Thank you but Unfortunately, I did not receive any response from any of them.

@rossanodr,

Thank you reporting this issue. This repository is for issues related to Gemini API Cookbook quickstarts and examples. For issues related to Gemini API, we would suggest you to use "Send Feedback" option in Gemini docs. Ref: Screenshot below. You can also post this issue on Google AI forum.

image

@ghost
Copy link

ghost commented Jun 29, 2024

Description of the bug:

Summary: When using the “gemini-1.5-flash” model for generating long texts, the model often starts creating repetitive sequences of tokens, leading to an infinite loop and exhausting the token limit. This issue is observed with both the Vertex and Gemini APIs.

Example: ``` “The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…” Steps to Reproduce:

Use the "gemini-1.5-flash" model via Vertex or Gemini API. Generate a long text (e.g., legal or technical document). Observe the generated output for repetition of phrases or sentences. Expected Behavior: The model should generate coherent and non-repetitive text.

Actual Behavior: The model begins to repeat sequences of tokens indefinitely, leading to the maximum token limit being reached.

Impact:

Wastes tokens and API usage limits. Generates unusable text, necessitating additional requests and costs. Reproduction Rate: Occurs frequently with long text generation tasks.

Workaround: Currently, there is no known workaround to prevent this issue.

Request for Resolution:

Investigate the cause of the repetitive token generation. Implement a fix to prevent the model from entering a repetitive loop. Provide a mechanism for users to request refunds or credits for tokens wasted due to this bug.

Actual vs expected behavior:

Actual: “The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. The judgment can be appealed…”

Expected: “The judgment can be appealed in a motion for reconsideration, claiming that the judge did not consider the evidence properly. ”

Any other information you'd like to share?

No response

nbdh

@mioruggieroguida
Copy link

We are experiencing the same issue

@rossanodr
Copy link
Author

We are experiencing the same issue

I posted the same issue on gemini forum. It would be nice if you could make some noise there too, to bring attention to the problem https://discuss.ai.google.dev/t/bug-report-the-model-often-starts-creating-repetitive-sequences-of-tokens/6445

@mioruggieroguida
Copy link

@rossanodr Done.

Did you manage to make any progress on this?

@rossanodr
Copy link
Author

@rossanodr Done.

Did you manage to make any progress on this?

No :(
Unfortunately, I think the problem is with Gemini. It is happening with many different prompts. The main issue is the large context. Let's say your prompt is something like, "Read the document below and make a list of all dates of birthdays on it {list}". If the document is large, it has a chance of starting to repeat the same date until it reaches the token limit.

Copy link

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

@github-actions github-actions bot added the status:stale Issue/PR is marked for closure due to inactivity label Aug 16, 2024
@bastien8060
Copy link

I have too

@AmosDinh
Copy link

I solved it like this (So you have to repeat yourself in order for the model not to repeat itself):
Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).
Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).
Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).
Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).
Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).

@zxl777
Copy link

zxl777 commented Sep 4, 2024

I solved it like this (So you have to repeat yourself in order for the model not to repeat itself): Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself). Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself). Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself). Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself). Please don't return any tool_code in your response and follw the DRY principle (Don't repeat yourself).

There are some improvements, but the repetition issue still occurs. It seems like this is an unavoidable bug.
I hope other LLMs don't have this problem.

@bastien8060
Copy link

bastien8060 commented Sep 4, 2024

@zxl777 Fixed it.

Basically, I asked Gemini to rephrase my prompt, and I moved the prompt from the system instruction to the actual chat/feed. Removing data structure (and just explaining it in the prompt) also improved performances.

I also turned temperature and top-p way down to 0.

Everything helped to a certain extent, but leaving system prompt empty worked much more.

@bastien8060
Copy link

I realized basically that system prompt is good to restrict the model or give it guidelines (ethics etc), but it's not good at following instructions from there

@rossanodr
Copy link
Author

@zxl777 Fixed it.

Basically, I asked Gemini to rephrase my prompt, and I moved the prompt from the system instruction to the actual chat/feed. Removing data structure (and just explaining it in the prompt) also improved performances.

I also turned temperature and top-p way down to 0.

Everything helped to a certain extent, but leaving system prompt empty worked much more.

not working for me. same errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:other Issues unrelated to examples/quickstarts status:awaiting response Awaiting a response from the author status:stale Issue/PR is marked for closure due to inactivity type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants