Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

Open
tekkeon opened this issue Nov 21, 2024 · 15 comments
Open
Assignees
Labels
bug Something isn't working js

Comments

@tekkeon
Copy link

tekkeon commented Nov 21, 2024

Describe the bug
After upgrading to 0.9.3, we're noticing that the output Gemini (we're specifically using Gemini Pro 1.5 002) has numerous problems in our system. We're getting hallucinated text and overall very strange outputs that make our application unusable. When we downgrade back to 0.9.1, everything works properly again.

We looked at the rendered prompts and noticed the only difference was that format was now specified as json.

Snippit from 0.9.1:

"output": {
    "jsonSchema": {
      "type": "object",
      "properties": {
...

Snippet from 0.9.3

"output": {
    "format": "json",
    "jsonSchema": {
      "type": "object",
      "properties": {
...

We suspect this change to default to JSON mode may be causing the issue.

To Reproduce
Upgrade from 0.9.1 to 0.9.3 and run some prompts with complex outputs.

Expected behavior
We expected AI output to remain consistent to what our system had been producing.

Runtime (please complete the following information):

  • OS: MacOS 14.6.1

Node version

  • v18.17.1
@tekkeon tekkeon added bug Something isn't working js labels Nov 21, 2024
@pavelgj pavelgj self-assigned this Nov 21, 2024
@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

Thanks for the report. I'm investigating.

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

There's a diff of model request between 0.5 and 0.9

image

the diff in instructions is negligible, however the new (0.9) json format does specify contentType and constrained: true option, which would make a difference in generation. In theory the difference should be for the better, but not necessarily.

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

cc @mbleigh

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

I have a PR: #1365
it would allow you to define custom formats and override the default behaviour changes in the json format. Ex:

ai.defineFormat(
  {
    name: 'myJson',
    format: 'json',
  },
  (schema) => {
    let instructions: string | undefined;

    if (schema) {
      instructions = `Output should be in JSON format and conform to the following schema:

\`\`\`
${JSON.stringify(schema)}
\`\`\`
`;
    }

    return {
      parseChunk: (chunk) => {
        return extractJson(chunk.accumulatedText);
      },

      parseMessage: (message) => {
        return extractJson(message.text);
      },

      instructions,
    };
  }
);

const MenuItemSchema = z.object({
  name: z.string(),
  description: z.string(),
  calories: z.number(),
  allergens: z.array(z.string()),
});

export const menuSuggestionFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    outputSchema: MenuItemSchema.nullable(),
  },
  async () => {
    const response = await ai.generate({
      prompt: 'Invent a menu item for a pirate themed restaurant.',
      output: { format: 'myJson', schema: MenuItemSchema },
    });

    return response.output;
  }
);

The diff between 0.5 and this json formatter is negligible:
image

@i2amsam
Copy link
Contributor

i2amsam commented Nov 21, 2024

this is an interesting path, I think the reported difference was between 0.9.1 and 0.9.3, were there any differences there? Anecdotally I saw the same thing, after successfully conforming to the output for a long time I saw non-conforming generations in my testing yesterday

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

this is an interesting path, I think the reported difference was between 0.9.1 and 0.9.3, were there any differences there? Anecdotally I saw the same thing, after successfully conforming to the output for a long time I saw non-conforming generations in my testing yesterday

d'oh... I should pay better attention... let me diff 0.9.1->0.9.3 real quick....

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

Interesting, the diff is only in format:
image

however, gemini json mode condition is format==='json' OR contentType==='application/json', so it should not have made any difference between 0.9.1 and 0.9.3

const jsonMode =

@pavelgj
Copy link
Collaborator

pavelgj commented Nov 21, 2024

yeah, verified that there's no diff between 0.9.1->0.9.3 at the gemini model call level....

@i2amsam
Copy link
Contributor

i2amsam commented Nov 21, 2024

I'm not sure that it's the same issue, but I isolated down the changes I noticed to a repeatable case in #1368 . Filed as a separate issue so as not to derail this thread.

@tekkeon are you able to provide any more context for your snippets? I think an example with

config: {
temperature: 0
}

on the generation showing hallucination / strange output would be very helpful.

If you're using the Developer UI you can export a trace from a bad generation under the trace details tab which would also help us debug if you have a sharable trace.
image

@tekkeon
Copy link
Author

tekkeon commented Nov 21, 2024

Thanks for the quick replies here.

@pavelgj @i2amsam I've taken a screenshot of the diff of the redacted rendered prompts. I'll take a look at the traces as well, though I suspect there will at least be some IP in there I wouldn't want shared publicly. I can share it with you privately though if it will help.

image

@tekkeon
Copy link
Author

tekkeon commented Nov 21, 2024

Interesting, the diff is only in format: image

however, gemini json mode condition is format==='json' OR contentType==='application/json', so it should not have made any difference between 0.9.1 and 0.9.3

const jsonMode =

@pavelgj Not sure if it's relevant, but the lines you referred to were changed 2 weeks ago adding in the || contentType === 'application/json' logic: d3c0dbe#diff-321672f29eb53cc56ab2871e37a98e016269ab4a0efa7445cbc3af677ff13938R586-R587

@i2amsam
Copy link
Contributor

i2amsam commented Nov 22, 2024

Hmmmm, I'm a little out of my wheelhouse here, but @pavelgj wouldn't we expect to have a constrained:true in tekkon's example here? It's in yours, and in the working 0.9.3 example I have. @tekkeon is this a ai.generate call or a prompt() call. Could you give us the outline of how you're constructing the generate call?

@i2amsam
Copy link
Contributor

i2amsam commented Nov 22, 2024

Oh, and @tekkeon are you using the Vertex version or the Gemini API version?

@tekkeon
Copy link
Author

tekkeon commented Nov 22, 2024

Oh great question - we're using the Vertex version.

@mbleigh
Copy link
Collaborator

mbleigh commented Nov 22, 2024

A change in 0.9 is that Gemini models now use constrained generation by default (that's the constrained: true) you're seeing. It's possible that constrained generation is affecting output in a way that increases hallucination.

Can you try adding {output: {constrained: false}} to your generate call? This should disable constrained generation and may revert the behavior back to the previous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working js
Projects
Status: No status
Development

No branches or pull requests

4 participants