[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

tekkeon · 2024-11-21T10:01:24Z

Describe the bug
After upgrading to 0.9.3, we're noticing that the output Gemini (we're specifically using Gemini Pro 1.5 002) has numerous problems in our system. We're getting hallucinated text and overall very strange outputs that make our application unusable. When we downgrade back to 0.9.1, everything works properly again.

We looked at the rendered prompts and noticed the only difference was that format was now specified as json.

Snippit from 0.9.1:

"output": {
    "jsonSchema": {
      "type": "object",
      "properties": {
...

Snippet from 0.9.3

"output": {
    "format": "json",
    "jsonSchema": {
      "type": "object",
      "properties": {
...

We suspect this change to default to JSON mode may be causing the issue.

To Reproduce
Upgrade from 0.9.1 to 0.9.3 and run some prompts with complex outputs.

Expected behavior
We expected AI output to remain consistent to what our system had been producing.

Runtime (please complete the following information):

OS: MacOS 14.6.1

Node version

v18.17.1

The text was updated successfully, but these errors were encountered:

pavelgj · 2024-11-21T14:11:04Z

Thanks for the report. I'm investigating.

pavelgj · 2024-11-21T15:24:59Z

There's a diff of model request between 0.5 and 0.9

the diff in instructions is negligible, however the new (0.9) json format does specify contentType and constrained: true option, which would make a difference in generation. In theory the difference should be for the better, but not necessarily.

pavelgj · 2024-11-21T15:28:55Z

cc @mbleigh

pavelgj · 2024-11-21T15:32:49Z

I have a PR: #1365
it would allow you to define custom formats and override the default behaviour changes in the json format. Ex:

ai.defineFormat(
  {
    name: 'myJson',
    format: 'json',
  },
  (schema) => {
    let instructions: string | undefined;

    if (schema) {
      instructions = `Output should be in JSON format and conform to the following schema:

\`\`\`
${JSON.stringify(schema)}
\`\`\`
`;
    }

    return {
      parseChunk: (chunk) => {
        return extractJson(chunk.accumulatedText);
      },

      parseMessage: (message) => {
        return extractJson(message.text);
      },

      instructions,
    };
  }
);

const MenuItemSchema = z.object({
  name: z.string(),
  description: z.string(),
  calories: z.number(),
  allergens: z.array(z.string()),
});

export const menuSuggestionFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    outputSchema: MenuItemSchema.nullable(),
  },
  async () => {
    const response = await ai.generate({
      prompt: 'Invent a menu item for a pirate themed restaurant.',
      output: { format: 'myJson', schema: MenuItemSchema },
    });

    return response.output;
  }
);

The diff between 0.5 and this json formatter is negligible:

i2amsam · 2024-11-21T15:39:35Z

this is an interesting path, I think the reported difference was between 0.9.1 and 0.9.3, were there any differences there? Anecdotally I saw the same thing, after successfully conforming to the output for a long time I saw non-conforming generations in my testing yesterday

pavelgj · 2024-11-21T15:42:40Z

this is an interesting path, I think the reported difference was between 0.9.1 and 0.9.3, were there any differences there? Anecdotally I saw the same thing, after successfully conforming to the output for a long time I saw non-conforming generations in my testing yesterday

d'oh... I should pay better attention... let me diff 0.9.1->0.9.3 real quick....

pavelgj · 2024-11-21T15:50:57Z

Interesting, the diff is only in format:

however, gemini json mode condition is format==='json' OR contentType==='application/json', so it should not have made any difference between 0.9.1 and 0.9.3

genkit/js/plugins/googleai/src/gemini.ts

Line 605 in 39c86af

const jsonMode =

pavelgj · 2024-11-21T16:00:56Z

yeah, verified that there's no diff between 0.9.1->0.9.3 at the gemini model call level....

i2amsam · 2024-11-21T17:06:19Z

I'm not sure that it's the same issue, but I isolated down the changes I noticed to a repeatable case in #1368 . Filed as a separate issue so as not to derail this thread.

@tekkeon are you able to provide any more context for your snippets? I think an example with

config: {
temperature: 0
}

on the generation showing hallucination / strange output would be very helpful.

If you're using the Developer UI you can export a trace from a bad generation under the trace details tab which would also help us debug if you have a sharable trace.

tekkeon · 2024-11-21T19:47:07Z

Thanks for the quick replies here.

@pavelgj @i2amsam I've taken a screenshot of the diff of the redacted rendered prompts. I'll take a look at the traces as well, though I suspect there will at least be some IP in there I wouldn't want shared publicly. I can share it with you privately though if it will help.

tekkeon · 2024-11-21T19:51:08Z

Interesting, the diff is only in format:

however, gemini json mode condition is format==='json' OR contentType==='application/json', so it should not have made any difference between 0.9.1 and 0.9.3

genkit/js/plugins/googleai/src/gemini.ts

Line 605 in 39c86af

const jsonMode =

@pavelgj Not sure if it's relevant, but the lines you referred to were changed 2 weeks ago adding in the || contentType === 'application/json' logic: d3c0dbe#diff-321672f29eb53cc56ab2871e37a98e016269ab4a0efa7445cbc3af677ff13938R586-R587

i2amsam · 2024-11-22T02:33:19Z

Hmmmm, I'm a little out of my wheelhouse here, but @pavelgj wouldn't we expect to have a constrained:true in tekkon's example here? It's in yours, and in the working 0.9.3 example I have. @tekkeon is this a ai.generate call or a prompt() call. Could you give us the outline of how you're constructing the generate call?

i2amsam · 2024-11-22T02:34:17Z

Oh, and @tekkeon are you using the Vertex version or the Gemini API version?

tekkeon · 2024-11-22T06:25:29Z

Oh great question - we're using the Vertex version.

mbleigh · 2024-11-22T17:22:31Z

A change in 0.9 is that Gemini models now use constrained generation by default (that's the constrained: true) you're seeing. It's possible that constrained generation is affecting output in a way that increases hallucination.

Can you try adding {output: {constrained: false}} to your generate call? This should disable constrained generation and may revert the behavior back to the previous.

tekkeon added bug Something isn't working js labels Nov 21, 2024

github-project-automation bot added this to Genkit Backlog Nov 21, 2024

pavelgj self-assigned this Nov 21, 2024

i2amsam mentioned this issue Nov 21, 2024

[JS] Difference between 0.9.0 and 0.9.3 in Output Instruction Following for Gemini #1368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

tekkeon commented Nov 21, 2024 •

edited

Loading

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

i2amsam commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024 •

edited

Loading

i2amsam commented Nov 21, 2024

tekkeon commented Nov 21, 2024

tekkeon commented Nov 21, 2024 •

edited

Loading

i2amsam commented Nov 22, 2024

i2amsam commented Nov 22, 2024 •

edited

Loading

tekkeon commented Nov 22, 2024

mbleigh commented Nov 22, 2024

[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360

Comments

tekkeon commented Nov 21, 2024 • edited Loading

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

i2amsam commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024

pavelgj commented Nov 21, 2024 • edited Loading

i2amsam commented Nov 21, 2024

tekkeon commented Nov 21, 2024

tekkeon commented Nov 21, 2024 • edited Loading

i2amsam commented Nov 22, 2024

i2amsam commented Nov 22, 2024 • edited Loading

tekkeon commented Nov 22, 2024

mbleigh commented Nov 22, 2024

tekkeon commented Nov 21, 2024 •

edited

Loading

pavelgj commented Nov 21, 2024 •

edited

Loading

tekkeon commented Nov 21, 2024 •

edited

Loading

i2amsam commented Nov 22, 2024 •

edited

Loading