Why were metrics of "helpfulness" prioritised over "correctness", and what will MDN / Mozilla do to prevent further statistical misinformation? #412
Replies: 4 comments 7 replies
This comment has been hidden.
This comment has been hidden.
-
|
Beta Was this translation helpful? Give feedback.
-
Why isn't anyone considering the probability that an LLM promotion brigade is trying to game the metrics? Or perhaps people who don't have Mozilla's users' best interest at heart, and might want to push Mozilla into spending resources in an ultimately unhelpful direction? Does Mozilla have linkage between the question asked, the answer given, and the rating given? Are the questions asked incredibly simple which get the helpful rating? The very poll is designed in such a way to generate bad statistics, since it doesn't track outcomes. "This sounds helpful, so I'll mark it as such, but ultimately I can't get it to work so it is actually worse than unhelpful" is a path I've traversed countless times on the internet. The number of people who mark something "helpful" is a non-statistic -- it's just the number of people who you can nudge through the apparent happy path without realizing it's not necessarily a truly happy path. Does Mozilla have any actual statisticians or poll-developers on this? Or are people stepping out of their realms of competence to do something which seems appropriate? |
Beta Was this translation helpful? Give feedback.
-
@LeoMcA @caugner @Rumyra will MDN be addressing the inconsistencies and contradictions in Leo's first set of answers I raised in my reply, or answering my 5th and 6th questions about statistical misinformation from my original post? Just to clarify, when I ask about "statistical misinformation" I'm not referring to helpfulness vs. correctness nor even the intentional exclusion of GitHub feedback in Steve's blog post, but the naming and usage of the actual data collected, as in these paragraphs from my original post:
|
Beta Was this translation helpful? Give feedback.
-
The blog post by Steve Teixeira, following initial pushback on these "AI" features, shared some "helpfulness" statistics via screenshot and referenced them in the text of the post. This metric, of the output being "helpful", was also used by Class Augner (caugner) and Florian Dieminger (fiji-flo) in their responses to the issues opened here on GitHub.
These prominent uses in place of any other metric imply "helpful" is a major, if not the primary, metric for those involved in
AI Help
andAI Explain
at MDN / Mozilla.To quote obfusk on issue #9230:
Importantly, this framing and showcasing of the "helpful" statistics is misleading to the point I imagine many people would deem it a lie. Even just the column names "Positive Feedback %" and "Negative Feedback %" could easily induce notable bias; they read as though they show how much positive and negative feedback has been received, when in actual fact they show how much of the feedback received was positive or negative. These are two very different concepts, but easily misinterpreted - especially if those data points are shared without the rest of the data for context.
In terms of the actual data, it clearly shows over 24,000 unique users and over 44,000 clicks on
AI Explain
but only 1,017 likes. Of people who submitted feedback, yes, a notable majority liked the "AI" output of these tools. However, 1,146 of 48,380 usages ofAI Explain
andAI Help
combined - only 2.37% positive feedback - is a tiny amount of support for these features; fewer than 3.5% of total uses of both features resulted in any feedback whatsoever including 1.03% indicating the result was "not helpful".Are Steve / MDN / Mozilla counting non-answers as positive feedback? That is the only way to explain the apparent fixation on this paltry number as some sort of holy grail of support to justify these "AI" tools' continued existence on MDN, but would be utterly ridiculous.
This data, held up by Steve, objectively does not scream "those who have tried the features to find answers tend to be happy with the results" - barely any responses, and nearly a 3rd negative. Yes it is notably in favour, of those who responded, but a third failure rate being reported is not good enough for technical documentation that must aim to be correct and accurate above all other goals or it fails its only purpose - informing and educating people about technical concepts and their implementation.
Further, this apparent love for statistics presented by Steve, however misleading, also completely fails to include the feedback statistics of the GitHub issues for these two tools. At the time of writing, the original post for issue #9208 has 1,287 likes against "AI" to 4 dislikes in favour (99.7% against) and issue #9230's OP has 147 likes against "AI" and 0 dislikes in favour. Adding these numbers from the GitHub issues into the Likes / Dislikes data from Steve's screenshot provides 1,150 in support and 1,934 against - only 37.29% positive overall!
A single issue on GitHub has more votes against these tools than MDN / Mozilla have in support of them - 1,287 vs. 1,146.
My questions are follows:
AI Explain
andAI Help
?Beta Was this translation helpful? Give feedback.
All reactions