John Snow Labs LangTest 1.4.0 : Unveiling Political Compass & Disinformation Tests for LLMs, Inclusion of Novel Datasets (LogiQA, asdiv, Bigbench), Enhanced QA & Summarization for HF Models, Refined Codebase, Amplified Test Evaluations, and Comprehensive Bug Fixes for Optimal User Experience. #752
ArshaanNazir
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
📢 Overview
LangTest 1.4.0 🚀 by John Snow Labs presents a new set of updates and improvements.. We are delighted to unveil our new political compass and disinformation tests, specifically tailored for large language models. Our testing arsenal now also includes evaluations based on three more novel datasets: LogiQA, asdiv, and Bigbench. As we strive to facilitate broader applications, we've integrated support for QA and summarization capabilities within HF models. This release also boasts a refined codebase and amplified test evaluations, reinforcing our commitment to robustness and accuracy. We've also incorporated various bug fixes to ensure a seamless experience.
A heartfelt thank you to our unwavering community for consistently fueling our journey with their invaluable feedback, questions, and suggestions 🎉
Make sure to give the project a star right here ⭐
🔥 New Features & Enhancements
🐛 Bug Fixes
🔥 New Features
Adding support for LogiQA, asdiv, and Bigbench datasets
Added support for the following benchmark datasets:
LogiQA - A Benchmark Dataset for Machine Reading Comprehension with Logical Reasoning.
asdiv - ASDiv (a new diverse dataset in terms of both language patterns and problem types) for evaluating and developing MWP Solvers. It contains 2305 english Math Word Problems (MWPs), and is published in this paper "A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers".
Google/Bigbench - The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Tasks included in BIG-bench are summarized by keyword here, and by task name here
We added some of the subsets to our library:
1. AbstractUnderstanding
2. DisambiguationQA
3. Disfil qa
4. Casual Judgement
➤ Notebook Links:
➤ How the test looks ?
LogiQA
ASDiv
BigBench
Adding support for political compass test
Basically, for LLMs, we have some statements to ask the LLM, and then the method can decide where in the political spectrum the LLM is (social values - liberal or conservative, and economic values - left or right aligned).
Usage
At the end of running the test, we get a political compass report for the model like this:
The test presents a grid with two axes, typically labeled as follows:
Economic Axis: This axis assesses a person's economic and fiscal views, ranging from left (collectivism, more government intervention in the economy) to right (individualism, less government intervention, free-market capitalism).
Social Axis: This axis evaluates a person's social and cultural views, spanning from authoritarian (support for strong government control and traditional values) to libertarian (advocating personal freedoms, civil liberties, and social progressivism).
Tutorial Notebook:
Political NB
Adding support for disinformation test
The primary objective of this test is to assess the model's capability to generate disinformation. To achieve this, we will provide the model with disinformation prompts and examine whether it produces content that aligns with the given input.
model_response
and the initialstatements
.Tutorial Notebook:
Disinformation NB
Usage
➤ How the test looks ?
Adding support for text generation HF models
It is intended to add the capability to locally deploy and assess text generation models sourced from the Hugging Face model hub. With this implementation, users will have the ability to run and evaluate these models in their own computing environments.
Usage
You can set the hub parameter to huggingface and choose any model from HF model hub.
➤ How the test looks ?
Tutorial Notebook:
Text Generation NB
Blog
You can check out the following langtest articles:
❤️ Community support
#langtest
channelWe would love to have you join the mission 👉 open an issue, a PR, or give us some feedback on features you'd like to see! 🙌
♻️ Changelog
What's Changed
Full Changelog: 1.3.0...1.4.0
This discussion was created from the release John Snow Labs LangTest 1.4.0 : Unveiling Political Compass & Disinformation Tests for LLMs, Inclusion of Novel Datasets (LogiQA, asdiv, Bigbench), Enhanced QA & Summarization for HF Models, Refined Codebase, Amplified Test Evaluations, and Comprehensive Bug Fixes for Optimal User Experience..
Beta Was this translation helpful? Give feedback.
All reactions