MathViz-E is an agent-based system which provides voice-driven automated math graphing. The system consists of three interacting LLM-based agents:
- An agent that interacts with a math solver, currently WolframAlpha
- An agent that interacts with a visual calculator, currently Desmos Graphical Calculator
- An orchestrator agent which interacts with the user interface, and the solver and calculator agents
The agents complement each other. The solver agent can perform computations that are too complex for the LLM (currently gpt4-turbo-preview) to solve reliably (like computing the tangent to a circle passing through a given point). On the other hand the orchestrator can reformulate queries into a form that is understandable by the solver. Finally, the visual calculator provides multimodal graphing capbilities. The combination allows the system to produce accurate graphs even for difficult, multi-step problems requiring complex reasoning.
An immediate use-case is in-classroom math pedagogy for K-12. Visual graphing is an integral part of math education, allowing students to visualize and interact with mathematical concepts. Visual calculators are used to teach several Common Core learning objectives. Pedagogical agents like MathViz-E hold the promise of untethering teachers by enabling natural language-based operation of classroom tools, enabling teachers to spend more time teaching and interacting with students.
Beyond the immediate pedgogical use-case, MathViz-E is an example of the broader class of AI agent systems that are poised to fundamentally transform AI's impact on our everyday lives, and how we interact with technology. To learn more, check out our blog posts.
-
Rename
src/env.dist
tosrc/.env
. -
In
src/.env
, add yourOPENAI_API_KEY
andWOLFRAM_APP_ID
. -
In
src/.env
, add yourDESMOS_API_KEY
. As an example, you can use the public key here. -
In
src/.env
setUSE_EXPRESSION_VALIDATION
toTrue
orFalse
.
- If
True
the agent uses a self-critique vaidation to fix common errors (generally related to Desmos API requirements e.g. usingle
instead ofleq
for inequalities). This produces more accurate outputs, but at the expense of significant latency. - Set to
False
is you prefer quicker responses, at the epense of occasional errors, that can be manually fixed in the Desmos calcualtor.
-
Run docker e.g., via Docker Desktop
-
In
src/.env
, set theUSE_DOCKER
flag toTrue
. -
On Linux/Mac: In the project root folder run
source run.sh
. -
On Windows: In the project root folder, run:
docker build -t my_desmos .
docker run -p 5001:5001 my_desmos
- Create a virtual environment with python v3.11. For example, if you're using Anaconda
conda create -n math_agent python=3.11 pip
conda activate math_agent
- Install requirements inside the environment
pip install -r requirements.txt
-
In
src/.env
, set theUSE_DOCKER
flag toFalse
. -
In the root folder run
python -m src.main
.
-
In
src/.env
, set theUSE_DOCKER
flag toFalse
. -
Install Python virtual environemnt
pip install virtualenv
- Go to the project directory and create the virtual environment
cd <PROJECT_ROOT_DIR>
python3.11 -m venv env
- Activate the virtual environment
source env/bin/activate
- Install requirements inside the environment
pip install -r requirements.txt
- Run the application
python -m src.main
In your browser, go to http://localhost:5001/desmos/
. Tested on Chrome and Firefox; use Chrome to enable the voice-interface.
These are some demonstrations of multi-turn interactions with MathViz-E. There are several other types of mathematical concepts that the system can be used for. See the test cases in \tests\
for many other examples of the categories of problems that MathViz-E can solve.
To run the system on a wide variety of single and multi-turn examples, you can run the provided test-scripts in the root-folder, on the examples in the test
directory:
# single-turn tests
source run_single_turn_tests.sh
# multi-turn tests
source run_multi_turn_tests.sh
Note that this takes a long time and requires many calls to the LLM (gpt4-turbo-preview
). Also note that accuracy is significantly higher if you set USE_EXPRESSION_VALIDATION
to True
in src/.env
.
The test scripts produce output csv files with Desmos expressions, by default in the test/output
folder. Evaluating the correction of these expressions is currently manual. We're in the process of adding an autoevaluator which uses the LLM and Sympy to automate evaluation.
Known issues and todos:
- Support for polygons and non-parametric shapes: MathViz-E currently relies on the Desmos
setExpressions
API, which does not handle non-parametric shapes. Support for polygons etc. requires the use of other Desmos APIs likepolygon()
. - Unhandled classes of problems and Wolfram solver output: Other classes of problems that the system does poorly at include tangents to parabolas, and labeling certain types of intersections. A common theme in these errors is that Wolfram's output varies across classes of problems; we currently take Wolfram's output from the "Step-by-step solution" pod, but for some categories of problems this is incorrect. Wolfram output interpretation needs to be made more robust.
- Graph retention over multi-turn interaction: Selecting which graphs to retain and which to remove, over the turns of a multi-turn interaction is non-trivial and needs to be made more robust. For example, "Move the shape to x,y" should probably remove the previously graphed shape, while "Plot the asymptote to the function" should probably retain the previously graphed function.
- Natural language interfaces for the solver and calculator agents: Currently the orchestrator invokes the solver and calculator agents through member functions. We'd like to move to invocation via natural language interfaces.
- Transition to Autogen or an equivalent framework: Related to the above (natural language interfaces), we plan to transition the system to a multi-agent framework like Autogen
- Autoevaluator: Currently, accuracy evaluation of the system is done manually. An AI-critique based autoevaluator would signifcantly simplify this; we're looking at a SymPy + LLM based auto-evaluator.
- Faster self-critique: The curent self-critique (when
USE_EXPRESSION_VALIDATION
is set "True") is slow and needs to be made more efficient. Also, the alternative rule-based post-processing is rather limited currently. One possibility is to combine the validation checks in the initial query prompt, to reduce the number of LLM invocations; another is to promotp the LLM to generate shorter responses, perhaps by limiting CoT. - Latency and fine-tuned LLMs: The system uses GPT4, which can be high-latency and expensive. Fine-tuning smaller open-source LLMs like Mistral or Llama would mitigate this.
- Speech recognition issues: The system uses the MDN Web Speech API for speech recognition. This has known issues with certain graphing utterances, for example, it often mis-transcribes words like "comma" and "cubed".
Thank you for your interest in contributing! We welcome involvement from the community.
Please visit our contributing guidelines for more details on how to get involved.
The UI uses icons and product names for the solver and calculator. This is for demonstration only, and all trademarks belong to the original owners.