Spider 2.0, an evaluation framework with 632 real-world text-to-SQL tasks from enterprise databases. These databases, often with over 1,000 columns, come from cloud or local systems like BigQuery, Snowflake, and PostgreSQL.
Solving these tasks requires models to understand database metadata, dialects, and project code, navigating complex SQL environments and handling long contexts. The models must perform advanced reasoning and generate diverse SQL queries, sometimes over 100 lines, surpassing traditional text-to-SQL challenges.
For Spider 2.0
, all evaluation examples are aggregated in file spider2.jsonl
, where each data point contains the following field:
{
"instance_id": "3a348be1-aed2-44fb-8185-c66c9d14a6ef",
"instruction": "Please tell me the number of sessions for each website traffic channel in December 2020.",
"type": "Bigquery"
}
For each instance, we also provide a separate folder ./examples/{instruction_id}
as its Execution Contetxt to simulate the agentic setting. Each folder may have the following files:
README.md
: detailed requirements of theinstruction
field for the current example withinstance_id
;*_credential.json
: credential file connecting to realistic enterprise-level databases, e.g., BigQuery. Can be replaced with your OWN;result.csv
: CSV file to store the execution results;- other instance-specific materials which assist in finishing the current task:
- 🏗️ partial project, e.g.,
dbt_project/
. - 📝 reference documentation:
ga4_dimensions_and_metrics.md
,retention_rate.md
, etc. - 🔍 query interface: We have predefined how to access the diverse database systems.
- 🎞️ query history or samples, e.g.,
QUERY_HISTORY/
, etc.
- 🏗️ partial project, e.g.,
-
To sign up for a BigQuery account, please follow this guideline.
-
Follow this guideline) and fill out this Snowflake form, and we will send you an account sign-up email, which will allow you to access the Snowflake database.
We proposed an agent framework Spider-Agent
baseline with interactive environment.
We create evaluation suite for Spider 2.0.