QANTA 2025: Human-AI Cooperative QA Leaderboard

📋 Register here to participate in our Human-AI Cooperative Trivia Competition.

🎲 Create and submit your quizbowl AI agents at our submission site.

📅 Next Cutoff Date: June 10, 2025

ℹ️ E [Score] is the Expected Score for a question. 🙋🏻 and 🤖 indicate the scores against just the Human and the AI players respectively.
ℹ️ Cost is the cost in USD of executing the pipeline per question prefix. (Typically we have upto ~20 prefixes per tossup question)ℹ️ When does the cost matter? When two models buzz at the same token, which they often do, a lighter (cost-effective) model takes precedence.

🛎️ Tossup Round Leaderboard

🛎️ Tossup Round Leaderboard

mgor/single-step-meticulous-gpt-4o	-0.23966646634615382	-0.23933293269230765	-0.19515625	0.037349999999999994	0.8875	0.9875	86.89873417721519	0.36790340386064224


nmokaria/GPT40_Tossup_Titan	0.669	0.706	0.632	$0.63	85.0%	100.0%	67.612	78.2%
nmokaria/gpt-4o-mini-Tossup-Titan	0.666	0.715	0.618	$1.11	87.5%	100.0%	71.875	77.9%
jaimiec/jaimiec-test-3-gpt-41	0.617	0.700	0.534	$1.49	90.0%	100.0%	79.787	75.1%
Amanvir/gpt-tiger-tuned	0.608	0.694	0.522	$1.59	85.0%	100.0%	72.138	77.2%
Amanvir/gpt-sloth-2pt5	0.598	0.707	0.490	$3.53	88.8%	100.0%	73.938	76.3%
Amanvir/gpt-tiger-aggro-2	0.594	0.698	0.490	$1.59	85.0%	100.0%	70.138	77.6%
Amanvir/gpt-2-tiger-safer	0.593	0.685	0.502	$1.57	85.0%	100.0%	72.963	76.1%
Amanvir/gpt-tiger-aggro-4	0.592	0.687	0.498	$1.59	83.8%	100.0%	69.388	77.2%
Amanvir/gpt-sloth-2	0.587	0.707	0.468	$3.53	88.8%	100.0%	73.938	76.3%
jaimiec/jaimiec-test-3	0.582	0.704	0.460	$1.86	90.0%	100.0%	81.263	75.5%
jaimiec/jaimiec-test-7	0.580	0.666	0.494	$1.48	87.5%	100.0%	82.237	73.0%
jaimiec/jaimiec-test-2-gpt-41	0.579	0.638	0.521	$1.25	86.2%	100.0%	78.562	70.8%
Amanvir/gpt-tiger-aggro-3	0.569	0.651	0.487	$1.59	81.2%	100.0%	68.875	74.9%
jaimiec/jaimiec-test-5	0.564	0.650	0.478	$1.49	86.2%	100.0%	81.025	72.0%
jaimiec/jaimiec-test-2	0.561	0.650	0.472	$1.56	85.0%	100.0%	77.125	72.4%
Amanvir/gpt-beaver	0.537	0.669	0.406	$2.35	88.8%	98.8%	86.899	71.7%
Amanvir/gpt-sloth-freq-fix	0.533	0.666	0.401	$3.05	83.8%	100.0%	76.650	74.8%
Amanvir/gpt-snail	0.526	0.614	0.438	$3.00	78.8%	100.0%	62.625	72.0%
Amanvir/gpt-tiger-calib	0.516	0.650	0.383	$1.57	85.0%	100.0%	83.075	73.0%
Parth-Dua/Sub5	0.507	0.587	0.427	$1.78	80.0%	96.2%	69.870	67.6%
Amanvir/gpt-mouse	0.507	0.622	0.392	$0.47	85.0%	96.2%	88.364	68.6%
LeoJ-xy/vote-for-the-answer	0.504	0.550	0.458	$1.29	77.5%	91.2%	80.466	63.2%
Amanvir/gpt-tiger-aggro-1	0.502	0.567	0.438	$1.59	73.8%	100.0%	60.237	69.8%
Amanvir/gpt-sloth	0.487	0.599	0.375	$2.94	83.8%	93.8%	89.773	65.8%
jaimiec/jaimiec-test-1	0.472	0.576	0.368	$0.14	86.2%	96.2%	99.792	63.5%
jaimiec/jaimiec-test-cautious	0.419	0.478	0.361	$2.00	96.2%	98.8%	131.101	49.2%
mgor/single-step-meticulous-gpt-4o	0.336	0.347	0.326	$0.79	57.5%	100.0%	39.800	56.1%
Amanvir/gpt-tiger-2-risky	0.328	0.355	0.300	$1.47	57.5%	100.0%	43.438	56.5%
Amanvir/gpt-tiger-2	0.308	0.336	0.280	$1.47	56.2%	100.0%	49.962	55.1%
jaimiec/jaimiec-test-3-gpt-41-06	0.294	0.313	0.275	$1.49	55.0%	100.0%	43.075	54.2%
jaimiec/jaimiec-test-aggressive	0.046	0.057	0.035	$2.02	37.5%	100.0%	26.525	36.9%
Parth-Dua/Sub	0.040	0.073	0.007	$0.04	40.0%	98.8%	50.101	36.8%
spc2best/cosing-1	0.000	0.000	0.000	$1.30	0.0%	0.0%	nan	17.1%
spc2best/cosing-1	0.000	0.000	0.000	$1.30	0.0%	0.0%	nan	17.1%
houyu0930/simple-agent	-0.192	-0.189	-0.195	$0.04	21.2%	100.0%	26.913	20.6%
houyu0930/simple-qb-player	-0.240	-0.239	-0.240	$0.39	17.5%	100.0%	15.012	17.3%
Parth-Dua/Sub4	-0.426	-0.425	-0.427	$0.55	5.0%	100.0%	8.738	5.0%
spc2best/cc	-0.444	-0.444	-0.444	$0.03	3.8%	100.0%	8.975	3.8%
Parth-Dua/Sub2	-0.462	-0.463	-0.462	$1.40	2.5%	100.0%	7.000	2.5%
spc2best/cosing-1	-0.463	-0.462	-0.463	$0.04	2.5%	100.0%	8.300	2.5%
mgor/test-sub	-0.481	-0.481	-0.481	$0.03	1.2%	100.0%	7.175	1.2%

ℹ️ Cost for Bonus pipeline is the cost in USD of executing the pipeline per bonus part. (We have exactly 3 parts per bonus question)

🧐 Bonus Round Leaderboard

🧐 Bonus Round Leaderboard

jaimiec/jaimiec-bonus-test	0.07964999999999998	0.033333333333333326	0.8916666666666667	0.7625	0.8500833333333333	0.39166666666666666


Amanvir/oracle-1	$0.19	0.208	91.2%	76.2%	85.0%	39.2%
Amanvir/simple-two-step	$2.12	0.192	89.2%	67.5%	86.2%	36.2%
Amanvir/naive-agent-1	$1.18	0.183	89.2%	67.5%	87.7%	34.6%
Amanvir/naive-agent-3	$0.13	0.179	88.8%	68.8%	83.6%	36.7%
jaimiec/jaimiec-bonus-test	$1.77	0.175	88.8%	67.5%	83.0%	35.0%
LeoJ-xy/clue-extraction	$2.15	0.171	88.3%	67.5%	80.3%	34.2%
houyu0930/simple-bonus	$0.07	0.058	75.8%	43.8%	72.2%	32.1%
nmokaria/Mini4o_BonusPlus	$0.06	0.037	74.2%	40.0%	71.3%	31.7%
mrshu/simple-two-step	$0.08	0.033	74.2%	40.0%	69.9%	34.2%
Amanvir/naive-agent-2	$1.42	0.000	0.0%	0.0%	100.0%	0.0%
Amanvir/two-step-2	$2.54	0.000	0.0%	0.0%	100.0%	0.0%
houyu0930/default-qb-bonus	$0.04	-0.017	66.7%	28.7%	65.3%	45.0%

🥇 Overall Leaderboard

🥇 Overall Leaderboard

Parth-Dua	single-step-meticulous-gpt-4o	jaimiec-bonus-test	0.033333333333333326	0.6078290991596951	0.033333333333333326	0.7416666666666667	0.39166666666666666


Amanvir	gpt-tiger-tuned	oracle-1	0.816	0.608	0.208	91.2%	39.2%
jaimiec	jaimiec-test-3-gpt-41	jaimiec-bonus-test	0.792	0.617	0.175	88.8%	35.0%
nmokaria	GPT40_Tossup_Titan	Mini4o_BonusPlus	0.706	0.669	0.037	74.2%	31.7%
LeoJ-xy	vote-for-the-answer	clue-extraction	0.675	0.504	0.171	88.3%	34.2%
Parth-Dua	Sub5	-	0.507	0.507	-	-	-
mgor	single-step-meticulous-gpt-4o	-	0.336	0.336	-	-	-
mrshu	-	simple-two-step	0.033	-	0.033	74.2%	34.2%
spc2best	cosing-1	-	0.000	0.000	-	-	-
houyu0930	simple-agent	simple-bonus	-0.134	-0.192	0.058	75.8%	32.1%

🛎️ Tossup Round Leaderboard

🛎️ Tossup Round Leaderboard

mgor/single-step-meticulous-gpt-4o	-0.27789005720312887	-0.17413497317255822	0.46381578947368424	0.037349999999999994	0.21666666666666667	0.8666666666666667	41.016666666666666	0.21772382416992772


nmokaria/GPT40_Tossup_Titan	0.672	0.727	0.616	$0.63	90.0%	100.0%	69.367	77.2%
Amanvir/gpt-sloth-2	0.656	0.783	0.529	$3.53	95.0%	100.0%	70.650	80.8%
Amanvir/gpt-sloth-freq-fix	0.587	0.723	0.450	$3.05	90.0%	100.0%	74.067	76.8%
Amanvir/pair-gpt-claude-1	0.582	0.609	0.556	$2.38	76.7%	100.0%	42.367	72.7%
Amanvir/gpt-sloth-3	0.554	0.617	0.490	$0.33	78.3%	100.0%	63.683	71.6%
LeoJ-xy/vote-for-the-answer	0.530	0.596	0.464	$1.29	81.7%	86.7%	81.135	63.7%
Amanvir/gpt-snail	0.508	0.577	0.439	$3.00	75.0%	100.0%	57.450	69.5%
Amanvir/gpt-sloth	0.504	0.627	0.382	$2.94	90.0%	95.0%	90.316	65.4%
mgor/single-step-meticulous-gpt-4o	0.395	0.410	0.379	$0.79	61.7%	100.0%	41.017	59.8%
houyu0930/simple-agent	-0.173	-0.174	-0.171	$0.04	21.7%	100.0%	33.117	21.8%
houyu0930/simple-qb-player	-0.278	-0.277	-0.279	$0.39	15.0%	100.0%	15.467	14.9%

ℹ️ Cost for Bonus pipeline is the cost in USD of executing the pipeline per bonus part. (We have exactly 3 parts per bonus question)

🧐 Bonus Round Leaderboard

🧐 Bonus Round Leaderboard

houyu0930/default-qb-bonus	0.07964999999999998	0.055555555555555525	0.9611111111111111	0.38333333333333336	0.9321111111111112	0.32222222222222224


Amanvir/two-step-2	$2.54	0.206	96.1%	88.3%	93.2%	32.2%
Amanvir/naive-agent-2	$1.42	0.194	96.1%	88.3%	90.9%	29.4%
Amanvir/naive-agent-1	$1.18	0.178	87.8%	66.7%	86.1%	30.0%
Amanvir/simple-two-step	$2.12	0.172	88.3%	68.3%	86.4%	28.3%
LeoJ-xy/clue-extraction	$2.15	0.167	86.7%	61.7%	78.9%	28.3%
Amanvir/naive-agent-3	$0.13	0.161	92.2%	78.3%	86.9%	29.4%
mrshu/simple-two-step	$0.08	0.061	75.6%	38.3%	70.8%	26.7%
houyu0930/simple-bonus	$0.07	0.056	77.2%	43.3%	73.9%	17.8%
houyu0930/default-qb-bonus	$0.04	0.017	65.0%	21.7%	63.3%	33.9%

🥇 Overall Leaderboard

🥇 Overall Leaderboard

houyu0930	single-step-meticulous-gpt-4o	clue-extraction	-0.11714789594300429	-0.1727034514985598	0.055555555555555525	0.9611111111111111	0.32222222222222224


Amanvir	gpt-sloth-2	two-step-2	0.861	0.656	0.206	96.1%	32.2%
LeoJ-xy	vote-for-the-answer	clue-extraction	0.696	0.530	0.167	86.7%	28.3%
nmokaria	GPT40_Tossup_Titan	-	0.672	0.672	-	-	-
mgor	single-step-meticulous-gpt-4o	-	0.395	0.395	-	-	-
mrshu	-	simple-two-step	0.061	-	0.061	75.6%	26.7%
houyu0930	simple-agent	simple-bonus	-0.117	-0.173	0.056	77.2%	17.8%

🛎️ Tossup Round Leaderboard

🛎️ Tossup Round Leaderboard

mgor/single-step-meticulous-gpt-4o	0.048423520923520924	-0.19047619047619047	0.011111111111111112	0.037349999999999994	0.8	0.8	109.6	0.6457142857142857


Amanvir/gpt-sloth	0.483	0.550	0.416	$2.94	80.0%	100.0%	109.600	64.6%
Amanvir/pair-gpt-claude-1	0.078	0.090	0.067	$2.38	40.0%	100.0%	61.400	39.0%
LeoJ-xy/vote-for-the-answer	0.048	0.086	0.011	$1.29	40.0%	80.0%	98.000	32.4%
mgor/single-step-meticulous-gpt-4o	-0.193	-0.190	-0.196	$0.79	20.0%	100.0%	40.400	20.0%
houyu0930/simple-agent	-0.500	-0.500	-0.500	$0.04	0.0%	100.0%	24.400	0.0%
houyu0930/simple-qb-player	-0.500	-0.500	-0.500	$0.39	0.0%	100.0%	14.800	0.0%

ℹ️ Cost for Bonus pipeline is the cost in USD of executing the pipeline per bonus part. (We have exactly 3 parts per bonus question)

🧐 Bonus Round Leaderboard

🧐 Bonus Round Leaderboard

houyu0930/default-qb-bonus	2.1174999999999997	0.20000000000000007	0.8666666666666667	0.8	0.8466666666666665	0.26666666666666666


Amanvir/simple-two-step	$2.12	0.200	86.7%	80.0%	84.7%	33.3%
Amanvir/naive-agent-1	$1.18	0.133	80.0%	60.0%	79.3%	46.7%
LeoJ-xy/clue-extraction	$2.15	0.133	80.0%	60.0%	73.3%	33.3%
houyu0930/simple-bonus	$0.07	0.067	73.3%	40.0%	70.3%	26.7%
houyu0930/default-qb-bonus	$0.04	-0.133	53.3%	0.0%	51.3%	33.3%

🥇 Overall Leaderboard

🥇 Overall Leaderboard

houyu0930	single-step-meticulous-gpt-4o	simple-two-step	-0.19301587301587303	0.048423520923520924	0.20000000000000007	0.8666666666666667	0.26666666666666666


Amanvir	gpt-sloth	simple-two-step	0.683	0.483	0.200	86.7%	33.3%
LeoJ-xy	vote-for-the-answer	clue-extraction	0.182	0.048	0.133	80.0%	33.3%
mgor	single-step-meticulous-gpt-4o	-	-0.193	-0.193	-	-	-
houyu0930	simple-agent	simple-bonus	-0.433	-0.500	0.067	73.3%	26.7%

QANTA 2025 Leaderboard Metrics Manual

This document explains the metrics displayed on the QANTA 2025 Human-AI Cooperative QA competition leaderboard.

Tossup Round Metrics

Tossup rounds measure an AI system's ability to answer questions as they're being read, in direct competition with human buzz points:

Metric	Description
Submission	The username and model name of the submission (format: `username/model_name`)
Expected Score ⬆️	Average points scored per tossup question, using the point scale: +1 for a correct answer, -0.5 for an incorrect buzz, 0 for no buzz. Scores are computed by simulating real competition against human buzz point data: the model only scores if it buzzes before the human, and is penalized if it buzzes incorrectly before the human.
Buzz Precision	Percentage of correct answers when the model decides to buzz in. Displayed as a percentage (e.g., 65.0%).
Buzz Frequency	Percentage of questions where the model buzzes in. Displayed as a percentage (e.g., 65.0%).
Buzz Position	Average (token) position in the question when the model decides to answer. Lower values indicate earlier buzzing.
Win Rate w/ Humans	Percentage of times the model successfully answers questions when competing with human players before the opponent correctly buzzes.

Bonus Round Metrics

Bonus rounds test an AI system's ability to answer multi-part questions with right explanation to collaborate with another player. The leaderboard measures the model's effect on a simulated Quizbowl player (Here, gpt-4o-mini):

Metric	Description
Submission	The username and model name of the submission (format: `username/model_name`)
Effect	The overall effect of the model's responses on a target Quizbowl player's accuracy. Specifically, this is the difference between the net accuracy of a gpt-4o-mini + model team, and the gpt-4o-mini player alone, as measured on the bonus set. In the team setting, the target model samples the response, confidence and explanation to provide the final guess, while the gpt-4o-mini player uses the model's response, confidence and explanation to provide the final guess.
Question Acc	Percentage of bonus questions where all parts were answered correctly.
Part Acc	Percentage of individual bonus question parts answered correctly across all questions.
Calibration	The calibration of the model's confidence in its answers. Specifically, this is calculated as the average of the absolute difference between the confidence score (between 0 and 1) and the binary correctness score (1 for correct, 0 for incorrect), over the bonus set.
Adoption	The percentage of times the target model adopts the model's guess, confidence and explanation to provide the final guess, as opposed to using its own.

Understanding the Competition

QANTA (Question Answering is Not a Trivial Activity) is a competition for building AI systems that can answer quiz bowl questions. Quiz bowl is a trivia competition format with:

Tossup questions: Paragraph-length clues read in sequence where players can buzz in at any point to answer. The leaderboard simulates real competition by using human buzz point data for scoring.
Bonus questions: Multi-part questions that test depth of knowledge in related areas. The leaderboard measures the effect of models in a team setting with a simulated human (gpt-4o-mini).

The leaderboard tracks how well AI models perform on both question types across different evaluation datasets, using these updated, competition-realistic metrics.

QANTA 2025: Human-AI Cooperative QA Leaderboard

📋 Register here to participate in our Human-AI Cooperative Trivia Competition.

🎲 Create and submit your quizbowl AI agents at our submission site.

👉 Note: Rows in blue with (*) are your submissions past the cutoff date and are only visible to you.

📅 Next Cutoff Date: June 10, 2025

QANTA 2025 Leaderboard Metrics Manual

Tossup Round Metrics

Bonus Round Metrics

Understanding the Competition

👉 Note: Rows in blue with **(*)** are your submissions past the cutoff date and are only visible to you.