Georgetown University, McCourt School of Public Policy
Coppock & McGrath (2026) launched a replication competition for survey experiments published in political science journals, and I wanted to find a survey experiment paper to join the team
I had some papers in mind, but not sold on anything.
My goal was to do a broad search on what could be an interesting paper to work on this task. But… I was severely time-constrained
Built a workflow in CC to:
I (CC) wrote a ~900-line instructions.md with a process for this task:
Your task (in phases):
0. Build a deep understanding of the RFP call
1. Build a deep understanding of my research
pipeline and my-coauthor on this project
2. Search for high-impact survey experiments published
across eight leading political science journals (2010-2025)
3. Evaluate papers as candidates for a replication exercise
4. Run a multi-agent tournament between the top papers
## IMPORTANT: Stop-and-Check Points
At each of these points, you must:
1. Summarize what you have completed
2. Present key outputs for review
3. List any issues or concerns
4. Wait for human approval before proceeding
Do not proceed past a checkpoint without explicit approval.
Before searching, CC read two things:
The RFP (rfp.pdf): Coppock & McGrath (2026) competition rules — what counts as a valid replication, length constraints, submission format
My research pipeline: CV, paper statuses, and co-author profile (Nejla Asimovic — intergroup relations, polarization) — to ground the researcher fit criterion before any paper was seen
Stop-and-check: Claude summarized its understanding of the RFP constraints and the team’s research areas before proceeding to search.
A systematic journal-by-journal search across 8 leading journals (APSR, AJPS, JoP, JEPS, Political Behavior, Political Communication, BJPS, Political Psychology), 2010–2025.
candidate_papers.csv — ~299 papers, then hard-filtered to 285 eligibleBefore scoring, I used GPT Deep Research to run live web searches on every candidate paper.
gpt-5-search-api evaluated each of the 285 papers using real-time web retrieval with deep-researchOutput: 285 individual .md files — these files, not training data, fed the scoring step. This is already 285 lit reviews, for “free”.
Each paper scored on 8 criteria using the deep evaluation files as input:
| Criterion | Weight | What it captures |
|---|---|---|
| S1 Theoretical importance | 20% | Does a null replication reshape a debate? |
| S2 Design simplicity | 10% | <5 min, 2-arm, few outcomes |
| S3 Replication feasibility | 15% | Can stimuli transfer to a US sample? |
| S4 Data availability | 10% | Public data + code? |
| S5 Researcher fit | 15% | Social media, misinformation, polarization |
| S6 Impact & visibility | 10% | Top journal, citations, active debate |
| S7 Low statistical power | 10% | Underpowered originals are worth verifying |
| S8 Large effect sizes | 10% | Effects >0.3 SD worth checking |
Top 18 by weighted total → tournament shortlist.
Top 18 papers compete in a two-stage multi-agent debate:
The instruction file evolved through dialogue:
“Even though the RFP talks about replication + extension, I want your focus to be only on the replication” — scoping
Use deep-research from openAI via their API to summarize all the papers — reduces hallucination
“Add as criteria to the soft scoring: a) experiment ran with low statistical power; b) effect sizes are larger than 0.3 sd” — iterative rubric refinement
“Save for me two instructions.md files, one with the multi model competition, and one without it” — multi-trial design
The prompt was ~900 lines:
Checkpoint architecture works: mandatory stops kept Claude on track across a very long task
Multi-model tournament: different models have different evaluation biases, running many with CC + Plugins is relatively easy.
Literature Reviews via web search sometimes hallucinated paper details: deep research or better yet pdfs
Takeaway: not a replacement, but it broadened the perspective on options for me to work on + opens up more diversity in the research pipeline
Tiago Ventura (Georgetown University)