We are launching a new series on research policy to sit alongside the existing ones (data, AI, metrics, and thoughts). This first piece is a reflection on research assessment, shaped by a UK perspective given my long-standing involvement in evaluation. We have more policy bites with a global focus on the way.
Key takeaways:
Funders themselves need scrutiny if we want fair, unbiased decisions and a healthy research ecosystem.
Transparency, accountability, and an openness to experiment, like we have seen with the Research on Research Institute, are essential to improve how we assess and support the best ideas in science.
If you’ve spent any time in or around academia you will know that research is not only about breakthroughs in the lab or evidence from the library. I spent half my career inside the engine room of UK research assessment, It is clear that research is about measurement and increasingly about evaluation, a process that sets the bar higher than just measuring outcomes. We want to recognise high-quality research, distribute funding responsibly, and ensure that all fields, from astrophysics to comparative literature, get the attention they deserve. So we build systems, e.g. peer review, grant panels, national research assessments, strategic research reviews, and research excellence frameworks, to pick out the most promising proposals and the most impactful studies by careful evaluation that goes beyond simple metrics.
But an unsettling question follows: who evaluates the evaluators themselves? Are these assessment mechanisms as fair, robust, and forward-thinking as they need to be? How do we ensure the people and frameworks distributing funding and guiding scientific priorities are accountable, unbiased, and evolving with the times?
In this post, we will critically examine these questions, focusing on how research funders themselves are assessed or might begin to be. We will explore how peer review, research excellence frameworks, and strategic research reviews emerged.
And we will highlight new endeavours such as the Research on Research Institute (RoRI), launched in 2019 and led by Professor James Wilsdon. RoRI aims to bring more evidence and experimentation into the way we evaluate research. The goal is to spark conversation to champion more transparent, fair, and innovative assessment practices that benefit researchers, institutions, and society alike.
The Rise of Research Assessment
From Informal Judgment to Institutional Frameworks
Historically speaking, research evaluation was a relatively informal affair. Since World War 2, peer review got more formalised and formal peer review has been practised for a long time when it comes to published outputs and to distribute project-based resources (grants). A scholar would publish a paper, colleagues would read it, and the unofficial peer review process continued in the corridors of conferences and behind journal doors. Over time, funding bodies and governments recognised the need for systematic approaches to assess research quality across institutions. In the UK, this led to the development of national evaluation exercises, first the Research Assessment Exercise (RAE), and later the Research Excellence Framework (REF), to inform the allocation of block funding and to provide accountability for public investment in research. It is worth mentioning here, too, that the scope of criteria of peer review has grown substantially over time. We are no longer just interested in 'excellent' science, but in its relevance, impact, openness, integrity,...
Why the shift? The stakes for allocating resources are high. Government agencies allocate billions of dollars into research each year, while private foundations steward endowments that can open or close critical opportunities for researchers. The scale and complexity of research have also increased, making non-systematic evaluations insufficient. In response, the tools used to evaluate research have also evolved: Peer review, bibliometrics, impact case studies, and assessments of the research environment. Each approach attempts to gauge “quality” and “impact,” though none does so without limitations.
Yet, as these evaluations themselves grow in influence, a critical question emerges: who evaluates the evaluators?
Can we assess peer review, research frameworks, or metrics-based systems using the very tools they impose on others—or do we need different standards for that task?
Peer Review: Still the Gold Standard?
The Merits of Peer Review
Often associated with academic publishing, peer review has long been heralded as the bedrock of scientific rigour. At its best, it offers expertise, constructive criticism, and a degree of democratic oversight over unsubstantiated claims. Whether deciding what gets published or what gets funded, peer reviewers bring specialised knowledge of the field’s intricacies, theoretical frameworks, and experimental methods. Ideally, peer review ensures that only research that meets high quality standards receives public investment..
The Flaws of Peer Review
However, peer review is not without its flaws and critics, already the topic of a recent post by my colleague Hélène here at research musings. It can be slow, inconsistent, and subject to both conscious and unconscious bias. Reviewers, often balancing research, teaching, and administrative duties, may lack the time to fully engage into every manuscript or funding proposal. Funding panels, pressed for time, might rely on surface metrics or the reputations of certain institutions. Interdisciplinary research, often where the major breakthroughs happen, can struggle in traditional, discipline-focused peer review pools.
There are actually a large number of different process changes that have been tried to variously deal with the problems incurred by Peer Review and a recent ‘Review of Peer Review’ by Kolarz et al. (2023) looked at 38 interventions to grant funding, including open review and partial randomisation.
Calls for more transparent peer review processes are on the rise, with some journals experimenting with open or post-publication peer review. Yet, in the broader funding context, the question remains: are funders systematically evaluating whether their own peer review mechanisms are up to the task? And how can we measure or verify the “quality” of that evaluation itself?
In response to concerns about bias and the limits of expert judgment, some funders have begun to trial randomisation in grant selection—using lotteries to allocate funding among equally ranked proposals. This approach, adopted by schemes in New Zealand and explored by others internationally (e.g. partial randomisation experiments) aims to mitigate unconscious bias and promote fairness in highly competitive funding environments.
Enter the Research on Research Institute (RoRI): Shining a Light on Evaluation
One promising response to these challenges comes from initiatives like the Research on Research Institute (RoRI), led by Professor James Wilsdon (UCL) and a global consortium of funders and institutions. Digital Science, my employer, was part of the RoRI founding team in 2019 and has been a core partner ever since. RoRI applies the same rigour and evidence-based mindset used in science investigations to the domain of metascience research evaluation. RoRI’s many activities in the metascience field include systematically studying how different assessment strategies work in practice, RoRI aims to identify biases, inefficiencies, and examples of best practices.
RoRI drives evidence-based reforms by partnering with funders to run pilot studies and collect data, testing interventions such as double-blind peer review or examining whether commonly used metrics align with long-term research impact. By sharing and analysing data from real funding calls and comparing different evaluation strategies, RoRI builds an empirical foundation for improving assessment practices. In essence, it offers the kind of meta-evaluation needed to address the fundamental question: Who evaluates the evaluators?
Charting a Path Forward
1. Transparency and Accountability
If there is a single theme uniting these concerns around research evaluation, it is transparency. The more open we are about how decisions are made, and the more we share data on outcomes, the easier it is to identify biases, detect gaming, and build trust. Funders can commit to publishing reviewer guidelines, success rates, demographic breakdowns, and rationales for major decisions. Such openness invites constructive scrutiny from the broader research community, journalists, and watchdog groups alike, and has been repeatedly emphasised in reviews of research assessment systems; see for instance The Metric Tide (Wilsdon et al., 2015)
2. Training and Support for Evaluators
Evaluating research, especially across multiple disciplines, is a specialised skill. Yet many reviewers, grant panel members, and research administrators receive minimal training. Introducing bias-awareness workshops, mentorship for newer reviewers, and resources to handle interdisciplinary proposals can make a significant difference - though unconscious bias training is still under scrutiny, with mixed evidence about its effectiveness. That said, basic awareness remains an important starting point, and there are examples of research funders and institutions attempt more holistic approaches (in the past NIH and Wellcome Trust).
Funding agencies themselves benefit from professional development around emerging metrics, best practices in peer review, and frameworks for identifying truly innovative projects. The Hong Kong Principles highlights the importance of recognising and supporting responsible research practices including appropriate training and development for those involved in assessment (Moher et al., 2020).
3. A Balanced Toolkit
No single method (peer review, bibliometrics, or other combined forms such as strategic reviews) can capture the full scope of research excellence (NB: the notion and definition of excellence is constantly shifting and may vary depending on the context). Moving forward, funders and institutions can adopt mixed-method approaches that blend quantitative data with qualitative expertise. Peer review panels might consider metrics but weigh them appropriately. External audits or independent evaluations of large-scale frameworks like REF can provide a reality check on whether current tools truly incentivise the best science. As emphasised in the Leiden Manifesto, quantitative evaluation should support—not supplant—expert qualitative assessment (Hicks et al., 2015). However, although many of these principles date back more than a decade, we now have far richer data on open science and research integrity, highlighting that each context—and each form of excellence—may require tailored evaluation strategies in which metrics and peer review each play different, carefully calibrated roles.
4. Continuous Experimentation and Feedback
Finally, adopting a culture of experimentation is key. Organisations like RoRI show how funders and institutions can actively test new strategies such as novel peer review models, algorithmic aids, or more nuanced impact measures. Coupled with feedback mechanisms that allow researchers to respond and reflect, this cycle of experimentation and refinement keeps evaluation practices dynamic. Over time, we can converge on approaches that minimize bias, reward excellence, and respect diverse disciplinary norms. As Whitley and Gläser (2007) in The Changing Governance of the Sciences argue, evaluation systems must co-evolve with the changing governance and organisation of science itself, reinforcing the need for feedback-oriented approaches.
Funders who adopt regular audits, open feedback loops, and robust training programs are better positioned to remain future-facing.
Conclusion: Toward a More Reflective Future
The question, “Who evaluates the evaluators?” might sound rhetorical or provocative, but it points to a genuine need for meta-assessment in research funding. Resources tighten and the expectations on science grow constantly to help solve social, economic and environmental challenges. Hence, interdisciplinary science grows in importance, and public trust in expertise is challenged, we can no longer assume that established methods are beyond scrutiny. Self-reflection, transparency, and a willingness to adapt are essential if we are to nurture the next generation of transformative research.
This imperative is even more urgent with the growing use of AI in research assessment. From algorithmic reviewer suggestions to grant triage systems and citation-based influence scores, automated tools are already shaping evaluative decisions. Yet these systems are not neutral: they reflect the biases and blind spots embedded in their training data and design. For instance, machine-learning-based metrics may reinforce past patterns of funding concentration or undervalue underrepresented disciplines. Under the auspices of Research England, the UK's four main public research funders have already tested how AI might support peer review in future iterations of the REF, but early results were mixed (Chawla, 2022). More recently, discussions have shifted toward the potential use of generative AI to relieve the burden of REF assessments altogether, with some proposing that large language models could assist—or even undertake—parts of the review process (Watermeyer and Phipps, 2025). Similarly, publishers are already deploying AI tools to assist with statistical checks, summarisation, and reviewer selection—uses that improve efficiency but introduce new concerns about reliability and editorial responsibility (Naddaf, 2025).
Funders wield immense influence over what gets studied, how it gets studied, and who leads the charge. By turning the evaluative lens on themselves—through partnerships with initiatives like RoRI, active engagement with the academic community, and a commitment to openness—they can help ensure that research assessment remains robust, equitable, and future-oriented. In doing so, they don’t just answer the question of who evaluates the evaluators; they demonstrate that their own practices can meet the standards they expect of others.
References
Chawla, D. S. (2022). Should AI have a role in assessing research quality? Nature, 610(7930), 610–612. https://www.nature.com/articles/d41586-022-03294-3
Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). The Leiden Manifesto for research metrics. Nature, 520(7548), 429–431. https://doi.org/10.1038/520429a
Kolarz, P., Vingre, A., Vinnik, A., Neto, A., … & Sutinen, L. (2023). Review of Peer Review - Final Report. UKRI. https://www.ukri.org/wp-content/uploads/2023/07/UKRI-060723-Review-of-peer-review-Final-report-revs-v2.pdf
Moher, D., Bouter, L., Kleinert, S., Glasziou, P., Sham, M.H., Barbour, V., et al. (2020). The Hong Kong Principles for assessing researchers: Fostering research integrity. PLoS Biol 18(7): e3000737. https://doi.org/10.1371/journal.pbio.3000737
Naddaf, M. (2022). Peer reviewers agree with AI on research quality — but not enough to trust it. Nature, 614, 16–17. https://www.nature.com/articles/d41586-022-04493-8
Watermeyer, R., & Phipps, A. (2025, January 28). Using GenAI for the REF is a no-brainer. Impact of Social Sciences Blog. London School of Economics and Political Science. https://blogs.lse.ac.uk/impactofsocialsciences/2025/01/28/using-genai-for-the-ref-is-a-no-brainer/
Whitley, R., & Gläser, J. (2007). The Changing Governance of the Sciences: The Advent of Research Evaluation Systems. Springer. https://link.springer.com/book/10.1007/978-1-4020-6746-4
Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S., Hill, S., ... & Johnson, B. (2015). The Metric Tide: Report of the Independent Review of the Role of Metrics in Research Assessment and Management. HEFCE. https://responsiblemetrics.org/the-metric-tide/