A new study demonstrates that fine-tuned BERT models achieve competitive results with large language models for climate claim verification while processing claims 11-380 times faster, offering a more sustainable and practical solution for real-time fact-checking.
Efficient Climate Claim Verification: Fine-Tuned BERT Models Outperform Large Language Models
Efficient Climate Claim Verification: Fine-Tuned BERT Models Outperform Large Language Models
When seconds count in preventing climate misinformation from spreading on social media, the speed of your fact-checking system matters as much as its accuracy. A new study from the ClimateCheck research collaboration shows that fine-tuned BERT models can verify climate claims 11 to 380 times faster than large language models while maintaining competitive accuracy—processing each claim in just 0.032 seconds while consuming far less energy and offering better explainability.
This research, published at the Fifth Workshop on Scholarly Document Processing (SDP 2025), challenges the assumption that bigger models are always better. Led by researchers from TU Berlin, DFKI, and Climate+Tech, the study demonstrates that smaller, specialized models offer a more sustainable, transparent, and practical path forward for real-time fact-checking.
The Approach
The team built a two-stage system: first retrieving relevant scientific abstracts from a corpus of 394,000 climate science publications, then verifying claims against those abstracts. Their hybrid retrieval pipeline combined BM25 sparse retrieval with fine-tuned dense embeddings and neural reranking to find the most relevant evidence.
For the verification stage, they fine-tuned DeBERTa-v3-large models—starting with models already trained on natural language inference tasks, then specializing them on climate-specific claim-abstract pairs from the ClimateCheck dataset. The key innovation was using class-wise accuracy optimization to handle the imbalanced nature of fact-checking data, where “not enough information” cases are common.
The Results
The fine-tuned BERT model achieved an F1 score of 0.683 and processed claims in 0.032 seconds—competitive accuracy with dramatically better speed. Compared against state-of-the-art LLMs, it outperformed Phi 4 14B across all metrics while running 23x faster. Even Qwen3 14B in reasoning mode, which scored slightly higher at 0.716 F1, took 12.2 seconds per claim—382 times slower.
The speed difference isn’t marginal. Statistical testing on 1,760 claims confirmed the BERT model’s superiority with overwhelming significance (p < 0.001, Cohen’s d = 2.49). For real-time fact-checking where every second counts, this matters immensely.
Why This Matters
The speed advantage is just one part of the story. At 0.032 seconds per claim, the BERT model can verify thousands of social media posts per minute—fast enough to catch misinformation before it gains traction. But smaller models also bring two often-overlooked benefits:
Energy efficiency: The model’s 355M parameters consume far less energy than 14B-parameter LLMs, reducing both operational costs and carbon footprint. When you’re processing millions of claims, this difference compounds dramatically—especially critical for climate fact-checking where the tool itself shouldn’t contribute to the problem.
Explainability: BERT’s simpler architecture makes it easier to understand why the model made a particular decision. For fact-checking applications where transparency matters—journalists verifying claims, researchers auditing decisions—being able to trace the model’s reasoning is crucial. LLMs’ billions of parameters make them black boxes by comparison.
This doesn’t mean LLMs have no place in fact-checking. They may generalize better to unseen domains, and few-shot learning could improve their performance. But for organizations building practical systems where speed, energy use, transparency, and cost all matter, fine-tuned BERT models offer a compelling alternative.
The Team
This research emerged from the ClimateCheck collaboration, bringing together Max Upravitelev, Nicolau Duran-Silva, Christian Woerle, Giuseppe Guarino, Salar Mohtaj, Jing Yang, Veronika Solopova, and Vera Schmitt from TU Berlin, DFKI, Climate+Tech, SIRIS Lab, Data for good, BIFOLD, and CERTAIN.
Access
The paper is published in the ACL Anthology (DOI: 10.18653/v1/2025.sdp-1.26). All code and models are open source: GitHub, retrieval model, and verification model on HuggingFace.
Related Resources: