Meta-analysis ≠ Data aggregation

Meta-analyses are becoming more popular than ever!

They play a vital role in producing guideline recommendations. However, data aggregation is mistaken for meta-analysis. A tool where you enter ‘rubbish’, and therefore have ‘rubbish’ output. I am assessing and reading dozens of Cochrane reviews for a project I am working on. While doing this, I realised that many of these reviews are officially NOT meta-analyses. Yet they are perceived as such because they use the tool and are used to produce evidence-based guidelines.

What is a meta-analysis?
A meta-analysis is a statistical method to estimate an effect size from collected empirical studies. Like RCTs, statistical power is essential in producing a proper meta-analysis with all the sub-analyses such as p-curve, funnel plot, trim-and-fill, and checking for moderators.

What is the difference between meta-analysis and data aggregation?

A meta-analysis is a statistical technique used to combine the results of multiple studies to obtain a more precise estimate of the treatment effect, such as OR, RR, Cohen’s d or r. Data aggregation is the process of combining data from multiple sources into a single dataset. Data aggregation can refer to any process of bringing data together, while meta-analysis is a specific statistical technique used to combine data from multiple studies. Meta-analysis tools are often used for data aggregation purposes. I'm afraid it's methodologically weak to pool studies when the number of included individual studies (K) is K<10.

It is recommended to have at least 8 to 10 studies in a meta-analysis to ensure that the results are statistically robust and generalisable. This could be a lower number if the N in the included studies is high (individual study with high power), there is a high effect size (Hedges g or Cohens d) and low heterogeneity. Still, you may find studies that ‘pool’ several ‘similar’ studies, most often low-powered studies, small effect size, high heterogeneity, and small K. An example of the latter are the meta-analyses conducted by Cochrane Reviews. Cochrane aggregates data even for K=1 and K=2 which are reported as meta-analyses. Cochrane, and others, add data up (aggregation) and ask the meta-analysis tool to calculate a new effect size. The latter is considered a new effect size without correcting and considering heterogeneity and sampling errors.

Here I present two examples of data aggregation termed by the authors as a meta-analysis which can have massive implications on practices which supposed to be evidence-based.

Example 1: WHO intrapartum guideline with a strong recommendation on using chlorhexidine for routine vaginal cleansing during labour to prevent infectious morbidities.

Link to the guideline. Link to the evidence table (table 2). Link to the Cochrane Review used as evidence.

Example 2: the concept of the Dutch dentistry guideline (DDG) on antibiotic use (the final version still needs to be published). Link to the concept guideline.

Both the WHO and the DDG supported their recommendations on aggregated data and NOT meta-analysis, even though the statistical tool of the meta-analysis was used.

The WHO guideline

The WHO guideline, based on a Cochrane review, aggregated data from less than <3 RCTs and sometimes even 1 RCT.

Please stop pooling 1 RCT. Instead, report the original data separately and not in a forest plot.

Yet these ‘pooled’ effect sizes are used to draft a new recommendation. The power is too low, the heterogeneity is too high, and the 95%CI is too large, indicating that the studies are imprecise and do not come close to the actual effect. No trim-and-fill analysis or other publication bias analyses were conducted to understand the provided estimates.

Would you base your interventions on low-powered RCTs?
No?
Then why would you for low-powered meta-analyses?

What could the WHO have done instead?
They could’ve aggregated the data without pooling or providing extensive analysis on the low-powered studies. Then they could argue that more evidence is needed to support intervention and provide nuance around the aggregated data and available evidence.

The Dutch Dentistry Guideline
An example analysis from the DDG pooled is the analysis reported in figure 3, page 30 of the draft guideline. Here the group pooled 2 RCTs (which are incomparable; both studies have different populations and intervention groups) and drafted a treatment recommendation on the use of amoxicillin on sinus augmentation (implantology). This guideline states that the dentist should be careful in administering amoxicillin for a sinus lift since ‘no effect’ was found. Such a statement cannot be made because of the N=10 per study arm, K=2 in a meta-analysis wherefrom the RCTs are incomparable.

Example of a wrong meta-analysis from the Dutch dentistry guideline on using antibiotics in oral medicine. The RCTs of Lindeboom and Momand are incomparable and very low-powered, therefore insufficient to produce a reliable effect size or draft treatment recommendations.

What could the DDG have done instead?
In the case of amoxicillin use in sinus lift for implantologist, they could have mentioned that no recommendations could be made due to the lack of data, low power and high risk of bias and inconsistency (high 95%CI). Instead, indirect evidence or reasoning from the science behind biofilm infections and oral microbiology knowledge could have been communicated and put in the perspective of possible antimicrobial resistance and treatment success.

In conclusion
Taking 2 or 3 RCTs similar in PICO elements to pool their estimates and produce a new one is NOT the right way for a meta-analysis. Even if you have only 2 RCTs and wish to pool them somehow, a Bayesian method is a better method, and still, the new effect size must be interpreted with much caution.

Policymakers, guideline developers and researchers in public health (oral) medicine often collect similar data and pool them. Yet many confuse data aggregation with meta-analyses and base their recommendations on very low-powered and methodologically poor analyses. The consequences are that recommendations from guidelines are not really evidence-based (K=2 is insufficient for a meta-analysis) and poor powered, yet they are disseminated as such. Imagine the public health effects when clinicians implement an intervention that is insufficiently justified with insufficient evidence.

Previous
Previous

Multilingual communications during peak of the COVID-19 pandemic

Next
Next

Welke taal spreken Marokkaans-Nederlandse jongeren?