Overton Blog

Characteristics of Overton's data

Overton's database of policy document is extensive – at the time of writing, we have 6,069,397 policy documents from 30,644 policy organizations – and we're adding more all the time. On top of making their full text searchable we categorize and analyse each one, storing their metadata (which organisation published them, what topics they cover, what year they were published in, which academics do they mention by name, what scholarly papers or other policy documents are they citing etc) for later use.

We've always shared our data with external researchers and we've recently been exploring Overton's potential use as a bibliometric data source. A key step towards that is understanding more about the data, what it does and doesn't include, and characterizing what is there. To that end we've been working with Martin Szomszor of Electric Data Solutions who analysed the characteristics of the Overton database in this recent preprint on arXiv.


Geographical coverage

We'd originally set out to try and avoid geographical bias as far as possible when building Overton - we track policy sources from over 180 different countries and the system was built to be broadly language agnostic.

That said in practice we're limited by availability: we only capture policy documents made available online. This means there's a skew in the number of policy documents in Overton towards knowledge economies and other countries with an strong online government presence.




This skew is reflected in the regional breakdown, where Northern America and Europe account for around a third of all policy documents in Overton, followed in descending order by Asia, IGOs (intergovernmental organizations, like the UN or World Health Organization), Latin America & the Caribbean, Oceania and Africa.





English is the most common language used in policy documents indexed by Overton. That's not surprising: English is a primary language (though not necessarily the only language!) used for government business in the UK, US, Australia and Canada, as well as other Commonwealth countries like India, and many other countries  publish at least some of their documents in English if they're aimed at an international audience.

The next most common languages that appear in Overton's policy documents are French, Spanish, German, Japanese and Chinese.


Source types

We collect documents from think tanks, policy orientated NGOs, IGOs and other organizations - however, the majority of documents in the database are from government departments and agencies, at a mix of national, state and city levels.

 References to scholarly work from policy documents

We automatically extract any references to scholarly outputs and researchers from policy documents that we index in Overton. Approximately 10% of policy documents reference at least one scholarly paper; the others may only reference other policy documents, news stories and/or datasets or may not contain any references at all.

In general government documents are the least likely to contain scholarly references, with think tank documents having (very approximately) twice as many on average and IGO documents many times more. The "other" category in the graph below includes aggregators, including one of our sources of clinical guidelines. These make heavy use of scholarly citations, which is why this category has a higher average citation count.


You can read more about Overton's data, including how long it typically takes for research to be cited in policy and where it may and may not be useful in more quantitative analyses in Martin Szomszor's preprint "Overton: A bibliometric database of policy document citations" on arXiv.

If you'd like to know more about Overton's policy document database and how we work with institutions, funders, publishers and think tanks, we'd love to show you round – request a trial here.

What is Overton

We help universities, think tanks and publishers understand the reach and influence of their research.

The Overton platform contains is the world’s largest searchable policy database, with almost 5 million documents from 29k organisations.

We track everything from white papers to think tank policy briefs to national clinical guidelines, and automatically find the references to scholarly research, academics and other outputs.