Overton Blog

App update: improving how we track policy progress towards achieving SDGs

We've updated our Sustainable Development Goals (or SDGs) filter on Overton, to make it more precise and help our users better understand the impact they're having in priority areas. In this blog we explore the importance of the SDGs and how we’ve improved the filter in the app.  

What are the Sustainable Development Goals?

The SDGs are a set of 17 interconnected objectives adopted by all United Nations member states as part of the 2030 Agenda for Sustainable Development. Each of them are accompanied by 169 targets which provide specific pathways to achieving the overall goal of a fairer and more sustainable world. They’re meant to help guide governments and organisations address these issues in a coordinated and integrated way.

Some are relatively focused, if ambitious - on achieving gender equality (SDG 5) or eliminating hunger (SDG 2) for example - while others are broader, like promoting inclusive and sustainable economic growth (SDG 8).

How are they represented in the Overton app?

Since the development of these goals, policy makers and researchers have become increasingly focused on developing work that helps to address these pressing shared concerns.

To track the progress towards achieving these goals, it’s helpful to be able to see what policy action is being taken on each front. To this end, the Overton app includes a filter which allows users to refine their results by SDG category, so they can see policy documents related to each specific goal.

Overton does things slightly differently to other bibliometric databases that use the SDGs. Typically those other databases will link SDGs to research papers, which helps researchers understand the societal value or (potential) impact of their work. However, the Overton app links SDGs to the policy documents actually using the research rather than the research paper itself.

Our view is that the SDGs are a policy initiative rather than a research one. The goals relate to the application of the research rather than the research itself - a research paper’s SDG classification isn't really relevant until something is being done with it. Take, for example, a new economic security policy initiative from a developing nation. This might cite an economics paper about microfinancing that may not itself have any reference to women or gender, but still proves invaluable in advancing SDG 5: Gender Equality (microfinance services are often targeted at women in developing nations who are often disadvantaged in access to credit and other financial services). If we were categorising based on research alone, this paper probably wouldn’t be linked, but its application in policy does advance the goal.

How we’ve improved the categorisation

While we have had SDG tags in the app for some time, we weren't satisfied with the accuracy of this filter and were concerned that things were being missed.

There are different ways to approach the task of linking policy documents to relevant Sustainable Development Goals. One of them is keyword matching, in which the system looks for certain terms that relate to the various SDG categories. Though easy to understand, it can be challenging to put together a broad enough set of keywords to adequately describe each SDG in a precise but also complete enough way.

Another, more sophisticated approach is to use machine learning to assign SDGs. This is accomplished by first creating a training set of documents that the system can learn from. By showing it a thousand text snippets related to SDG 3, and then another thousand related to SDG 4 etc, over time it can learn to identify new documents with a high level of accuracy.

When Overton was first launched we opted for the machine learning approach but rapidly hit a wall when it came to creating a large enough training set… it took us weeks to put together data for just one SDG, so we quickly switched to a keyword based system.

A recent game-changer, however, has been the release of a free, open-source training set containing thousands of text examples for the 16 main SDGs (goal #17 "Partnerships for the Goals" is a bit more meta and not included in the set), produced over the last two years by a group called OSDG. You can read the project team's two papers about the project on arXiv here and here.

The OSDG project is a large-scale collaboration by hundreds of citizen scientists. A huge group of volunteers from UN agencies and universities across the world came together to analyse publicly available texts like policy documents and research papers in order to assess their relevance to SDGs.

The scale of this exercise - coupled with their systematic approach, in which several people analysed each text excerpt before agreeing its classification - makes it particularly trustworthy. 

We’ve now taken the OSDG training sets and have built a new SDG classifier for policy documents, and our system now has a much higher level of both precision and recall - the app picks up as many indications as possible while also reducing the likelihood of misclassification.

However, even this method isn’t perfect. Our current approach is to err on the side of precision - we would rather miss things than tell you something incorrect. For this reason, the classifier doesn’t ever assign SDGs 9 (“Industry, Innovation and Infrastructure”), 12 (“Responsible Consumption and Production”) and 17 (“Partnerships for the Goals”) - they’re very broad topics so we were seeing a lot of false positives, and we decided that the accuracy of the classifier on them just wasn’t high enough.

We’re still working on the classifier and experimenting with ways to improve precision and recall, so watch this space for further updates!


Learn more about our SDG classification update here, and check out the refreshed categories out yourself in our app.

What is Overton

We help universities, think tanks and publishers understand the reach and influence of their research.

The Overton platform contains is the world’s largest searchable policy database, with almost 5 million documents from 29k organisations.

We track everything from white papers to think tank policy briefs to national clinical guidelines, and automatically find the references to scholarly research, academics and other outputs.