Home
Softono
CWTS-OpenAlex-databases

CWTS-OpenAlex-databases

Open source MIT TSQL
21
Stars
1
Forks
1
Issues
2
Watchers
7 months
Last Commit

About CWTS-OpenAlex-databases

CWTS OpenAlex ETL data pipeline.

Platforms

Web Self-hosted

Languages

TSQL

Links

CWTS OpenAlex databases

This repository contains the source code used by CWTS (Centre for Science and Technology Studies, Leiden University) to extract, transform, and load (ETL) data from OpenAlex into a Microsoft SQL Server database system.

The source code produces five Microsoft SQL Server databases:

(1) Database containing data from OpenAlex in a relational format.

(2) Database containing titles and abstracts of publications.

(3) Database containing data on core publications.

(4) Database containing a classification of publications into research areas.

(5) Database containing stored procedures for indicator calculations.

See this blog post for more information about databases (3), (4), and (5).

This repository makes use of the CWTS ETL tooling repository, the publicationclassification repository, and the publicationclassificationlabeling repository.

Database structure and diagram

Database (1), containing data from OpenAlex in a relational format, is organized into multiple interrelated tables representing key OpenAlex entities such as works, authors, institutions, and sources, along with their relationships.

The diagram below presents the structure of this relational database:

Database diagram

Availability in Google BigQuery

The ETL process also includes functionality to make the extracted and transformed data available in Google BigQuery.

In addition to the Microsoft SQL Server environment, databases (1), (3), and (4) are publicly available in the Google BigQuery environment of CWTS, enabling cloud-based querying and large-scale analysis.