Data Quality Library (dqLib): An R Package for Assessing and Reporting Data Quality in Clinical Research and Care

1. Description

The data quality library (dqLib) is an R package for data quality (DQ) assessment. The library provides generic methods for calculating DQ metrics and generating reports on detected DQ issues, especially in clinical research and healthcare settings. This package also provides specific functions for reporting on DQ issues that may arise in the context of Rare Diseases(RDs) and common diseases like cardiovascular diseases (CVDs). The current version enables the detection and visualization of plausibility issues based on predefined mathematical and logical rules. To enhance usability, this release allows for the specification of DQ rules using spreadsheets. Further details on the developed functions are given in the news.

2. Installation

You can install dqLib directly from github by running the following command:

devtools::install_github("https://github.com/KaisTahar/dqLib")

To install dqLib, you can also clone the code repository of the desired version or download it, and then run the following command from the local folder:

devtools::install_local("./dqLib")

3. DQ Metrics and Reports

dqLib provides multiple metrics to analyze different aspects of DQ. The implemented functions enable users to select desired dimensions and indicators as well as to define and generate customized DQ reports. The following generic DQ Indicators are already implemented:

DQ Indicator		DQ Dimension
Abbreviation	Name	DQ Dimension
dqi_co_icr	Item Completeness Rate	completeness
dqi_co_vcr	Value Completeness Rate
dqi_co_scr	Subject Completeness Rate
dqi_pl_rpr	Range Plausibility Rate	Plausibility
dqi_pl_spr	Semantic Plausibility Rate	Plausibility

In addition to indicators, the DQ reports include the resulting parameters and adequate information to identify potential DQ issues. The dqLib package enables users to specify DQ rules using spreadsheets and to detect DQ issues based on the predefined rules, as described in the news. dqLib provides functions to detect the following common DQ issues:

Abbreviation	DQ Parameter	Description
im_misg	missing mandatory data items	number of missing mandatory data items
vm_misg	missing mandatory data values	number of missing mandatory data values
s_inc	incomplete subjects	number of incomplete subject records
vo	outlier values	number of detected outlier values
vc	contradictory values	number of detected contradictory data values

dqLib also provides functions to assess the following specific indicators for RD data:

DQ Indicator		DQ Dimension
Abbreviation	Name	DQ Dimension
dqi_un_cur	RD Case unambiguity Rate	Uniqueness
dqi_un_cdr	RD Case Dissimilarity Rate	Uniqueness
dqi_co_icr	Orphacoding Completeness Rate	Completeness
dqi_pl_opr	Orphacoding Plausibility Rate	Plausibility
dqi_cc_rvl	Concordance with Reference Values from Literature	Concordance

Moreover, dqLib enables annual assessments of selected DQ parameters. The following RD-specific metrics are already implemented:

Abbreviation	DQ Parameter	Description
rdCase	RD cases	number of RD cases
orphaCase	Orpha cases	number of available orpha-coded cases
tracerCase	tracer cases	number of tracer cases
rdCase_rel	RD cases rel. frequency	relative frequency of RD cases
orphaCase_rel	Orpha cases rel. frequency	relative frequency of Orpha cases normalized to 100.000 inpatient cases
tracerCase_rel	tracer cases rel. frequency	relative frequency of tracer cases normalized to 100.000 inpatient cases
tracerCase_rel_min	minimal tracer cases in reference values	min. rel. frequency of tracer cases normalized to 100.000 inpatient cases found in the literature
tracerCase_rel_max	maximal tracer cases in reference values	max. rel. frequency of tracer cases normalized to 100.000 inpatient cases found in the literature
vm_case_misg	missing mandatory data values in case module	number of missing mandatory data values in the case module
rdCase_amb	ambiguous RD cases	number of ambiguous RD cases
rdCase_dup	duplicated RD cases	number of duplicated RD cases
oc_misg	missing Orphacodes	number of missing Orphacodes by tracer diagnoses
link_ip	implausible links	number of implausible ICD-10-GM/OC links

The following references are required to assess the quality of RD documentation: (1) Current Version of Alpha-ID-SE Terminology [1] and (2) a reference for tracer diagnoses such as the list provided in [2].

[1] BfArM - Alpha-ID-SE [Internet]. [cited 2022 May 23]. Available from: BfArM

[2] Tahar et al. Rare Diseases in Hospital Information Systems — An Interoperable Methodology for Distributed Data Quality Assessments. Methods Inf Med. 2023 Sep;62(3/4):71–89. DOI: 10.1055/a-2006-1018

4. Examples

cordDqChecker: A reporting tool for DQ assessment on RD data implemented using dqLib. This tool provides some examples of DQ reports generated using synthetic data.
cvdDqChecker: A tool for assessing and reporting data quality on CVD data. This tool was also implemented using dqLib. The ./Export folder contains exemplary DQ reports and visualizations.

5. Notes

To cite dqLib, please use the CITATION file located in the folder ./inst.
Acknowledgment: This work was funded by the German Centre for Cardiovascular Research (DZHK), grant number 81X1300117, and the "Collaboration on Rare Diseases" of the Medical Informatics Initiative (CORD-MI) under grant number: 01ZZ1911R, FKZ-01ZZ1911R.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
R		R
inst		inst
man		man
.Rbuildignore		.Rbuildignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
dqLib.Rproj		dqLib.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Quality Library (dqLib): An R Package for Assessing and Reporting Data Quality in Clinical Research and Care

1. Description

2. Installation

3. DQ Metrics and Reports

4. Examples

5. Notes

About

Releases 2

Packages

Languages

KaisTahar/dqLib

Folders and files

Latest commit

History

Repository files navigation

Data Quality Library (dqLib): An R Package for Assessing and Reporting Data Quality in Clinical Research and Care

1. Description

2. Installation

3. DQ Metrics and Reports

4. Examples

5. Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages