Analysis and Visualization of Missing Value Patterns

van Stein, Bas; Kowalczyk, Wojtek; Bäck, Thomas

doi:10.1007/978-3-319-40581-0_16

Bas van Stein¹⁶,
Wojtek Kowalczyk¹⁶ &
Thomas Bäck¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 611))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

973 Accesses

Abstract

Missing values in datasets form a very relevant and often overlooked problem in many fields. Most algorithms are not able to handle missing values for training a predictive model or analyzing a dataset. For this reason, records with missing values are either rejected or repaired. However, both repairing and rejecting affects the dataset and the final results, creating bias and uncertainty. Therefore, knowledge about the nature of missing values and the underlying mechanisms behind them are of vital importance. To gain more in-depth insight into the underlying structures and patterns of missing values, the concept of Monotone Mixture Patterns is introduced and used to analyze the patterns of missing values in datasets. Several visualization methods are proposed to present the “patterns of missingness” in an informative way. Finally, an algorithm to generate missing values in datasets is provided to form the basis of a benchmarking tool. This algorithm can generate a large variety of missing value patterns for testing and comparing different algorithms that handle missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation Methods

Visualizing Missing Data: COVID-2019

Diminishing Unclear Consequences of Missing Values in Data Mining

References

Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Carpenter, J.R., Kenward, M.G.: Multiple Imputation and its Application, 1st edn. Wiley, New York (2013)
Book MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Howell, D.C.: The Analysis of Missing Data. Sage, London (2007)
Google Scholar
Lakshminarayan, K., Harp, S.A., Samad, T.: Imputation of missing data in industrial databases. Appl. Intell. 11, 259–275 (1999)
Article Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
MATH Google Scholar
Rajaraman, A., Ullman, J.D., Ullman, J.D., Ullman, J.D.: Mining of Massive Datasets, vol. 77. University Press Cambridge, Cambridge (2012)
MATH Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581 (1976)
Article MathSciNet MATH Google Scholar
Saar-tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1625–1657 (2007)
MATH Google Scholar
Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)
Article MathSciNet Google Scholar
van Stein, B.: Missing data visualisation (2015). https://github.com/Basvanstein/MissingDataVis
Williams, D., Carin, L.: Incomplete-data classification using logistic regression. In: Proceedings of the 22nd International Conference on Machine learning, pp. 972–979. ACM (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, Leiden, The Netherlands
Bas van Stein, Wojtek Kowalczyk & Thomas Bäck

Authors

Bas van Stein
View author publications
You can also search for this author in PubMed Google Scholar
Wojtek Kowalczyk
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bäck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bas van Stein .

Editor information

Editors and Affiliations

INESC-ID,Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Joao Paulo Carvalho
LIP 6, Université Pierre et Marie Curie, Paris, France
Marie-Jeanne Lesot
School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
Uzay Kaymak
IDMEC,Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Susana Vieira
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, Machine Intelligence Institute, New Rochelle, New York, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Stein, B., Kowalczyk, W., Bäck, T. (2016). Analysis and Visualization of Missing Value Patterns. In: Carvalho, J., Lesot, MJ., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2016. Communications in Computer and Information Science, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-40581-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-40581-0_16
Published: 11 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40580-3
Online ISBN: 978-3-319-40581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics