🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
Updated
Nov 25, 2024 - Python
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
A powerful and modular toolkit for record linkage and duplicate detection in Python
🆔 Command line tool for deduplicating CSV files
🆔 Examples for using the dedupe library
Identifying and removing near-duplicate images using perceptual hashing.
Fast block-level out-of-band BTRFS deduplication tool.
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Fast Scalable Dedupe - Fuzzy Matching With Opensearch + nmslib + Rapidfuzz
Base class for dedupe variables for parsed fields
Duplicate file finder - with % duplication of folders
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
Model for data deduplication assignment.
Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
Add a description, image, and links to the dedupe topic page so that developers can more easily learn about it.
To associate your repository with the dedupe topic, visit your repo's landing page and select "manage topics."