Wiktionary dump file parser and multilingual data extractor
-
Updated
Mar 20, 2025 - Python
Wiktionary dump file parser and multilingual data extractor
Extract data from German Wiktionary XML files.
Code for the paper: Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary (Metheniti and Neumann, 2018)
This repository contains a python script for parsing an xml dump of the Italian Wiktionary (Wikizionario); it also contains the parsed dictionary in a JSON file and a ONLI (italian database of neologisms) scraper with the scraped data in a CSV file
A library for parsing the french wiktionary
Extraction of the Russian word forms and their segmentation from the Russian Wiktionary
Selected data processing scripts including language agnostic multilingual wiktionary parser
Parses the Russian Wiktionary HTML dumps into JSON and generates ereader dictionaries
A scraper which extracts data from the German Wiktionary HTML dump.
🇫🇷 Source code for frenchhomophones website. [inactive]
Prototype of an interface to use Wiktionary translations
A Python package to parse and extract data from the German Wiktionary. It allows users to access wikitext content, either by fetching it directly online or by loading a dump file locally.
Extract hyphenation from Italian Wiktionary
A Hands-On Guide to Parsing Wikitext with Python
Simple and memory-efficient word extractor for Wiktionary
English-Deutsch (Sorted by Frequency)
Add a description, image, and links to the wiktionary-parser topic page so that developers can more easily learn about it.
To associate your repository with the wiktionary-parser topic, visit your repo's landing page and select "manage topics."