A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
-
Updated
Mar 13, 2025 - Python
A parser turns its input (often text in form of a file) into a more advantageous representation (usually a certain data structure in memory) to perform a specific task.
Common examples include:
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
A python module to repair invalid JSON from LLMs
Type-safe YAML parser and validator.
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
High-accuracy NLP parser with models for 11 languages.
Wiktionary dump file parser and multilingual data extractor