MinerU-HTML
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decoding.