awesome-ocr
Awesome OCR is a curated collection of open-source tools, libraries, and research resources for optical character recognition and related document image processing tasks. The repository organizes projects into categories including deskewing and dewarping for correcting distorted or skewed document images, document segmentation for analyzing page layouts, and subcategories for line, character, and word segmentation. Featured projects include jdeskew for Fourier-based skew estimation, DewarpNet for deep learning document dewarping, unpaper for post-processing scanned pages, LayoutParser for layout analysis, dhSegment for document segmentation, and tools like pagedewarp that use cubic sheet models for image correction. The collection covers both classical image processing approaches such as Hough transforms and Canny edge detection, as well as modern deep learning methods including GANs, LSTMs, and attention networks. It also includes specialized tools for handwritten text detection, Chinese word segmentation, e