open-semantic-search
Open Semantic Search is an open source integrated search server and text mining platform for searching, browsing, analyzing, and exploring large document collections. It combines semantic search with text analytics through an ETL framework that handles crawling, text extraction, OCR for images and PDFs, and named entity recognition for persons, organizations, and locations. The platform supports metadata management via thesauri and ontologies, and provides search user interfaces and apps for fulltext search, faceted search, exploratory search, and knowledge graph search. It can be installed as a deb package on Debian or Ubuntu, run via Docker containers with the web interface exposed on port 8080, or deployed as a virtual desktop search appliance for VirtualBox. The project includes integration tests using Solr, spaCy, and Tika services, as well as end-to-end browser tests with Playwright and Jest.