jsvir

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Visit Website

Total Products

Software by jsvir

Open Source

vad

SG-VAD is a speech activity detection (VAD) model based on stochastic gates, implementing an ICASSP 2023 research paper from Jonathan Svirsky and Ofir Lindenbaum. The system uses a mask or filter architecture where noise audio and spoken words are processed as separate categories during training, then combined into a unified VAD model for inference. Built on NVIDIA's NeMo framework, it supports custom label counts and flexible training via manifest files. The repository includes a pre-trained PyTorch checkpoint (sgvad.pth) and an inference script that applies a configurable threshold on model output to classify speech versus non-speech. On the AVA-speech test set, the published checkpoint achieves an EER of 10.40%, TPR at FPR 0.315 of 0.96, and ROCAUC of 0.95. After fixing a label creation bug for the HAVIC benchmark and restricting non-speech categories to noise, background noise, music, and baby, EER improved to 21.33% with ROCAUC of 85.31. The original HAVIC results show EER of 23.29%, TPR at FPR 0.315 of

ML Frameworks Firmware & RTOS

38 Github Stars

Open Source

sparknet

SparkNet is a tiny keyword spotting (KWS) neural network designed for efficient deployment on edge devices and micro-controllers. It uses sparse binarization to achieve competitive accuracy with significantly fewer multiply-accumulate operations (MACs) compared to existing models. The architecture delivers state-of-the-art performance on the Google Speech Commands v1 and v2 datasets with 12 target labels, achieving 97.0% accuracy on v2 with only 11,500 parameters and 1.2M MACs. Smaller configurations are available, with the most compact variant (C=4) using just 1,416 parameters and 105K MACs while still reaching approximately 83% accuracy. The repository includes training and inference code based on a modified version of NVIDIA's NeMo framework, along with pretrained checkpoints for configurations C=4, 8, 16, and 32. Inference can be tested by placing audio recordings in a wavs directory and running the provided inference script. The project was published at Interspeech 2024.

ML Frameworks Firmware & RTOS

18 Github Stars