The Californer - Sensory Announces the World's Smallest, Most Powerful On-Device Speech-to-Text Engine

Trending...

ICT Innovations Releases ICTPBX Community Edition as Open Source Under Mozilla Public License 2.0 - 287
Lick Introduces Pineapple Flavored Massage Oil — A Tropical Date Night Favorite Available on Amazon
FutureLot Powers ADU Wizard for Massachusetts Clean Energy Center's Statewide ADU Resource Center

STT Engine Now Supports TensorFlow Lite Micro and NPU Architectures

SANTA CLARA, Calif. - Californer -- Sensory Inc. (https://www.sensory.com/), a pioneer in on-device AI, today announced a breakthrough in embedded speech recognition with the launch of its latest Speech-to-Text engine. Optimized for TensorFlow Lite Micro (TFLM) (https://ai.google.dev/edge/litert/microcontroll...) and advanced Neural Processing Units (NPUs), including the Arm® Ethos™-U55 (https://developer.arm.com/documentation/109267/...), this engine delivers unparalleled accuracy and performance in an ultra-compact footprint.

Sensory's STT engine supports 37 languages, enabling manufacturers to deploy global products with a single, ultra-efficient architecture.

Maximum Power, Minimum Footprint
By leveraging specialized Neural Processing Units (NPUs), Sensory's STT engine eliminates the performance-draining data transfers between a CPU and an NPU. This architecture offloads the entire tensor computation graph to the hardware accelerator, which can significantly reduce power consumption and latency. By keeping the CPU idle during inference, device manufacturers can extend battery life or reserve processing cycles for complex system tasks and UI management.

More on The Californer

The engine is available in two optimized configurations:

2.7 MB Domain-Specific Model: Optimized for large vocabulary "Command & Control" tasks, this model utilizes domain adaptation to maintain high accuracy in specific environments, like automotive cabins. It features a peak SRAM usage of 787.11 KiB and operates at 892.9 Million MACs per inference.
13 MB General-Purpose Model: A versatile model designed to handle natural language and large vocabularies without per-domain tuning. It fits within standard 2MB SRAM limits with a peak footprint of 1.68 MB, operating at 4.37 Billion MACs per inference.

Sensory's STT engine is engineered for rapid portability across a broad ecosystem. By using LiteRT (https://ai.google.dev/edge/litert/microcontrollers/overview) Micro (formerly Tensorflow Lite Micro) as the essential runtime layer, Sensory provides seamless integration for:

Arm® Ethos™ NPU Family: Native support for Ethos-U55, U65, and U85.
Cadence® Tensilica® HiFi DSPs: Full compatibility with the HiFi 4, HiFi 5, and HiFi iQ series.
Edge Platforms: Optimized for Arm Cortex-M (https://ai.google.dev/edge/litert/microcontroll...) (M4, M7, M55) and popular boards like Arduino Nano 33 BLE Sense (https://ai.google.dev/edge/litert/microcontroll...), ESP32 (https://ai.google.dev/edge/litert/microcontroll...), and Sony Spresense (https://ai.google.dev/edge/litert/microcontroll...).

Privacy and Performance
"Our STT engine demonstrates that natural language interfaces can be powerful without relying on the cloud," said Todd Mozer, Chairman and CEO of Sensory (https://www.linkedin.com/company/sensory-inc-). By processing 100% of voice data on-device, Sensory helps developers ensure user privacy, lower latency, and reliability in environments with limited or no connectivity.

More on The Californer

Why Embedded STT

Privacy & Security: Voice data never leaves the device, ensuring user privacy.
Low Latency: Instantaneous results without relying on internet connectivity.
Lower power/lower heat: Model efficiency and NPU usage reduce power substantially.
Cost Efficiency: Eliminates ongoing cloud processing fees and reduces data transmission costs.
Reliability: Performance even in "comms-denied" environments or areas with poor cellular service.