Last night I built a voice to text app for macOS. Well, saying I built it is a bit of a stretch. AI built it. I funded the tokens and provided some sense of direction and product scope. Together, we built Gemmur – an app which let’s you speak, transcribes the text, cleans it up and inserts it into whatever text field you have in focus. This type of ASR (automatic speech recognition) used to be only possible with cloud-based models. Now, it’s possible to run high quality voice to text conversion on your local device without leaking any of your sensitive data.
Why this name?
I was hearing a lot of positive reviews of the new Gemma 4 models, especially their ability to handle audio, images, video and text, whilst running locally and privately on your own hardware. Gemmur is a mix of the words Gemma and Murmur – I named the project in anticipation of what the local Gemma 4 model could do. Spoiler: running Gemma 4 on a 16GB macbook didn’t work well. Gemmur now uses Gemma 3 nano models, which are plenty powerful for cleaning up text. Whisper handles the The audio processing (speech to text). Gemma just does any clean up required, if you want it to.
Why this project?
Alongside Gemma 4’s announcement, it seems like WisprFlow is everywhere – kudos to their marketing team. For those of you who know me well, you’ll know that I just became a father. I now spend a lot of time with a baby in my hands, making it hard to type. Naturally, I like the idea of an easy way to insert text so I can interact with messaging apps and AI.
I don’t like the security and privacy offered by WisprFlow. Their website claims to take privacy and security seriously, but you need to ‘turn it on’ in settings, meaning it’s not private by default. Your voice recording is transmitted to the cloud, even with privacy mode on. Privacy mode only means they don’t keep the recording or use it to improve the model. I don’t want that. I want my conversations with my friends and my AI agents to be private. Audio should processed locally and never leave my device.
Gemmur vs other options
Voice to text doesn’t always need the cloud. That’s where Gemmur comes in. It runs completely locally on your device using local models. In the demo video below you can see the quality of our voice to text transcription. Given it uses local models, I don’t expect it to be always right, only most of the time. That’s a fine trade-off for me in exchange for privacy and security.
There are, of course, other alternatives out there. I only got to learn about some of these after I had finished my MVP and shared it with friends. Some of them are using paid products that have no become open source like VoiceInk.
Here’s how the options compare:
| Gemmur | WisprFlow | SuperWhisper | VoiceInk | |
| Local audio processing | Local only | No – data is always sent to external servers | Hybrid – local with option to use cloud models | Local only |
| Cost | Free | Paid – £12/user/month | Free for local models, option to upgrade to use cloud models. | Paid – lifetime licence Free – if you install from github |
It’s open source
Recent advances in AI have led to major improvement in speech-to-text technology, but they’re not benefitting everyone – yet. Local models can now deliver high accuracy while offering a much higher level of privacy for users. At the same time, many companies are building businesses around voice input. That makes sense, creating good software requires investment.
Still, I don’t think access to voice to text tools should depend on someone’s willingness or ability to pay. In many cases, “free” products from with another cost for the user. Their data may be collected and used to train the model or for other purposes.
I believe a basic, high quality version of voice to text should be made available to everyone at no cost. By lowering the barrier to entry, more people can enjoy one more quality of life improvement that AI can offer today.
By making this project open source, my hope is that others who share this view will contribute to improving the experience. I’d like to extend support beyond macOS, and bring voice to text to other platforms too.



0 Comments