Gemini Live Voice to Text Realtime

Enter your (free!) Gemini API key from: aistudio.google.com/apikey


This is a web application that allows you to interact with Google's Gemini 2.0 Flash Live large language model using your voice in real-time. It's based on Google's js-genai API to run entirely in the browser, providing a seamless, voice-driven experience. Key features include Google Search, Python code execution (including image display from e.g. matplotlib), image upload (including from the camera on mobile), and text output rendered in markdown and LaTeX. The source code is free on GitHub under the MIT License. Voice to text interactions are advantageous because most people read about twice as fast as synthetic speech typically talks; while text can be skimmed, interrupting speech leaves what might have been said as an often frustrating mystery.


Privacy policy: Your API key is stored in a cookie which is only accessible from the HTML JavaScript at this domain, which runs only in your browser. It will never be stored anywhere else. If you don't trust this, fork the code on GitHub to run it on your own server, or from a localhost server with your API key hardcoded in the gemini-live.html file.


screenshot

By Jim Salsman, April 11, 2025. Donate.