That old feature uses Whisper to transcribe your voice to text, and then feeds the text into the GPT which generates a text response, and then some other model synthesizes audio from that text.
This new feature feeds your voice directly into the GPT and audio out of it. It’s amazing because now ChatGPT can truly communicate with you via audio instead of talking through transcripts.
New models should be able to understand and use tone, volume, and subtle cues when communicating.
I suppose to an end user it is just “version 2” but progress will become more apparent as the natural conversation abilities evolve.
Does it feed your audio directly to gpt4?To test it I said in a very angry tone "WHAT EMOTION DOSE IT SOUND LIKE I FEEL RIGHT NOW?" and it said it didn't know because we are communicating over text
Yes, per my other comment this is an improvement on what their app already does. The magnitude of that improvement remains to be seen, but it isn’t a “new” product launch like a search engine would be.