I want to record a microphone clip and then transcribe it with Whisper (OPENAI). In order to do that, I need to pass in a File parameter. I don't think it's working right now.
Thank you for the screen shots and the well worded explanation on the issue.
It looks like the microphone1 component is storing its data as an encoded string, which we will need to convert to a file of the correct format for Whisper (OPENAI) to receive and work with
This is an issue that I can definitely make a feature request to our engineering team to improve and streamline this process. As it is very annoying that the microphone component does not simply save the input data it receives in a format that can be immediately used in the query.
Also we should figure out the best way to store this file once we create it from the string. The first thought that jumped to my mind would be to use Retool Storage so that the file could be queried and then passed to the Open AI query.
The other option would be saving the file to your local machine and then using a file uploader component. Let me do a little research and see what could work best.
I am thinking the best option is to use Retool Workflows as it will allow us to use a Python library to convert the string to a wav file and then send this to Whisper.
Let me see if I can build out a prototype to make sure this works and then I can share that with you
I was working with our workflows team and it seems that we have some issues with using the primary python libraries that would be the best bet for taking the string, and once it is decoded, converting it to a mp3 or wav file
Users would have to install a binary file to support the library, although users in python workflow blocks do have access to the file system, this is definitely not easy or intuitive to do for users that are not highly technical and experienced with python dependency libraries.
I can file a Feature Request for a workflow block that can convert file types but I can't promise that this will be built out soon if you or any other users have ideas on third party tools that you can send out your data to have it converted that would likely be the best way to get the file needed for the Whisper query.
Will let you know if I have any more details from our team.
Hey! I got the workflow working after hours of debugging. So it works, but it would definetely be much, much, much cleaner and faster and easier (and more secure) if (a) the microphone component generated a file somehow or (b) the base64 could be easily converted to a file. All without using workflows.
If you don't mind sharing your solution, I am super curious on how you got it to work and your solution might be able to help others with similar issues
Yes I completely agree with your last two points, I made feature requests for both and will update this thread with any news I hear on those