We also developed a solution using speech-to-text to generate text transcriptions based on videos and translate them into any language, all of which could be edited in live mode anytime through a user-friendly interface. Conversely, using text-to-speech, we worked on generating new videos based on previously made text translations. Subtitles were embedded in the video so that users could download the video with the desired transcription.
Alongside the client, we developed an end-to-end product, allowing collaboration between the ML team, backend and frontend developers. As a result, we helped them increase accuracy in the transcribing stage closer to 90-95%, compared to the 70% that was previously achieved.