Most STT algorithms don't support receiving a reference text, meaning you have to trust their given characters alignment. However, sometimes you already have the text that is being said in the audio, ...