如何比较不精确匹配的字符串

我需要比较两个输出字符串, 即原始转录和语音到文本服务的转录。数字通常是以数字格式或以 "四" 或 "4" 等单词的形式写的。考虑到这些不同的转录方法, 如何比较字符串? 到目前为止, 我只是用小写字母转换了两个字符串, 并将每个单词分成一个空格。 #Read the two files and store them in s1_raw and s2_raw with open('original.txt', 'r') as f: s1_raw = f.read() with open('comparison.txt', 'r') as f: s2_raw = f.read() #Transform all letters to minuscule letter s1 = s1_raw.lower() s2 = s2_raw.lower() #Split texts with space as seperator to have a list of words s1_set = s1.split(' ') s2_set = s2.split(' ') #Used later for ……