IRC Logs for #musicbrainz-picard-development

9:24 AM
outsidecontext[m

aerozol: For Picard we have run well with partial translations in the past. There hadn't been much negative feedback about this. Maybe little known, but we have essentially two tiers of languages in the app: Those that are actively selectable in the UI, and those that will be only used if system language is being used. The selectable have a higher threshold of completion. Although the percentage completed
9:24 AM
can drop, and we have never removed a language from that list so far.
9:25 AM
For the website I only enable complete translations. Same for the app metadata (used most prominently in Linux software centers).
9:26 AM
Also Windows installer gets only complete translations.
9:27 AM
It depends on both how well missing translations are handled by the component (e.g. is there a clean fallback to English) and on the size. For smaller components it is more realistic to have complete translations.
9:34 AM
Protopia[m] joined the channel
9:34 AM
Protopia[m]

You might want to look into how you can do AI translations. In essence, you would use AI to do an initial translation for any strings not translated, and then users can contribute tweaks to correct all the out-of-context translation errors that the AI creates.
9:35 AM
outsidecontext[m

You can integrate automatic translations into Weblate. But the free ones are most often not very helpful, and the good ones can become costly quite quickly.
9:35 AM
The French translation of the picard docs was mostly done automatic, as I understand
9:36 AM
Protopia[m]

I have no idea how to do it using Github actions, but there may well now be standard community actions to do it, and they may well use an AI model that is free for limited transactions per month or for open source projects.
9:37 AM
outsidecontext[m

The proper place for this would be weblate, not github actions. As I said, it supports integrating such systems
10:45 AM
I think this is only suitable for translation systems where you can distingiush between reviewed and not reviewed translations in some way, as it is e.g. possible with gettext. Then you can have all the translations being marked as requiring review (being marked as "fuzzy" in gettext) and they won't actually show up as translations in the UI. It's also important to notice that machine translation quality
10:45 AM
differs widely depending on language.
10:47 AM
Protopia[m]

Ah - but I think you want AI translations to show up in the UI so that users can see that they are wrong and correct them. Also, you would obviously set up the AI so that it doesn't translate anything that already has a translation, so user changes are not overwritten.
10:48 AM
outsidecontext[m

That might be fine for a website that you can deploy frequently, but I wouldn't do this for a desktop app where the wrong translation just gets stuck for a long time.
10:49 AM
Protopia[m]

very true.
10:51 AM
outsidecontext[m

Of course we need to be a bit hands-on with translations anyway, as we don't have a big enough translation community to have everything reviewed and complete before a release. So mistakes in translations are probably not uncommon, especially for the less used languages.
10:52 AM
For the app description in the appdata metadata I run a machine reverse translation once a translation is complete, to see whether the translation generally fits :)
10:54 AM
Also in regards to machine translations, translations with an LLM like chatgpt are really good and often a lot better then earlier existing generic MT systems. And we would have a chance to give the machine more context (e.g. glossary and background information on MB).
10:56 AM
But costs can be high :(
10:57 AM
Protopia[m]

According to https://docs.weblate.org/en/latest/admin/machin... there are several free services: Deepl, Glosbe, LibreTranslate, MyMemory. I will (in due course) be trying these for my own open source Laravel project, and I can let you have feedback once I get to this.
11:04 AM
outsidecontext[m

You need to distingiush between systems that do machine translations (MT) and translation memory (TM). E.g. DeepL is machine translation like Google Translate.
11:05 AM
DeepL is a paid service (kind of good, I think Bob used this a lot if I remember correctly). Glosbe I don't know. There was a free service activated on MB Weblate in the beginning, it might have been LibreTranslate, not sure. It got disabled because it was not useful and provided strange suggestions.
11:06 AM
MyMemory is a translation memory. In my experience a translation memory is mostly useful for the project itself. A generic TM is often only useful for very limited strings, due to differences in glossary and general wording. I have used MyMemory in the past and wouldn't expect much help from it for MB. Free version is also very limited in regards to requests per day.
11:08 AM
The entirety of translations on the MB Weblate instance is btw. also made available as a translation memory. It's where the stuff shown under suggestions when translating a string is currently shown.
11:09 AM
Protopia[m]

Generally speaking Deepl has an excellent reputation for translating paragraphs (i.e. with reasonable context). I have no idea about translating app strings where there is less context.
11:10 AM
outsidecontext[m

IMHO Weblate's own TM is a bit too strict in showing matching translations, though. As a translator I would prefer it to also show strings with a way lower match rate also, right now it only shows very close matches (but AFAIK this cannot be configured).
11:10 AM
Ah, yes. That's generally difficult for the generic MT systems, like DeepL and Google Translate. As you can't give them context.
11:11 AM
Protopia[m]

However, I would imagine that an AI translator running in Github could get quite a lot of context from the app source code, and thus give better quality translations.
11:12 AM
outsidecontext[m

It's much simpler, you can use the existing corpus of texts and translations for such systems.
11:13 AM
Protopia[m]

Sorry - what is much simpler
11:13 AM
* much simpler?
11:14 AM
outsidecontext[m

Giving the context. You don't need the source code, you can use the existing texts for giving context
11:14 AM
Protopia[m]

Ah - OK.
11:16 AM
Deepl has a free plan: "With the DeepL API Free plan, you can translate up to 500,000 characters per month for free."
11:25 AM
outsidecontext[m

Which isn't that much in the end. According to weblate the Picard "app" component has 56477 characters in the source. There are accross all languages a total of 2,358,279 untranslated characters (just "app" again). picard-docs overall another 1,173,023 untranslated.
11:28 AM
Regarding training data: The problem is the availability of such systems. There have been MT systems around for years that could produce good results when trained with e.g. company specific texts, especially for a limited use case. But I'm not aware of anything freely available, and there have been proprietary systems that have been really expensive to set up and run. With the advent of LLMs things have
11:28 AM
drastically changed I think. The quality is way better, and everyone can set up a specific LLM agent for translation with enough context to get surprisingly good translations. The issues are the costs again, though.
11:29 AM
Since LLMs have arrived I have been personally very happy that I stopped working in the translation business a couple of years ago :D
11:30 AM
Protopia[m]

500,000 characters a month for translation is pretty good - if you are doing incremental translations as you develop and / or add languages slowly, this might be pretty OK.
11:34 AM
outsidecontext[m

Yes, if you are running it this way, sure. I mean that is essentially how the French picard-docs translation happened. 500k characters would be enough to get the translations for 1.5 languages picard-docs.
11:34 AM
But it's less suitable if you want to use it continously across all or most languages in a CI context.
11:36 AM
But yes, it could be tried to enable it in Weblate and run it over some components where someone knows the language
11:38 AM
One thing to note is that the services are often not handling formatting characters very well. This needs a lot of review and manual correction. It's a big reason not to enable the translations by default. I think Bob can also say more about this, but I imagine the specific formatting for restructuredtext required quite some work to fix.
14:36 PM
BobSwift[m]

<outsidecontext[m> "DeepL is a paid service (kind of..." <- Yup. That's what I used for most of the French translations. You're right in saying that it still required a manual review of pretty much every string that contained RST directives, links, etc. I wouldn't want to try to automate that.
14:43 PM
We haven't exactly had a lot of complaints about not having translations, so I'm not sure there is much urgency. IMHO the best solution is to somehow get more people interested in translating and keep chipping away at it.
20:38 PM
aerozol[m] has quit