Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address field dropdowns should choose a name in the relevant language #10541

Closed
1ec5 opened this issue Nov 14, 2024 · 5 comments · May be fixed by #10561
Closed

Address field dropdowns should choose a name in the relevant language #10541

1ec5 opened this issue Nov 14, 2024 · 5 comments · May be fixed by #10561
Labels
considering Not Actionable - still considering if this is something we want field An issue with a field in the user interface localization Adapting iD across languages, regions, and cultures

Comments

@1ec5
Copy link
Collaborator

1ec5 commented Nov 14, 2024

As far as I know, most postal services and other addressing authorities use only one name at a time in each part of an address (street, city, etc.), even if OSM would give those features multiple values in name=* as a linguistic compromise.

For example, I recently had to retag a bunch of POIs where a mapper had previously accepted iD’s suggestion of addr:street=Bellaire Boulevard;Đại Lộ Sàigòn based on this nearby street, which is tagged name=Bellaire Boulevard;Đại Lộ Sàigòn. This street has two values in name=* because of dual wayfinding signs. But neither the U.S. Postal Service nor the county records manager recognizes “Đại Lộ Sàigòn” as a street name for addressing purposes. Even if they did, they still wouldn’t recognize a dual name. (Do other countries’ postal systems work like this? Please correct me if I’m wrong.)

Ideally, iD would somehow know that it should use name:en=Bellaire Boulevard instead, because that’s the standard language for addressing in most of the U.S. We might be able to use data/territory_languages.json for this purpose, but it gets tricky in multilingual countries where the countrywide default language may not be a good fit. Since iD also consults Nominatim, maybe it can access the default_language=* from the surrounding boundaries, or it could base the decision on the interface language. (Not great, but this is a fallback.) Or maybe iD could simply use the name=* but truncate it at the first delimiter (a semicolon in this case).

const value = resultProp && d.tags[resultProp] ? d.tags[resultProp] : d.tags.name;

@1ec5 1ec5 added localization Adapting iD across languages, regions, and cultures field An issue with a field in the user interface labels Nov 14, 2024
@Deeptanshu-sankhwar
Copy link
Contributor

Hello! Please take a look at this solution @1ec5 whenever you can. I have prioritized name:en values if it exists for a given nearby address, else I have resorted to name with the fallback to return the string before the delimiter ; in case it exists. I have attached a screenshot below on how the nearby address dropdown looks now.

Screenshot 2024-11-27 at 10 28 42 AM

@1ec5
Copy link
Collaborator Author

1ec5 commented Nov 27, 2024

Following up on some of my half-baked suggestions in #10541 (comment):

Since iD also consults Nominatim, maybe it can access the default_language=* from the surrounding boundaries

Note that this file currently only consults country-coder, but country-coder has declined to include language-territory information: rapideditor/country-coder#131. So that’s why I suggested Nominatim. Consulting Nominatim isn’t foolproof, but it’s more language-neutral than what’s currently in this PR.

The Nominatim API integration lives in modules/services/nominatim.js. There’s a countryCode method that grabs the country code out of a Nominatim reverse geocoding result, but it isn’t quite what we’re after, since the default_language would be on the containing country or perhaps a political subdivision of the country. Unfortunately, a standard reverse geocoding request doesn’t seem to include any metadata about those containing boundaries.

That’s as far as I got at a glance, but feel free to explore this approach some more and ask in Nominatim’s discussion board if you need more help.

or it could base the decision on the interface language. (Not great, but this is a fallback.)

You’d call utilDisplayName to get the same name that iD is currently showing as the label for the street on the map. The label already prefers the user’s language, irrespective of the language spoken locally. This is a lot simpler but can be inaccurate, for example, if an English speaker from the U.S. happens to edit in Japan.

One other potential approach I had forgotten was that the Multilingual Name field already fetches information about which countries speak which languages. You could make this field behave similarly.

var _territoryLanguages = {};
fileFetcher.get('territory_languages')
.then(function(d) { _territoryLanguages = d; })
.catch(function() { /* ignore */ });

Unfortunately, it isn’t very granular. For example, the entry for the U.S. lists a lot of languages that the postal service does not speak, and you’d have no way of knowing that the preferred language in Québec is French, not English:

"us": ["en", "es", "zh-Hant", "fr", "de", "fil", "it", "vi", "ko", "ru", "nv", "yi", "pdc", "hnj", "haw", "frc", "chr", "esu", "dak", "cho", "lkt", "ik", "mus", "cad", "cic", "io", "jbo", "osa"],
"ca": ["en", "fr", "zh", "yue", "es", "pa", "ar", "fil", "it", "de", "ur", "fa", "pt", "ru", "hi", "ta", "vi", "pl", "ko", "gu", "el", "ro", "bn", "pdt", "uk", "sr", "nl", "ja", "hu", "so", "hr", "iu", "iu-Latn", "tr", "oj", "ojs", "chp", "moe", "cr", "mic", "atj", "bla", "crk", "den", "dgr", "csw", "moh", "nsk", "dak", "clc", "hur", "crg", "war", "lil", "oka", "pqm", "crl", "kwk", "gwi"],

But in combination with the user’s preferred language as used by utilDisplayName, maybe it could work.

@tyrasd
Copy link
Member

tyrasd commented Dec 3, 2024

The question here is if the linguistic compromise led to multiple languages to be included in the name tag, should it not also be the same for the address tag?! Because otherwise it could easily lead to edit wars when one of the two (or more) languages have to be chosen in the address tag. Also, having the same value in both the highway name and addr:street makes it easier for data consumers to ingest the data, and is as far as I know the current standard to map addresses.

But neither the U.S. Postal Service nor the county records manager recognizes “Đại Lộ Sàigòn” as a street name for addressing purposes.

That said, this seems to be rather specific situation here. Usually, when roads have multilingual names (e.g. in Belgium or South Tyrol), the postal services accept all recognized languages also in the postal address. So, this would require to add a special exception for the US only?

PS: Slightly off topic, but semicolons should probably be avoided in (multilingual) names in general, see https://wiki.openstreetmap.org/wiki/Multilingual_names

@tyrasd tyrasd closed this as completed Dec 3, 2024
@tyrasd tyrasd added the considering Not Actionable - still considering if this is something we want label Dec 3, 2024
@1ec5
Copy link
Collaborator Author

1ec5 commented Dec 3, 2024

That said, this seems to be rather specific situation here. Usually, when roads have multilingual names (e.g. in Belgium or South Tyrol), the postal services accept all recognized languages also in the postal address. So, this would require to add a special exception for the US only?

Maybe. I guess this gets into whether the dropdown should be more helpful than merely looking at name=*. For example, a vast number of addresses in the U.S. are along roads that only have numbers, not names. There are specific formats for those numbers in addresses, such as “State Route 123” for ref=SR 123 on the way (or network=US:XY ref=123 on the route relation). Encoding these formats as name=* is controversial, to say the least.

Slightly off topic, but semicolons should probably be avoided in (multilingual) names in general, see https://wiki.openstreetmap.org/wiki/Multilingual_names

There’s a long discussion questioning that premise on the forum

@1ec5
Copy link
Collaborator Author

1ec5 commented Dec 3, 2024

Anyways, this issue is not really about semicolons. It came up in the context of this debate about hybrid street names in Canada. “Rue John Road” is not a street name that Canada Post would recognize in either French or English, and not a name that anyone would search for, regardless of the language they speak. iD would ideally suggest either name:en=John Road or name:fr=Rue John. Truncating at a semicolon was only a suggestion about how to solve part of that issue more easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
considering Not Actionable - still considering if this is something we want field An issue with a field in the user interface localization Adapting iD across languages, regions, and cultures
Projects
None yet
3 participants