Affected page
https://www.php.net/manual/en/collator.setstrength.php
Current issue
The Collator::setStrength() documentation explains ICU collation strength
levels and lists the corresponding Collator constants, such as
Collator::PRIMARY, Collator::SECONDARY, Collator::TERTIARY,
Collator::QUATERNARY, and Collator::IDENTICAL.
However, the page does not mention that collation strength may also be
requested through the Unicode locale extension key ks when creating a
Collator from a locale identifier.
For example, the following two collators have the same strength:
$collator1 = new Collator('en_US');
$collator1->setStrength(Collator::IDENTICAL);
$collator2 = new Collator('en_US-u-ks-identic');
var_dump($collator1->getStrength() === $collator2->getStrength());
// bool(true)
Suggested improvement
Add a short note explaining that the Unicode locale extension key ks can
be used to request a collation strength in the locale identifier.
For example:
ks-level1 corresponds to Collator::PRIMARY
ks-level2 corresponds to Collator::SECONDARY
ks-level3 corresponds to Collator::TERTIARY
ks-level4 corresponds to Collator::QUATERNARY
ks-identic corresponds to Collator::IDENTICAL
Also mention that omitting the ks key lets ICU use the default strength
for the locale, rather than specifying a separate value corresponding to
Collator::DEFAULT_STRENGTH.
This would help users understand the relationship between
Collator::setStrength() and strength requested through locale identifiers.
It is also useful for APIs that accept a locale identifier but do not accept
a Collator object directly.
Additional context (optional)
This behavior is based on the Unicode LDML Collation setting options.
The ks Unicode locale extension key is defined as the BCP 47 key for
collation strength, with values such as level1, level2, level3,
level4, and identic.
Specification reference:
https://www.unicode.org/reports/tr35/dev/tr35-collation.html#Setting_Options
The following script verifies that these ks values are reflected in
Collator::getStrength():
<?php
$locales = [
'PRIMARY' => 'en_US-u-ks-level1',
'SECONDARY' => 'en_US-u-ks-level2',
'TERTIARY' => 'en_US-u-ks-level3',
'QUATERNARY' => 'en_US-u-ks-level4',
'IDENTICAL' => 'en_US-u-ks-identic',
'DEFAULT' => 'en_US',
];
foreach ($locales as $label => $locale) {
$collator = new Collator($locale);
printf(
"%-10s %-25s strength = %d\n",
$label,
$locale,
$collator->getStrength()
);
}
Example output:
PRIMARY en_US-u-ks-level1 strength = 0
SECONDARY en_US-u-ks-level2 strength = 1
TERTIARY en_US-u-ks-level3 strength = 2
QUATERNARY en_US-u-ks-level4 strength = 3
IDENTICAL en_US-u-ks-identic strength = 15
DEFAULT en_US strength = 2
This confirms the following correspondence:
ks-level1 -> Collator::PRIMARY
ks-level2 -> Collator::SECONDARY
ks-level3 -> Collator::TERTIARY
ks-level4 -> Collator::QUATERNARY
ks-identic -> Collator::IDENTICAL
The DEFAULT row omits the ks key. It shows the default strength chosen
by ICU for the locale, rather than a locale extension value corresponding
to Collator::DEFAULT_STRENGTH.
Affected page
https://www.php.net/manual/en/collator.setstrength.php
Current issue
The
Collator::setStrength()documentation explains ICU collation strengthlevels and lists the corresponding
Collatorconstants, such asCollator::PRIMARY,Collator::SECONDARY,Collator::TERTIARY,Collator::QUATERNARY, andCollator::IDENTICAL.However, the page does not mention that collation strength may also be
requested through the Unicode locale extension key
kswhen creating aCollatorfrom a locale identifier.For example, the following two collators have the same strength:
Suggested improvement
Add a short note explaining that the Unicode locale extension key ks can
be used to request a collation strength in the locale identifier.
For example:
ks-level1corresponds toCollator::PRIMARYks-level2corresponds toCollator::SECONDARYks-level3corresponds toCollator::TERTIARYks-level4corresponds toCollator::QUATERNARYks-identiccorresponds toCollator::IDENTICALAlso mention that omitting the
kskey lets ICU use the default strengthfor the locale, rather than specifying a separate value corresponding to
Collator::DEFAULT_STRENGTH.This would help users understand the relationship between
Collator::setStrength()and strength requested through locale identifiers.It is also useful for APIs that accept a locale identifier but do not accept
a
Collatorobject directly.Additional context (optional)
This behavior is based on the Unicode LDML Collation setting options.
The
ksUnicode locale extension key is defined as the BCP 47 key forcollation strength, with values such as
level1,level2,level3,level4, andidentic.Specification reference:
https://www.unicode.org/reports/tr35/dev/tr35-collation.html#Setting_Options
The following script verifies that these
ksvalues are reflected inCollator::getStrength():Example output:
This confirms the following correspondence:
The DEFAULT row omits the ks key. It shows the default strength chosen
by ICU for the locale, rather than a locale extension value corresponding
to
Collator::DEFAULT_STRENGTH.