diff options
author | Stephan Bergmann <sbergman@redhat.com> | 2022-11-18 16:47:07 +0100 |
---|---|---|
committer | Stephan Bergmann <sbergman@redhat.com> | 2022-11-18 19:50:28 +0100 |
commit | 50d73574b6c3d71f9a539c895a15d6fcda22390b (patch) | |
tree | 9e8af52c52b02afdd5f83703a49f1892e64cd03e /comphelper/qa | |
parent | 9f49fc7b2a56e0d806ab8aab6e32fd1cd55c5c93 (diff) |
Related tdf#104597, tdf#151546: Introduce comphelper::string::reverseCodePoints
69e9925ded584113e52f84ef0ed7c224079fa061 "sdext.pdfimport: resolves tdf#104597:
RTL script text runs are reversed" and f6004e1c457ddab5e0c91e6159875d25130b108a
"tdf#151546: RTL text is reversed (Writer pdfimport)" had introduced two calls
to comphelper::string::reverseString into sdext. That function reverts on the
basis of individual UTF-16 code units, not on the basis of Unicode code points.
And while at least some pre-existing callers of that function want the former
semantics (see below), these two new callers in sdext apparently want the latter
semantics. Therefore, introduce an additional function
comphelper::string::reverseCodePoints with the latter semantics.
I identified three other places that call comphelper::string::reverseString:
* SbRtl_StrReverse in basic/source/runtime/methods1.cxx apparently implements
some StrReverse Basic function, where a (presumably non-existing) Basic spec
would need to decide which of the two semantics is called for. So leave it
alone for now.
* SvtFileDialog::IsolateFilterFromPath_Impl in fpicker/source/office/iodlg.cxx
reverts a string, operates on it, then reverts (parts of) it back. Whether or
not that is the most elegant code, using the latter semantics here would
apparently be wrong, as double invocation of
comphelper::string::reverseCodePoints is not idempotent when the input is a
malformed sequence of UTF-16 code units containing a low surrogate followed by
a high surrogate.
* AccessibleCell::getCellName in svx/source/table/accessiblecell.cxx apparently
always operates on a string consisting only of Latin uppercase letters A--Z,
for which both semantics are equivalent. (So we can just as well stick with
the simpler comphelper::string::reverseString here.)
(Extending the tests in comphelper/qa/string/test_string.cxx ran into an issue
where loplugin:stringliteralvar warns about deliberate uses of sal_Unicode
arrays rather than UTF-16 string literals wrapped in OUStringLiteral, as those
arrays deliberately contain malformed UTF-16 code unit sequences and thus
converting them into UTF-16 string literals might be considered inappropriate,
see the newly added comment at
StringLiteralVar::isPotentiallyInitializedWithMalformedUtf16 in
compilerplugins/clang/stringliteralvar.cxx for details. So that loplugin had to
be improved here, too.)
Change-Id: I641cc32c76b0c5f6339ae44d8aa85df0022ffb05
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/142949
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
Diffstat (limited to 'comphelper/qa')
-rw-r--r-- | comphelper/qa/string/test_string.cxx | 29 |
1 files changed, 27 insertions, 2 deletions
diff --git a/comphelper/qa/string/test_string.cxx b/comphelper/qa/string/test_string.cxx index 0b2f3ee05479..58f9c3f63c16 100644 --- a/comphelper/qa/string/test_string.cxx +++ b/comphelper/qa/string/test_string.cxx @@ -17,6 +17,10 @@ * the License at http://www.apache.org/licenses/LICENSE-2.0 . */ +#include <sal/config.h> + +#include <iterator> + #include <comphelper/string.hxx> #include <cppuhelper/implbase.hxx> #include <com/sun/star/i18n/CharType.hpp> @@ -43,6 +47,7 @@ public: void testDecimalStringToNumber(); void testIsdigitAsciiString(); void testReverseString(); + void testReverseCodePoints(); void testSplit(); void testRemoveAny(); @@ -55,6 +60,7 @@ public: CPPUNIT_TEST(testDecimalStringToNumber); CPPUNIT_TEST(testIsdigitAsciiString); CPPUNIT_TEST(testReverseString); + CPPUNIT_TEST(testReverseCodePoints); CPPUNIT_TEST(testSplit); CPPUNIT_TEST(testRemoveAny); CPPUNIT_TEST_SUITE_END(); @@ -178,9 +184,28 @@ void TestString::testTokenCount() void TestString::testReverseString() { - OUString aOut = ::comphelper::string::reverseString(u"ABC"); + CPPUNIT_ASSERT_EQUAL(OUString(), comphelper::string::reverseString(u"")); + CPPUNIT_ASSERT_EQUAL(OUString("cba"), comphelper::string::reverseString(u"abc")); + static sal_Unicode const rev[] = {'w', 0xDFFF, 0xDBFF, 'v', 0xDC00, 0xD800, 'u'}; + CPPUNIT_ASSERT_EQUAL( + OUString(rev, std::size(rev)), + comphelper::string::reverseString(u"u\U00010000v\U0010FFFFw")); + static sal_Unicode const malformed[] = {0xDC00, 0xD800}; + CPPUNIT_ASSERT_EQUAL( + OUString(u"\U00010000"), + comphelper::string::reverseString(std::u16string_view(malformed, std::size(malformed)))); +} - CPPUNIT_ASSERT_EQUAL(OUString("CBA"), aOut); +void TestString::testReverseCodePoints() { + CPPUNIT_ASSERT_EQUAL(OUString(), comphelper::string::reverseCodePoints("")); + CPPUNIT_ASSERT_EQUAL(OUString("cba"), comphelper::string::reverseCodePoints("abc")); + CPPUNIT_ASSERT_EQUAL( + OUString(u"w\U0010FFFFv\U00010000u"), + comphelper::string::reverseCodePoints(u"u\U00010000v\U0010FFFFw")); + static sal_Unicode const malformed[] = {0xDC00, 0xD800}; + CPPUNIT_ASSERT_EQUAL( + OUString(u"\U00010000"), + comphelper::string::reverseCodePoints(OUString(malformed, std::size(malformed)))); } void TestString::testSplit() |