QStringLiteral vs QLatin1String performance

Discussion:

Kent Hansen

2011-09-30 10:57:28 UTC

Hi,
You might have seen Thiago's blog about QStringLiteral [1], and his idea
on replacing QLatin1String usage by QStringLiteral in Qt (where possible).

I like the idea, but wanted to do some benchmarking first to get an
impression of the performance impact. [2]

My results so far (on Linux 32-bit) indicate that QString::appends are
way faster when switching to using QStringLiteral: 7x faster than
QLatin1String for a 2-character literal and 14x for a ~50-character literal.

Now, the not-so-good news: operator==(QString) is a bit (just a bit)
slower than operator==(QLatin1String) for short strings.
It seems that, for short strings, the overhead of calling qMemEquals()
and performing its "housecleaning chores" outweigh the benefits of its
fast comparison loop.

In other words, if someone were to optimize QString::operator==(QString)
to perform better for small strings, the total replacement would be a
done deal.

Regards,
Kent

[1] http://www.macieira.org/blog/2011/07/qstring-improved/
[2] http://pastebin.com/jmpNAAFG

Thiago Macieira

2011-09-30 11:49:09 UTC

Permalink

Post by Kent Hansen
Hi,
You might have seen Thiago's blog about QStringLiteral [1], and his idea
on replacing QLatin1String usage by QStringLiteral in Qt (where possible).
I like the idea, but wanted to do some benchmarking first to get an
impression of the performance impact. [2]
My results so far (on Linux 32-bit) indicate that QString::appends are
way faster when switching to using QStringLiteral: 7x faster than
QLatin1String for a 2-character literal and 14x for a ~50-character literal.

Not unexpected. The conversion from Latin 1 to UTF-16 required for the
appending needs to be done at runtime for a QLatin1String, whereas the
compiler does it with QStringLiteral.

And that's assuming you didn't get an implicit conversion to QString
somewhere. QString::append has a QLatin1String overload, but some methods
don't and those cause a temporary to be created, which involves a malloc (non-
deterministic time).

Post by Kent Hansen
Now, the not-so-good news: operator==(QString) is a bit (just a bit)
slower than operator==(QLatin1String) for short strings.
It seems that, for short strings, the overhead of calling qMemEquals()
and performing its "housecleaning chores" outweigh the benefits of its
fast comparison loop.

Sounds like the result I found in the investigation of using SIMD in QString:
the best algorithm is the least complex possible. There are some better
algorithms for qMemEquals in tests/benchmarks/corelib/tools/qstring.cpp (the
ucstrncmp functions). But as the comment above qMemEquals shows, the
performance varies a lot depending on the architecture.

Post by Kent Hansen
In other words, if someone were to optimize QString::operator==(QString)
to perform better for small strings, the total replacement would be a
done deal.

The code is already there. Just replace qMemEquals with the contents of
ucstrncmp_sse2, but you need to keep the generic code for other architectures.
They'll still benefit on the unrolling of the loop for small strings (less than
8 characters).

As for Neon optimisation, the lack of a "movemask" instruction like SSE2 makes
it very hard to produce optimal code. If you look at fromUtf8_neon, you'll see
you need to execute two Neon instructions, two comparisons and then rbit and
clz. If anyone wants to try this, be my guest. I won't be doing any more Neon
optimisations :-)

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Olivier Goffart

2011-09-30 12:11:33 UTC

Permalink

You could have used template instead of macros :-)

Post by Kent Hansen
My results so far (on Linux 32-bit) indicate that QString::appends are
way faster when switching to using QStringLiteral: 7x faster than
QLatin1String for a 2-character literal and 14x for a ~50-character literal.

That is because you append to a null string.
Appending a QString to a null string is equivalent to the operator= (that is,
only setting a pointer)
could you change the benchmark to append to something?

Also, appending with QLatin1String currently do not do the sse2 fromLatin1, it
would be trivial do so by extracting the fromLatin1 code to a helper helper
function.

It would be nice to compare with different lenght AND different on the first
char.
Or better, with actual data extracted form a qDebug in the operator== for a
random application.

Post by Kent Hansen
In other words, if someone were to optimize QString::operator==(QString)
to perform better for small strings, the total replacement would be a
done deal.

Another thing that need to be taken in account is that QStringLiteral takes
more memory in the binary. Hence, more cache misses.
This is difficult to show in a benchmark that runs over the same data all the
time.

Kent Hansen

2011-09-30 12:37:10 UTC

Permalink

Post by Olivier Goffart

You could have used template instead of macros :-)

Nope :)

Post by Olivier Goffart

Good point :)

Post by Olivier Goffart
Also, appending with QLatin1String currently do not do the sse2 fromLatin1, it
would be trivial do so by extracting the fromLatin1 code to a helper helper
function.

So appending QLatin1String can be made faster?! That's horrible news, we
want QStringLiteral to beat it massively, remember :P

Post by Olivier Goffart

It would be nice to compare with different lenght AND different on the first
char.
Or better, with actual data extracted form a qDebug in the operator== for a
random application.

Lots of good ideas here, please update the benchmark and pass it around :D

Post by Olivier Goffart

Post by Kent Hansen
In other words, if someone were to optimize QString::operator==(QString)
to perform better for small strings, the total replacement would be a
done deal.

Yeah. On a related note, I replaced as many QLatin1Strings by
QStringLiteral in QtCore as possible, and the library size increase was
1% (40K) on ia32.

Regards,
Kent

Thiago Macieira

2011-09-30 13:07:17 UTC

Permalink

Post by Kent Hansen

Post by Olivier Goffart
Also, appending with QLatin1String currently do not do the sse2
fromLatin1, it would be trivial do so by extracting the fromLatin1 code
to a helper helper function.

So appending QLatin1String can be made faster?! That's horrible news, we
want QStringLiteral to beat it massively, remember :P

Heh, talk about "adjusting the data"...

Anyway, let me give you another argument: yes, you can make QString::append
faster for QLatin1String. You can also make QString::prepend faster. And you
can make QString::indexOf, QString::startsWith, QString::endsWith, etc.

The point being: QLatin1String is an *additional* code path we'd need to
optimise. If we can instead use the same codepath that is already optimal,
it's probably better.

Post by Kent Hansen

Post by Olivier Goffart

It would be nice to compare with different lenght AND different on the
first char.
Or better, with actual data extracted form a qDebug in the operator== for a
random application.

Lots of good ideas here, please update the benchmark and pass it around :D

Different-length comparison means qMemEquals isn't even called. Given that
QLatin1String carries the length just like QString (QStringLiteral) does,
those will have the exact same time.

The question of comparison time is only for same-length. Note that includes
startsWith and endsWith.

Post by Kent Hansen
Yeah. On a related note, I replaced as many QLatin1Strings by
QStringLiteral in QtCore as possible, and the library size increase was
1% (40K) on ia32.

I've got the same commit. It's a trivial increase.

The only drawback is that, due to a bug in gcc 4.6, the read-only data
initialised by a constexpr is placed in .data instead of .rodata. GCC 4.7 has
that fixed.

André Pönitz

2011-09-30 12:54:32 UTC

Permalink

Post by Olivier Goffart
Another thing that need to be taken in account is that QStringLiteral takes
more memory in the binary.

Actually quite a few of the string literals we have in Creator are essentially
plain ASCII "identifiers" that are only converted to QStrings to be able to
interface QVariantMap, QSettings etc.

I have this nagging gut feeling that it would be better for performance if
these would take QByteArray keys instead. If someone has the desparate wish
to use more than 7 bits for his key strings (I certainly don't), the convention
could just be that the encoding is implicitly assumed to be UTF-8.

I understand that this is (a) not possible to change, and (b) would mean
falling back to the Dark Ages of std::string-with-uncertain-encoding, but still,
we are discussing making a conversion fast that could be avoided altogether
in a lot of cases by just having another API.

Andre'

Konstantin Tokarev

2011-09-30 13:06:24 UTC

Permalink

Post by AndrÃ© PÃ¶nitz
Actually quite a few of the string literals we have in Creator are essentially
plain ASCII "identifiers" that are only converted to QStrings to be able to
interface QVariantMap, QSettings etc.
I have this nagging gut feeling that it would be better for performance if
these would take QByteArray keys instead. If someone has the desparate wish
to use more than 7 bits for his key strings (I certainly don't), the convention
could just be that the encoding is implicitly assumed to be UTF-8.

+1

char * would be fine too

--
Regards,
Konstantin

Thiago Macieira

2011-09-30 13:29:04 UTC

Permalink

Post by AndrÃ© PÃ¶nitz

Post by Olivier Goffart
Another thing that need to be taken in account is that QStringLiteral takes
more memory in the binary.

Actually quite a few of the string literals we have in Creator are
essentially plain ASCII "identifiers" that are only converted to QStrings
to be able to interface QVariantMap, QSettings etc.
I have this nagging gut feeling that it would be better for performance if
these would take QByteArray keys instead. If someone has the desparate wish
to use more than 7 bits for his key strings (I certainly don't), the
convention could just be that the encoding is implicitly assumed to be
UTF-8.
I understand that this is (a) not possible to change, and (b) would mean
falling back to the Dark Ages of std::string-with-uncertain-encoding, but
still, we are discussing making a conversion fast that could be avoided
altogether in a lot of cases by just having another API.

For an application's internal data, if all it wants is to store 7-bit keys to
variants, it can simply use
QMap<QByteArray, QVariant>

It doesn't have to use QVariantMap. That's what std::basic_string does: if you
want to use something other than char, you can (hence, std::wstring).

However, asking QSettings to have a QByteArray API is asking too much in my
opinion. That would mean either implicit conversions anyway into the internal
storage type, or increasing the complexity of every function dealing with the
multiple types of keys.

Neither option is interesting. Increasing the complexity of code means we may
have lingering bugs for years due to less-tested codepaths, missing fixes that
go to one branch but not the other, etc. It also means increased library size,
increased runtime due to the extra checks when compared to the library taking
one single type.

The implicit conversions just hide the problem. Instead of doing the right
thing and using QStringLiteral or caching your QStrings, you might be deceived
into writing bad code.

Stefan Majewsky

2011-10-02 12:22:59 UTC

Permalink

Post by AndrÃ© PÃ¶nitz
I have this nagging gut feeling that it would be better for performance if
these would take QByteArray keys instead. If someone has the desparate wish
to use more than 7 bits for his key strings (I certainly don't), the convention
could just be that the encoding is implicitly assumed to be UTF-8.

+1

When I design library APIs, I tend to distinguish QByteArray vs.
QString as application-internal vs. user-visible strings. So
QByteArray is used mostly as a key in QMap/QHash and semantically
similar interfaces. If users are properly educated about this
difference in semantics between QByteArray and QString, I consider it
to be very powerful.

Of course that's not an option for Qt 5 because it would be a quite
invasive change, and would slow down ported applications which store
QString keys elsewhere which need to be converted to QByteArray.

Greetings
Stefan

Konstantin Tokarev

2011-09-30 12:55:53 UTC

Permalink

Hi,
You might have seen Thiago's blog about QStringLiteral [1], and his idea
on replacing QLatin1String usage by QStringLiteral in Qt (where possible).
I like the idea, but wanted to do some benchmarking first to get an
impression of the performance impact. [2]
My results so far (on Linux 32-bit) indicate that QString::appends are
way faster when switching to using QStringLiteral: 7x faster than
QLatin1String for a 2-character literal and 14x for a ~50-character literal.
Now, the not-so-good news: operator==(QString) is a bit (just a bit)
slower than operator==(QLatin1String) for short strings.
It seems that, for short strings, the overhead of calling qMemEquals()
and performing its "housecleaning chores" outweigh the benefits of its
fast comparison loop.
In other words, if someone were to optimize QString::operator==(QString)
to perform better for small strings, the total replacement would be a
done deal.

Great news! Is it possible to use this superfast QStringLiteral with Qt 4.x?

--
Regards,
Konstantin

Olivier Goffart

2011-09-30 13:00:49 UTC

Permalink

Post by Konstantin Tokarev
Great news! Is it possible to use this superfast QStringLiteral with Qt 4.x?

No. This is a binary incompatible change in QString.

Konstantin Tokarev

2011-09-30 13:20:58 UTC

Permalink

Great news! Is it possible to use this superfast QStringLiteral with Qt 4.x?

No. This is a binary incompatible change in QString.

It's sad to realize that many embedded and legacy platforms are banned from
QString improvements because they don't support OpenGL ES 2.

--
Regards,
Konstantin

Thiago Macieira

2011-09-30 14:46:37 UTC

Permalink

Post by Konstantin Tokarev

Post by Olivier Goffart

Post by Konstantin Tokarev
Great news! Is it possible to use this superfast QStringLiteral with Qt
4.x?>

No. This is a binary incompatible change in QString.

It's sad to realize that many embedded and legacy platforms are banned from
QString improvements because they don't support OpenGL ES 2.

Install Mesa. I thought the message was clear. Some benchmarks showed that the
Mesa-based software OpenGL support is faster than the Qt raster engine.

If you want to backport the QString changes to Qt 4 in your project, you can.
We can't do it because of the binary incompatibility.

Don't forget to upgrade to GCC 4.6. Anything less and you're better off leaving
as it is.

Thiago Macieira

2011-09-30 13:29:26 UTC

Permalink

Post by Konstantin Tokarev
Great news! Is it possible to use this superfast QStringLiteral with Qt 4.x?

No. The blog explains why.