Discussion:
Merging the vector/array headers in QtCore
Thiago Macieira
2011-10-16 14:30:39 UTC
Permalink
When I talked to João this week, he expressed the desire to unify the header
code of QVector, QByteArray and QString. Since QByteArray is a vector of char
and QString is a vector of QChar / ushort, it makes sense to unify the code to
simplify maintenance. Whether specific optimisations in allocation policies are
needed, we can figure out later.

But given my last email, about also bringing in QList for small and movable
types into this merging, I pointed out that QList has one optimisation that
QVector doesn't have.

To support that feature, the QListData header currently is:

QAtomicInt ref;
int alloc;
int begin;
int end;
// size = 16 bytes, alignment = 4 bytes

There's also an 1-bit flag which I have not included. It can be stored in the
sign bit of one of the ints in the future.

In addition, the current QStringData and QByteArrayData headers are (not
including the flags):

QAtomicint ref;
int size;
int alloc;
qptrdiff offset;
// size = 16 (24) bytes, alignment = 4 (8) bytes

Note how there's a 4-byte padding gap before the "offset" member on 64-bit
platforms and how the data starts at an offset of 24 bytes. Given a 16-byte
aligned allocator, which is common on 64-bit platforms, the data starts at an
8 byte offset from the nearest 16-byte alignment, which is a problem for SSE2
optimisations.

I'd like to pick all of your brains for a solution that has the following
properties:

- size always 16 bytes
- contains a qptrdiff offset, to still allow for fromRawData (which we'd
then introduce to QVector too, but not QList)
- can store at least one 1-bit flag, but 2 would be preferred
- size calculation reasonably fast
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Olivier Goffart
2011-10-16 15:02:10 UTC
Permalink
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the header
code of QVector, QByteArray and QString. Since QByteArray is a vector of
char and QString is a vector of QChar / ushort, it makes sense to unify the
code to simplify maintenance. Whether specific optimisations in allocation
policies are needed, we can figure out later.
Appart from consistency in out internal structures, what is the gain?
What is the code that can be reuse? (Appart the declaration itself, i can't
think of any)

As this is an internal thing, this do not give us anything regarding the API
point of view.
And this keep us to have nice per/data structure optimisation.

In other word, i'm not really in favor of this.

(I think the sharing should happen at code level, by providing the same
interface which enable templated code (like QStringBuilder) to work of all
types.)
Post by Thiago Macieira
But given my last email, about also bringing in QList for small and movable
types into this merging, I pointed out that QList has one optimisation that
QVector doesn't have.
QAtomicInt ref;
int alloc;
int begin;
int end;
// size = 16 bytes, alignment = 4 bytes
There's also an 1-bit flag which I have not included. It can be stored in
the sign bit of one of the ints in the future.
In addition, the current QStringData and QByteArrayData headers are (not
QAtomicint ref;
int size;
int alloc;
qptrdiff offset;
// size = 16 (24) bytes, alignment = 4 (8) bytes
Note how there's a 4-byte padding gap before the "offset" member on 64-bit
platforms and how the data starts at an offset of 24 bytes. Given a 16-byte
aligned allocator, which is common on 64-bit platforms, the data starts at
an 8 byte offset from the nearest 16-byte alignment, which is a problem for
SSE2 optimisations.
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
Post by Thiago Macieira
I'd like to pick all of your brains for a solution that has the following
- size always 16 bytes
Not possible on 64bit
Or did you mean "multiple of"?
Post by Thiago Macieira
- contains a qptrdiff offset, to still allow for fromRawData (which we'd
then introduce to QVector too, but not QList)
And there again you are saying it will be different for QList and QVector
Or you want to put a lot of unions there (which negates the hole purpose)
Post by Thiago Macieira
- can store at least one 1-bit flag, but 2 would be preferred
The more flag you have, the more future proof you are if you want to keep
binary compatibility.
Post by Thiago Macieira
- size calculation reasonably fast
And add stuff like:
- can be generate at compile time and put in the .rodata (like we did for
QString)
Thiago Macieira
2011-10-16 16:57:44 UTC
Permalink
Post by Olivier Goffart
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the
header code of QVector, QByteArray and QString. Since QByteArray is a
vector of char and QString is a vector of QChar / ushort, it makes sense
to unify the code to simplify maintenance. Whether specific optimisations
in allocation policies are needed, we can figure out later.
Appart from consistency in out internal structures, what is the gain?
What is the code that can be reuse? (Appart the declaration itself, i can't
think of any)
As this is an internal thing, this do not give us anything regarding the API
point of view.
And this keep us to have nice per/data structure optimisation.
In other word, i'm not really in favor of this.
Why? The gains are the simplification and the limited code reuse. And as you
say, it's an internal thing.

What reasons are there not to do this? We'll have three classes with the exact
same members anyway...
Post by Olivier Goffart
(I think the sharing should happen at code level, by providing the same
interface which enable templated code (like QStringBuilder) to work of all
types.)
Sorry, I didn't get this part. What do you mean by it?
Post by Olivier Goffart
Post by Thiago Macieira
QAtomicint ref;
int size;
int alloc;
qptrdiff offset;
// size = 16 (24) bytes, alignment = 4 (8) bytes
Note how there's a 4-byte padding gap before the "offset" member on 64-bit
platforms and how the data starts at an offset of 24 bytes. Given a 16-byte
aligned allocator, which is common on 64-bit platforms, the data starts at
an 8 byte offset from the nearest 16-byte alignment, which is a problem for
SSE2 optimisations.
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
We discussed that but we're not satisfied that it will provide good results. It
means spending 4 bytes to store the hash as well as hardcoding the hashing
function until Qt 6. Someone mentioned that only 10% of the strings are ever
hashed. In particular, I am not ready to give up a 16-byte alignment for the
hashing.

Maybe João can give more details of his thinking.
Post by Olivier Goffart
Post by Thiago Macieira
I'd like to pick all of your brains for a solution that has the following
- size always 16 bytes
Not possible on 64bit
Or did you mean "multiple of"?
No, I meant 16 bytes exactly. I think it's possible, so I'll disregard your
"not possible" part.

If we can't make it fit 16 bytes, then we have a question to be asked: is 24
good enough or should we go for 32? 32 bytes gives us our 16-byte alignment
*provided* that the allocator is also doing that.

But if we have fromRawData, we'll need to support misaligned operations
anyway.
Post by Olivier Goffart
Post by Thiago Macieira
- contains a qptrdiff offset, to still allow for fromRawData (which we'd
then introduce to QVector too, but not QList)
And there again you are saying it will be different for QList and QVector
Or you want to put a lot of unions there (which negates the hole purpose)
QList<T> where T is a small movable type is a wrapper or typedef around
QVector<T>. QVector has the fromRawData option, whereas QList doesn't.
Post by Olivier Goffart
Post by Thiago Macieira
- can store at least one 1-bit flag, but 2 would be preferred
The more flag you have, the more future proof you are if you want to keep
binary compatibility.
Post by Thiago Macieira
- size calculation reasonably fast
- can be generate at compile time and put in the .rodata (like we did for
QString)
That implies having no pointers, which is why QString has that qptrdiff offset
in the first place. QVectorData and QListData today already don't have any
pointers.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
João Abecasis
2011-10-17 08:31:36 UTC
Permalink
Post by Thiago Macieira
Post by Olivier Goffart
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
We discussed that but we're not satisfied that it will provide good results. It
means spending 4 bytes to store the hash as well as hardcoding the hashing
function until Qt 6. Someone mentioned that only 10% of the strings are ever
hashed. In particular, I am not ready to give up a 16-byte alignment for the
hashing.
Maybe João can give more details of his thinking.
When we discussed this, no one seemed to be much in favor of the change, in subjective terms. One concrete show-stopper is the concern that it makes QString's hashing function part of the ABI.

Other concerns, as you mention, are the waste of space and whether the resulting improvement is worth the trouble -- so, 10% of strings are ever hashed, but are there places where we re-hash the same string over and over that other improvements in code would provide better results?

I haven't given this much more thought, but I'm wondering if there are places where introducing a QString-compatible QString-extension (say QHashedString) would provide the same benefits... That is something that can be played with before we change QString.

Cheers,


João
Thiago Macieira
2011-10-17 08:55:35 UTC
Permalink
Post by João Abecasis
When we discussed this, no one seemed to be much in favor of the change, in
subjective terms. One concrete show-stopper is the concern that it makes
QString's hashing function part of the ABI.
That's the worst part IMO -- assuming that the patch to calculate hashes at
compile time (C++11 feature) goes in too.
Post by João Abecasis
Other concerns, as you mention, are the waste of space and whether the
resulting improvement is worth the trouble -- so, 10% of strings are ever
hashed, but are there places where we re-hash the same string over and over
that other improvements in code would provide better results?
If we give up on the 16-byte header, then we may have room for storing a hash.
E.g.:

QAtomicInt ref;
int alloc;
qptrdiff begin;
qptrdiff end;
int flags;
int hash;
// size = 24 (32) bytes

We could even use a bit in the flags to indicate which version of the hashing
function was used, if we choose to change it later. That means hashing at
runtime all the constant strings that haven't been recompiled -- a solution
that is no worse than the current solution.

In theory, a compile-time hash could even use the reference counter field, as
long as it produces a negative number.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Olivier Goffart
2011-10-17 09:14:01 UTC
Permalink
Post by João Abecasis
Post by Thiago Macieira
Post by Olivier Goffart
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
We discussed that but we're not satisfied that it will provide good
results. It means spending 4 bytes to store the hash as well as
hardcoding the hashing function until Qt 6. Someone mentioned that only
10% of the strings are ever hashed. In particular, I am not ready to
give up a 16-byte alignment for the hashing.
Maybe João can give more details of his thinking.
When we discussed this, no one seemed to be much in favor of the change, in
subjective terms. One concrete show-stopper is the concern that it makes
QString's hashing function part of the ABI.
QString's hashing is only part of the ABI because of the compile time hashes.

Anyway, it can easily be versionized (read the comment on the merge request)
that is, we add a uint stringVersion :8; in the flags, and qHash become
if (stringData->hash && stringData->stringVersion == _Current_Version)
return stringData->hash;
//else ignore the embedded hash

So this is not really an issue, is it?
Post by João Abecasis
Other concerns, as you mention, are the waste of space and whether the
resulting improvement is worth the trouble
For the space, 16bytes is not much, just enough to store "QString" (in utf16
null-terminated)

(at the same time, some people argue over QStringLitteral should be used
instead of QLatin1String even in places where it does not matter from a
performence point of view, while taking twice as much memory)
Post by João Abecasis
-- so, 10% of strings are ever
hashed, but are there places where we re-hash the same string over and over
that other improvements in code would provide better results?
Well, this is why QtDeclarative has its internal QHashedString.
Post by João Abecasis
I haven't given this much more thought, but I'm wondering if there are
places where introducing a QString-compatible QString-extension (say
QHashedString) would provide the same benefits... That is something that
can be played with before we change QString.
The problem is that it is that the hash is lost each time we go back to a
QString.
--
Olivier

P.S:

This common pattern is problematic because there is no way to avoid computing
twice the hash in the cache-miss case. (even if one could argue it is not a
problem because cache-miss is meant to be a slow case)

auto it = hash.find(key);
if (it == hash.end())
it = hash.insert(key, computeValue(key));
return *it;
André Pönitz
2011-10-17 08:46:03 UTC
Permalink
Post by Thiago Macieira
Post by Olivier Goffart
[...]
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
We discussed that but we're not satisfied that it will provide good results. It
means spending 4 bytes to store the hash as well as hardcoding the hashing
function until Qt 6. Someone mentioned that only 10% of the strings are ever
hashed. In particular, I am not ready to give up a 16-byte alignment for the
hashing.
Re "hash": Can't the hash be put at ((char*)data)[-4] (plus the necessary
adjustments)?

Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....

Andre'
Robin Burchell
2011-10-17 08:58:04 UTC
Permalink
hi,
Post by André Pönitz
Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....
isn't QVarLengthArray what you're after?

thanks,

robin
André Pönitz
2011-10-17 09:18:41 UTC
Permalink
Post by Robin Burchell
hi,
Post by André Pönitz
Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....
isn't QVarLengthArray what you're after?
Except for the ugly name, a few missing convenience functions like 'indexOf',
and the second template argument, sure ;-}

But QVector without implicit sharing would fit the same bill.

Given that QVector and QList cover the same ground in a lot of cases
and QVector is not heavily used in Qt API (compared to QList) tweaking
QVector seems to be the better choice from my perspective.

Andre'
Robin Burchell
2011-10-17 09:27:30 UTC
Permalink
Post by André Pönitz
Post by Robin Burchell
isn't QVarLengthArray what you're after?
Except for the ugly name, a few missing convenience functions like 'indexOf',
well... yeah, i'm sure it could do with a better name, and some more
convenience-of-use. point is, it's there now, even if it's not ideal
Post by André Pönitz
and the second template argument, sure ;-}
you don't have to use that - at least off the top of my head, it has a
default value.
Post by André Pönitz
Given that QVector and QList cover the same ground in a lot of cases
and QVector is not heavily used in Qt API (compared to QList) tweaking
QVector seems to be the better choice from my perspective.
If they were an internal implementation detail, I'd agree. given that
you don't know how they are used outside of Qt, I don't think this is
a safe assumption - I've seen (and written) rather a lot of code
copying these around, and thus it's not something I'd personally want
to do.

thanks,

robin
André Pönitz
2011-10-17 10:57:15 UTC
Permalink
Post by Robin Burchell
Post by André Pönitz
Post by Robin Burchell
isn't QVarLengthArray what you're after?
Except for the ugly name, a few missing convenience functions like 'indexOf',
well... yeah, i'm sure it could do with a better name, and some more
convenience-of-use. point is, it's there now, even if it's not ideal
Post by André Pönitz
and the second template argument, sure ;-}
you don't have to use that - at least off the top of my head, it has a
default value.
[With impact on the ability to use forward declarations. But yes, this
is not really dramatic either.]
Post by Robin Burchell
Post by André Pönitz
Given that QVector and QList cover the same ground in a lot of cases
and QVector is not heavily used in Qt API (compared to QList) tweaking
QVector seems to be the better choice from my perspective.
If they were an internal implementation detail, I'd agree. given that
you don't know how they are used outside of Qt, I don't think this is
a safe assumption - I've seen (and written) rather a lot of code
copying these around, and thus it's not something I'd personally want
to do.
I have by now seen several ("numerical") projects initially using QVector also
for "main storage", only to replace it by "anything else" rather quickly.

So while it may not safe to assume that people do not use QVector, I'd say
it's a rather safe bet to assume that QVector is not used in critical code by
people who do measure performance and code size. And the others don't care
and/or won't notice.

Also note that dropping the implicit sharing does not necessarily translate
into "returning a QVector creates a deep copy". E.g. gcc applies RVO rather
aggressively, even at -O0.

Well, maybe the solution is really to make the QVarLengthArray interface
more "complete" and have the docs refer to it more often, so the typical
choice is not "QList vs QVector" but "QList vs QVarLengthArray". Ugly....

Andre'

PS: If you want to see something _really_ scary, run

#include <QVector>
QVector<int> bar()
{
QVector<int> c;
c.append(1);
return c;
}

through gcc -S (with either -O2 or -O0) and compare it to, say,

#include <vector>
std::vector<int> foo()
{
std::vector<int> c;
c.push_back(1);
return c;
}
Robin Burchell
2011-10-17 11:04:36 UTC
Permalink
Post by André Pönitz
Well, maybe the solution is really to make the QVarLengthArray interface
more "complete" and have the docs refer to it more often, so the typical
choice is not "QList vs QVector" but "QList vs QVarLengthArray". Ugly....
now is a good time to suggest a new name, if you can think of a better
one. Given that it's not used all that often, and it *would* just be a
case of renaming, perhaps it'd stand a chance of getting through? why
not just QArray? :)

...heh, and straight after googling that name just to make sure it wasn't used:
http://doc.qt.nokia.com/qt3d-snapshot/qarray.html
Thiago Macieira
2011-10-17 12:25:34 UTC
Permalink
Post by André Pönitz
Also note that dropping the implicit sharing does not necessarily translate
into "returning a QVector creates a deep copy". E.g. gcc applies RVO rather
aggressively, even at -O0.
And QVector now has a move constructor.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
João Abecasis
2011-10-17 09:01:17 UTC
Permalink
Post by André Pönitz
Post by Thiago Macieira
Post by Olivier Goffart
[...]
One thing I would like to add is the hash, for QString (and potentialy
QByteArray)
see: http://qt.gitorious.org/qt/qtbase/merge_requests/62
We discussed that but we're not satisfied that it will provide good results. It
means spending 4 bytes to store the hash as well as hardcoding the hashing
function until Qt 6. Someone mentioned that only 10% of the strings are ever
hashed. In particular, I am not ready to give up a 16-byte alignment for the
hashing.
Re "hash": Can't the hash be put at ((char*)data)[-4] (plus the necessary
adjustments)?
Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....
I don't think we can remove reference counting from QVector. That said, I do want to give you access to a non-reference-checked version of it. This would be available on detach, probably through the use of a private base class:

template <class T> QUncheckedVector { ... QVectorData<T> *d; };
template <class T> QVector : private QUncheckedVector<T> { ... QUncheckedVector & detach(); ... };

I have a toy implementation of the above, but haven't tried to bring it into Qt, yet. AFAICS, this would allow give you (or allow you to access) the best of both worlds.

Cheers,


João
Thiago Macieira
2011-10-17 09:01:50 UTC
Permalink
Post by André Pönitz
Re "hash": Can't the hash be put at ((char*)data)[-4] (plus the necessary
adjustments)?
Yes, and it can also be stored past the end of the data and a suitable bit set
in the flags.

In fact, that's not a bad idea: storing an extra header *past* the end of the
data. It wouldn't affect the alignment constraints.
Post by André Pönitz
Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....
João had some ideas on how to do this. When we talked, he basically wanted the
same API, on the same QVectorData header, but without using the reference
counter field.

João?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
André Pönitz
2011-10-17 09:44:10 UTC
Permalink
Post by Thiago Macieira
Post by André Pönitz
Re "hash": Can't the hash be put at ((char*)data)[-4] (plus the necessary
adjustments)?
Yes, and it can also be stored past the end of the data and a suitable bit set
in the flags.
In fact, that's not a bad idea: storing an extra header *past* the end of the
data. It wouldn't affect the alignment constraints.
Why "past"? Don't we typically get memory allocated at 16*n+8, and would
like to use 16*m (with m = n+1 perhaps) instead? Adjusting it would leave
us with a "natural" gap of 8 bytes in front of the "main" data, 4 of which
could be used for the hash.

Actually, with 4 more left, couldn't even the "alloc" field could go there?
It's rarely accessed, and would take some pressure off the main structure.
Post by Thiago Macieira
Post by André Pönitz
Re "QList-and-QVector" merge: I'd really like to see reference counting
_removed_ from QVector. We do need _one_ easy-to-use-no-overhead
container. That probably make that merge harder....
João had some ideas on how to do this. When we talked, he basically wanted the
same API, on the same QVectorData header, but without using the reference
counter field.
If that works, it should fit the bill, too.

Andre'
Thiago Macieira
2011-10-17 10:56:48 UTC
Permalink
Post by André Pönitz
In fact, that's not a bad idea: storing an extra header past the end of
the data. It wouldn't affect the alignment constraints.
Why "past"? Don't we typically get memory allocated at 16*n+8, and would
like to use 16*m (with m = n+1 perhaps) instead? Adjusting it would leave
us with a "natural" gap of 8 bytes in front of the "main" data, 4 of which
could be used for the hash.
Actually, with 4 more left, couldn't even the "alloc" field could go there?
It's rarely accessed, and would take some pressure off the main structure.
You know, every time I look at a heap pointer value in 32-bit Linux in a
debugger, it ends in hex 8 . That is, like you said, the pointer value is a
16*n+8.

However, the histogram when I profiled the string data shows that pointers of
16*n also happen and are quite common. I'd say that over a large sample, they
are probably close to half of the heap pointers.

Given a 16+8 byte headers, we could position the 8 bytes either before or
after the data so that we align the data.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Oswald Buddenhagen
2011-10-17 11:59:47 UTC
Permalink
Post by André Pönitz
Actually, with 4 more left, couldn't even the "alloc" field could go there?
It's rarely accessed, and would take some pressure off the main structure.
it makes append()ing by char extremely slow.
that's also the reason why i'm skeptical about putting the capacity bit
into the size field.
João Abecasis
2011-10-17 08:46:55 UTC
Permalink
Post by Thiago Macieira
Post by Olivier Goffart
- can be generate at compile time and put in the .rodata (like we did for
QString)
That implies having no pointers, which is why QString has that qptrdiff offset
in the first place. QVectorData and QListData today already don't have any
pointers.
This is something I want to support more generally: mmap'able data structures, not necessarily compile time generated. At the minimum I'd like to get an array (QVector) and a dictionary (QHash/QMap), in particular dictionaries where the key is QString/QByteArray. We'd have to be creative and directly abuse QStringData/QByteArrayData (or a unified QArrayData) for this to happen.

That's in addition to the existing support for QByteArray and QString.

Another variation of the array classes I've wondered about is one with a detached reference count, say:

QStringRef
{
QAtomicInt *ref;
int size;
int alloc;
qptrdiff offset;
};

Provided we can specify how data is deleted when the reference count goes to zero (if at all), something along those lines would allow allocation-less manipulation of array data coming from external sources.

Cheers,


João
Thiago Macieira
2011-10-17 09:10:46 UTC
Permalink
Post by João Abecasis
This is something I want to support more generally: mmap'able data
structures, not necessarily compile time generated. At the minimum I'd like
to get an array (QVector) and a dictionary (QHash/QMap), in particular
dictionaries where the key is QString/QByteArray. We'd have to be creative
and directly abuse QStringData/QByteArrayData (or a unified QArrayData) for
this to happen.
I want that too! But I just don't think we can have that in the standard
QVector or especially QHash. Though technically, if you have:

QVector<QStringData> vector = QVector<QStringData>::fromRawData(ptr);

You can write:

QString s = vector.at(0);

Since QString is just a pointer to QStringData and automatically (implicitly)
casts from it.
Post by João Abecasis
That's in addition to the existing support for QByteArray and QString.
Another variation of the array classes I've wondered about is one with a
QStringRef
{
QAtomicInt *ref;
int size;
int alloc;
qptrdiff offset;
};
Provided we can specify how data is deleted when the reference count goes to
zero (if at all), something along those lines would allow allocation-less
manipulation of array data coming from external sources.
I don't like a pointer to the reference count... that means an additional
indirection when doing counting up and down.

As for having a "deleter function", like QSharedPointer, it makes sense. It
would allow implementations to finally use fromRawData on mmap'ed data and be
able to eventually free that data.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Thiago Macieira
2011-10-16 19:09:44 UTC
Permalink
Post by Thiago Macieira
- size always 16 bytes
- contains a qptrdiff offset, to still allow for fromRawData (which we'd
then introduce to QVector too, but not QList)
- can store at least one 1-bit flag, but 2 would be preferred
- size calculation reasonably fast
Here's an idea:
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes

Allocation sizes are always positive, so the sign bit in alloc is free for our
use. If the sign bit is set (alloc < 0) then we're dealing with fromRawData.
If it's not set, we're dealing with data we allocated ourselves.

The two 1-bit flags can be stored in the sign bits of begin and end, but only
for own-allocated data. Currently, the two flags we use are sharable and
capacity, neither of which makes sense for raw data.

We'd have:

int size() const { return d->alloc < 0 ? -d->alloc : d->end - d->begin; }
T *data() { return d->array() + (d->alloc < 0 ? d->offset : d->begin); }

Advantages:
* 16 bytes on all platforms, regardless of pointer size
* 2 bits available for flags
* raw data is possible
* size calculation involves one extra comparison (the subtraction would have
been there anyway)
* the data() function above does not actually do any comparison on 32-bit
platforms, as offset and begin always occupy the same space
* the calculation to increase the size upon reallocation like:
if (newAlloc > d->alloc)
does not require special code to deal with raw data

Disadvantages:
* No wiggle room for *any* future expansion
* Storing the flags in the sign bits require bit fields or other manipulations
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Thiago Macieira
2011-10-16 19:55:16 UTC
Permalink
Post by Thiago Macieira
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes
And here are two possibilities admitting defeat and going over 16 bytes:

Option 1:

QAtomicInt ref;
int alloc;
union {
qptrdiff begin;
qint64 dummy;
};
int end;
int flags;
// size = 24 bytes

Advantages:
* 32 bits of flags available, reserving room for future expansion
* no fiddling with sign bits anywhere

Disadvantages:
* 32 bits wasted on 32-bit platforms, which will never be used
* assuming an allocator aligning to 16 bytes, the start of the data will
always be 8 bytes off, incurring performance penalty with SSE2 operations
(> 99% of the cases)
* QVectors of SSE types will have 8 bytes of padding

Option 2:

QAtomicInt ref;
int flags;
union {
qptrdiff alloc;
qint64 dummy;
};
qptrdiff begin;
qptrdiff end;
// size = 24 (32) bytes

Advantages:
* 32 bits of flags available
* size multiple of 16 on 64-bit platforms, for best SSE2 performance
* full 64-bit sizes for 64-bit machines, allowing for allocation of more than
2 GB of data. The same header could be used for a QHugeVector class that
operates on signed 64-bit sizes, allowing up to 8388608 TB of data
* No padding required for QVectors of SSE types

Disadvantages:
* 100% bigger than the original structure, 50% bigger than the Option 1
* 32 bits wasted on 32-bit platforms

On 32-bit machines, if the allocator produces 16-byte-aligned memory regions,
we'll be wrong on >95% of the cases, causing SSE2 performance penalties.
However, if the allocator produces 8-byte-algined memory regions, as malloc in
glibc does, we'll be wrong just over 50% of the cases whether the structure is
24 or 32 bytes long. So we gain nothing by making it 32 bytes long on 32-bit
machines.

The %-age of the use-cases is based on my experience with attempting SIMD
optimisations on QString. Over a large sample, I found out that 95%-99% of the
data comes from QString's own allocations and the rest (1-5%) comes from
fromRawData. The strings in fromRawData are evenly distributed across all
possible alignments, the strings allocated by QString are evenly distributed
across both possibilities on 32-bit machines.

In other words, the histogram of QString data alignments, on a 32-bit machine
with an 8-byte-aligning allocator (like glibc's) should be roughly like the
following, with both a 16, 24 or 32-byte header:

0 48.5%
2 0.5%
4 0.5%
6 0.5%
8 48.5%
10 0.5%
12 0.5%
14 0.5%

With a 16- or 32-byte header with an allocator giving aligned-to-16 memory
regions, we should see:

0 96.5%
2 0.5%
4 0.5%
6 0.5%
8 0.5%
10 0.5%
12 0.5%
14 0.5%

To make the 32-bit structure fit the latter profile above, we'd need to add
another 8 bytes to the header (bringing the total wastage to 12 bytes) and
hope for an allocator that aligns to 16 bytes. Using posix_memalign or
equivalent functions is likely to simply cause another 8 bytes of overhead
inside the allocators.

An alternative, and IMHO better, approach would be to always allocate 8 bytes
more than strictly needed and force d->begin to the 16-byte boundary. That
means that d->begin == 4 whenever d is misaligned. This approach would allow
us to achieve the above profile even on systems with allocators giving 8-byte-
aligned pointers, such as glibc 32-bit.

It would also allow us to adapt on-the-fly if the allocator is updated and
starts to give us 16-byte aligned pointers on 32-bit, which would otherwise be
the worst case scanerio below:

0 0.5%
2 0.5%
4 0.5%
6 0.5%
8 96.5%
10 0.5%
12 0.5%
14 0.5%
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
l***@nokia.com
2011-10-17 08:25:01 UTC
Permalink
Do we really believe we need the 32 flags? We need capacity. sharable is
only used internally right now (in qiterator.h). Have we thought if there
are ways to avoid needing it?

I like the idea of getting the header into 16 bytes at least on 32bit
platforms. We don't have to be as strict on 64bit IMO, as these platforms
should usually have some more memory available anyway.

Cheers,
Lars
Post by Thiago Macieira
Post by Thiago Macieira
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes
QAtomicInt ref;
int alloc;
union {
qptrdiff begin;
qint64 dummy;
};
int end;
int flags;
// size = 24 bytes
* 32 bits of flags available, reserving room for future expansion
* no fiddling with sign bits anywhere
* 32 bits wasted on 32-bit platforms, which will never be used
* assuming an allocator aligning to 16 bytes, the start of the data will
always be 8 bytes off, incurring performance penalty with SSE2 operations
(> 99% of the cases)
* QVectors of SSE types will have 8 bytes of padding
QAtomicInt ref;
int flags;
union {
qptrdiff alloc;
qint64 dummy;
};
qptrdiff begin;
qptrdiff end;
// size = 24 (32) bytes
* 32 bits of flags available
* size multiple of 16 on 64-bit platforms, for best SSE2 performance
* full 64-bit sizes for 64-bit machines, allowing for allocation of more than
2 GB of data. The same header could be used for a QHugeVector class that
operates on signed 64-bit sizes, allowing up to 8388608 TB of data
* No padding required for QVectors of SSE types
* 100% bigger than the original structure, 50% bigger than the Option 1
* 32 bits wasted on 32-bit platforms
On 32-bit machines, if the allocator produces 16-byte-aligned memory regions,
we'll be wrong on >95% of the cases, causing SSE2 performance penalties.
However, if the allocator produces 8-byte-algined memory regions, as malloc in
glibc does, we'll be wrong just over 50% of the cases whether the structure is
24 or 32 bytes long. So we gain nothing by making it 32 bytes long on 32-bit
machines.
The %-age of the use-cases is based on my experience with attempting SIMD
optimisations on QString. Over a large sample, I found out that 95%-99% of the
data comes from QString's own allocations and the rest (1-5%) comes from
fromRawData. The strings in fromRawData are evenly distributed across all
possible alignments, the strings allocated by QString are evenly distributed
across both possibilities on 32-bit machines.
In other words, the histogram of QString data alignments, on a 32-bit machine
with an 8-byte-aligning allocator (like glibc's) should be roughly like the
0 48.5%
2 0.5%
4 0.5%
6 0.5%
8 48.5%
10 0.5%
12 0.5%
14 0.5%
With a 16- or 32-byte header with an allocator giving aligned-to-16 memory
0 96.5%
2 0.5%
4 0.5%
6 0.5%
8 0.5%
10 0.5%
12 0.5%
14 0.5%
To make the 32-bit structure fit the latter profile above, we'd need to add
another 8 bytes to the header (bringing the total wastage to 12 bytes) and
hope for an allocator that aligns to 16 bytes. Using posix_memalign or
equivalent functions is likely to simply cause another 8 bytes of overhead
inside the allocators.
An alternative, and IMHO better, approach would be to always allocate 8 bytes
more than strictly needed and force d->begin to the 16-byte boundary. That
means that d->begin == 4 whenever d is misaligned. This approach would allow
us to achieve the above profile even on systems with allocators giving 8-byte-
aligned pointers, such as glibc 32-bit.
It would also allow us to adapt on-the-fly if the allocator is updated and
starts to give us 16-byte aligned pointers on 32-bit, which would otherwise be
0 0.5%
2 0.5%
4 0.5%
6 0.5%
8 96.5%
10 0.5%
12 0.5%
14 0.5%
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
_______________________________________________
Qt5-feedback mailing list
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Thiago Macieira
2011-10-17 08:44:39 UTC
Permalink
Post by l***@nokia.com
Do we really believe we need the 32 flags? We need capacity. sharable is
only used internally right now (in qiterator.h). Have we thought if there
are ways to avoid needing it?
I like the idea of getting the header into 16 bytes at least on 32bit
platforms. We don't have to be as strict on 64bit IMO, as these platforms
should usually have some more memory available anyway.
Well, no, we don't need 32 flags today. We only need 2. But who knows what we
may need in the future? When I introduced the feature of aligned allocation to
some of the containers, in one of them having access to an unused flag bit
allowed the implementation to happen.

Using the sign bits is ugly but possible. On x86, x86-64, IA-64 and ARM it
means one more instruction (one AND), on MIPS it's 3 more (the AND and two
instructions to load the value 0x7fffffff into a register).

I just unrolled the flags because we had enough space left if we give up trying
to get to 16 bytes.


The important thing is that the header is a multiple of 8 bytes on 32-bit
platforms and a multiple of 16 bytes on 64-bit platforms. What we mustn't do
is a multiple of 8 and not a multiple of 16 (i.e., 24 bytes) on 64-bit
platforms.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Thiago Macieira
2011-10-17 11:10:40 UTC
Permalink
Post by Thiago Macieira
Post by Thiago Macieira
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes
QAtomicInt ref;
int alloc;
union {
qptrdiff begin;
qint64 dummy;
};
int end;
int flags;
// size = 24 bytes
QAtomicInt ref;
int flags;
union {
qptrdiff alloc;
qint64 dummy;
};
qptrdiff begin;
qptrdiff end;
// size = 24 (32) bytes
Here's another suggestion now, with "split personality". It's not very good,
so suggestions to improve are very much welcome.

Option 3:
struct QArrayData {
QAtomicInt ref;
int begin;
int size;
int flags;
// size = 16 bytes
};

struct QArrayRawData {
T *offset;
void (*deleterFunction)(T *);
};

struct QArrayOwnData {
int alloc;
int hash;
...
};

Somewhere in the fixed header, we need an indication of whether the data is raw
or allocated by us. It can be either a bit in the flags or a negative begin
value (begin is unused if offset is used). I replaced the end value with size,
which means the number of elements allocated but unused past the end is:
alloc - begin - size

However, the calculation of the beginning of the data pointer now involves a
comparison:
T *data()
{
if (d->isRawData()) return d->rawData()->offset;
return d->array() + d->begin;
}
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
l***@nokia.com
2011-10-17 12:11:09 UTC
Permalink
Post by Thiago Macieira
Post by Thiago Macieira
Post by Thiago Macieira
QAtomicInt ref;
int alloc;
union {
qptrdiff offset;
struct { int begin; int end; };
};
// size = 16 bytes
QAtomicInt ref;
int alloc;
union {
qptrdiff begin;
qint64 dummy;
};
int end;
int flags;
// size = 24 bytes
QAtomicInt ref;
int flags;
union {
qptrdiff alloc;
qint64 dummy;
};
qptrdiff begin;
qptrdiff end;
// size = 24 (32) bytes
Here's another suggestion now, with "split personality". It's not very good,
so suggestions to improve are very much welcome.
struct QArrayData {
QAtomicInt ref;
int begin;
int size;
int flags;
// size = 16 bytes
};
struct QArrayRawData {
T *offset;
void (*deleterFunction)(T *);
};
struct QArrayOwnData {
int alloc;
int hash;
...
};
Somewhere in the fixed header, we need an indication of whether the data is raw
or allocated by us. It can be either a bit in the flags or a negative begin
value (begin is unused if offset is used). I replaced the end value with size,
alloc - begin - size
However, the calculation of the beginning of the data pointer now involves a
T *data()
{
if (d->isRawData()) return d->rawData()->offset;
return d->array() + d->begin;
}
Aren't we now making things way to complicated?

If we go for storing the hash in the string as well, we can't do it in 16
bytes, so 24 is our next best option on 32bit systems. How about a simple

struct Data {
QRefCount ref;
int alloc;
uint32 flags;
uint32 hash;
int size; // or end, depending on what's more convenient
qptrdiff begin;
}

This gives 24 bytes on 32bit systems, 32 on 64bit. The hash stays 0 for
compile time strings.

Another option we discussed earlier (at least for QString and QByteArray
was to remove the fromRawData() functionality. In that case we could
simply do:

struct QString/ByteArrayData {
QRefCount ref;
int alloc;
int size; // or end, depending on what's more convenient
uint32 hash;
}

and try to squeeze the required bits into the sign bits of alloc and size.


Cheers,
Lars
Thiago Macieira
2011-10-17 14:18:11 UTC
Permalink
Post by l***@nokia.com
Aren't we now making things way to complicated?
Yes.
Post by l***@nokia.com
If we go for storing the hash in the string as well, we can't do it in 16
bytes, so 24 is our next best option on 32bit systems. How about a simple
struct Data {
QRefCount ref;
int alloc;
uint32 flags;
uint32 hash;
int size; // or end, depending on what's more convenient
qptrdiff begin;
}
This gives 24 bytes on 32bit systems, 32 on 64bit. The hash stays 0 for
compile time strings.
Right. On 64-bit systems, there's a 4-byte padding before "begin", but I guess
that is intentional.
Post by l***@nokia.com
Another option we discussed earlier (at least for QString and QByteArray
was to remove the fromRawData() functionality. In that case we could
[snip]

I'd prefer the above option though. Your solution also removes the prepend
optimisation.

Even though QString("hi") has 4 bytes of payload and 24 (32) bytes of overhead
-- 85.7% (88.8%).
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
l***@nokia.com
2011-10-17 20:21:21 UTC
Permalink
Post by Thiago Macieira
Post by l***@nokia.com
Aren't we now making things way to complicated?
Yes.
Post by l***@nokia.com
If we go for storing the hash in the string as well, we can't do it in 16
bytes, so 24 is our next best option on 32bit systems. How about a simple
struct Data {
QRefCount ref;
int alloc;
uint32 flags;
uint32 hash;
int size; // or end, depending on what's more convenient
qptrdiff begin;
}
This gives 24 bytes on 32bit systems, 32 on 64bit. The hash stays 0 for
compile time strings.
Right. On 64-bit systems, there's a 4-byte padding before "begin", but I guess
that is intentional.
Yep. That's why I ordered them this way :)
Post by Thiago Macieira
Post by l***@nokia.com
Another option we discussed earlier (at least for QString and QByteArray
was to remove the fromRawData() functionality. In that case we could
[snip]
I'd prefer the above option though. Your solution also removes the prepend
optimisation.
You can do the prepend optimization with the structure above. You can let
begin point into the string data and have some space at the beginning.
Post by Thiago Macieira
Even though QString("hi") has 4 bytes of payload and 24 (32) bytes of overhead
-- 85.7% (88.8%).
Yes, unfortunately. On the other hand is starts mattering less on "Hello
World.", where overhead and payload start being equal. The question is
whether saving 8 bytes here will really make a difference, or whether
there are other places where we can save more in total.

Cheers,
Lars
André Pönitz
2011-10-17 14:20:41 UTC
Permalink
Post by Thiago Macieira
[...]
However, the calculation of the beginning of the data pointer now involves a
T *data()
{
if (d->isRawData()) return d->rawData()->offset;
return d->array() + d->begin;
}
If that's used inside the at()/operator[]() that might be a problem.

IMNSHO anything that's called a "proper solution" has compile to
at most four instructions in the inner loop in code like

Container<int> c(100);
for (int i = 0; i != 100; ++i)
c[i] = 42;

(including the jump) at on x86 and ARM. That's what the naive way
to write the loop produces with gcc for std::vector. There's no need
to provide anything that's worse.

Andre'
Thiago Macieira
2011-10-17 16:29:43 UTC
Permalink
Post by André Pönitz
IMNSHO anything that's called a "proper solution" has compile to
at most four instructions in the inner loop in code like
Container<int> c(100);
for (int i = 0; i != 100; ++i)
c[i] = 42;
(including the jump) at on x86 and ARM. That's what the naive way
to write the loop produces with gcc for std::vector. There's no need
to provide anything that's worse.
Right.

In theory, the compiler could realise that the part doing the tests is
constant and hoist it out of the loop.

In practice, it doesn't. Which is another reason to avoid using the sign bit
of the ints.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Oswald Buddenhagen
2011-10-17 12:46:12 UTC
Permalink
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the
header code of QVector, QByteArray and QString.
whether deriving from qvector makes sense i don't know, but i'm very
much in favor of implementing QString and QByteArray as
QStringBase<QChar> and QStringBase<QByte>. for compatibility, data()
would have to continue to return QChar* resp. uchar*, but more unified
access could be granted via array() (or wrappedData()) (returning QChar*
resp. QByte*) and optionally podArray() (or podData()) (returning
ushort* resp. uchar*).
l***@nokia.com
2011-10-17 12:55:53 UTC
Permalink
Why would we want to add a wrapper for char's? I don't see any added value
of having a 'QByte' class/struct/typedef.

And let's face it: I think it'll be a lot of work to unify the
implementations, and I am very uncertain what the added value would be
(apart from satisfying some definition of being cleaner).

I'd say let's adjust the data structures to what's needed, but not spend
time now on reimplementing the classes. At least not as long as we have
other, more important things to get fixed.

Lars

On 10/17/11 2:46 PM, "ext Oswald Buddenhagen"
Post by Oswald Buddenhagen
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the
header code of QVector, QByteArray and QString.
whether deriving from qvector makes sense i don't know, but i'm very
much in favor of implementing QString and QByteArray as
QStringBase<QChar> and QStringBase<QByte>. for compatibility, data()
would have to continue to return QChar* resp. uchar*, but more unified
access could be granted via array() (or wrappedData()) (returning QChar*
resp. QByte*) and optionally podArray() (or podData()) (returning
ushort* resp. uchar*).
_______________________________________________
Qt5-feedback mailing list
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
Frans Klaver
2011-10-17 13:20:56 UTC
Permalink
Post by l***@nokia.com
I'd say let's adjust the data structures to what's needed, but not spend
time now on reimplementing the classes. At least not as long as we have
other, more important things to get fixed.
Sounds reasonable to me.

On a related but slightly different note: when removing code
redundancy is an issue to some it would at least be cool if some of
the classes not yet using QSharedData(Pointer)? would actually start
using it (QString to name one). You know it makes sense somehow ;).
l***@nokia.com
2011-10-17 14:00:55 UTC
Permalink
Post by Frans Klaver
Post by l***@nokia.com
I'd say let's adjust the data structures to what's needed, but not spend
time now on reimplementing the classes. At least not as long as we have
other, more important things to get fixed.
Sounds reasonable to me.
On a related but slightly different note: when removing code
redundancy is an issue to some it would at least be cool if some of
the classes not yet using QSharedData(Pointer)? would actually start
using it (QString to name one). You know it makes sense somehow ;).
It doesn't always make sense. While QSharedDataPointer is safe, you have
to be careful with it if you want to optimize for performance to avoid
calling detach() too often.

The same thing is true in other places in Qt: It's not always the best
idea to implement Qt using Qt. For some core things, hand-crafted data
structures are often better.

As an example, we managed to squeeze around 30% out of the load times for
QML by replacing some of our generic containers with data structures that
were written and optimized for the use case at hand. And that's not
because the containers we have are bad, but simply due to the fact that
they are general purpose.

Cheers,
Lars
Frans Klaver
2011-10-17 14:30:36 UTC
Permalink
Post by l***@nokia.com
It doesn't always make sense. While QSharedDataPointer is safe, you have
to be careful with it if you want to optimize for performance to avoid
calling detach() too often.
The same thing is true in other places in Qt: It's not always the best
idea to implement Qt using Qt. For some core things, hand-crafted data
structures are often better.
As an example, we managed to squeeze around 30% out of the load times for
QML by replacing some of our generic containers with data structures that
were written and optimized for the use case at hand. And that's not
because the containers we have are bad, but simply due to the fact that
they are general purpose.
There's nothing here that isn't true and speed optimization may indeed
require custom or duplicated code. For QString, though, I know that it
isn't implemented using QSharedDataPointer simply because QString was
there before QSharedDataPointer. I'm not aware of any performance
tests that suggest using QSharedDataPointer would be a bad idea in
this case. I may be wrong though.

My suggestion was based on a labs post by Thiago[0], in which he
states about shared data pointer:

"This class is the basis of all Qt value-type, implicit-shared,
thread-safe copy-on-write recent classes, like QNetworkProxy. The only
reason why it isn’t used in the base classes like QByteArray, QString
and QList is that those classes were developed before this class was
made. There’s nothing technically stopping the retrofitting of those
classes with QSharedDataPointer."

If people are going to mess about in these classes, this might as well
be done/checked with it.

Cheers,
Frans

[0] http://labs.qt.nokia.com/2009/08/25/count-with-me-how-many-smart-pointer-classes-does-qt-have/
João Abecasis
2011-10-17 14:49:56 UTC
Permalink
Post by Frans Klaver
Post by l***@nokia.com
It doesn't always make sense. While QSharedDataPointer is safe, you have
to be careful with it if you want to optimize for performance to avoid
calling detach() too often.
The same thing is true in other places in Qt: It's not always the best
idea to implement Qt using Qt. For some core things, hand-crafted data
structures are often better.
As an example, we managed to squeeze around 30% out of the load times for
QML by replacing some of our generic containers with data structures that
were written and optimized for the use case at hand. And that's not
because the containers we have are bad, but simply due to the fact that
they are general purpose.
There's nothing here that isn't true and speed optimization may indeed
require custom or duplicated code. For QString, though, I know that it
isn't implemented using QSharedDataPointer simply because QString was
there before QSharedDataPointer. I'm not aware of any performance
tests that suggest using QSharedDataPointer would be a bad idea in
this case. I may be wrong though.
My suggestion was based on a labs post by Thiago[0], in which he
"This class is the basis of all Qt value-type, implicit-shared,
thread-safe copy-on-write recent classes, like QNetworkProxy. The only
reason why it isn’t used in the base classes like QByteArray, QString
and QList is that those classes were developed before this class was
made. There’s nothing technically stopping the retrofitting of those
classes with QSharedDataPointer."
If people are going to mess about in these classes, this might as well
be done/checked with it.
This no longer applies now that we abuse a reference count of -1 for indicating static, read-only, and we can consider using -2 for non-shareable (copies are always deep).

Cheers,


João
Thiago Macieira
2011-10-17 16:30:41 UTC
Permalink
Post by João Abecasis
we can consider using -2 for non-shareable (copies are always deep).
That removes one flag we need.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Thiago Macieira
2011-10-17 14:10:49 UTC
Permalink
Post by Oswald Buddenhagen
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the
header code of QVector, QByteArray and QString.
whether deriving from qvector makes sense i don't know, but i'm very
much in favor of implementing QString and QByteArray as
QStringBase<QChar> and QStringBase<QByte>. for compatibility, data()
would have to continue to return QChar* resp. uchar*, but more unified
access could be granted via array() (or wrappedData()) (returning QChar*
resp. QByte*) and optionally podArray() (or podData()) (returning
ushort* resp. uchar*).
Hi Ossi

I don't see the benefit of doing this, but I do see a lot more work to do.

The implementation of the functions dealing with the data-as-a-string are
completely different from the two classes. QString also has Unicode methods,
which QByteArray doesn't.

Converting from QString to QByteArray and vice-versa is completely different;
QString has methods to deal with Latin 1 strings whereas QByteArray does not.

Unless you're suggesting we make it possible to use UTF-8 QStrings...
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Oswald Buddenhagen
2011-10-17 17:33:07 UTC
Permalink
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
Post by l***@nokia.com
I don't see any added value of having a 'QByte' class/struct/typedef.
i do. code which is easily templateable for both.
Post by l***@nokia.com
And let's face it: I think it'll be a lot of work to unify the
implementations, and I am very uncertain what the added value would be
(apart from satisfying some definition of being cleaner).
one of the advantages of a unified codebase would be making an end to
qbytearray constantly lagging behind qstring in api for no good reason.
Post by l***@nokia.com
The implementation of the functions dealing with the data-as-a-string are
completely different from the two classes.
the what?
Post by l***@nokia.com
QString also has Unicode methods, which QByteArray doesn't.
my remark about data() alone is enough to see that there have to be
specialized subclasses made from the template. and obviously, the
qstring-specific functionality would stay in qstring as ever.
Post by l***@nokia.com
Unless you're suggesting we make it possible to use UTF-8 QStrings...
if you mean hybrid strings (bytearray + codec pointer), that's certainly
an interesting topic, but i'm not too concerned about that now.
Thiago Macieira
2011-10-17 18:54:47 UTC
Permalink
Post by Oswald Buddenhagen
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
What functions do you want to have on QByte?
Post by Oswald Buddenhagen
one of the advantages of a unified codebase would be making an end to
qbytearray constantly lagging behind qstring in api for no good reason.
The good reason is that it's not a string. It's a sequence of arbitrary bytes
which are not a string.
Post by Oswald Buddenhagen
Post by l***@nokia.com
QString also has Unicode methods, which QByteArray doesn't.
my remark about data() alone is enough to see that there have to be
specialized subclasses made from the template. and obviously, the
qstring-specific functionality would stay in qstring as ever.
Post by l***@nokia.com
Unless you're suggesting we make it possible to use UTF-8 QStrings...
if you mean hybrid strings (bytearray + codec pointer), that's certainly
an interesting topic, but i'm not too concerned about that now.
If you're not suggesting that for now, I don't get what the point of adding
this common code.

What methods are you missing in QByteArray that QString has?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Olivier Goffart
2011-10-17 19:46:19 UTC
Permalink
Post by Thiago Macieira
On Mon, Oct 17, 2011 at 02:55:53PM +0200, Knoll Lars (Nokia-MP-Qt/Oslo)
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
What functions do you want to have on QByte?
isDigit, isLetter, toUpper, whatnot?
Oswald Buddenhagen
2011-10-17 20:03:10 UTC
Permalink
Post by Thiago Macieira
On Mon, Oct 17, 2011 at 02:55:53PM +0200, Knoll Lars (Nokia-MP-Qt/Oslo)
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
What functions do you want to have on QByte?
for symmetry with QChar: data(). well, except that QChar has unicode()
instead. *grumble*. but QString has both, aliased to each other.
Post by Thiago Macieira
one of the advantages of a unified codebase would be making an end to
qbytearray constantly lagging behind qstring in api for no good reason.
The good reason is that it's not a string. It's a sequence of arbitrary bytes
which are not a string.
uhm, no. in qt 4, qbytearray was turned into a fully-fledged string
class, to replace QCString.
Post by Thiago Macieira
What methods are you missing in QByteArray that QString has?
lots. well, a lot was synchronized over time, but we always have the
same problem when something new is added or some change (often an
optimization) is made.
Thiago Macieira
2011-10-18 13:14:41 UTC
Permalink
Post by Oswald Buddenhagen
Post by Thiago Macieira
The good reason is that it's not a string. It's a sequence of arbitrary
bytes which are not a string.
uhm, no. in qt 4, qbytearray was turned into a fully-fledged string
class, to replace QCString.
It's still a glorified container for arbitrary bytes. However, unlike
QVector<char>, QByteArray has some support for treating its contents as
strings.

The presence of methods like toUpper and toLower is problematic. Since it uses
the C library's ctype, it's locale-dependent. It should be fixed into ASCII (7-
bit) only.
Post by Oswald Buddenhagen
From my point of view, the API in QVector should be in both QString and
QByteArray, as they are all three just arrays of a given type. Then QByteArray
can have a bit of convenience. But the really useful, Unicode methods are only
in QString.

In any case, common derivation has never been a problem for us. I point you to
QQ13, the part about "static polymorphism".
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
l***@nokia.com
2011-10-17 20:25:37 UTC
Permalink
On 10/17/11 7:33 PM, "ext Oswald Buddenhagen"
Post by Oswald Buddenhagen
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
Post by l***@nokia.com
I don't see any added value of having a 'QByte' class/struct/typedef.
i do. code which is easily templateable for both.
At the cost of people having to deal with another class where a primitive
type is perfectly fine? Not even to mention the drawbacks in terms of ABI.
Classes are not passed in registers to functions, primitive types are.
Post by Oswald Buddenhagen
Post by l***@nokia.com
And let's face it: I think it'll be a lot of work to unify the
implementations, and I am very uncertain what the added value would be
(apart from satisfying some definition of being cleaner).
one of the advantages of a unified codebase would be making an end to
qbytearray constantly lagging behind qstring in api for no good reason.
One of the problems is that we'll most certainly get regressions by trying
to unify the code base. And in the end we find out that we might be able
to reuse less code than we thought before.

IMO it's a mostly academic exercise.

Cheers,
Lars
Post by Oswald Buddenhagen
Post by l***@nokia.com
The implementation of the functions dealing with the data-as-a-string are
completely different from the two classes.
the what?
Post by l***@nokia.com
QString also has Unicode methods, which QByteArray doesn't.
my remark about data() alone is enough to see that there have to be
specialized subclasses made from the template. and obviously, the
qstring-specific functionality would stay in qstring as ever.
Post by l***@nokia.com
Unless you're suggesting we make it possible to use UTF-8 QStrings...
if you mean hybrid strings (bytearray + codec pointer), that's certainly
an interesting topic, but i'm not too concerned about that now.
_______________________________________________
Qt5-feedback mailing list
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
João Abecasis
2011-10-18 09:08:42 UTC
Permalink
Post by l***@nokia.com
One of the problems is that we'll most certainly get regressions by trying
to unify the code base. And in the end we find out that we might be able
to reuse less code than we thought before.
IMO it's a mostly academic exercise.
For the record, my suggestion to unify "header" code was to unify QStringData/QByteArrayData/QVectorData. Other than the current proposal to put string hashes in QStringData, those base structs provide the same functionality in slightly different ways for no good reason.

Any other sharing can happen over time where it makes sense.

Cheers,


João
Oswald Buddenhagen
2011-10-18 09:56:59 UTC
Permalink
Post by l***@nokia.com
On 10/17/11 7:33 PM, "ext Oswald Buddenhagen"
Post by Oswald Buddenhagen
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
Post by l***@nokia.com
I don't see any added value of having a 'QByte' class/struct/typedef.
i do. code which is easily templateable for both.
At the cost of people having to deal with another class where a primitive
type is perfectly fine?
well, yes. it's consistent.
for that matter, i would be all for turning qchar into a collection of
static functions operating on ushort.
but either option is not doable for qt5 anyway, because a change either
way would be massively source-incompatible for low-level code. so i'll
just add *podData() functions to both classes for the time being.
Post by l***@nokia.com
Not even to mention the drawbacks in terms of ABI.
Classes are not passed in registers to functions, primitive types are.
in this case https://qt.gitorious.org/qt/qtbase/merge_requests/69 is
plain bogus. and the previous endeavours to get rid of QLatin1String
const refs.
l***@nokia.com
2011-10-18 10:29:21 UTC
Permalink
On 10/18/11 11:56 AM, "ext Oswald Buddenhagen"
Post by Oswald Buddenhagen
Post by l***@nokia.com
On 10/17/11 7:33 PM, "ext Oswald Buddenhagen"
Post by Oswald Buddenhagen
Post by l***@nokia.com
Why would we want to add a wrapper for char's?
why again do we have a wrapper for ushort called QChar? after all, some
global functions operating on ushort would do. ;)
Post by l***@nokia.com
I don't see any added value of having a 'QByte' class/struct/typedef.
i do. code which is easily templateable for both.
At the cost of people having to deal with another class where a primitive
type is perfectly fine?
well, yes. it's consistent.
for that matter, i would be all for turning qchar into a collection of
static functions operating on ushort.
but either option is not doable for qt5 anyway, because a change either
way would be massively source-incompatible for low-level code. so i'll
just add *podData() functions to both classes for the time being.
Well, QByteArray has data(), QString has data() and unicode()? Why do we
need another method for getting the same thing?
Post by Oswald Buddenhagen
Post by l***@nokia.com
Not even to mention the drawbacks in terms of ABI.
Classes are not passed in registers to functions, primitive types are.
in this case https://qt.gitorious.org/qt/qtbase/merge_requests/69 is
plain bogus. and the previous endeavours to get rid of QLatin1String
const refs.
I think it is indeed wrong. But someone should go back and check the ARM
and x64 ABI specs once again to make sure.

Cheers,
Lars
Oswald Buddenhagen
2011-10-18 12:15:54 UTC
Permalink
Post by l***@nokia.com
so i'll just add *podData() functions to both classes for the time being.
Well, QByteArray has data(), QString has data() and unicode()? Why do we
need another method for getting the same thing?
QChar *QString::data() => str->data()->unicode()
char *QByteArray::data() => str->data()

template *that*.

ps: please fix you mail client not to wrap the attribution headers.
Olivier Goffart
2011-10-18 10:30:50 UTC
Permalink
Post by Oswald Buddenhagen
Post by l***@nokia.com
Not even to mention the drawbacks in terms of ABI.
Classes are not passed in registers to functions, primitive types are.
in this case https://qt.gitorious.org/qt/qtbase/merge_requests/69 is
plain bogus. and the previous endeavours to get rid of QLatin1String
const refs.
FYI:
http://sourcery.mentor.com/public/cxx-abi/abi.html#normal-call


| In general, C++ value parameters are handled just like C parameters. This
| includes class type parameters passed wholly or partially in registers.
| There are, however, some special cases.
| 1. In the special case where the parameter type has a non-trivial copy
| constructor or destructor, the caller must allocate space for a temporary
| copy, and pass the resulting copy by reference (below). [...]


So because QChar and QLatin1String do not declare a copy constructor or a
destructor, they are passed in registers.

I just verified again with gcc -S of simple code.

(That is on Linux, I don't know about others ABI)
Thiago Macieira
2011-10-18 13:25:12 UTC
Permalink
Post by Olivier Goffart
http://sourcery.mentor.com/public/cxx-abi/abi.html#normal-call
| In general, C++ value parameters are handled just like C parameters. This
| includes class type parameters passed wholly or partially in registers.
| There are, however, some special cases.
| 1. In the special case where the parameter type has a non-trivial copy
|
| constructor or destructor, the caller must allocate space for a
|temporary copy, and pass the resulting copy by reference (below). [...]
So because QChar and QLatin1String do not declare a copy constructor or a
destructor, they are passed in registers.
I just verified again with gcc -S of simple code.
(That is on Linux, I don't know about others ABI)
I called for that as early as
http://labs.qt.nokia.com/2008/04/28/string-theory/

Any struct that is movable and less than 8 or 16 bytes in size should be
declared without a copy constructor or assignment operator (or defaulted) and
passed by value. That also means adding them later is binary incompatible.

That applies to QChar, QLatin1Char and QLatin1String.

Mac's x86-64 ABI is the same as Linux's, see http://www.x86-64.org/

x86 is the only remaining major architecture to pass arguments on the stack
exclusively. That's to change with x32, see
http://sites.google.com/site/x32abi/
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Oswald Buddenhagen
2011-10-18 16:20:20 UTC
Permalink
Post by Thiago Macieira
x86 is the only remaining major architecture to pass arguments on the stack
exclusively. That's to change with x32, see
http://sites.google.com/site/x32abi/
well, that doesn't actually change the existing architecture.
it would be correct to say that it obsoletes most of the remaining use
cases for "real" 32 bit code on modern x86 processors.
Till Oliver Knoll
2011-10-18 18:54:44 UTC
Permalink
Post by Oswald Buddenhagen
Post by Thiago Macieira
x86 is the only remaining major architecture to pass arguments on the stack
exclusively. That's to change with x32, see
http://sites.google.com/site/x32abi/
well, that doesn't actually change the existing architecture.
Good to know that there are still people concerned about bits and bytes under the hood ;)

That reminds me of "university classics" such as "Computer Architecture: a Quantitative Approach" (read "quantitative" as in "How much kilograms of paper do you get for your money!") or "SPARC Architecture".

It's interesting to read that the x86 architecture apparently still passes arguments on the stack only. If I remember correctly the SPARC architecture could pass up to 8 arguments ("trivial" values I guess - 64 bit already?) in registers, and this to a call stack depth of 7 or 8 or so (after that arguments were put on the stack as well). That was early or mid 90ies IIRC.

Oh and my! My infamous "copyblk" assembler routine (on a Ceres 3 computer designed by N. Wirth) homework which contrary to its name didn't even work at home! It was supposed to copy a "graphics memory block" into another block. Off course source and target could overlap. To make it even more fun the Ceres had a black and white display, "highly optimised" - that means one byte (8 bits) per 8 pixels ;) in linear display memory.

Sorry, just rambling. Where was I? Oh yeah, allocating that 40 MBytes for my data cache... ;)


Thanks for making Qt rock below the surface!
Oliver
Thiago Macieira
2011-10-18 21:06:33 UTC
Permalink
Post by Oswald Buddenhagen
Post by Thiago Macieira
x86 is the only remaining major architecture to pass arguments on the stack
exclusively. That's to change with x32, see
http://sites.google.com/site/x32abi/
well, that doesn't actually change the existing architecture.
it would be correct to say that it obsoletes most of the remaining use
cases for "real" 32 bit code on modern x86 processors.
Well, actually it does. x32 is a different instruction set from x86.

x32 is x86-64 running in ILP32 mode (int, long, ponters 32-bit). Other 64-bit
architectures can run ILP32 too: UltraSparc, MIPS, PA-RISC, Itanium.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
Oswald Buddenhagen
2011-10-19 12:18:15 UTC
Permalink
Post by Thiago Macieira
Post by Oswald Buddenhagen
Post by Thiago Macieira
x86 is the only remaining major architecture to pass arguments on the stack
exclusively. That's to change with x32, see
http://sites.google.com/site/x32abi/
well, that doesn't actually change the existing architecture.
it would be correct to say that it obsoletes most of the remaining use
cases for "real" 32 bit code on modern x86 processors.
Well, actually it does. x32 is a different instruction set from x86.
x32 is x86-64 running in ILP32 mode (int, long, ponters 32-bit).
that's what i said: it doesn't change the existing x86 (i686)
architecture, it replaces it. ;) x86 will still be around for a while.
Thiago Macieira
2011-10-19 12:42:05 UTC
Permalink
Post by Oswald Buddenhagen
that's what i said: it doesn't change the existing x86 (i686)
architecture, it replaces it. x86 will still be around for a while.
Hopefully not long :-)

What's the benefit of running 32-bit x86 instead of 32-bit x32? Unless you're
trying to target devices with no 64-bit capability (like older desktops or
early Atoms), I can't think of any. But there are plenty of drawbacks,
starting with the number of registers and the calling convention.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
André Pönitz
2011-10-17 15:40:44 UTC
Permalink
Post by Oswald Buddenhagen
Post by Thiago Macieira
When I talked to João this week, he expressed the desire to unify the
header code of QVector, QByteArray and QString.
whether deriving from qvector makes sense i don't know, but i'm very
much in favor of implementing QString and QByteArray as
QStringBase<QChar> and QStringBase<QByte>. for compatibility, data()
would have to continue to return QChar* resp. uchar*, but more unified
access could be granted via array() (or wrappedData()) (returning QChar*
resp. QByte*) and optionally podArray() (or podData()) (returning
ushort* resp. uchar*).
Not sure.

This somehow smells a bit like an "Ivory Tower" kind of thing to me.

Making the interfaces "mostly identical" would certainly be nice.

Templating the whole code (and potentially even put everything into the
header because "someone perhaps might want QStringBase<float>" at some
time) looks significantly less interesting to me, especially as a simple
"typedef QStringBase<QChar> QString;" would break all existing code that
has forward declarations of QString and therefore the "try hard to stay
source compatible" promise.

I also _do_ like the fact that a QByteArray is reported as

QByteArray

by the usual suspects and not something similar to

std::basic_string<char, std::char_traits<char>, std::allocator<char> >

Having templated code to work on QString and QByteArray (and things like
regexps actually might want to) is possible also with two plain classes.

Andre'
Frans Klaver
2011-10-17 15:47:50 UTC
Permalink
Post by André Pönitz
Templating the whole code (and potentially even put everything into the
header because "someone perhaps might want QStringBase<float>" at some
time) looks significantly less interesting to me, especially as a simple
"typedef QStringBase<QChar> QString;" would break all existing code that
has forward declarations of QString and therefore the "try hard to stay
source compatible" promise.
It's still possible to be templating the whole thing. For the type
definition I would prefer

class QString : public QStringBase<QChar> {
// move along now, nothing to see here
};

over a typedef nine times out of ten for the reason you stated before.
Post by André Pönitz
I also _do_ like the fact that a QByteArray is reported as
  QByteArray
by the usual suspects and not something similar to
  std::basic_string<char, std::char_traits<char>, std::allocator<char> >
Cheers,
Frans
Loading...