Discussion:
QDateTime and QLocale
John Layt
2011-10-20 20:36:52 UTC
Permalink
I have now completed most of the Calendar System framework and think it's
close to ready for an initial review. The code is at the usual place:

https://qt.gitorious.org/~odysseus/qt/odysseus-qtbase/commits/qt5-qdatetime

Current changes:
1) Various small cleanups
2) QDate stores JD as a qint64
3) Separate QDateTimeFormatter and QDateTimeParser classes
4) QDateCalculator class to calculate calendar systems
5) QLocale uses CLDR date/time formatting [small source incompatible
changes]
6) Modify QDateTime* to use the CLDR data and QDateCalculator via QLocale

There's still a few areas to be completed:
1) More documentation and unit tests
3) Hebrew, Islamic Lunar, Chinese Lunar calendars
4) Era support
5) Localized week number support
6) New lightweight datetime parser
7) New or improved widgets
8) Modify QDate to make old code QT4_COMPAT
9) Convenience wrapper class, i.e. QLocaleDate
10) Full Windows and OSX integration

These are mostly polish and fit I can work on later, but there is one further
area that is probably a show-stopper now, and that is memory use. Currently
Qt compiles all the locale data into the QtCore library itself as static const
for speed and convenience at the cost of some memory, but adding calendar
systems bloats the memory use right out:

Qt4.8 locale data size 700 kB
Qt5 with QT_NO_CALENDAR_SYSTEMS 750 kB
Qt5 with Calendar support 3.5 MB

The actual calendar data such as names and translations is only 125kB more,
it's the index into that data for each locale that adds 2.7MB. This doesn't
yet include the Era or Time Zone translations that I want to use, so this is
only going to get bigger.

Just for comparison, all the CLDR data in ICU including conversion tables
takes 15 MB and can be either compiled into the library or loaded from file,
see http://userguide.icu-project.org/icudata for details.

I think I can do a more efficient index and reduce size by 30-50%, but that's
still bloated and really complicates the backend code, so I may need to look
at a file based alternative.

Denis mentioned at QtCS that the embedded guys were wanting to reduce
the memory footprint anyway and to be able to choose what locales they ship,
so it could be an all-round win to change to file loading.

I can think of a few options:
1) Offload just the translation data into translation files and use tr()
2) Offload all the data into binary files keeping the current data structure
3) Offload all the data into a single binary file (use ICU .dat format?)
4) Offload all the data into one binary file per locale

Option 1 may not work due to dependencies. Option 2 is close to the current
implementation, but puts choosing locales at the file generation stage, as
does option 3. Option 4 seems the most flexible but will probably take a lot
more disk space than 2 or 3 due to data duplication. Any option probably
means changing from static const data to QSharedData.

Any comments or suggestions?

John.
Richard Moore
2011-10-20 21:09:28 UTC
Permalink
Post by John Layt
1) Offload just the translation data into translation files and use tr()
2) Offload all the data into binary files keeping the current data structure
3) Offload all the data into a single binary file (use ICU .dat format?)
4) Offload all the data into one binary file per locale
Option 1 may not work due to dependencies.  Option 2 is close to the current
implementation, but puts choosing locales at the file generation stage, as
does option 3.  Option 4 seems the most flexible but will probably take a lot
more disk space than 2 or 3 due to data duplication.  Any option probably
means changing from static const data to QSharedData.
Any comments or suggestions?
One that springs to mind is compiling the data structures for each
locale into a .so which is dlopened as needed, this could use the
existing plugin infrastructure and allow you to retain the existing
internal data structures.

Rich.
Andre Somers
2011-10-21 17:07:09 UTC
Permalink
Post by John Layt
I have now completed most of the Calendar System framework and think it's
https://qt.gitorious.org/~odysseus/qt/odysseus-qtbase/commits/qt5-qdatetime
Hi John,

Is this now stable enough to start looking into how I can port QTimeSpan
to the new QDateTime?

André
John Layt
2011-10-22 23:37:15 UTC
Permalink
Post by Andre Somers
Post by John Layt
I have now completed most of the Calendar System framework and think it's
https://qt.gitorious.org/~odysseus/qt/odysseus-qtbase/commits/qt5-
qdatetime
Hi John,
Is this now stable enough to start looking into how I can port QTimeSpan
to the new QDateTime?
André
Probaly not quite, I'd wait till after QtDD and any news on ICU which could
change things somewhat. Most of the api is stable as it's based on QDate, but
it does need a review for some of the new stuff I've added.

John.

l***@nokia.com
2011-10-21 19:27:25 UTC
Permalink
Post by John Layt
I have now completed most of the Calendar System framework and think it's
https://qt.gitorious.org/~odysseus/qt/odysseus-qtbase/commits/qt5-qdatetim
e
1) Various small cleanups
2) QDate stores JD as a qint64
3) Separate QDateTimeFormatter and QDateTimeParser classes
4) QDateCalculator class to calculate calendar systems
5) QLocale uses CLDR date/time formatting [small source incompatible
changes]
6) Modify QDateTime* to use the CLDR data and QDateCalculator via QLocale
1) More documentation and unit tests
3) Hebrew, Islamic Lunar, Chinese Lunar calendars
4) Era support
5) Localized week number support
6) New lightweight datetime parser
7) New or improved widgets
8) Modify QDate to make old code QT4_COMPAT
9) Convenience wrapper class, i.e. QLocaleDate
10) Full Windows and OSX integration
Sounds great. Thanks for working on this.
Post by John Layt
These are mostly polish and fit I can work on later, but there is one further
area that is probably a show-stopper now, and that is memory use.
Currently
Qt compiles all the locale data into the QtCore library itself as static const
for speed and convenience at the cost of some memory, but adding calendar
Qt4.8 locale data size 700 kB
Qt5 with QT_NO_CALENDAR_SYSTEMS 750 kB
Qt5 with Calendar support 3.5 MB
The actual calendar data such as names and translations is only 125kB more,
it's the index into that data for each locale that adds 2.7MB. This doesn't
yet include the Era or Time Zone translations that I want to use, so this is
only going to get bigger.
HmmmŠ I wonder if there isn't a way to get this to a reasonable size. It
sounds weird that the index into the data is 20 times bigger than the data
itself.
Post by John Layt
Just for comparison, all the CLDR data in ICU including conversion tables
takes 15 MB and can be either compiled into the library or loaded from file,
see http://userguide.icu-project.org/icudata for details.
I think I can do a more efficient index and reduce size by 30-50%, but that's
still bloated and really complicates the backend code, so I may need to look
at a file based alternative.
Denis mentioned at QtCS that the embedded guys were wanting to reduce
the memory footprint anyway and to be able to choose what locales they ship,
so it could be an all-round win to change to file loading.
1) Offload just the translation data into translation files and use tr()
2) Offload all the data into binary files keeping the current data structure
3) Offload all the data into a single binary file (use ICU .dat format?)
4) Offload all the data into one binary file per locale
Option 1 may not work due to dependencies. Option 2 is close to the current
implementation, but puts choosing locales at the file generation stage, as
does option 3. Option 4 seems the most flexible but will probably take a lot
more disk space than 2 or 3 due to data duplication. Any option probably
means changing from static const data to QSharedData.
Any comments or suggestions?
I think there's a general question we will need to discuss first: Do we
want to explicitly depend on ICU or not. I guess we'll have some
discussions around this next week at Dev Days :)

Cheers,
Lars
John Layt
2011-10-22 23:16:25 UTC
Permalink
Hmmm I wonder if there isn't a way to get this to a reasonable size. It
sounds weird that the index into the data is 20 times bigger than the data
itself.
It's because a lot of the data is duplicated for variants of the same
calendar, e.g. the Thai, Japanese, RoC, ISO8601 and Julian calendars all use
the same Gregorian month and day names, Islamic and IslamicCivil use the same
data, etc. Also many locales default to the English names for calendars they
don't have translations for, like Ethiopic or Indian.

I also optimised the existing data storage, the standalone month and day names
didn't need to be stored separately from the format versions when most of the
time they are the same, so that reduced the difference.

The index into that data however is 368 locales times by 15 calendars times 43
calendar data fields times by 2 quint16's for index and size = 949kB extra
memory required.

I could cut the 949kB back by removing the 7 duplicate calendars for the names
but not the formats, giving a reduction of 368 * 7 * 30 * 2 * 2 = 309kB, so
640kB extra for the index then.

The indices are also stored as quint16 when there are many fields may only
need quint8, especially size, which could save 45B per index = 132kB saving =
508kB index.

Huh, that's less than the 2.75 MB extra I said? Ah I took source file size,
stupid me, I shouldn't write emails so late at night. That could change
things.

Let's try that maths again just on actual bytes.

Qt 4.8

88 index fields * 368 locales = 64kB index

index = 64kB
data = 174kB
total = 238kB

Qt 5.0

133 index fields * 368 locales = 98kB main index

main index = 98kB [excludes calendar index]
data = 180kB
total = 278kB

full cal index = 949kB, so total = 1227kB
trim cal index = 640kB, so total = 918kB
no-fat cal index = 508kB, so total = 786kB

Better, but still not brilliant.
I think there's a general question we will need to discuss first: Do we
want to explicitly depend on ICU or not. I guess we'll have some
discussions around this next week at Dev Days :)
Interesting question :-) Shame I can't be there to discuss it. Unless....

Anyway, there's pro's and con's. It is a 17MB dependency, but it's usually
already installed on Linux, and OSX uses it anyway, so it's only Windows and
embedded that could be an issue.

Would we just be using it to get the data, or for the parsers / formatters /
calculators etc as well? If the latter it would save us a lot of work on new
features, but we might have issues integrating it into our existing api's and
classes, and it could be a lot of work. It would also make most of the code
I've just written redundant :-)

I've also just finished writing another email about QLocale changes which KDE
would like which I'll send on anyway as part of the discussion.

Cheers!

John.
Loading...