seemingly 'simple' strings

Ok, Im done with finalizing strings in the engine, yes yes we had working strings since the beginning, but everything always evolves, and code does as well, we started off with simple 'char*' strings, so no internationalization (badly needed for WOF), the next version support UNICODE 'wchar_t*' which i assumed was the end of all string things... but it seems using only the documentation of 'so called' 'unicode equivalents' of simple 'char*'string functions never mentions that in fact a 'real' UTF16 char is variable width!!! Short Look at sample headaches/documentation pitfalls

so back to the design table (hopefully for the last time for strings).

in the end i decided to go for UTF-8, has all the advantages I need, and in thie way I can also easily move text files and binary files (containing text) happily between OS's no problems at all.

it works quite well, I also added some goodies for 'OS' independence, Strings can be converted (or not when not needed but still transparently) to a 'NativeString' which is used to pass to native functions like windows functions which have ANSI/Unicode variants and so on .... this makes the whole OS dependence nicer (it is already nice and hidden away in nice 'Native' Intrface classes, but now Strings can be passed in and out nicely).

The functions are not optimized, but they will only be heavilu used at loading time and should not be used in performance critical parts anyway (comparing strings? NONO).

Bonus: Another thing I like is that the string code is now more general becuase it doesnt treat each 'char' as a 'glyph' so now switching formats (which most probably wont happen anyway) would be more or less painless, specially since I already needed to write conversion between ANSI-UTF8-UTF16 to support char*, wchar_t* and the internally used UTF8.

Bonus: no more const static char* kBla = "bla" thrown here and there, these now become const static CtString kBla("bla"), this is converted of course into the internal UTF8 format, well probably not a big difference but I like it more...

Bonus: In code there were places where temp string are passed around a lot specially while loading, a 'BuffString' was used for this which until now simply typedef'd to a normal string, but now i added StringMemAllocator supports, so now u have real Buffer strings (when u want them) which reuse memory and save a lot of free/alloc calls.

As for string length including/excluding the ending NULL character, I decided to include it, after all is IS a character in the string, be it special or not. and i added a 'bool lengthIncludesEndingNULL()' function which shoud be used when manipulating strings with no assumptions. String code doesnt have to be super fast, only super usable, at least for the projects in-sight.

Another indirect bonus is that we use tinyXML (currently) which supports UTF8 and we couldnt use this support (for player names per example) until now.




Page :  1