What does String#length do?
I just came across John O’Conner’s ‘How long is your String?‘ post. The gist of the post is that String#length in Java doesn’t always return the number of characters in your string, it depends on the character set you’re using. If you’re just writing an English app, you’re fine, but if you plan to i18n your app you have another thing to worry about.
There are plenty of places in my current project where I use string.length() (e.g. making sure a user name and password is long enough, etc.) but now to be sure it needs to be replaced by:
String str = ....;
int len = str.codePointCount(0, str.length());
Of course user name and password lengths might not make sense in a non-alphabetic language anyway.
At least they keep the definition of String#length consistent, which in case you never bothered to read the Javadoc (like me) is:
Returns the length of this string. The length is equal to the number of 16-bit Unicode characters in the string.
So you can work out the size of the data fairly easily, e.g. to make sure the data is not too big for a database field. And if your database is using the same size for characters you don’t have to do any calculations. But I can see some pretty gnarly bugs coming out of that, so it’s probably best to work in bytes, i.e. you have to do something like:
String str = ....;
int len = str.getBytes().length;
If you interested in reading more about character sets, Tim Bray has a pretty good article: Characters vs. Bytes.
