Archive for April, 2006


Who came up with DOM and XPath in Java?

Monday, April 24th, 2006

I’ve hated the DOM implementation in Java for a long time. Today I used XPath for the first time, now I hate it too. Up to now I’ve used a collection of utility methods that would just iterate over nodes until it found one with a matching tag name and/or attribute set. After my experience today I’m back to them. Seriously how wordy is this? (Exception handling excluded for ‘brevity’)

DocumentBuilderFactory docfactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docbuilder = docfactory.newDocumentBuilder();

// Assume we've got the file as an InputSource
Document docroot = docbuilder.parse(filestream);

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("/errors", docroot, XPathConstants.NODESET);

int length = nodes.getLength();
for (int i = 0; i < length; i++) {
    Node node = nodes.item(i);

    if (node instanceof Element) {
        Element e = (Element) node;
        // ...
    }
    else {
        // ...
    }
}

You get the idea. The code I was writing was meant to pull out all ‘errors’ blocks, consolidate them, update a count attribute, then replace the old errors blocks with the new one. This mean there were more XPaths, parsing of Integers, then converting them back to Strings, it was ridiculous. To make it even worse, the nodes returned in the NodeList, were as far as I could tell, copies and not the original nodes, so I couldn’t remove them. If I’m reading the API correctly, a document fragment is returned, so to be fair it is documented, but when element.getParentNode().removeChild(element) is failing, it’s hard to get past the frustration and make sense of the docs.

Why can’t I have something like:

Document doc = new Document(...path to XML file...);
List matchingNodes = doc.find(”/errors”);

for (Element errors : matchingNodes) { // … }

That’s not too un-Java is it? Okay, the return type of my find method isn’t well defined, but that can be worked around.

Why is this API so unwieldy?

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Who came up with DOM and XPath in Java? digg.com digg it!  |  reddit reddit!

Google Mobile Maps

Friday, April 21st, 2006

I just came across Google Maps for your phone. What’s even better is that it’s integrated with local search. It’s pretty slick, it has the same scrolling and zooming ease of the web version, very easy to use. Only hitch is that it currently only covers the US. The map detail of London isn’t at street level yet.

This could be a really useful bit of kit. The next step would be to get some sort of GPS in there so it immediately knows where you are on the map. Phones already have some location information from what ever base station they’re connected to.

My main concern is the amount of data that gets transferred. It’s quick so I assume they’ve got the image size right down, but at £2.35 a meg, it’s not going to be small enough. It would be great if you could predownload maps for a given area, e.g. London, over a regular connection, then upload it to your phone.

Combining playing with GMM and all the mail programs I’ve been trying out, maybe it’s time to start looking for a unlimited data plan. I’m guessing I’d have to switch from Vodafone since when I popped into their store I had this ridiculous conversion:

Me: I’m using a lot of GPRS with my phone

Shop guy: Yeah that’s really expensive

Me: I noticed that, is it possible to prebuy a bundle of MB for a lower price?

Shop guy: You mean like a data pack?

Me: That sounds like it.

Shop guy: No, we don’t do that.

What I want has a name, but doesn’t exist, madness.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Google Mobile Maps digg.com digg it!  |  reddit reddit!

Microsoft’s New Brain

Wednesday, April 19th, 2006

I don’t normally just posts links, but this is a good article on Fortune about Microsoft’s new CTO, Ray Ozzie, and what he’s doing to the company:

http://money.cnn.com/magazines/fortune/fortune_archive/2006/05/01/8375454/index.htm

He also has a blog here:

http://spaces.msn.com/rayozzie/

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Microsoft’s New Brain digg.com digg it!  |  reddit reddit!

GPRS Internet with Linux

Tuesday, April 18th, 2006

I actually came across the correct settings for Vodafone GPRS while trying to use my phone as a modem. I was away over Easter and had my laptop, but my only Internet connection was a vain hope there would be a public wireless connection around. There weren’t any, and surprisingly all the ones I did find were encryptyed.

It was annoying to see wireless connections around but not being able to connect. I figured if I couldn’t get a permanent solution like a 3G datacard or a monthly subscription to T-Mobile HotSpot, then I’d have to fall back to the old trusty dial up. There is a modem built into the Thinkpad T30, but I’ve never bothered to set it up. I figure if there is a phone line, there’s going to be some sort of broadband connection. The modem I wanted to use was in my phone, the main driver being I had a load of bundled minutes with it and what better way to use them than downloading email at 9.6kbps?

But before I could find out if I could even be old fashioned and dial up to Freeserve (or what ever they’re called now) I remembered about GPRS, which should give me a data rate of about 25-40kbps. I found some instructions how to use a Sony Ericsson phone as a GPRS modem in Linux. These particular instructions use Bluetooth, something my T30 doesn’t have, but you can connect using the supplied USB cable as well. After you plug in your phone load the following drivers:

  • cdc_acm
  • ppp
  • ppp_deflate
  • bsd_comp

The first driver should create a device at /dev/ttyACM0, the others are to allow you to dial up and use PPP. So ignore the first step in the instructions above (unless of course you’re using Bluetooth). Then in the first file change the device to /dev/ttyACM0, and make sure you have the correct user (’web’ for Vodafone). My copy of /etc/ppp/chat.gprs contains:

TIMEOUT         5
SAY             "Internet via Vodafone GPRS"
ABORT           'nBUSYr'
ABORT           'nERRORr'
ABORT           'nNO ANSWERr'
ABORT           'nNO CARRIERr'
ABORT           'nNO DIALTONEr'
ABORT           'nRINGINGrnrRINGINGr'
''              rAT
TIMEOUT         12
OK              ATE1
OK              AT+cgdcont=1,"IP","internet"
OK              ATD*99***1#
CONNECT         ''

and finally in /etc/ppp/pap-secrets set the username and password to what ever your provider requires. Then you can connect with:

pppd chat gprs

and amazingly you’re connected to the Internet. Maybe I’m too cynical about technology these days, but I was amazed how easy this was to setup.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit GPRS Internet with Linux digg.com digg it!  |  reddit reddit!

Email on a Sony Ericsson K750i

Tuesday, April 18th, 2006

This is the first of a two part post on mobile Internet access. I’ve looked at it before but having an unlimited 3G datacard from Vodafone (actually it looks like 1GB/month fair use limit) is just too expensive for the limited travel I do. Having been at Heathrow airport on Friday evenings it looks like Blackberries are the way to go. But frankly I don’t want to carry something else around, nor do I want to have another suscription. My phone seems capable to do everything so I figure why not use it to check my email?

Before looking into too many J2ME email clients I noticed that it has email support built in! It’s in the ‘Messaging’ section, under the deceptively named ‘Email’ icon. Setting it up is like any other email program, enter your server details, username, password, etc. What I was amazed at was that it has IMAP4 support as well as being able to encrypt it with SSL or TLS. Very fancy. Of course it didn’t work, it hung while trying to find the server. I figured it was flaking IMAP or SSL support and just forgot about it, but finally I figured out why it wasn’t working.

The problem was the WAP settings I got from the Vodafone website. They’re simply wrong:

APN Username Password
Vodafone website wap.vodafone.co.uk wap wap
Actual settings internet web web

You need to create a new GPRS account in your Data Comms section, then change your Internet profile to use that. After that, it can connect fine. It’s also worth turning on the compression options given the high costs of GPRS (~£2.50/MB).

Well almost…it now fails with an SSL error. I’m guessing that it’s because the SSL certificate for my mail server is issued to my ISP and hence generates an error message. But on the K750 I don’t get the option to okay it. I decided against trying it without SSL for now because I’m not a fan of sending out clear text passwords. In the certificates part of the Internet settings there’s no option to upload a certificate, but there’s bound to be away to get around this given that the phone is just a USB drive when I connect it to my laptop.

The other options are a different mail client, or something insane like pine over a J2ME ssh client.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Email on a Sony Ericsson K750i digg.com digg it!  |  reddit reddit!

WPA, Gentoo Linux and an IBM Thinkpad T30

Sunday, April 2nd, 2006

I decided it was finally time to switch from WEP to WPA because WEP is simply broken and shouldn’t be considered safe. There are some attacks available against WPA, but they tend to be ‘try every key in existence’ type. Have a sufficiently long key and you’re pretty much safe. The attacks also seemed to be geared towards weaknesses in TKIP, I haven’t read any about AES.

Using WPA in Gentoo is trivial and the guidebook covers all you need to know. There was one slight hiccup, my wireless card’s firmware didn’t support WPA. I have the optional mini-PCI card for the Thinkpad T30. This is based on the Intersil Prism 2.5 chipset. The hostaputils package contains the ‘prism2srec’ command that allows you to update the firmware. Here are the instructions I followed.

First check what card you have:

# hostap_diag wlan0
Host AP driver diagnostics information for 'wlan0'

NICID: id=0x8013 v1.0.0 (PRISM II (2.5) Mini-PCI (SST parallel flash))
PRIID: id=0x0015 v1.1.0
STAID: id=0x001f v1.4.9 (station firmware)

The ’station firmware’ needs to be at least 1.7.0 to use WPA. If you try to run wpa_supplicant before updating the firmware you’ll get a driver doesn’t support WPA error message. Check out the instructions I linked to previously to work out what files you need. The latest firmware is 1.8.4, but the I only upgraded to 1.7.4 since several people had tested that version and it works. With that version of the station firmware also came an update to the primary firmware, the files I used were:

Only use those files if the output of hostap_diag matched mine.

You’ll have to modify the hostap driver to enable flashing since it’s disabled by default. Instructions are on the linked page. Once you’ve done that it’s a simple matter of:

# prism2_srec -v wlan0 pk010101.hex sf010704.hex

and if everything went smoothly (no error messages), do it for real:

# prism2_srec -f -v wlan0 pk010101.hex sf010704.hex

After a lot of messages you should see:

Downloading to non-volatile memory (flash).
Note! This can take about 30 seconds. Do _not_ remove card during download.
OK.
Components after download:
  NICID: 0x8013 v1.0.0
  PRIID: 0x0015 v1.1.1
  STAID: 0x001f v1.7.4

Your card will be updated and you can start using WPA.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit WPA, Gentoo Linux and an IBM Thinkpad T30 digg.com digg it!  |  reddit reddit!

Character Sets and Encodings

Saturday, April 1st, 2006

Long gone are the days where each character was represented by a number, or more specifically a byte. You know, when your character routines were simple like:

/* lower: convert c to lower case; ASCII only */
int lower(int c)
{
    if (c >= 'A' && c < = 'Z')
        return 'a' + 'A';
    else
        return c;
}

(The C Programming Language, page 43)

Of course that only worked for character sets with consecutive letters, e.g. ASCII, but not EBCDIC.

But even early on I knew this type of function was bad, and where ever possible you should just use the supplied library routines that took care of all character set nastiness for you. In fact with Java’s String class, it’s amazing how long you can be ignorant of character sets and encoding. This is because in the English speaking World, our characters map to the same set of bytes in almost all encodings.

I think now’s a good time to take a quick break and go read Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

Now hopefully from that you’ve got three key things:

  1. There is no such thing as plain text. Bytes on a disk are worthless without knowing the character encoding.
  2. Character set != Character encoding
  3. Character != Byte

Plain text was gone the moment someone wanted to store something besides an English character. You shouldn’t assume a ‘text file’ is encoding in US-ASCII. One of the conveniences of the new character encodings (e.g. ISO-8859-1 and UTF-8) is that they let you get away with this if you’re dealing with English text. But you still shouldn’t do it!

For a long time a character was mapped to a byte or a byte sequence, hence character sets and character encodings were one and the same thing. This changed with Unicode (which is what Java uses to represent characters internally), which brought in the concept of a ‘code point’. A code point is a unique identifier for a glyph, e.g.:

Unicode code point
U+0041
U+00DF
U+6771
U+10400
Representative glyph
UTF-32 code units
00000041
000000DF
00006771
00010400
UTF-16 code units
0041
00DF
6771
D801 DC00
UTF-8 code units
41
C3 9F
E6 9D B1
F0 90 90 80

(Table from Supplementary Characters in the Java Platform)

Included in the above table are the byte representations of the glyphs in various different UTF-x encodings. The thing to note about the UTF-x encodings is that they are variable length, and the ‘x’ is the smallest number of bits required to represent one character, but it may require more. Although in practice UTF-32 always uses 32-bits because that currently covers all the code points. This brings us to the third point, a character is no longer represented by one byte, but it can be. This breaks a lot of character handling routines that assumes characters a 8 bits long. It’s worth checking about the Unicode 4.0 support in J2SE 1.5 to see how the use of the ‘char’ type is going out of fashion.

It should be fairly obvious why reading a UTF-8 encoded file as ASCII could produce a lot of garbage, but at the same time your English characters would be fine.

So why does this matter if Java uses Unicode to store Strings? You’d assume it would also have a default encoding, so the following would be standard:

String text = ....;
FileOutputStream fos = new FileOutputStream("/tmp/dump.txt");
fos.write(text.getBytes());

This file could be read back in with a FileInputStream and you’d get the same file each time. This is true if you run the read and write programs on the same machine, but the default character encoding depends on the JVM and operating system.

How to find the default character set for your JVM

import java.io.OutputStreamWriter;
import java.nio.charset.Charset;

/**
 * How to determine the default encoding
 */
public class CharacterSet {

    public static void main(String[] args) {
        // in JDK 1.4, defaultEncodingName will typically be "Cp1252"
        // In an Applet, this requires signing for privilege.
        String defaultEncodingName = System.getProperty( "file.encoding" );
        log(defaultEncodingName);

        // in JDK 1.5+, will typically be "windows-1252"
        // First, get the Charset/encoding then convert to String.
        defaultEncodingName = Charset.defaultCharset().name();
        log(defaultEncodingName);

        // I'm told this circumlocution has the nice property you can even use
        // it in an unsigned Applet.
        defaultEncodingName = new OutputStreamWriter( System.out ).getEncoding();
        log(defaultEncodingName);
    }

    private static void log(String msg) {
        System.out.println(msg);
    }
}

Output (IBM 1.5.0 JDK on Linux)

ANSI_X3.4-1968
US-ASCII
ASCII

Clearly a lot of variation! My preference is to specify UTF-8 when I’m reading and writing my own files because I deal mostly with English text and this does save bytes. It also allows the files to be viewed in almost any text reader.

But what if you don’t control bytes, e.g. you download a file from the web? This becomes a bit trickier. Sometimes they tell you what it is, e.g. in the ‘Content-Type’ header, or in a metatag, sometimes they don’t. Thankfully browsers have had to deal with this problem for years, and the Mozilla project has produced character set detectors, which have been ported to Java. Definitely worth looking into if you have to handle text files from unknown sources.

Spread the word: Technorati related  |  del.icio.us bookmark it!  |  submit Character Sets and Encodings digg.com digg it!  |  reddit reddit!