Why do we still have Arrays?

Continuing my look into language features, today I’m asking why are we still using arrays? Or more specifically why does Java have C-style arrays? An array is essentially a list, an ordered collection of things. Lists are a vital data structure in programming but why is there this first-class array primitive in Java?

Let’s first look at C. Arrays in C are nothing more than syntactical sugar for pointer arithmetic. Consider the following example.

int[] a = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
a[1]; /* equals 2 */
int *a = malloc(10 * sizeof(int));
for (int i = 0; i < 10; i++) {
    *(a + i) = i;
}
*(a + 1); /* equals 2 */

These two bits of code are, for all intends and purposes, the same. Since they are the same there are some limitations placed on arrays in C, i.e. they are fixed length, and things need to be next to each other in memory because of the way pointer arithmetic works.

We don’t really want these restrictions on lists, I may want to change the number of elements, and where they’re located in memory should be an implementation detail. For the second point in particular, Java does make it an implementation detail when you decide to use ArrayList or LinkedList.

So what is an array in Java? It’s not an instance of the Array class, but it is an object. It has a field named ‘length’ (public final), and it implements the Cloneable interface. int[].class.getName() is [I. That doesn’t tell us much, but IMHO it’s a hack in the language to implement C-style arrays. It is fast, according to Peter Norvig’s IAQ on Java, accessing an array is 15-30 times faster than accessing an element of a Vector. But that could just be synchronization overhead, and Vector is probably implemented with arrays anyway. There should be some low level language feature that allows us to access that speed.

So what’s my gripe? Arrays in Java should be so much more than copies of the things in C. The Java Collections API which is one of the best Java APIs, and the List type should have the array syntax. Take a look at Arrays in Ruby, which are essentially a good list implementation with array syntax.

a = [1 ,2, 3, 4, 5]
a[1] » 2

b = [6, 7, 8, 9, 10]
a + b » [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

N.B. this produces a new array, but you can use the ‘concat’ method to append them to the same array

c = [2, 4, 6, 8, 10]
(a + b) - c » [1, 3, 5, 7, 9]

Ruby’s Array class also does a lot more.

The concept of C-style arrays need to be dropped. We don’t handle pointers any more, we shouldn’t be bound by the limitations. The concept we’re using is the list data structure, we should make that, along with all of its potential features, our first class type.

Update: I simplified the way to get the int[] class name (Thanks Shai).

Update 2: Corrected a typo in the C example (Thanks JDB).

Disclaimer: In the C example the first declaration will assign memory from the stack and the latter from the heap, so they’re not the same, but I just want to demonstrate elements in the array are next to each other in memory and hence [] notation can be changed to pointers.

Disclaimer 2: I know Vector’s methods are synchronized, and ArrayList is essentially the same thing without the synchronization overhead, hence faster. The performance numbers I pulled up are old and it used Vectors, thanks Shai for providing up to date numbers.

Spread the word: Technorati related  |  Technorati related  |  del.icio.us bookmark it!  |  submit Why do we still have Arrays? digg.com digg it!  |  reddit reddit!

15 Responses to “Why do we still have Arrays?”

  1. JD says:

    Vector is implemented using an array. It’s a synchronised version of ArrayList. Java arrays are nothing like C arrays. For a start they are not adjecent memory blocks and you obviously can’t perform pointer arithmetic. They are also typed and have a defined length.

    You use different list implementations so you can take advantage of the different performance properties of each of them. You use a LinkedList when you need quick insertion in the middle of the list, but an ArrayList when you need quick iteration. You have the same sort of thing in C++ with C arrays, vector and deque etc.

    You are unable to make this distinction with dynamic languages where you are stuck with a single list implementation.

  2. Jesse Wilson says:

    When I want elegant code, I wrap my arrays with Arrays.asList(). When I need fast code, I profile twice then suffer with array access directly.

  3. Shai Almog says:

    First, ArrayList which isn’t synchronized is 20 times slower as of Java 5 with generics. Collections and objects don’t translate directly to assembly language constructs and all the JIT’s in the world can’t make them as fast as arrays. We don’t need the pointers of C but the hardware still uses these datatypes and constructs and Java is still translated to assembly language (Java is also faster than Ruby and will probably always be that way).
    In fact Java arrays have advantages over C’s arrays in some regards, yes there is the cost of boundry checking but its lower than a C programmers cost of making the same check (unless he knows in compile time the size of the array it will take longer in C), Java JIT’s can REALLY optimize arrays and the same cannot be said for collections or none primitive types.
    Second you don’t need: new int[0].getClass(), you can instead write int[].class which is more efficient and inteligent (I realize that was not the point made).
    If I accept the whole [I argument on its face then constructors are also a hack (), so are inner classes (they are) etc… Yes its a special case marker hack whatever its useful.

    Third, no. Arrays should do nothing more.
    Arrays were designed in Java in that way because it is familiar and familiarity is the cornerstone of useability (and yes usability matters to languages). There is a place for the features you mentioned and its called an API, why?
    Here are several reasons:
    1. Easier to maintain - I have an array that suddenly changes, after I call some code. I’d like to find the offending line in the code, it would be quite useless for me to look for “>>” but looking for “Arrays.” is trivial.

    1. Adding features to the language makes it more complicated NEVER simpler. The fewer lines of code seem simpler but if lines of code were measures of simplicity then PERL would be the worlds best language and not a sinking ship.

    2. You know what you get, performance is linear and the VM can guarantee good speed for all operations on an array. A VM can also make many assumptions regarding arrays.

    3. There are millions of Java developers with their own pet peeve that they want ot modify in the Java language, if a change was made for every single idea just because language X does it then Java would be like language X and wheres the benefit?

  4. Raddedas says:

    What you are in effect saying is we have to remove a fundamental part of the language, incidentally rewriting all the OOP list classes based on it, so that the language conforms to a “nicer” object oriented model you personally favour, and anyone who has a high performance requirement for arrays must stop using Java or buy a lot more hardware.

    I think before making this kind of sweeping and incorrect judgement you should consider where Java is used. I’d love to see your argument for using OOP list constructs on a J2ME device with 180Kb heap and next to no processing power…

  5. Ron says:

    I think your two C examples aren’t the same. In the first example, I would expect that a smart compiler could perform some optimizations. In the second example, it seems that dynamic memory allocation from the heap would be unavoidable.

  6. Miles says:

    I figured this post would get some heated replies. :)

    I didn’t make my point particularly clear, probably because I didn’t know what it was, because this started out as a rant. My point as I see it now is that when I’m choosing between a list and an array, conceptually I always want a list (assuming lists have indexed access, which traditionally they don’t, you’re supposed to just have head and tail) because that is a data structure I think in. I would like to see standard data types as part of the language, and optimised so they are fast, just like arrays are.

    Shai is right though, eventually we have to get down to the data structures your processor supports, so for performance reasons we have arrays. The Java Language Specification doesn’t go into any detail about how to implement them, but I bet that performance on par with C arrays was a big requirement at Sun.

    BTW I’m not suggesting Sun drop arrays from Java or make the drastic changes I suggest. That would change the language so it’s not Java. But the JVM is a great platform, there’s no reason we can’t have a higher level language that does these sorts of things and skips over language features that are there for performance reasons.

    This post is just meant to be a bit of food for thought, when we’re designing new languages we can take a step back, look at those features from our old favourites and decide what is a good feature, and what came about because of limitations at the time.

  7. J. David Beutel says:

    Shouldn’t the last line in your C example be *(a + 1)? (I’m glad I’m doing Java now instead of C.)

    Cheers,
    11011011

  8. J. David Beutel says:

    By the way, one reason I often used arrays was to get the compiler to check the types of the elements for a parameter or return value. Now that’s no longer necessary. Generic collections in JDK 1.5 can do that type check, and they’re more flexible (i.e., change size, read-only, Maps, synchronized, etc). When I read the title of your post, I thought that’s what it would be about.

  9. Miles says:

    While I’m not coming from the generics (and other features of collections) angle I think the underlying reasoning is the same. Arrays are an old abstraction, and we now have better abstractions, that are at least equivalent feature wise, we should be using those instead. But because of the performance reasons, arrays still exist, so it would be foolish to drop them before its replacements catch up on that front too.

  10. Sedat says:

    :)

  11. joe schmoe says:

    “Adding features to the language makes it more complicated NEVER simpler”

    Umm.

    Tell me this. Which is simpler?

    list.add(blah);

    –OR–

    list.add((Integer)blah);

    clearly due to a “feature”(in this case auto-boxing), complexity is reduced. And note that auto-boxing is a feature inspired by dot net.

    Another example:
    in Ruby:
    a ||= b

    same thing in Java:
    if (a == null) {
    a = b;
    }

    So tell me, which one is more complex and more likely to be screwed up?

    I can tell you from extensive personal experience the way arrays are handled in Ruby kill the awkward, limited functionality in Java.

    Dismissing any language features out of hand is closed-minded and short-sighted. There is a lot to learn from other languages. You do have to be careful what you pull in of course, but Java is far, far from perfect.

  12. Steve says:

    Arrays are also nice in APIs if you don’t want a client to add to the list, remove items, etc. They feel more final to me at least. You can use an immutable list, but arrays express the intent better.

  13. Dale says:

    In the .NET, arrays are a core part of the CLR, efficiently implemented at the assembler level. I imagine the situation is much the same in the Java VM. But I think the author was talking more about why do we have to see them in the higher level language. He’s right on that, I think. Generically typed collections are much more elegant, and can be implemented with arrays by the compiler.

  14. JD says:

    Of course, in C++ you can use C arrays in the STL anywhere where you’d use an STL container, because all the functions in the STL work on iterators and not containers and the method for moving forward in an iterator is iter++, which conviently is the same as incrementing a pointer. At that point an array stops becoming special and just another container with certain properties. People complain about C++, but the STL was well designed.

  15. Arash Rajeeyan says:

    I don’t want to speak alot
    but if you have studied software engineering or computer science you most know that List and Arrays are two different data structures each one needed in it’s special case.

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>