Wednesday, April 08, 2009

Always Do The Math!

I just spent the last three weeks futzing with proof of concept code, trying to get a client/server setup to support the performance requirements of the client. I was researching all kinds of tcp kernel parameters, reading up on the network stack, java.nio.Selectors, asynchronous I/O, TCP receive windows, doing everything I could to get the numbers to what the client was asking for. Finally, after I had gotten the number to about 4x less than what the client wanted, I did a simple math calculation to see how close to the theoretical bandwidth of GigE I was getting. Turns out I was already using about .8Gbps!! Did the math on the performance that the client wants, it would require 4Gbps ethernet!! Could have saved myself and the client three weeks of time if I had simply done that calculation (correctly) up front!!!

Friday, April 04, 2008

Java 5 Static Import "Feature"

Wow, so first the people at Java 5 give us the convoluted generics which makes code as illegible as C++ templates, without any of the power of C++ templates. At least that provides the ability to avoid casting (but not really). But a new feature I haven't heard advertised much is the ability to import all the static methods of a class so that you don't have to prepend the name of the class when calling the method.

So, if you have a class:


public class StringUtils {
public static String parseString ( String a, String b ) { ... }
}


Whereas before Java 5 in order to call parseString() you would always have to write:


StringUtils.parseString( ... );


Now in Java 5 you can "statically" import all the static methods of the class, then use the method directly:


import static StringUtils.*;

parseString( ... ); // Calls StringUtils.parseString( );


Now, this basically flies in the face of everything that we have been told from the beginning of Java, which is that having static methods be attached to Classes was important for namespacing. Here we have a situation where finding what parseString() does without the help of an IDE is practically impossible. It looks like a local method call! If I saw this line in a piece of code and needed to figure out what parseString did, I would first look in the class, then start hunting through all of its superclasses (which in Java is usually quite a number). Then I would be thoroughly confused. Only after wasting a large amount of time would I think to inspect the import statements! Yet another language feature that is completely unuseable without the help of an IDE.

But the best part yet is the rationale for Sun's designers putting this "feature" into the language: it's to prevent a really dumb "antipattern". So, basically, they have tried to prevent an antipattern by introducing another one. Brilliant.

"Hey, instead of shooting someone with a gun, how about just stabbing them with a knife instead?" What a great idea. It's bad enough when Sun makes poor design decisions in Java. It's even worse when they follow that up with even worse band-aids to the original mistakes.

Here's a better solution: Developers should just type out the name of the class when using static methods! Since everyone in Javaland relies on IDE's to do everything anyway, it's just a series of mouse-clicks to call a static method anyway. At least that way everything stays true to Sun's original intentions for the language.

Tuesday, October 03, 2006

Javadoc: What a Joke

The idea of javadocs have always been pretty cool: a standardized way to document code and then view that documentation. Sun did a pretty good job enabling Java code to be documented in a consistent way.

What Sun didn't do a good job of was producing a useful javadoc tool. The javadoc binary has always been a pretty crappy tool, and today it has shown itself to be useless. The fact that every single class/package had to be listed on the command line in order to be "javadoc'd" made absolutely no sense. Nobody javadocs one file at a time, people javadoc entire projects all at once. Multiple packages, hundreds of files. This is the norm. Yet through over a decade of Java development Sun apparently still has no idea that this is how real Java developers write and document code (maybe this is why they come up with such poor technology specs? Do I need to mention EJB's? JPA is starting to look like garbage, too, a real step BACKWARDS from Hibernate.)

And today, I tried to fun javadoc on my entire project, passing in every single Java file since Javadoc insists on that, and I got this:

Building index for all the packages and classes...
Generating html/overview-tree.html...
Generating html/index-all.html...
java.lang.ClassCastException: com.sun.tools.javadoc.ClassDocImpl

Are you kidding me?! Is this for real?? A ClassCastException?? Who puts out production code like this? This is on JDK 1.5_06. Absolutely ridiculous.

Wednesday, August 09, 2006

Automatic Overriding of Parameterized Methods in Java 5

I thought I was going crazy for a while because I seemed to be getting all kinds of odd behaviors, both at compile-time and at run-time, while trying to play with generics. Thanks to the help of the "javap" command, I think I've figured out a key part of how parameterized methods get overloaded versus how normal methods get overloaded.

Consider the following two classes:

class CompareAgainst<T>
{
public int compareAgainst ( T a ) { return 1; }
}

public class Main2 extends CompareAgainst<Main2>
{
public static int testClassComparable ( Object[] a )
{
Main2 x = (Main2)a[0];
Object y = a[1];
return x.compareAgainst( y ); // Compiler error!
}

public static void main (String[] argv)
{
Main2 m1 = new Main2( );
Main2 m2 = new Main2( );
Main2[] mArray = new Main2[] { m1, m2 };
System.out.println( testClassComparable( mArray ) );
}
}


This seems to be a pretty straight-forward piece of code that demonstrates the correct use of generics. We have a base class CompareAgainst that contains a method compareAgainst() that takes a parameterized type as its argument. Just as we'd expect, when we have Main2 extend CompareAgainst and fill in the parameterized type as Main2, the method compareAgainst(T a) effectively becomes compareAgainst(Main2 a), and so in testClassComparable when we force one of the classes to be an Object, the compiler checks to see if compareAgainst(Object a) exists; it doesn't, so there's a compiler error:

Main2.java:23: compareAgainst(Main2) in CompareAgainst cannot be applied to (java.lang.Object)

However, if we change the implementation of the testClassComparable method so that instead of casting the first parameter to Main2, it instead casts it to a CompareAgainst instance, surprisingly the code compiles and runs just fine!

public class   Main2  extends CompareAgainst<Main2>
{
public static int testClassComparable ( Object[] a )
{
CompareAgainst x = (CompareAgainst)a[0]; // Change cast
Object y = a[1];
return x.compareAgainst( y ); // works fine!
}

public static void main (String[] argv)
{
Main2 m1 = new Main2( );
Main2 m2 = new Main2( );
Main2[] mArray = new Main2[] { m1, m2 };
System.out.println( testClassComparable( mArray ) );
}
}

You might be wondering, as I was, how the compiler suddenly found compareAgainst(Object a) whereas previously it thought it didn't exist. The key is to understand that the compiler can only use the type that we specify. In the first case, we specified that the type of the class was Main2. What methods does Main2 contain? In this first case, the compiler looks at what methods Main2 has and sees only compareAgainst(Main2 a), because we parameterized the compareAgainst(T a) that was defined in the superclass CompareAgainst. So the compiler throws an error.

In the second case though, inside the testCompareAgainst() method we told the compiler that the type of the object was CompareAgainst. The compiler has no choice but to listen to us and go look at CompareAgainst to see what methods are available. You might be wondering what the compiler sees when it looks at CompareAgainst, since the class defines compareAgainst() to take a parameterized type T. But "javap" will settle that question pretty quickly:

javap CompareAgainst
class CompareAgainst extends java.lang.Object{
public int compareAgainst(java.lang.Object);
}

It looks like compareAgainst takes an Object as its parameter! This is our good old friend type erasure rearing its head. The compiler first compiles the CompareAgainst class down to binary form, and there it performs the type erasure, leaving just Object as the method parameter. So when the compiler is trying to compile Main2, it checks the binary version of CompareAgainst and sees that compareAgainst(Object a) exists, and thus compiles the code. This code will also run just fine.

All of this makes sense so far once you understand type erasure and how the compiler resolves methods in other classes during compilation. What's really surprising though is if you then overload compareAgainst() in Main2:

public class Main2 extends CompareAgainst<>
{
public int compareAgainst ( Main2 m ) { return -1; }

public static int testClassComparable ( Object[] a )
{
CompareAgainst x = (CompareAgainst)a[0];
Object y = a[1];
return x.compareAgainst( y ); // Still looks like CompareAgainst.compareAgainst(Object)
}

public static void main (String[] argv)
{
Main2 m1 = new Main2( );
Main2 m2 = new Main2( );
Main2[] mArray = new Main2[] { m1, m2 };
System.out.println( testClassComparable( mArray ) );
}
}

Here we have explicitly defined a compareAgainst( Main2 a ) method in Main2. We might assume that this is a case of method overloading, since we know that in the base class it's really a compareAgainst(Object o) that's defined. In that case, we know that method overloading is determined at compile-time based on the types that we have told the compiler about. In this case, the compiler still thinks it has a CompareAgainst instance, not a Main2 instance, and the parameter being sent in still looks like an Object, not a Main2 instance, so the compiler should still call the superclass's compareAgainst method, not our newly defined method. If this was regular method overloading, this is exactly the behavior we would get.

However, this is not the way the compiler sees the situation. From the compiler's point of view, when we specialized the base class with Main2, we created a compareAgainst(Main2) method automatically IN THE BASE CLASS. Again, this is not what the byte code says, but rather what the compiler thinks. When we then define our own compareAgainst(Main2) method, the compiler assumes we want to override the superclass's implementation. Thus, it proceeds to hide the superclass's implementation and make our explicitly defined implementation the one that is called. Running the code above confirms that this is what happens, our new compareAgainst(Main2) method is called, not the version in the base class.

But how does the compiler make the overridden method get called when all it knows is that a compareAgainst(Object o) is being called on an instance of a CompareAgainst object? Here is where the compiler does a bit of magic. The compiler understands that even though we have specialized CompareAgainst with Main2, and thus conceptually created a compareAgainst(Main2) in the base class, in reality all that will be left in the base class when it gets compiled is compareAgainst(Object o). So if the compiler wants to enforce that the overridden method is called every time, what it actually has to do is override not compareAgainst(Main2) from the base class, but compareAgainst(Object o)!

If we use "javap -c" to examine the source code for Main2, we see that that is exactly what has happened:

public int compareAgainst(java.lang.Object);
Code:
0: aload_0
1: aload_1
2: checkcast #4; //class Main2
5: invokevirtual #9; //Method compareAgainst:(LMain2;)I
8: ireturn

public int compareAgainst(Main2);
Code:
0: iconst_m1
1: ireturn

Here, we see that the compiler has inserted a compareAgainst(Object o) automatically into our Main2 class. This causes the compareAgainst(Object o) from the superclass to be overridden. The compiler implements compareAgainst(Object o) so that it simply casts the argument to a Main2, and then calls our compareAgainst(Main2) method.

Now if we go back and check the code again, we can understand how our new compareAgainst() gets called even though the compiler only thinks it has a handle to a CompareAgainst class, and even though the parameter to compareAgainst() looks like an Object. During runtime, the JVM actually does invoke compareAgainst(Object), but there is a compareAgainst(Object) defined in Main2, namely the one the compiler defined for us. Thus that version gets called, and that version in turn calls our compareAgainst(Main2) method explicitly. Note that this mechanism will work across multiple levels of inheritance. If you were to create a class called Main3 that extended Main2, and then defined your own compareAgainst(Main2) in that Main3 class, the compiler would again insert the overridded compareAgainst(Object) into the Main3 instance to force the Main3 definition of the method to be called instead of any of the superclass versions.

This is the exact mechanism that enables Comparable to work in things like Arrays.sort(). Within the implementation of Arrays.sort(Object[]), the method casts each object to an instance of Comparable, and then calls compareTo(Object) on it. But because the compiler has inserted a compareTo(Object) into the definition of every class that implements Comparable, the correct compareTo(T) is called and the sorting algorithm works properly.

One last note: This mechanism works because the compiler is attempting to ensure that method overriding works correctly. This does not have anything to do with method overloading. If we define a compareAgainst(Integer) in any of our classes, then we are overloading compareAgainst(), not overriding it. As with any overloaded method, the compiler will only call that method if the types match up at compile-time. So unless we explicitly cast something to Integer and pass it into compareAgainst(Integer), that overloaded method will not be called.

Tuesday, August 08, 2006

Maybe Java Generics Really Are Useless

So in March I thought I had found a pretty cool way to get around type erasure in Java 5 generics. If you overload methods with generic types, the resolution of which overloaded method to call happens at compile-time, so the compiler will hard-code the correct method to call. Or so I thought. This led me to think that I could do something cool like write a base class to handle all the correct operator equals() operations without requiring a runtime type-check. I describe this in detail in my blog posting titled "Overcoming Java 5 Generics Type Erasure with Method Overloading".

In that posting I basically posited that given:

class MyClass
{
public boolean equals( T o ) {}
public boolean equals( Object o) {}
}

MyClass myString = new ...
myString.equals( "someString" );
myString.equals( new Object() );

I thought that I could now assume that the proper equals() would always get called, because the compiler would at compile-time figure out the appropriate equals() and hard-code that into the bytecode, thus avoiding type erasure. In other words, I assumed that the "T" would get replaced with String in the definition of MyClass, and thus when I called equals() the compiler would hard-wire a call the correct overloaded version at compile-time.

Turns out I was mostly wrong, because the compiler only knows about the correct type in the scope in which the correct type is specified. In other words, if you say "MyClass<> myclass", then within the immediate scope of 'myclass' the compiler knows what type 'myclass' is and will call the appropriate overloaded type. That's why I got tricked into thinking this would always work.

This led me to think that I could send in MyClass to a generically-typed object, and then when the object called "equals()" the appropriate overloaded version of MyClass's equals() would get called. So if I had:

class GenericClass
{
public void doSomething( T a ) {
a.equals( "someString" );
}
}

// Code snippet
GenericClass o = new ....;
MyClass myclass = new ....;
o.doSomething( myclass );

I thought that within doSomething() the proper equals would get called in this situation, since the compiler would know that myclass is of generic type "String". Unfortunately this is not the case. Once the scope passed into the doSomething method, the compiler "forgot" about the generic type of myclass, and instead just defaulted to "Object". So instead of calling equals(T) the compiler coded in equals(Object).

This makes sense in a way, because the compiler can only see the generic type at the boundary level where it is specified. Once it enters a block of code that treats the code generically it can't keep track of the type anymore, not if it wants to allow the block of code to be pre-compiled and distributed in binary form.

Put into another way, the decision of which overloaded method to call is always made at compile-time. So if one block of code can be compiled separately from another block of code, then it's not possible for the compiler to figure out retroactively which overloaded method to call given the constraints of Java 5 generics, so it has to default to the lowest-common-denominator, which is usually Object.

Man, Java 5 generics really aren't all that useful...

Wednesday, August 02, 2006

Reflection IS Slow - Eating Major Crow!!

So I'm an idiot, please disregard the last post about reflection being fast. Turns out I switched the names of two critical variables in my benchmark test, and the times got swapped. It's CGLIB that's 4x - 10x faster than reflection, not the other way around!! Don't use reflection unless performance really doesn't matter!!!

Munch, munch ... crow tastes like crap! ... Munch, munch

Tuesday, August 01, 2006

Java Reflection no longer that slow!

[WARNING: Turns out I was wrong about reflection, see the blog post above for my addendum! I'm only leaving the contents here for "historical" reasons! REFLECTION IS SLOW!!!!]

I was in the process of evaluating validation frameworks for Java applications and came across iScreen. Looking at that framework, I liked the fact that it performed a mapping from any given object to a more specific object that the validation required. This eliminated the need to have every object implement some sort of a "IValidatable" interface that would enable some generic validation object to validate it.

For example, if you want to write a generic method that validates all objects that have a start date and an end date, you'd need for every object to have a "getStartDate()" and "getEndDate()" method so that this generic method could perform the validation. So you'd create an "IDateValidatable" interface that has a "getStartDate()" and "getEndDate()" method and force any object that wanted to be validated to implement these methods. This is rather clunky because it means that your data access objects might need to implement an interface that it really doesn't care about at all (most validation happens at the business layer, not the data access layer). Normally, one object needs to be validated in multiple ways, so this same data access object might also need to implement other methods in order to be validatable by other generic methods. This means the one object has to have knowledge of any and all validations that anyone might need to perform on it. Considering that two completely separate groups might be implementing the data access layer and the business layer, coordinating all the validations becomes a mess rather quickly.

iScreen has a different approach. Instead of trying to have interfaces that any and all objects must implement in order to be validated, iScreen allows each generic validator to define an object that contains the methods it needs in order to validate the data, and then iScreen will guarantee that an instance of that object is passed into the validator for validation. For example, a DateValidator might define an object that contains "getStartDate()" and "getEndDate()" methods, and then this object would be what iScreen passes into the DateValidator. iScreen is able to guarantee that this object is passed in because it allows the developer to define a set of rules to map the data from any arbitrary object into an instance of this object. So, if there was an Employee object that had a "getHireDate()" and "getTerminationDate()", iScreen allows the developer to map the "getHireDate()" method to the "getStartDate()" method, and the "getTerminationDate()" method to the "getEndDate()" method. Once that mapping is performed, anytime an Employee is sent in for validation, iScreen automatically creates a new instance of the object that the validator needs and populates it with the data from the Employee object.

It's quite a neat way around the fundamental problem of how to enable objects to be used in multiple generic contexts without requiring them to implement any and all interfaces that might be required across all of the layers of the application. Consider for example the need to display objects in a table. There might be a generic table component that always displays an ID, a name, and a description. Without this mapping functionality, it would be necessary for an interface to be created that had "getID()", "getName()", and "getDescription()" methods, and a data object that comes all the way from the back-end data access layer would be required to implement this interface. This would be extremely cumbersome, not to mention a terrible break in the encapsulation of the different layers of the application, since front-end UI concerns would be infecting the objects in the back-end. Instead, if the data object could be automatically mapped to some other object that the front-end UI component required, this would enable both the creation of the reuseable UI component without requiring the data object to be aware of it. It's really the best of both worlds.

iScreen achieves this mapping capability by using a technology called OGNL. Looking under the hood of OGNL, I was dismayed to see that it appeared to use reflection to perform the mapping. My understanding had always been that reflection was quite slow, so I decided not to use iScreen and OGNL, but instead to write my own mapping tool based on cglib, since it was my understanding that cglib performed byte-code manipulations to achieve faster reflection-like operations. Specifically, I used cglib's FastClass, FastMethod, and Enhancer APIs. (One very frustrating thing about cglib was its lack of documentation. It was obvious to me that there were other APIs that I could be making in order to potentially achieve more speed, but there is almost no documentation for how to use them, so all I could do was use FastClass and FastMethod).

When I was done coding the first part of the data mapping tool and began unit testing it, I decided to do some performance benchmarks to see how much of a speed gain I had achieved using cglib instead of reflection. After all, if the performance difference was minimal, I might as well stop my custom coding and just use OGNL and iScreen. I set up a synthetic microbenchmark that simply called a FastMethod versus a regular reflection Method over and over again and timed the operations. I was quite shocked when I saw the results.






# Iterationscontrolreflectioncglib
10,000x0ms5ms21ms
100,000x4ms17ms105ms
1,000,000x47ms97ms971ms


The control was simply calling a regular method without reflection over and over again. While the control was the fastest of all, the most surprising result was that cglib was much much slower than reflection! Over 10,000 iterations it was about 4x slower, but at 1,000,000 iterations it was almost 10x slower. Now granted, synthetic microbenchmarks are always a little bit of hooey, but these differences aren't small at all. I tried the benchmark both on a WindowsXP Pentium4 class machine and a FreeBSD laptop Pentium M, and in both cases the differences between the two were comparable. So there was basically no question about it, reflection is faster than cglib, at least in the way that I was using cglib, and it didn't make sense to write my own data mapping tool using cglib.

The only question is whether or not it's worth it to take the performance hit of using reflection versus regular methods. Personally, when I see that at 100,000 iterations each reflection method call only took 0.00017ms, or below the nanosecond threshold, and those numbers were even better at 1,000,000 iterations, I'd say the coding problem that's solved by using reflection is worth the minor cost in performance.