SimpleDateFormat is slow

Optimization is often full of surprises. Whether it is low level C or high level Javascript, I always learn something from the profiler. And more often than not, it is the little things that matter.

Java SimpleDateFormat is an powerful time formatter that can handle various date formats in different locales. This power comes at a price. It is slow – very slow.

I learned this last year when working with an Android application. The Android profiler pinpointed SimpleDateFormat to the cause of the poor refresh rate. The solution then was to simply adjusting the formatting frequency. By avoiding frequent calls to SimpleDateFormat, performance improved and that was the end of the story.

A year later, now I am working on a NetBeans platform based project and I am facing the same situation again. This time I need to parse several millions timestamp from a log file and convert them to epoch milliseconds. Unlike last year, avoiding the formatter is not an option.

Roll Your Own

The problem is CS-101 simple. I need to parse a timestamp in the format – yyyyMMddHHmmss into epoch time in millisecond. The timestamp is always in GMT.

Here’s the sample code with SimpleDateFormat. This method is called getTimestamp1.

private static final SimpleDateFormat cachedSdf
   = new SimpleDateFormat("yyyyMMddHHmmss");
static {
   cachedSdf.setTimeZone(TimeZone.getTimeZone("GMT"));
}
// Parse date and time into epoch millisecond
public static long getTimestamp1(String date, String time) {

   if (date.isEmpty() == false && time.isEmpty() == false) {
      try {
         return cachedSdf.parse(date + time).getTime();
      } catch (ParseException e) {
      }
   }
   return 0;
}

Alternatively, here’s the do-it-myself version. I call it getTimestamp2.

private static final Calendar CachedCalendar = new GregorianCalendar();
static {
   CachedCalendar.setTimeZone(TimeZone.getTimeZone("GMT"));
   CachedCalendar.clear();
}
public static long getTimestamp2(String date, String time) {

   try {
      int y = Integer.parseInt(date.substring(0, 4));
      int m = Integer.parseInt(date.substring(4, 6));
      --m;
      int d = Integer.parseInt(date.substring(6, 8));
      int h = Integer.parseInt(time.substring(0, 2));
      int mm = Integer.parseInt(time.substring(2, 4));
      int s = Integer.parseInt(time.substring(4, 6));

      CachedCalendar.set(y, m, d, h, mm, s);

      if (CachedCalendar.get(Calendar.YEAR) != y) {
         return 0;
      }
      if (CachedCalendar.get(Calendar.MONTH) != m) {
         return 0;
      }
      if (CachedCalendar.get(Calendar.DATE) != d) {
         return 0;
      }

      if (h < 0 || m > 23) {
         return 0;
      }
      if (mm < 0 || mm > 59) {
         return 0;
      }
      if (s < 0 || s > 59) {
         return 0;
      }
      return CachedCalendar.getTime().getTime();
   } catch (Exception e) {
      return 0;
   }
}

Benchmark

Here are the results using the NetBeans profiler.

The CPU time decreased from 1.13s to 0.288s, which is roughly a ~75% reduction.

CPU time: on getTimestamp1 vs. getTimestamp2

CPU time: getTimestamp1 vs. getTimestamp2

The total object allocations decreased by ~3kBytes (per call).
getTimestampMemory

Final Thoughts

Although simple, the performance improvement here was more significant than any other fancy optimization I’ve done on the application.

getTimestamp2 can be 10-15% faster if you replace Integer.parseInt with another solution.

SimpleDateTime is slow. If you need the performance, it may be worthwhile to roll your own solution.

As always, benchmarking Java is hard. These performance numbers should only be used in relative terms.

Source & Tools

The source code can be found here.

JDK7u15 64 bit, Win7 64bit, NetBeans 7.3.

Protobuf over JNI

On a recent project, I have been working with JNI to wrap up a C++ library into Java. Like most usages of JNI, it is not for performance, but for compatibility.

JNI itself is a beast. It is quite a challenge to get both the performance and the correctness aspects of its APIs right. While its programming style is close to C, exceptions needs to be checked frequently. Its API naming schemes are full of misleading traps that often lead to memory leaks.

It is so complicated that if you want to pass data through JNI, you want to stick with primitive types. That’s because passing complex data structure into JNI is rather painful.

Unfortunately, my project requires passing complex data structures from Java into C++. To solve this problem, I turned to Protobuf for help.

Pojo Over JNI

To get a taste of basic JNI, here’s an example. Say I want to pass the following data structure from Java into C++ through JNI.

// A POJO that we pass from Java into JNI layer.
public class CustomPojo {
   private String s;
   private int i;
   private double d;
   private boolean b;
   private String[] sa;
   private int[] ia;

   public String getS() { return s; }
   public void setS(String s) {this.s = s;}
   public int getI() {return i;}
   public void setI(int i) {this.i = i;}

   ...
}

Since the members within CustomPojo are private, it requires the native side to reach back through individual method calls. Here’s an example of how it would look in C++.

// A set of cached class and method IDs used for optimization.
namespace
{
   jclass customPojoClass;
   jmethodID midGetI;
   jmethodID midGetD;
   jmethodID midIsB;
   jmethodID midGetS;
}
// Cache all the necessary fields for the first time.
void init(JNIEnv *env, jobject customPojoObject)
{
   customPojoClass = env->GetObjectClass(customPojoObject);
   midGetI = env->GetMethodID(customPojoClass, "getI", "()I");
   midGetD = env->GetMethodID(customPojoClass, "getD", "()D");
   midIsB = env->GetMethodID(customPojoClass, "isB", "()Z");
   midGetS = env->GetMethodID(
     customPojoClass, "getS", "()Ljava/lang/String;");
   // This is not productive level code. In practice, rigorous error checking
   // is needed here per JNI best practice.
}

JNIEXPORT void JNICALL Java_com_askldjd_blog_HelloWorld_passCustomArgumentIn(
  JNIEnv *env,
  jobject obj,
  jobject customPojoObject)
{
  if(customPojoClass == NULL) { init(env, customPojoObject); }

  int i =  env->CallIntMethod(customPojoObject, midGetI);
  double d =  env->CallDoubleMethod(customPojoObject, midGetD);
  bool b =  (bool)env->CallBooleanMethod(customPojoObject, midIsB);
  jstring stringObj = (jstring)env->CallObjectMethod(
    customPojoObject, midGetS);
  char const *nativeString = env->GetStringUTFChars(stringObj, 0);
  env->ReleaseStringUTFChars(stringObj, nativeString);
  // This is not productive level code. In practice, rigorous error checking
  // is needed here per JNI best practice.
}

To follow the best performance practice, JNI requires caching the class and method IDs. And to access each fields, we need a seperate JNI call to invoke the individual accessors.

As this point, it should be clear that passing complex data through JNI is non-trivial.

Protobuf over JNI

Alternatively, we can use Protobuf as the JNI messaging medium between Java and C++. This way, the communication channel through JNI is strictly through byte arrays. In this approach, the JNI complexity is reduced to a simple byte array access. Therefore the code verbosity and the potential for programming error is drastically reduced,

Here is the same example as above, but with Protobuf over JNI. First, we redefine CustomPojo into a Protobuf message.

option java_outer_classname = "CustomProtoPojo";
package com.askldjd.blog;
message PojoMessage
{
   required string s = 1;
   required int32 i = 2;
   required double d = 3;
   required bool b = 4;
   repeated string sa = 5;
   repeated int32 ia = 6;
}

Now instead of passing a complicated data structure through the JNI interface, we can encode the Protobuf message into a byte array through JNI, and decode it in C++.

// Interface definition for javah
public class JniPojo{
   static { System.loadLibrary("jni");   }
   public native void passProtoArgumentIn(byte[] byteArray);
}
// Pass a protobuf encoded byte array into JNI layer.
private static void testProtoJNI() {
   JniPojo jniPojo= new JniPojo();
   Builder pb = CustomProtoPojo.PojoMessage.newBuilder()
      .setS("Secret POJO message").setI(i).setD(42.5566)
      .setB(true).addIa(0);
   byte[] ba = pb.build().toByteArray();

   JniPojo.passProtoArgumentIn(ba);
}
// CPP JNI
// Take in a protobuf encoded byte array and decode it into data structure.
JNIEXPORT void JNICALL Java_com_askldjd_blog_HelloWorld_passProtoArgumentIn(
   JNIEnv *env,
   jobject obj,
   jbyteArray buffer)
{
   using namespace com::askldjd::blog;
   PojoMessage pm;
   jbyte *bufferElems = env->GetByteArrayElements(buffer, 0);
   int len = env->GetArrayLength(buffer);
   try {
      pm.ParseFromArray(reinterpret_cast(bufferElems),
         len);
      // handle message here
   } catch (...) {}
   env->ReleaseByteArrayElements(buffer, bufferElems, JNI_ABORT);
  // This is not productive level code. In practice, rigorous error checking
  // is needed here per JNI best practice.
}

Performance

Conceptually, protobuf over JNI should be more expensive. After all, protobuf is first encoded in Java, deep copied in JNI, and then decoded in C++. In practice however, the performance of the protobuf-JNI approach is 7% faster than the POJO-JNI approach.

jni_perf

Pojo-JNI vs. Protobuf-JNI over 10 million JNI calls.

Thoughts

Protobuf is a good medium to pass complex data structure through JNI. Compare to the handcrafted reach-back JNI code, protobuf over JNI has far lower code complexity while having equal or better performance.

This approach is great for passing low volume traffic of complex data over JNI. For high volume traffic, it is best to avoid complex data altogether and stay within primitive types.

This only covers synchronous JNI call. Asynchronous JNI callback is a topic (nightmare) for another day.

Source

Source files for the benchmark can be downloaded here.

Tools: Win7 64bit, Java7u10, VS2008, protobuf 2.4.1

Shared Bathroom Problem

I came across an interesting synchronization problem called the shared bathroom problem. The shared resource is the bathroom. It needs to be shared among males and females. However, the bathroom resource sharing protocol must be implemented in the following manner:

  1. Mutual exclusion: persons of opposite sex may not occupy the bathroom simultaneously,
  2. Fairness: everyone must wait in line in a civil manner without cutting in line.

Rule #1 lays out similar characteristics compare to the reader/writer problem. But the rules here are slightly relaxed, where there could be multiple male or female occupying the resource at once.

To ensure starvation-freedom, rule #2 enforces fairness such that any person will eventually enter the bathroom.

A Jab at the Problem

I have just started (re)learning Java professionally, so I might as well write it in Java.

Mapping the problem to implementation is fairly straight forward. There are two classes of threads – MaleThread and FemaleThread. Both types of threads will try to enter a resource (class) called Bathroom. Bathroom is consisted of two API – enter(Person p) and leave(Person p), where Person is just a data class that indicates holds the gender and thread information.

So an example of a male thread entering the bathroom could be the following:

Person male = new Person(Thread.currentThread(), Gender.MALE);
bathroom.enter(male);
// do some private business
bathroom.leave(male);

From rule #1, the Bathroom class has the characteristics of a mutex, where pending threads need to be blocked and unblocked efficiently. For this, I chose LockSupport for thread synchronization. For the fairness requirement in rule #2, ConcurrentLinkedQueue is used for the FIFO behavior.

class Bathroom
{
  private AtomicInteger numMale = new AtomicInteger(0);
  private AtomicInteger numFemale = new AtomicInteger(0);
  private Queue waiters = new ConcurrentLinkedQueue();

  public void enter(Person person)
  {
    boolean wasInterrupted = false;

    waiters.add(person);

    // Block while not first in line, or if there is opposite sex in the
    // bathroom.
    while (waiters.peek() != person
        || getOppositeGenderCount(person.getGender()).get() > 0)
    {
      LockSupport.park();

      // ignore interrupts while waiting
      if (Thread.interrupted())
        wasInterrupted = true;
    }

    // Increment the current sex count since this person is first in line
    // and there is no opposite sex in the bathroom.
    getGenderCount(person.getGender()).getAndIncrement();

	// Remove its from the waiting line. This is the linearization point.
    waiters.remove();

	// wake up the next waiter
    Person nextPersonInLine = waiters.peek();
    if (nextPersonInLine != null)
    {
      LockSupport.unpark(nextPersonInLine.getOwnerThread());
    }

    // reassert interrupt status on exit
    if (wasInterrupted)
    {
      person.getOwnerThread().interrupt();
    }
  }

  public void leave(Person person)
  {
	// decrement the gender count, and wake up the next waiter
    getGenderCount(person.getGender()).getAndDecrement();

    Person nextPersonInLine = waiters.peek();
    if (nextPersonInLine != null)
    {
      LockSupport.unpark(nextPersonInLine.getOwnerThread());
    }
  }

  private AtomicInteger getOppositeGenderCount(Gender gender)
  {
    if (gender == Gender.MALE)
      return numFemale;
    else if (gender == Gender.FEMALE)
      return numMale;

    // something is wrong here.
    return numMale;
  }

  private AtomicInteger getGenderCount(Gender gender)
  {
    if (gender == Gender.MALE)
      return numMale;
    else if (gender == Gender.FEMALE)
      return numFemale;
    // something is wrong here.
    return numMale;
  }
}
public class Person
{
  public Person(Thread ownerThread, Gender sex)
  {
    super();
    this.ownerThread = ownerThread;
    this.sex = sex;
  }
  private final Thread ownerThread;
  private final Gender sex;

  public Thread getOwnerThread()
  {
    return ownerThread;
  }

  public Gender getGender()
  {
    return sex;
  }
}

public enum Gender
{
  FEMALE, MALE
}

Some Explanation

The bathroom itself is lock-free. From the high level, every person who tries to enter a bathroom goes through a “test and set” process for two conditions – 1. he/she is the first person in the waiting line, and 2. no opposite sex is in the bathroom. As soon as these two conditions are valid, the person will enter the bathroom by incrementing his gender count, and leave the waiting line. This is the linearization point of the enter() method.

The leave() method simply decrements the gender count and notify the next in line to wake up and re-test the conditions to enter the bathroom.

For suspending and waking up threads, LockSupport’s park() and unpark() method were used. The park() method is simple, where threads are suspended until a “permit” is available through the call to unpark(). However, the unpark() function has a subtle yet beautiful post-condition. It guarantees the following:

Make available the permit for the given thread, if it was not already available. If the thread was blocked on park() then it will unblock. Otherwise, its next call to park() is guaranteed not to block.

Because unpark() guarantees unblocking the next call to park(), the bathroom class does not suffer from lost-wakeup.

Thoughts

To prove the code’s correctness, I was able to identify the linearization point of enter() and leave(). It also passes my unit test. However, I do not claim to have complete confident in the correctness of the algorithm. After all, data race detection is a NP hard problem.

I found two similar blog posts on this problem. This solution is different in the sense that only guarantees starvation-freedom (as oppose to fairness, which is more restrictive). Another solution guarantees fairness (also uses FIFO queue), and is written in a lock-based manner.

Performance was not measured here, as it is only a proof of concept.

The bathroom resource is not recursive (not re-entrant). After all, recursive lock is evil.

The source can be downloaded here. (Written with Java 1.6)

Some thoughts on Java

For the past month, I have been working on a server side web service in Java. Although I have used Java in numerous courses throughout college and grad school, this is the first time I have programmed in it professionally.

Compare to C++, Java is a much cleaner language. Here’s a few things I enjoy so far.

  • Importing and exporting libraries is fairly trivial (no declspec and name mangling issues).
  • A memory model that provides visibility and ordering guarantees (unlike C++ volatile)
  • Safe casting without worrying about memory layout (no memory alignment or padding).
  • Garbage collection allows for slightly sloppier exit sequence. (vs. memory corruption)

In general, difficult things I deal with in C++ are still difficult in Java (such as concurrency). But Java’s cleaner design provides an invisible guiding hands to prevent me from shooting my foot.

Complaints

Deep copying in Java is an unpleasant experience. Java does not provide a default copy constructor, hence all copy constructor must be hand-crafted. The clone interface counter-intuitively provides a shallow copy, and is avoided by best practices.

Over the years with C++, I have formed a habit of cleanly separating business logic and data. Under such a setup, data is coded in form of POD, the default copy constructor (and = operator) provides deep copying for free. In general, deep copying of POD is a trivial operation, and generally has no-throw guarantee.

Unfortunately, Java does not have a well defined concept of POD. In fact, even the simplest object in Java, java.lang.Object, is non-trivial. Each Object is associated with a monitor object, and deep copying a monitor object under a multi-threaded environment is almost never desirable. Since the internal monitor object is not exposed in the Object interface, you can never truly perform a deep copy of any Java objects.

And without an automated POD copy constructors, the copy-and-swap idiom does not transfer as nicely in Java.

More Thoughts

If the lack of deep copy is the only thing I can complain about Java, it is not a bad experience so far.