Tuesday 14 October 2014

My take on "Don't Let Hibernate Steal Your Identity"

Some of my favorite interview questions for a Java developer centered around equals() and hashCode(). I usually start by asking:

What are the purposes of equals() and hashCode()?

Every Java programmer MUST know the answer, it's one of the most fundamental features of the language. Surprisingly, lots of people I interviewed couldn't give a straight answer.

Here's a couple of myths about equals() and hashCode():

1. "Equals is used to compare if two object are equal". Not quite right. Equals compares if two objects are equal within a predefined set of business rules. Imagine if I forge my license plate to match yours and was caught speeding on camera. Chances are, you're the one who's gonna get a ticket in mail because in the eyes of the infringement system, the two cars are equal even though to everybody else they look totally different.

2. "hashcode() is used by equals()". I don't know why people would say this, I guess it's because they didn't realize hashCode() does NOT return a unique number.

To follow on, I'd ask:

Given the following entity class, how would you implement equals() and hashCode()?
@Entity
public class Customer {
    @Id @GeneratedValue
    protected Long id;

    protected String firstName;
    protected String lastName;
}
A naive programmer would start talking about using different combinations of id, firstName and lastName to implement both methods. That's because they have not read O'Reilly article: Don't Let Hibernate Steal Your Identity or JBoss' own article on this topic.

Sadly, many developers I've met don't see what the fuss is about. One even argued the O'Reilly article is outdated. I guess that's because they haven't encountered production issues such as:
  • An entity goes missing after being inserted in a set, or 
  • Loading the same entity from 2 different EntityManagers returns false when equals() is called, or 
  • The same entity returns two different hash code before and after being persisted.

Worse still, many developers know about the danger but still resist the notion of adding a business key thinking it's too long and too slow to index, and start exposing primary keys to external systems.

Bad idea.

Once your primary keys are exposed to an external system, it becomes very difficult to move these data to a different schema because ID may clash. Also, a primary key is just a number, it's hard to differentiate a customer with id=1 from an administrator with id=1. It's very error prone.  

This is where I think we need to improve on O'Reilly's original suggestion of using a UUID in order to make business keys more "palatable".

A UUID is great for machine, but terrible for human consumption and it's not sequential. For an impatient user, the last thing he/she wants to do is to repeat a random 36-character long ID in order to do anything.

Fortunately, many people have thought about the same problem and came up with viable alternative solutions. One of my favorites is Twitter's Snowflake because it's not tied to a database and it generates (almost) sequential ID's, making them easy to sort and to define a range.

In addition to a unique number, we should also always prefix it with an appropriate abbreviation such as "CUST00401" and "ITEM00822" to clearly identify what the context of this unique ID is.

A business key like this is also extremely useful for logging because instead of looking for a straight numeric value or a meaningless UUID, you can easily grep for a unique business key in the logs.

But don't go too far though. Don't ditch your numeric primary keys and replace them with business keys because JPA and RDBMS still need numeric primary keys for entity comparison and indexing. Comparing numeric keys is always faster than comparing text-based keys.




No comments:

Post a Comment