Friday, 2 November 2018

IT infrastructure is debt, not capex

When it comes to comparing the cost of cloud vs on-prem, a lot of people still hold the belief that cloud is more expensive for reasons ranging from unexpired hardware lease, investment in the past are sunk cost, to maintenance cost is inevitable engineering cost.

The conventional thinking treats infrastructure as capital expenditure, as the cost of doing business just like bakers need ovens to make breads. But with the advent of cloud technology, that equation changes dramatically. Yes, a baker still needs ovens to make breads, but they certainly don't need milling machine to make flour or heavy machineries to make baking pans. There are other people who spent millions of dollars who can do the jobs better.

In fact, in the age of cloud computing, I'd argue that infrastructures are not just expenditure, they are debt. Because for every dollar you spend on infrastructure, you're obligated to service the new infrastructure, hire and train people to maintain them, constantly pamper the infrastructure like pets to make sure there's no outage or downtime. By the time your operation is smoothed out, your once shiny state-of-the-art infrastructure is a thing of the past, something else newer and shinier is available.

What's more, these are just the visible costs. The devil hides in the details for many people will resist change and hold on to existing infrastructure and make engineers to work around it. This not only impedes speed, performance and innovation, it takes a toll on the people as well. That's why I believe calculating the true cost of infrastructure is an art, not science since some of these costs can never be truly measured.

On-prem infrastructure had a great run in the nineties and early 2000's because the IT industry was exploding but the internet connection was slow and unreliable. Every company had to set up servers, networks, operating systems and admin tools in order to do business. Nowadays, the internet enables us to stay connected to another machine anywhere in the world. Infrastructure has become an industry just like factories on the other side of the world. IaaS providers certainly don't treat infrastructure as debt because it is how they make money, and they are willing to invest ever more money in it while we pay a fair price to reap the rewards.

So, next time before you buy anything new, think about whether it is cost or debt. It may surprise you how it changes the way you view the world.

Thursday, 12 February 2015

Spring Boot / MongoDB / Heroku

Spring Boot / MongoDB / Heroku

With the advent of embedded containers, the table has finally turned in favor of the applications. Java EE apps no longer depend on container's settings for memory allocation and class loading, apps can now decide how to deploy, where to deploy and when to deploy.

Spring Boot takes it one step further by being "opinionated". By selecting the default bootstrap behaviors and dependencies such as Tomcat, Jackson, Hibernate Validator, Logback and SLF4J, it makes a developer's life easier and allows them to focus on writing code, instead of XML's.

So, for the back-end service of my next mobile app, I decided to use Spring Boot, MongoDB and deploy them on a PaaS. The first step was to choose a PaaS provider and Heroku made it a very easy decision because it's:
  • Popular, with lots of Q&A and howto's on the web.
  • Easy to set up and use
  • Supported by plenty of plugins
  • Free to sign up and free to run small scale apps
  • Got support for Java 6, 7, and 8
In comparison, Engine Yard requires one to install JVM and Google App Engine is still tied to Servlet 2.5, something that's incompatible with Spring Boot. See paasify.it for more comparisons.

Setting up Heroku was a breeze. I created an account and went through the steps described in its dev center and I had Hello World up and running in less than half an hour.

To push a Spring Boot project to Heroku was equally easy. I'd recommend reading Spring's blog, and Spring Boot reference docs here and here.

My Project

To make life easier, I've created a GitHub project with all the necessary bits and pieces needed to run Spring boot and Mongodb on Heroku. It should serve as a good starting point for anyone who want to use the same stack.

Here is the steps I took to create this project:

1. Create a pom.xml file. This is where we rely on Spring Boot to manage dependencies and plugins. Only include additional dependent jars if they're not already provided by spring-boot-starter-web and make sure to avoid version conflicts. spring-boot-maven-plugin provides convenient Sprint Boot commands in maven:
 <parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>1.2.1.RELEASE</version>
 </parent>


 <dependencies>
  <!-- ======== Spring Boot ======== -->
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>

  <!-- ========= Spring Data ========= -->
  <dependency>
   <groupId>org.springframework.data</groupId>
   <artifactId>spring-data-mongodb</artifactId>
  </dependency>

  <!-- ========= Testing ======== -->
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-test</artifactId>
   <scope>test</scope>
  </dependency>

 </dependencies>


 <build>
  <plugins>
   <plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
   </plugin>
  </plugins>
 </build> 
 
2. Create an executable class. I personally do not like to mix this class with the configuration class. Use @Import to link the two together:
@EnableAutoConfiguration(exclude={DataSourceAutoConfiguration.class})
@Import(AppContext.class)
public class Application {

 public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
    
}
 
3. Create a configuration class. Spring Boot documentations recommend configuration classes over XML's. There's no denying the world is moving away from XML every single day:
@Configuration
@EnableWebMvc
@EnableAsync
@ComponentScan(basePackages="com.jackfluid")
@EnableMongoRepositories("com.jackfluid.repo")
public class AppContext extends WebMvcConfigurerAdapter {

    @Override
    public void addResourceHandlers(ResourceHandlerRegistry registry) {
        registry.addResourceHandler("/static/**").addResourceLocations("/static/");
    }
    
    @Override
    public void configureDefaultServletHandling(DefaultServletHandlerConfigurer configurer) {
        configurer.enable();
    }

}
 
4. Create a controller. In my case, since this is just a REST back-end, I settled for a very simple home page.
@RestController
public class HomeController {
 @RequestMapping("/")
 public String index() {
  return "Spring Boot on Heroku";
 }
}
 
5. Create a Procfile next to pom.xml. This files contains instruction to Heroku on how to launch the application. The important thing here is to declare the app as a web app and to pass server.port as a system parameter to the code. Each time Heroku starts the app, a different port number may be assigned. Additional JAVA_OPTS can also be defined here.
web: java -Dserver.port=$PORT -jar target/spring-boot-heroku-1.0.0-SNAPSHOT.jar

6. Create a system.properties file next to pom.xml. This file contains directives to Heroku as to what the runtime environment should be.
java.runtime.version=1.8

7. Create a MongoDB repository. My sample app aims to read live Twitter feeds and persist them as they stream in, so I simply need a repo for the tweets.:
public interface TweetRepo extends MongoRepository<Status, String> {

}

8. Create application.properties file under /src/main/resources. As long as this file is on the classpath, Spring Boot will pick it up and make the settings available in our code through @Value. Spring boot also recognizes a special variable called "spring.data.mongodb.uri" and uses it as the connection string to the MongoDB instance. If this variable is undefined, Spring boot would try to connect to MongoDB on localhost. Since I installed MongoLab as a plugin for my Heroku instance, I'll connect to that instance instead.
# look for MONGOLAB_URI config var in Heroku
spring.data.mongodb.uri=mongodb://<username>:<password>@<hostname>:<port>/<database_name>

9. Create the entity classes and controller classes. In this project, I've created a TweetFeederController that accepts API calls to start reading live Twitter feed and send them to any give URL using HTTP POST. By default, you can send the Tweet to the second controller, TweetController and let it persist to the MongoDB. Sample request payloads can be found under /test/resources.
{
 "searchTerm": "${searchTerm}",
 "maxResults": 10,
 "maxTotDuration": 10,
 "sendTo": "http://localhost:${port}/tweet"
}

10. Create test classes. Here I used mostly integration tests. This is another advantage of using Spring Boot. No longer are we limited to MockMVC for testing or dependent on web.xml and a running Tomcat for testing, we can actually perform end-to-end testing completely inside Spring.
 @Test
 public void testPopularTwitterTerm() {
  String json = super.fileResourceToString(tweetFeederFilterTestResource).replace("${port}", port+"")
    .replace("${searchTerm}", "obama");
  testTweetFeeder(json);
 }
 protected void testTweetFeeder(String json){
  HttpHeaders headers = new HttpHeaders();
  headers.setContentType(MediaType.APPLICATION_JSON);
  HttpEntity<String> requestEntity = new HttpEntity<String>(json, headers);
  ResponseEntity<Object> response = template.exchange(tweetFeederUrl, HttpMethod.POST, requestEntity, Object.class);
  assertThat(response.getStatusCode(), equalTo(HttpStatus.OK));
 }

11. Rename /src/main/resources/system.properties.template to system.properties. Update the values with actual MongoDB connection string and Twitter's developer credentials.
# look for MONGOLAB_URI config var in Heroku
spring.data.mongodb.uri=mongodb://<username>:<password>@<hostname>:<port>/<database_name>

# twitter api credentials
twitter.api.consumerKey=<your_twitter_api_consumerKey>
twitter.api.consuerSecret=<your_twitter_api_consumerSecret>
twitter.api.token=<your_twiter_api_token>
twitter.api.secret=<your_twitter_api_secret>

12. Push this project to Heroku and watch how it works:
# create a new Heroku instance and repository.
$ heroku create

# push all the local changes to Heroku's Git repository. This will kick of a build and deploy process
$ git push heroku master

# optionally, start a Heroku instance, or two.
$ heroku ps:scale web=1

# open the home page of the application
$ heroku open

# watch Heroku's log, live
$ heroku logs --tail
The reason why I chose to consume live Twitter feed is because it's a good load testing substitute for this application as well as the MongoLab plugin for Heroku. The Twitter sample streaming data comes in at approximately 8 requests per second and each request is about 5KB. So it won't take long to build up a database of the size of hundreds of megabytes in a very short time.
Much to my surprise, even with just 1 instance running, it was more than capable of handling the live streaming sample data from Twitter.



















Tuesday, 14 October 2014

My take on "Don't Let Hibernate Steal Your Identity"

Some of my favorite interview questions for a Java developer centered around equals() and hashCode(). I usually start by asking:

What are the purposes of equals() and hashCode()?

Every Java programmer MUST know the answer, it's one of the most fundamental features of the language. Surprisingly, lots of people I interviewed couldn't give a straight answer.

Here's a couple of myths about equals() and hashCode():

1. "Equals is used to compare if two object are equal". Not quite right. Equals compares if two objects are equal within a predefined set of business rules. Imagine if I forge my license plate to match yours and was caught speeding on camera. Chances are, you're the one who's gonna get a ticket in mail because in the eyes of the infringement system, the two cars are equal even though to everybody else they look totally different.

2. "hashcode() is used by equals()". I don't know why people would say this, I guess it's because they didn't realize hashCode() does NOT return a unique number.

To follow on, I'd ask:

Given the following entity class, how would you implement equals() and hashCode()?
@Entity
public class Customer {
    @Id @GeneratedValue
    protected Long id;

    protected String firstName;
    protected String lastName;
}
A naive programmer would start talking about using different combinations of id, firstName and lastName to implement both methods. That's because they have not read O'Reilly article: Don't Let Hibernate Steal Your Identity or JBoss' own article on this topic.

Sadly, many developers I've met don't see what the fuss is about. One even argued the O'Reilly article is outdated. I guess that's because they haven't encountered production issues such as:
  • An entity goes missing after being inserted in a set, or 
  • Loading the same entity from 2 different EntityManagers returns false when equals() is called, or 
  • The same entity returns two different hash code before and after being persisted.

Worse still, many developers know about the danger but still resist the notion of adding a business key thinking it's too long and too slow to index, and start exposing primary keys to external systems.

Bad idea.

Once your primary keys are exposed to an external system, it becomes very difficult to move these data to a different schema because ID may clash. Also, a primary key is just a number, it's hard to differentiate a customer with id=1 from an administrator with id=1. It's very error prone.  

This is where I think we need to improve on O'Reilly's original suggestion of using a UUID in order to make business keys more "palatable".

A UUID is great for machine, but terrible for human consumption and it's not sequential. For an impatient user, the last thing he/she wants to do is to repeat a random 36-character long ID in order to do anything.

Fortunately, many people have thought about the same problem and came up with viable alternative solutions. One of my favorites is Twitter's Snowflake because it's not tied to a database and it generates (almost) sequential ID's, making them easy to sort and to define a range.

In addition to a unique number, we should also always prefix it with an appropriate abbreviation such as "CUST00401" and "ITEM00822" to clearly identify what the context of this unique ID is.

A business key like this is also extremely useful for logging because instead of looking for a straight numeric value or a meaningless UUID, you can easily grep for a unique business key in the logs.

But don't go too far though. Don't ditch your numeric primary keys and replace them with business keys because JPA and RDBMS still need numeric primary keys for entity comparison and indexing. Comparing numeric keys is always faster than comparing text-based keys.




Friday, 9 August 2013

Landing a job with E3 visa in Silicon Valley

Why Silicon Valley?

It's probably every software engineer's dream to work in Silicon Valley, to rub shoulders with the best in the world and to tackle the most interesting problems on daily basis. Unfortunately, for those born outside of US, that will remain just that: a dream, thanks largely to the stringent 65,000 H1B visas quota every year.

Not so for the Aussies.

Thanks to the FTA negotiators (and George W Bush) back in the early 2000's, Australia is currently the only country in the world eligible for E3 visa. So to leave one of these 10,500 annual quota unclaimed while still dreaming about Silicon Valley is just not on!

To get an E3 is not difficult. There's literally thousands of blogs and Q&A on how to apply for it. But none of it matters if you cannot get a job offer in the first place. Job application is always the most difficult part, yet it's also the part people least blogged about. I guess that's because everyone's situation is different.

So, for whatever it's worth, this is my experience of landing a job in the Silicon Valley as a software engineer. Hopefully, it can help someone out there.

Starting The Job Search


There are many ways to work in Silicon Valley: through a friend or network, get transferred from Australian office, meet recruiters who fly out to Australia, create a start-up etc. Congrats if any of these avenues is available to you and you're happy with the position and compensation.

For me, none of the above was effective, so I resorted to the old fashion way: job sites.

In my opinion, it's the best option because there's more selection, more flexibility and better match for my skill set. In Silicon Valley, small is beautiful, and job sites are often the only way to introduce yourself to them.

Moreover, I planned from the outset to chase as many offers as I can in order to leverage my negotiation power. I let the recruiters know about it and they encouraged it for obvious reasons.

Preparing Resume


The first thing I did was to get my resume ready. It's no different from preparing a job search in the Aussie market, with one exception: phone number.

I read it somewhere: "Never leave an Australian number in the resume" and it turned out to be one of the best advice. I discovered later recruiters simply don't dial international numbers, it's part of their job training!

So, I apply for a US number through Skype Number service. You can pick whatever area code you like. I picked (408) because it's a well known area code in the south bay; basically the area I wanted to work in.

I left my Australian address in the resume to let the recruiters know I'm a foreign worker looking for a visa. I didn't leave age, marital status or ethnicity because, by law, companies in US must adhere to the equal opportunity hiring practice.

As for availability date, put down the actual date which you'll be available, never put down anything like "Depend on E3 application process", it'll confuse the hell out of the recruiters.

Job Search


Once my resume's ready I proceeded by surfing the major job sites e.g. linkedin, indeed, and cybercoders and look for both permanent and contract positions. At this stage, I really don't care how stable/unstable the position is, I just want to get my foot in the door.

One thing worth mentioning is that searching with keywords like E3 or H1B will almost never yield any result. If anything, you'll only find ads that say H1B is not welcomed.

It took me a while but I was eventually able to view myself as a local candidate. After all, E3 is a subclass of the E visas (the Employment visas), so we're really not that different from someone holding an EAD. So whenever I'm challenged with the question: "Are you allowed to work in US", I have no problem to check "Yes" because I know E3 application is really no hurdle.

I've also read it somewhere it's a good idea to travel to US on VWP for 3 months and do face-to-face interviews. While this may be true for other industries, it's definitely not for IT. In fact, companies ubiquitously carry out one or two phone / Skype interviews and online coding exams before ever inviting someone to the office. At this stage, they couldn't care less if you're living next door or half a world away, they just want to know how good/bad you are. I certainly saved a lot of money by staying in Australia.

Talking to Recruiters


Be prepared to hear nothing back for 90% of the jobs you applied for. At least that's what happened to me. Most likely because they don't want to hire a foreign worker, but I can never tell. For those that do got back, it's almost always the agents who called.

The initial calls always come in the middle of the night, so I used the Skype messaging service to tell the caller I'm unable to pick up the phone and I'll call back, translated to English: "I'm in the middle of my sleep, in a time zone 18 hours ahead of you, and you are talking to my Skype account". When I call back, I always let them know their 2pm is my 7am and they'll quickly adapt to the call pattern.

Most recruiters aren't familiar with E3, so you'll have to explain it to them. The key points to mention is it doesn't apply restriction on who you can work for and there's virtually no quota restriction.

Keeping a good relation with your recruiter is paramount. These guys can get you to interviews you'd otherwise have no shot at. I was fortunate enough to impress one of the larger recruiters, Jobspring, and they just send me to one interview after another. They even passed on my resume to their SF office (80 km away) and ask for more help. It was as if I've struck a gold mine. Can't say enough good things about these guys. Look them up when you get a chance.

Phone Screening / Online Coding Exam


This is the where rubber meets the road.

In Australia, most interviewers ask about work experience on certain programming language, framework or product and a few questions on the fundamentals and design (anti-)patterns. Not so in Silicon Valley. They love to dive right into the theories e.g. how would you implement a LRU cache?

I had to spend a good couple of months on my books from Uni just to get re-acquainted with all the data structures, algorithms, big O notation and concurrent computing theories. I learnt or re-learnt everything from BST traversal, RPN, Shunting Yard, sorting algorithms, collections I never knew existed to some of the programming fundamentals I've overlooked in the past. I visited CareerCup and worked through many of the interview questions. In the end, it was time well spent. I strongly recommend it to everyone, even if interview is not your favourite activity.

Beware the online coding exercise is not for the fainthearted. Not only are you pressured to get the right answer, you're also pressed by time. A couple of times I was able to come up with the right answers but was never invited back for a second round simply because I wasn't fast enough. Chances are, I would've never enjoyed working in those companies anyway.

Test yourself and see if you can write a bug-free in-order BST traversal iteratively in 15 minutes.

Beyond the coding questions, you've got to know your stuff well to impress the interviewers. I went through at least 20 first round interviews and 10 second rounds (each lasting at least 1 hour) in the span of 4 weeks and as time wore on, I got sharper and sharper. I can answer the follow-on questions before they even had a chance to follow on.

I can safely say, I've done more interviews in one month than many people have done in a lifetime. Not sure if it's applicable to everyone, but it's definitely something to prepare for.

"Why Do You Want to Work in US?"


By far the most common question in interviews, and in some ways, a difficult question to answer.

Imagine someone asks you "why do you want (or not want) to get married?". You can either convince him with a few words or argue about it for hours. The bottom line is: it's a very personal question, there's no "right" answer. It's the same with this question.

At the beginning, I'd give a long-winded answer like "I want to work with the best", "Design large scale system", "Work on the most challenging problems" ... blah blah blah. Only to invite questions like "What do you mean?" and "How do you know this is what you'll be doing?"

Over time, my answer got shorter and shorter until it's just a one-liner:

"I've been working in Australia for xx years and I feel like it's time for a change of scenery"

In a few cases, that's all I needed to say. If the interviewers wanted to know more, I'd give them a few more lines and wait for them to respond. The more they asked, the quicker they move on to other questions.

Still, one interviewer managed to spend 10 minutes on this question. Needless to say, we had a mutual dislike towards each other and the interview ended pretty quickly thereafter.

Face-to-face Interview


By the tail end of the month-long phone interview process, I was able to distill my leads down to about 6 to 7 companies. And that's when I booked my flights and hotels. I asked my recruiter to book 2 interviews a day, one in the morning and one in the afternoon over 4 days.

It turned out to be a really bad idea.

Each final interview was at least 5 hours long! I thought, alright, the next one can't be 5 hours, and it went for 6! I lost count the number of times my stomach rumbled for the lack of food. It was quite a culture shock.

Still, I can't complain. Spending long hours in the interview rooms can only mean one thing: I'm still wanted. Sure enough, offers came on the third day and I knew it was worth it; I wasn't going home empty-handed.

The interviews were pretty much the same as phone interviews, except the interviewers come one after another, each talking for about an hour. Since the start-ups usually hire nothing but engineers, everyone asked technical questions, only some behaviour questions and they're usually easy to answer.

I didn't get anything out of the left field like "How do you estimate the number of people currently logged on Facebook?" I'm sure even Google have stopped asking these silly questions, so there's no need to prepare for them.

Negotiations


I was lucky enough to get a few offers and talked up my price a little bit. I ended up with a figure exceeding my initial expectation, but I didn't stop there.

I threw in questions on health covers, annual leaves, relocation costs and most important of all, Green Card. At this stage, I was in the driving seat. E3 was a foregone conclusion. Companies were more than happy to shell out $5,000+ to get it done; they've spent far more on H1B's in the past, five grands is like pocket change to them.

Obviously, the promise of GC will depend on my performance in the next few years, but it's comforting to know there's the possibility.

Anyway, that's pretty much it. I hope my experience can offer help for others out there. Look forward to hearing about your stories.