January 2012
6 posts
2 tags
Jan 8th
1 tag
Jan 8th
1 note
1 tag
Get the size of a Postgresql table →
From time to time, you need to figure out how much data is in a table. This bit of SQL, specific to Postgres, gets you the size of the one that you specify.  SELECT pg_size_pretty(pg_relation_size('your_table'));
Jan 5th
6 notes
3 tags
Configuring Apache Tika's HtmlParser
So in my previous post about Apache Tika, I showed off a small Hello World program that demonstrated how you can quickly use it to parse HTML files. One of the first issues you will probably encounter using Tika though is that its HtmlParser does not immediately handle all tags. For example, the code tag is not recognized. To deal with that, you need to create a custom HtmlMapper. In the code...
Jan 5th
4 notes
3 tags
Parsing HTML with Apache Tika
Every now and then, I have to parse some HTML files. There are a lot of ways you can go about doing that. Recently, I have started using Apache Tika and it does a pretty reasonable job (i.e. better than what I have done before). There is not a lot of documentation on Tika so I had to do a bit of hacking to get my head around it. A good start is this quick Hello World Tika program I put together....
Jan 3rd
2 tags
Characteristics of slow SQL Queries →
I actually forgot I posted this answer on Stack Overflow until today when my cousin complimented me on it :-)
Jan 2nd
21 notes
December 2011
1 post
3 tags
So... new city, new website... again!
Hey all, I have changed cities and websites again! A bit of a coincidence that I have done both at the same time again. I have migrated off of Drupal and onto Tumblr. Drupal has been good to me but I just do not have time to maintain it on my server anymore especially since I only had a single Website running on it. I really like Tumblr as a platform and it is super convenient so that is why I am...
Dec 31st
4 notes
September 2011
1 post
2 tags
Creating and Managing Threads in Java
Threading in the early days of Java use to be a pain. Since Java 1.5 and the introduction of the ExecutorService, it is much easier to start up and manage them. Code that you want executed in a separate thread must be encapsulated in a class that implements Runnable. An instance of that class can then be passed to an ExecutorService that will handle its execution. Below is a trivial example but it...
Sep 14th
August 2011
3 posts
3 tags
Java SFTP
Believe it or not, there are very few options currently when it comes to implementing SFTP in Java. I believe that the Apache Mina project will eventually provide support for it (http://mina.apache.org/sshd/index.html) however, the project is still at version 0.5 and I have not found any good documentation for it. jCraft’s JSch package is the most straightforward library that I have used so...
Aug 29th
4 notes
2 tags
Formatting JSON from command line
So when working on REST services that return JSON, I often hit them from command line using curl. If the JSON message that is returned is rather large, it can be a pain to read. Python provides a nice little tool though for formatting it. You can pipe a JSON message returned from a curl to it like so:curl http://my.rest.service.org | python -mjson.tool | less
Aug 26th
2 notes
3 tags
Using Dependency Injection To Incorporate A/B...
I posted this article to my company’s tech blog a few weeks back. I am reposting it here because… well… I wrote it :-)  http://tech.gilt.com/post/8391205906/using-dependency-injection-to-incorporate-a-b-testing If you work at an e-commerce company, chances are you’ve probably come across the term “A/B testing”. We all know that it has something to do with testing out new...
Aug 20th
July 2011
2 posts
2 tags
Now in Java 1.7 - Support for Strings in switch...
With the release of Java 1.7, some cool new features have been added to the language. One of them is the support for Strings in switch statements (finally eh!). Prior to 1.7, the cases labels could be a byte, short, char, or int and their corresponding wrapper classes. Support for enum types is implied and have made it possible to provide more meaningful representations for primitive data types...
Jul 31st
3 tags
Product Recommendations at Gilt
A while back, I posted an article on the Gilt Groupe tech blog. It is about the recommendation engine I developed. I am pretty proud of it as it is based on some of my PhD work. Below is a re-blog. The original article with images can be found here. Product recommendations at Gilt work a little differently than they do at other companies.  For example, at Amazon they enjoy the benefit of...
Jul 28th
39 notes
April 2011
1 post
3 tags
Woo hoo - my first Mahout contributions
So ya, it has been crazy busy at work and I have been neglecting my blog :-/ Tragic I know for all of the 3 people that actually check it (me, myself, and I). Anyhow, I thought I would throw up a quick post here and hopefully have a big one over the weekend. I made my first contributions to Apache Mahout this week. I have been using it at work to do our text mining and there were a couple...
Apr 22nd
1 note
February 2011
3 posts
3 tags
Setting up postfix to relay through your gmail...
So this is kind of an odd topic. Why would you want to setup postfix to relay through your gmail. Well if you are using Verizon as an ISP for your home internet, you will find that they do not allow you send email from a locally running SMTP server, like postfix on your linux box or mac. They just block it. It is probably to prevent spammers however, if you need to write some email code, you will...
Feb 12th
9 notes
2 tags
Using the ClassLoader to access files in your...
Here is a quick tid-bit about how to access files in Java that are in your classpath. For example, suppose you have a text file that you want to parse and it is packaged in your jar. You can access it through the ClassLoader using either the method getResource(), that returns an instance of URL, or getResourceAsStream, that returns an instance of InputStream. Below is a simple coding example: ...
Feb 5th
3 tags
JDBC fetch size and Postgresql
Every now and then, you need to pull a massive amount of data from a database, more than can fit into memory reasonably. To this end, you can set the fetch size for your statement so that the database driver will pull back more manageable chunks of data. For example, if you set the fetch size to 100, the driver should pull back 100 row chunks. A new chunk is pulled when needed as you iterator over...
Feb 3rd
January 2011
2 posts
2 tags
How to setup SSL on Apache
Setting up SSL on your Apache server is a pretty good idea even if you are only just hosting your own website with a CMS like drupal. With SSL enabled, you can now securely login, make updates, and post blog entries like me :-). Here is what you have to do: Step 1. Generate an SSL certificate. All you really need is a self signed certificate unless of course you are doing something for work....
Jan 30th
9 notes
3 tags
External Spring Config Files
Most of the time when you are creating a Spring app, you end up packaging the XML config files with the war/jar. Sometimes though, it is quite beneficial to have a configuration file external to your built package. That allows you to configure your Spring app without having to rebuild or redeploy it; trust me, your system engineers/admins will love you for that. Using an external spring config is...
Jan 30th
2 notes
5 tags
Programatically logging a user out in Spring...
So I use Spring Security to handle user authentication in most of my Web applications. Every now and then, you need to log a user out programmatically. For example, users perform some sort of operation that redirects to a success page and logs them out. Logging a user out is quite simple. You need use the logout method for the relevant LogoutHandlers in your application. You are always going to...
Jan 1st
December 2010
13 posts
3 tags
Chaining SSH Tunnels
So in an early post I described how to create an SSH tunnel. That is fine if you only need to connect to a server by going through a single bastion. In this post, I am going to provide an example for how to connect by going through multiple servers; in other words, how to chain SSH tunnels. Suppose you want to connect to server E but in order to do so, you have to be on server D that you can only...
Dec 27th
4 tags
Rotating log files in log4j
In Java, it is pretty standard to use log4j to handle your logging. In a production environment, you rarely want to all your log data to be kept in a single file. If you did, eventually that file would become very large and difficult to use especially when you are looking for specific error messages.  A good practice is to employ rotating log files. Old log data gets moved into other files and...
Dec 27th
7 notes
5 tags
Adding authentication to an Apache CXF Web Service...
In my last post, I provided steps for creating an Apache CXF Web Service Client. Now that client only works with Web Services that do not require you to authenticate. In this post, I will provide an example of how add authentication. Step 1. Create a CXF Web Service client Step 2. Create a CallbackHandler. The code below is actually from a previous post pertaining to authenticating an Axis 1.4...
Dec 23rd
1 note
4 tags
Creating an Apache CXF Web Service Client
So in early blog posts, I discussed how to make an Axis 1.4 client and secure it. Axis 1.4, however, has been end of lifed; no more work is being done on it. Now, if you are thinking about using Axis 2 to make a client in order to stick with a package that is actively being maintained, I would strongly advise against it. Axis 2 is really designed to be used on a Java application server and it is...
Dec 23rd
3 notes
5 tags
Axis 1.4 Web Service Client Side authentication
So in a previous post, I gave an example of how to setup an Axis 1.4 Web Service client. That is fine if you do not have to login to use the service. Often times however, you do have to authenticate your client especially if your work has paid for the third party Web Service that you want to use. Here you want to use WSS4J. Below are the steps to get yourself setup: Step 1. Download WSS4J Step...
Dec 20th
3 tags
Tomcat 5.5/6.0 SSL Setup
If you want to serve pages from Tomcat over https, you are going to have to setup SSL in Tomcat. This is important if you are going to perform user authentication or serve sensitive data.  Here are the steps you need to do to get SSL up and running on your Tomcat instance. Steps 1: Create an certificate keystore. In lieu of buying a certificate, you can create a self signed one. You...
Dec 18th
26 notes
2 tags
Logging uncaught exceptions in a Java application
Logging exceptions is pretty important as it allows you to troubleshoot problems that come up with your production applications. If you have a regular Java application, there is a trick to logging uncaught exceptions. Those are the exceptions that do not require you to put your code in a try/catch block. To log these exceptions, you need to set a default uncaught exception handler. The example...
Dec 18th
4 tags
Using the Spring Expression Language with static...
One of the things that I love about Spring 3 is the Spring Expression Language (SPEL). At first glance, SPEL does not appear to give you that much; with it, you can execute Java expressions in your Spring config files. This capability, however, is awesome when have properties that take values defined by static variables. For example, how days of the week are defined in the Calendar class: ...
Dec 13th
14 notes
2 tags
Java case insensitive String replaceAll
Here’s a quick handy tip for replacing all occurrences of a substring regardless of case in Java. Put “(?i)” at the beginning of your substring. For example: String a = "hello HELLO hEllO"; String b = a.replaceAll("(?i)hello", "bye"); System.out.println(a); System.out.println(b); The above code will output: hello HELLO hEllO bye bye bye
Dec 12th
3 tags
Load and Unload Mac daemons
So I recently upgrade to Postgresql 9.0 from 8.4 on my Mac. I realized after installing that I did not remember how to stop my old Postgresql instance from startup on boot. Basically, both versions were trying to run on the same port whenever I booted up. I did not want to uninstall my old version just yet so I needed to stop it from starting up. You can see the list of daemons on your Mac that...
Dec 12th
1 note
4 tags
How to install a jar locally in Maven
This topic is probably trivial to Maven veterans however, to people getting started, it is pretty important. As you probably already know if you got here from Googling, Maven is a neat build tool for Java that goes out and downloads all your jar dependencies for you. For example, if you need to use the apache commons lang package, you specify the dependency in your pom.xml Maven will download it...
Dec 11th
20 notes
5 tags
How to build an Axis 1.4 XML-RPC based Web Service...
So Web Services have been around for a while now and one would think that the popular packages out there would support most any service. Well unfortunately, that is a bad assumption to make. That is because of how long Web Services have existed. Some earlier versions employed a variation of the SOAP message that we use today called XML-RPC. Very few contemporary Web Service packages support...
Dec 10th
4 notes
November 2010
3 posts
3 tags
SSH Tunnels - 2 ways
Recently, I have been googling how to make tunnels so I thought I would post what I do. A SSH tunnel allows you to connect to server, A, through server B, from client C. You generally only want to setup a tunnel when you need to connect to server A but only have access to server B from your client, your laptop in the diagram above. Usually server A is in a protected network and server B is a...
Nov 28th
10 notes
6 tags
Building Self Contained Executable Jars - 2 ways
Building an executable jar file is generally pretty simple if you do not want to package any library jars files with in it. You simply have to insert a Main-Class entry into the manifest file that specifies which class has the main method that you want to execute. That should be sufficient for small Java applications where you do not need to use a jarred library however once you need to use one,...
Nov 25th
42 notes
1 tag
Let's try blogging again
So since I set this Website up a couple years ago, I haven’t really used it. I’ve decided it is now time to try to blog some of the useful coding tid bits I run into from time to time. Generally I have found blogging to be a royal pain in the butt but I am getting tired of googling for the same stuff or even worse, telling people to google for the same stuff. Hopefully what I blog...
Nov 25th
January 2009
1 post
2 tags
A Best Paper Nomination
So I have been busy. Work has been awesome and I will have some interesting updates on different things I have been working on later. I also just got back from my first vacation to Vancouver… well first vacation since I started work. When I got home at the beginning of this month, I was pleasantly surprised to find that my HICSS-42 publication was nominated for a best paper award. It...
Jan 13th
2 notes
October 2008
1 post
3 tags
The PhD is done!
Well, that title is slightly deceiving. I am almost done. I successfully defended my thesis earlier this month and pasted in my revisions. I am just waiting to get my committee’s approval before I can official declare that I am done. So yes, that means I can be referred to as Dr. Chris. I am not too sure about using the prefix but I am definitely glad to be done. Any interesting aside; so...
Oct 25th
14 notes
September 2008
2 posts
3 tags
Retrieval of Single Wikipedia Articles While...
Well, it finally happened. I published a paper on my PhD thesis work. It got accepted at HICSS-42 in the Digital Media: Content and Communication track under the Information Access and Retrieval: The Web, Users, and HCI mini-track. The abstract is below. When reading online, users sometimes need auxiliary information to complement or fill in their own background knowledge in order to better...
Sep 5th
71 notes
3 tags
Task Effects on Interactive Search: The Query...
I recently published a paper with Dr. Toms’ CMI lab about our search research that we did for INEX 2007. It is entitled “Task Effects on Interactive Search: The Query Factor” Site. Coincidently, it is in the proceedings for INEX 2007. Essentially is it about the experimental Wikipedia search system that we have been developing over the past year and looking at how users behave...
Sep 1st
54 notes
August 2008
3 posts
2 tags
New City, New Framework
Hey all, I have just relocated to New York with my wife. I got what looks to be a cool job here and she will be continuing on with her school. It’s been a busy month with moving and settling in but we are getting there. In my down town, I have started getting back into my usual hackery. I have been reading Craig Walls’ book “Spring in Action”. It is a pretty good book...
Aug 30th
10 notes
1 tag
Site official moved over to Drupal
Ok, I have finally moved over all the content from my old Plone site to Drupal. I have to say that Drupal is a little different from Plone in terms of managing a Website. I think that the Plone approach to creating and editing pages and workflows is more intuitive than Drupal. Drupal however is much faster, uses less resources, and far easier to skin. Overall, I am happy with the move to Drupal....
Aug 4th
1 tag
Good Bye Plone, Hello Drupal
Hey all, PhD is all but done now so I decided to rebuild my Website. It needed something new and I was getting tired of Plone. Anyhow, I am moving everything over to the new site in the next few days. And yes… I have a blog now.
Aug 2nd