January 2012
6 posts
2 tags
Jan 8th
1 tag
Jan 8th
1 note
1 tag
Get the size of a Postgresql table →
From time to time, you need to figure out how much data is in a table. This bit of SQL, specific to Postgres, gets you the size of the one that you specify.  SELECT pg_size_pretty(pg_relation_size('your_table'));
Jan 5th
6 notes
3 tags
Configuring Apache Tika's HtmlParser
So in my previous post about Apache Tika, I showed off a small Hello World program that demonstrated how you can quickly use it to parse HTML files. One of the first issues you will probably encounter using Tika though is that its HtmlParser does not immediately handle all tags. For example, the code tag is not recognized. To deal with that, you need to create a custom HtmlMapper. In the code...
Jan 5th
4 notes
3 tags
Parsing HTML with Apache Tika
Every now and then, I have to parse some HTML files. There are a lot of ways you can go about doing that. Recently, I have started using Apache Tika and it does a pretty reasonable job (i.e. better than what I have done before). There is not a lot of documentation on Tika so I had to do a bit of hacking to get my head around it. A good start is this quick Hello World Tika program I put together....
Jan 3rd
2 tags
Characteristics of slow SQL Queries →
I actually forgot I posted this answer on Stack Overflow until today when my cousin complimented me on it :-)
Jan 2nd
21 notes