Monday, July 25, 2011

Google Docs API International Character Support

If you haven't seen my NoteSync app yet you should check it out: NoteSync. It's a very useful app in my opinion. It lets you take notes on your Android device and automatically synchronizes them with Google Docs. My users find it extremely handy; however, when I launched it I had some upset international users who found that my app did not preserve the special characters in their language. When they would sync with Google Docs they would get the infamous "?" block character in their browser where a special character should have been. This was a huge problem and was surprisingly difficult for me to solve.

I knew the problem had to do with encoding. Google Docs API uses UTF-8. I also knew that it had to do with my upload process since the characters were preserved perfectly downstream. I simply tried encoding the content String like so:


private String utf8Encode(String t) {
try {
byte[] b = t.getBytes("UTF8");
return new String(b, "UTF8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return t;
}

But that had no effect. I took the problem to a couple friends of mine, a Googler, and a professor and mentor of mine from school and I learned that while my character encoding method should work, I was concatenating the UTF-8 string with other regular Java Strings which are UTF-16 by default. After much toiling I finally found the fix and it was surprisingly simple. To wrap my request I'm using the org.apache.http.entity.StringEntity which is then given to my HttpPut object and eventually executed. I discovered that StringEntity has a second constructor with a "charset" parameter. The fix was as easy as putting "UTF8" as the second parameter.

StringEntity docEntity = new StringEntity(data, "UTF8");

That's it. One liner fixes are OK as long as you don't spend days searching for the answer. Hopefully this is of help to someone else with the same issue.