Here is a quite funny picture I found today on digg. It is called the Best Phishing Email, Ever. Despite the funny nature of this hilarious letter, have someone really noticed that in this letter, we have G mail or Gma il instead of Gmail? (one extra blank in between the letter “G” and “m”, or “a” and “i”).

gmail_scam.jpg

Last year when I was interviewed in Google, I reported this bug to Gmail team, (Sadly to say, they didn’t take it very serious, as it’s still there now). Maybe they can argue that it’s a feature, but let me take five minutes to explain why is it. (I guess it’s fairly simple and straightforward).

Let’s start from a little background about HTML. We all know that when we send colorful texts like “Google” via gmail, the email format is actually HTML. Here is a quite important aspect about HTML: An HTML user agent should treat end of line in any of its variations as a word space in all contexts except preformatted text.(Page 20, RFC1866) That is to say, a cstring in HTML source file like “a\nb\nc” will have the redering output identical to “a b c”. Therefore, it’s really confusing that we have to use
to make a newline within a paragraph in HTML text, and “\n” is equivalent to a white space in most of the cases.

Now let’s go back to Gmail. When I send myself a piece of email with a colorful string “Google” via Gmail, I got “Googl e“. Via viewing the source of the HTML, we can actually find that there is a “\n” in between those letters. For example, this is a piece of HTML(javascript) excerpted from Gmail source related to this colorful Google:


\u003cfont color\u003d\”#000099\”\>G\u003c/font\>\u003cfont color\u003d\”#ff0000\”\>o\u003c/font\>\u003cfont style\u003d\”background-color:#ffffff\” color\u003d\”#ffcc00\”\>o\u003c/font\>\u003cfont color\u003d\”#3333ff\”\>g\u003c/font\>\u003cfont color\u003d\”#33cc00\”\>l\u003c/font\>\u003cfont color\u003d\”#ff0000\”\>e\u003c/font\>\n \u003c/div\>\n\u003cdiv\> \u003c/div\>\n\u003cdiv\>

Here \u003c is “<“, \u003d is “=”, without the bold “\n” in this line, the result should be “Google“. So, why we have an extra “\n” here? Who did this trick? The answer is simple: “Gmail”. For some reason, Gmail breaks a long line in HTML source file into multiple lines and sends the email out (I haven’t figure out the rule that Google uses to break lines in HTML source file). By doing several trival experiments like sending mail from Gmail to Hotmail and vice versa, I am now pretty sure the problem is caused by Gmail automaitc line breaking strategy. That is to say, Gmail client automatically inserts a newline(“\n”) symbol in the HTML source file and causes this “visual bug”. Actually this bug is quite easy to fix, for instance, just break the line at the first blank after the label name, for example, like:

<span

style=”color: rgb(255, 0, 0)”>red</span><span

style=”color: rgb(0, 255, 0″>green</span>

instead of say

<span style=”color: rgb(255, 0, 0)”>red</span>

<span style=”color: rgb(0, 255, 0″>green</span>

or

<span style=”color: rgb(255, 0, 0)”>red</span><span style=”color: rgb(0, 255, 0″>

green</span>

The first generates “redgreen“, and last two give “red green

BTW, here is a nice tip for interviewees: love your prospective employer, love their products. Eventually, you would have a very nice understanding about their culture and products. All companies are willing to hire guys who actually love their culture and products (and can even find bugs :).

PS: in preparing this article, I found that Gmail team has secretly updated the text format system from using plain old to fancy (and elegant) XHTML+CSS .

PS2: http://www.opinionatedgeek.com/dotnet/tools/Base64Decode/Default.aspx is a nice online tool for decoding the base64 format.