Posts archived in English_Essays

百度

日本のサイト管理者様へのメッセージ:

日本の皆様、初めまして。百度株式会社代表取締役の陳海騰です。
百度は中国の最大の検索エンジンです(ネットユーザの約7割)。
海外初の進出となった日本市場へは2006年の12月に本格的に参入致したばかりです。
さて、弊社はまもなく日本語版検索サービスを開始致しますが、それにあたり現在日本語サイトのwebページの収集Spider(クローラ)を実施しております。
これにより御社のサイトに過剰なアクセスが発生したため、管理者の方には多大なるご心配をおかけ致しました。
百度株式会社の代表としてこの度Baiduspiderが御社のサイトにご迷惑をおかけしましたことに関して心よりお詫びを申し上げます。
今後はこの問題について会社全体が真摯に受け止め、日本のインターネット業界におけるルールに従い、このようなことが二度とないように努めていく所存でございます。
弊社はこれまで現れた問題について以下のように対処致しました:

  1. 各サイトへのクローラー負荷を下げております。Baiduspiderの最大クローラー頻度を9回/秒から1回/3秒までに下げました。以前のクローラー頻度の1/27になりました。
  2. 各サイトの規模とIP負荷に対し、それぞれ対応できるクローラー対策を設定し、中小のサイトに対し、クローラー頻度は20秒/回以内にコントロールしています。
  3. サイトに対し圧縮クローラー機能を追加したことにより、同じ負荷においてサイトへのアクセス量をもともとの1/3に下げました。
  4. 各サイトに対し、毎日のクローラー総量をコントロールしました。仮にサイトの最大制限を超えた場合、当日に調整致します。
  5. 各サイト管理者様におきましてはBaiduspiderに対し、何かご質問がございましたら、たいへんお手数ですが、webmaster-jp@baidu.com までにご連絡して頂きたいと思います。

百度はこの様な努力を継続し、御社のページビューにも役に立つような最適なネットワーク構築や、より一層の協力関係を築けるよう、日々全力をあげて参ります。
そしてユーザーがより良い体験をできるよう、弊社のリソースを日本のユーザーの皆様にご提供できれば幸いでございます。
万一今後も弊社の情報収集Spiderが御社にご迷惑をおかけすることあった場合、お手数ですが、webmaster-jp@baidu.com までにご連絡いただくか、直接百度株式会社の代表陳海騰 < htchen@baidu.com> 宛てまでご連絡頂ければと思います。
迅速的に対応させていただきますので、何卒よろしくお願い申し上げます。


敬具

Cited from www.baidu.jp

It SEEMS that (according to Google Translate) Baidu’s spider consumes quite huge amount of bandwidth of the websites in Japan and therefore this is a statement/correspondence about the spider policy. Well, baidu.jp still has a long way to go.

有懂日语的朋友大致解释日文意思么? 我也想了解一下日本的媒体对百度的进入是怎么样的看法. 中日韩市场潜力不可估量的大啊!
Could anyone who speak Japanese simply explain the basic idea of this? I just wonder to now the attitude that the Japanese media holds towards Baidu.

Something interesting:
1. baidu.jp has an IP address of 122.152.128.48, which belongs to Asia Netcom. I’ve looked it up via several ip-geo look-up system. To my surprise, either they do not have a clue or they report Europe, that’s ridiculous.

2. If you use wget -S to crawl baidu.jp, baidu.com and www.baidu.com, you will find something interesting.

Baidu.jp:
Server: Apache/2.2.3 (Unix)
Last-Modified: Tue, 27 Feb 2007 09:10:45 GMT

Baidu.com
Server: Apache/2.0.55 (Unix) PHP/4.3.11
Last-Modified: Fri, 16 Dec 2005 03:33:13 GMT

www.Baidu.com
Date: Fri, 02 Mar 2007 00:04:04 GMT
Server: BWS/1.0


We can now assume that BWS is possibly built on UNIX system like BSD or sth.


If you type <3 in Gmail talk (Note: Not the gtalk client or any jabber client), you will get a gif.
Take a look at this and this.

Well, no wonder! Love is really the thing needs <3 people :)
BTW, this is Google’s LOGO today:

Is it Google or Googe? Well, the romantic version is: g and l are falling in love. Or, you can not spell “girl” without g and l, they are together! (Chinese Version)

The XXX version is, they are 69ing. (Oh dude, I just cite the comments from Digg)

// BTW, The Internet is a series of tubes, not pipes. So enjoy YouTube and no bother Yahoo! Pipe. LOL!

Blogger Isaac Mao posted an open letter to Google Founders, which provided three suggestions. For me, I don’t quite agree with him, so here are my comments. I do believe that Dr. Kai-Fu Lee is the right person for Google China and the current strategy is right. (Come on, I have no relationship with him:) Well, probably in detail, Google China needs some improve/change, but the fundamental strategy is right. You can say no to me, please leave your comments freely.

Dear Larry and Sergey,

I’m writing you the short letter on behalf of many Internet users in China to have some suggestions to resolve the current dilemma for Google in China, from both business and social perspectives.

Google China now is not exactly in a dilemma. When we say dilemma, which means you can not go either way. However, we can see the progress in China. The marketing share decreasing in China is not necessarily the dilemma.

During the National Day holiday week in 2002, when Google.com was blocked in China for the first time, Chinese Google users made an online protest spontaneously. They appealed to free the purer search engine wave by wave. Its seemed its also the first time grassroots power was demonstrated in China on Internet. You can imagine how eager they are to have a complete Internet instead of a shrinked one. At last, people won, Google backed. However, after 4 years, we started to question whether we should continue to support Google. Many users here were disappointed when they found Google.cn filtered many keywords. The compromise remarks by you in Davos made us more frustrated. Seems you are adopting self-censorship which hurts those loyal users a lot which also devalue your motto of “non-evil”.

Here your basic assumption is that GFW is evil, and when Google filter the content himself, it is kind of evil. Let’s put it this way, if you can access google but usually get a connection reset, are you annoyed? Yes, we are professional user, we can bypass the sensitive keywords, we can setup proxy, we can do everything to fight with GFW. But the problem is, what should the common user do? The are expected to get a result, no matter sensitive or not, related to their search. However, sometimes, even their keywords are not sensitive, unfortunately, in the returning result, there is a sensitive content. Boo, they get a connection reset. Who can they blame? They are using Google, right? GFW will not say: “Sorry, your connection is reseted by GFW, please try later or dail XXX for more information”. To guarantee the user experiences, some compromises are needed here in China. I know nearly every blogger in China consider the GFW as evil. However, self-censorship is a down-to-earth strategy to make things work and protect Google itself in China. I think the Google’s philosophy is first make it work, then improve it. It is hard to say this is evil or not. For instance, if you can use Google but usually get the annoying connection reset everyday, what will you do? Will you choose Yahoo! or Baidu? Actually Google.cn is facing the small-business more than the blogger as the small-business will bring google the major income in China. Thus, to make the Google search work in China is much more important than other issues like the content. To assume every user has the technical knowledge and is patient enough to use Google behind the powerful GFW is gratuitous.

Google is ever regarded not only a leading Internet business, but a hope for many people around the world to open their thinking. Many bloggers in China still believes that in their everyday writings. We guess you were misled by incomplete information on how censorship is good to Chinese people. The fact is Google in the 130M-Internet-Users country is losing loyal users with loosing your principles. We understand its tough to anyone to make decisions. But it high time to change it back to the right track. Here we would like to propose 3 ideas to Google for its China strategy in a long term run, to survive, and live better:

The question is, who is the loyal users for Google now? Let me put it this way, do you really think that the Blogger in China will contribute more to Google China than the common user in terms of the income or searching market share? Do you really think the small business will not pay for Google only because Google self-censored the content without the overall quality of related AdWords?

1. Set up a 1B US$ corporate venture fund to invest in China’s Internet pioneer sites and cutting edge companies. The venture fund can be managed by experienced fund managers and industry gurus who really understand the value of Google, as well market potential of China. In my estimation, a venture fund with such a size can invest over 100 deals totally cover 60% of Internet traffic in China. With venture fund strategy, Google can play its manageable chaotic game in a capital way.

This idea is really bad. If Google really wants to setup up a VC, the best place for this fund is Silicon Valley instead of China now and in the near future. The main problem for Google China is the market share in the searching market instead of the whole Internet market. To invest the Internet company in China, which is actually invest the accessing point of Internet and the content producer or communities in China, Google will maintain a very long product line. The things is: you can not solve the dilemma in China in a capital way, Google China needs no money from the capital market. If this fund is for obtaining the market share or communities in China, the best way to manage this fund is Google China team themselves. We always emphasis the concentration of a company when they make the decision, which is also true in China. If Google want to play the game in a capital way, OK, please just suggest them to move the whole China technical team to Seattle or MV to develop other products or do the localization, and convert Google China to another Sequoia. VC can earn money, but is it really what Google China want? I don’t think so. The marketing share in search is quite hard to gain simply via investing. Google China now has 200- technical members and probably about 200 marketing/hr employees, which is rather a small team. You can image that for them, the localization is quite a heavy task, not to mention the product development for China market. Of course this small team can not take charge with the management of the fund. However, without their feedback, how can the VC choose companies to comply with the whole strategy of Google China. In one word, VC is good, but not helpful to solve the dilemma for Google China now if you say it is a dilemma. This strategy is in fact not a strategy. If this is a strategy for Google, it is also of Microsoft, for Oracle, for IBM, even for Citigroup, for American Express, for every Top 500 companies that wants to gain more money/market share in China.

2. Develop anti-censorship tools and service for global Internet users. In China as well some other coutries, censorship is still a tradition in culture. We are accustomed to control or to be controlled(It’s true!). But it’s too far from modern humanity and universal value. It won’t target China only, instead its a global issue to be solved. So it won’t cause Google’s operation in China into trouble. The budget to complete the mission will be not more than several millions dollars.

Good willing, but not easy to implement. I don’t know if you have heard about Tor. Tor is a tool to protect your content from GFW. However, to develop the product officically and distributed it openly is under a very high legal risk. For example, if we have this tool, can the gov-er-ment say: It is not legal to use this, as it is not legal to use a GPS speed camera detector? Of course the gov-er-ment can ban this tool like banning the speed camera detector, even these two things are inherently different (one is dangerous, one is for freedom). Actually, GFW is not the Google’s source of trouble in China. If the only issue is GFW, how can the smartest people at Google not come up with this idea? As you know, it is hard for google to obtain a permit to collect the news in China, that’s why Google News in China is called Google Information. It somehow reflect the key issue: the gov is the key constraint for Google China. However, how can Google blame/fight with the gov apparently in China if it wants to start the business? The only way to solve the dilemma is the combination of time and public relationship, which is in fact not a technical problem.

3. Increase the incentive to Chinese Google Adsense users. This can dramatically encourage more Internet users to participate Google’s business ecosystem. It’s a pure business strategy to increase loyalty as same important to Google’s products in China. Anyway the tactic should be deployed with better localized customer service to respect to individual users and protect their less to hundred-dollar income.

True, but unachievable. It is in fact the unfair competition and will essentially impair the whole searching market in China, which is also not good for Google.

Google is not alone. There are still several millions Google fans in China, especially those bloggers who are more real time intelligent to outside world. If Google do good as they did in early days. There will be more supporters for sure. Google is not playing a game of itself. You may under-estimate that before with limited information sources. People here are looking forward that you can pick the three suggestions(or partly) as China strategy in the coming years which can keep Google’s “non-evil” motto alive in people’s mind. It will also benefit to Google’s business in China. It will be also benefit to the whole Internet neutrality in China. All the Internet users will appreciate that eventually.

True. I am also a GFan. Support is not everything in China, right? We have supported someone in the history, but the result was not tah as we expected. If Google’s opponent in the game is baidu, support is everything. If the opponent is gov which you want Google to be the other player, it is a fatal game and usually no winner, support is nothing.

All of all, the pure the better; the more compromise, the worse.

You bet, but it is a further goal. Now, let’s face the truth: In China, more people choose Baidu while Baidu does the self-censorship ever since the very beginning. Why Baidu success, because of the pure? Well, I am not going to say: be evil. The key is: know the real situation in China, understand the local policy and have the good relationship with the gov. Sounds evil, but it is true. And I can see the progress, I think Google will be out of this trap soon.

Can gmail be an automation?

Gmail has a very useful function called filter. I guess every gmail user has used it more or less. In fact fliter provides us these essential options: FROM, TO, HAS WORD, NOTHASWORD, FORWARD and some auxiliary functionality like STAR IT, TAG IT. Note that we have use logical expression in all of the above filed like has word “XX” OR “YY”.

Now my idea is, if we have a bunch of gmail account, namely START, STATEi and HALT, can we bulid an automation using this system? Since Fliter can be used as the state-transformation function, for instance, FROM denotes the last state, content of the mail denotes the input, and of course FORWOAD is to the next state. Now the question is: can we use filter to build an automation and after sending something to the START, can we finally anticipate an result in the HALT state? If so, we can get the result as somehow like the result of a certain calculation. For instance, my START state will always check if the letter has the word “I” and then forward to STATE1, STATE1 will check if it is from START and contains “LOVE” or not, and forward the NOT to STATE2. STATE2 will always check if it is from STATE1 and has the word “YOU” then forward to HALT. Finally, if we get a mail in the HALT email box, we may say that the string putted to the START state is exactly not a LOVE LETTER:)

But the problem is, this formal system is oblivious. For instance, although it has checked some substrings in the previous states, now we want the current state to check the consequent letters instead of the whole string, which is according to the definition of automation. However, as gmail now doesn’t provide any mechanism like “READ POSITION” or “WRITE BACK” (You know, if so, theoretically, I can use Gmail filter system to build a Turing Machine). Now a question arises: can we build an equivalent Finite State Machine or even the Turing Machine based on a memory-less and read-only system? (Note that this system provides you the powerful string matching functions with the return value of true and false, but you can not write on the tape or remember the current position of the type).

Would someone has any ideas about it? Please feel free to leave me your comments.

BTW, you can easily setup a jabber server, and use a lot of python clients to build the Turing machine, as now you can “write” on the tape. You will have a serious clients called START, STATEi and HALT, it is in fact a kind of cool, I am planing to do it.

Wait a minute, are you stupid or something, can’t you build an automation using Yacc easily. Yes, I know, now my goal is to study the “distributed computer” Can you believe that one day we have a distributed Turing Machine like this :)

———–中文翻译—–Chinese Translation—–
Gmail可以做成有限状态自动机么?

Gmail 有一个很好用的功能叫做Filter, 想必大家都用过。 Fliter核心的功能包括过滤发件人 收件人 含有什么词和不含有什么词。还有一些辅助的 比如加星,加标签等等。 如果我们把每个Gmail邮箱想像成一个状态,fliter就是一个状态转移方程,输入一个先前的状态(From)和一个输入字符串(信件内容), 转移到下一个状态(Forward). 于是,自然我们要问一个问题,采用Gmail能不能搭建一个有限状态机甚至一个图灵机。 好比说我们把状态转移方程,也就是Filter设置好,把一封邮件(初始纸带)发给开始状态叫START, 邮件经过乱七八糟状态转移后或者发送到HALT状态,或者在Gmail网络中循环一辈子,于是,从HALT状态就可以读到结果。

然而我想的未免太好了,其实Filter 仅仅是个只读头,图灵机需要写头和状态存储器,以及当前读写头位置 ,这个都是Gmail目前不能实现的。Gmail形式系统是一个只读无记忆的状态转移机,当然,还有一个支持逻辑表达式查询的强大字符串匹配器。好,现在的问题是,有没有人能发明一套系统或者一种方法,让Gmail形式系统可以至少和有限状态机一样强大(也就是描述出上下文无关文法)

任何的想法,给我写信,这个研究很有趣,对计算能力和有限状态机的理解必定更加深刻。
对了,不知道有没有人把Filter设置成死循环,比如A把B的Forward给C, C for给B, B for 给A. 理论上说Gmail无法避免这样的死循环。任何能解决分布式相互调用系统的死循环避免的人,应该能很自信的把简历往任何大公司投了,这个可不是一个简单的问题。

PS: 1. 我觉得既然目前Gmail没有读写记忆功能,自己写的邮件客户端,或者jabber客户端总是可以修改成完全读写和有记忆功能的。在自己服务器上架一个 jabber, 若干用户叫start, halt 云云,给一个字符串发给START, 在HALT那边看能不能收到结果,很cool (干脆你还可以写概率转移的Markov :)
2. 有人说你笨呀,要自动机你用yacc写一下不就行了。 呵呵,咱这不是研究”分布式” 计算么:)

4 comments

Funny GreatFireWall

[Background: GFW is the firewall build by Chinese government, which can block wikipedia, gmail, googlepages and a lot of useful websites when you want to surf on the Internet in China]

GFW is funny now. You can not access gmail, but you can access wiki.

But, GFW can not block HTTPS, so you can try this link to access gmail:
https://mail.google.com/mail/

Again, try this to access wikipedia.
https://secure.wikimedia.org/wikipedia/zh/wiki

GFW is funny. Come on, GFW, can you tell me a list of websites besides playboy that you usually block? You know, you are just like a kid, you even can not tell what kind of candy you want and what kind of toy you DO NOT like. You just find a toy, play it, and block it or allow it. But no one knows exactly what kind of website you like.

Oh, come on, who can tell me the G-point of GFW?
[说句少儿不宜的话:“谁都摸不到GFW的G点”。这句话不是我首创,是借用连岳先生一篇“谁都摸不到GDP的G点”, 得了,咱们英明的党做事,什么时候让咱小民摸到过G点的。要爽的同志在家自己带Tor 自己爽,f**k GFW的时候千万不要试图找G点。这个这个都是教训呀]

This news is cited from Techtree.com. To see the original post, please click here
All the rights may be reserved by the original publisher, this citation is only for study and communication.
Finally, Sun Open Sources Java!
Techtree News Staff
Nov 13, 2006

Seems Sun Microsystems has finally relented under pressure to open source its Java programming language and associated software.

Today, the company will be releasing the first Java code under version 2 of the General Public License (GPLv2), which governs Linux and other open source products.

According to Sun, this move will promote Java and make it easier to bundle with Linux.

The Sun-hosted Java.net Web site will offer access to Java Platform Micro Edition (Java ME) software for mobile phones and Java Platform Standard Edition (Java SE) software for desktop applications.

Commenting on the development, Rich Green, Executive Vice President of Software, Sun, said, this is a milestone for the whole industry, and that not only are they making an influential and widely-used software platform for the Web available under open source, but that they are paving the way for a paradigm shift in how software is enhanced and developed.

While additions to software available under GPL have to also use the license, Sun is making an exception in the case of Java Standard Edition (Java SE). Meaning, programmers creating applications using Java SE will not be required to use the GPL license, and can instead opt for any other license for their applications.

Also, Sun will continue to offer commercial licenses that give other software vendors legal indemnification and official standards certification.

All in all, Sun’s move comes as a pleasant surprise, considering the company has continually resisted calls to open source Java, citing fears that such an action would cause incompatibilities among “forked” versions of the code.

[Added by me:] After SUN opened the Solaris x86 edition, now SUN make the JAVA, another weapon on its left hand open. It’s the winning of the Open Source Movement, and of course we can anticipate a more stable GCJ in the next year. Now everyone can redistribute Java with Linux and other GPLed code. We can anticipate that Python+Java will be the two most popular programming languages in the future based on the *nix platform including MacOS. I can also image that Jython and other combinations of Java+Python will flourish again. Java can never be so powerful as it is now.