Archive for Science

熵的公理化定义

TopLanguage 论坛上有人问为啥信息论要使用 H = -K \sum_{x \in X} p(x)\log p(x) 的形式. 或者说, 为啥要用不直观的对数.

古典的解释很简单, 对于一个有  N 种状态的事件, p 进制编码的时候, 需要的位数是  \log_p N . 也就是  - \log_p (1/N) .

这种古典方法有很多局限. 最显著的就是  N, p 必须是整数. 熵的概念既不能推广到一般概率, 也不能推广到一般的对数为底.

下面我介绍两个公理化的定义. 一个是 Shannon 的原始定义, 一个是我偶然想到的模仿 Kolmogroov 概率公理化定义一样, 关于自信息量的定义.

Shannon 在 A mathematical theory of communication 的公理化定义被很多人忽略了. 我简单的归纳一下.

0. 信源可以看成是一个 Markov 过程. 现在相信大家都知道这一点了, 比如抛硬币, 就是这样的图:

m.png

这里, 信源的熵(不确定度量)只和均衡状态下各态的到达概率有关. 而和其他量无关. 也就是说, 迥然不一样的Markov过程只要稳定状态下各态概率一样, 不确定度量也一样. 这样, 信源就抽象成了和具体工作机理无关的 Markov 信号发生器. 这个是信息论的关键前提.

1. 假设离散事件发生的概率为  p_1, p_2, \ldots, p_k . 其所含的”不确定度量”  H 是关于 p_i 的一个连续函数.

2. 假设 p_i = \frac{1}{n}. 则  H 关于  n 是增函数. 因为  n 变大, 表示可选择的更加多, 也表示“不确定度量”在增加.

3. 如果随机变量  X 的概率分布依赖于 随机变量  Y 的分布, 那么 X 的熵  H(X) 可以表示为  H(X) = \sum_{y \in Y} p(y) H(X|y)

从这三个公理出发, Shannon 证明, H = -K \sum_{x \in X} p(x)  \log p(x) 是唯一可能的信息熵的表示. 具体的证明也不难. 从因为从 3 出发可以简单推出.

假设  H(\frac{1}{n}, \ldots, \frac{1}{n}) = A(n)
则有,  A(t^{n}) = n A(t) . 按照连续函数的性质, 解一下函数方程, 可以得到  A(t) = K \log t.

我想到的公理化定义说起来也比较简单, 是依赖于自信息量的公理化定义. 基于和香农同样的假设, 一个事件的自信息量 S(X) 定义为该事件概率分布的一个连续函数.类比于 Kolmogorov 建立的概率公理体系, 自信息量是概率度量空间上的一个实度量, 满足一下几条公理:

1. 非负性. 对于任何事件 X,  S(X) \ge 0

2. 信息量的可列可加性. 即在条件概率的意义下, 以两项为例, 有,
S(X, Y) = S(X) +  S(Y|X)  = S(Y) + S(X|Y)

直观的解释是, 事件  X 与事件  Y 联合提供的信息量, 等于知道  X 的信息量加上知道  X  Y 提供的信息量.

根据公理2, 很简单可以得出独立同分布事件  X ,  Y 联合信息量  S(X, Y) = S(Y) + S(X) . 根据这个可以推导出两个结论, 一是概率为 1 的事件的信息量必然为 0. 二是因为独立分布事件的概率是两个概率相乘. 令  t_x = p(X), t_Y = p(Y) , 则有  f(t_x * t_y) = f(t_x) + f(t_y), 此函数方程的解为  k \log t . 其中 t 是概率,  \alpha 是任意常数. 根据1 可知,  \alpha 必须取负数. 因此, 令  k = - \alpha , 得自信息量公式  - K \log p . 与古典结果一致.

在自信息量的定义之上, 熵就是很显然的了. 事实上, 熵就是系统各状态自信息量的数学期望, 即:

H = -K \sum_{x \in X} p(x) \log p(x)

公理化定义的好处是, 整个信息论建立在了现代数学的磐石之上, 并且能与其他数学领域相结合. 信息论诞生以后, 到最近, 都一直受到数学家的重视. 在关于下一代移动通讯网络 (4G)* 的研究中, 非常纯数学非常抽象的代数数论和代数几何, 居然和信息论相结合, 产生了令人惊讶的代数几何码, MIMO 空时码等先进的编码技术. 编码理论和代数结构相结合, 不光在实际上产生了效果好的应用, 而且在理论上也出人意料, 建立了不同学科之间的深刻联系. 信息论早就已经超出了一个工程分支的范畴, 成了非常新, 非常有活力的一个数学练兵场. 有兴趣致力这方面研究的可以阅读几篇 IEEE Transaction on IT 上面的一些经典文章.

延伸材料:

1. A Mathematical Theory of Communication by Claude E. Shannon (http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html)

2. Wikipedia Article: “Entropy in thermodynamics and information theory” (http://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_information_theory)

* 3G (第三代移动通信) 的编码, 据我有限的知识, 一般是基于 Turbo 码这样一个随机的编码方式. 关于这个编码有一段非常有趣且有启示的故事: http://jcst.ict.ac.cn/downloads/xsqy/qy1503.pdf

Comments (4)

Pyipopt, Python marries with Ipopt

Ipopt is a leading nonlinear optimization tool. There are a bunch of interfaces, e.g. C/C++ standard interfaces, Java and MATLAB interface, FORTRAN interface for programmers.  However, no existing interface to python was presented. I started to connect ipopt with python two months ago inspired by OpenOpt. After lots of extensive development and nasty testing,  today, I proudly announce that the python interface of ipopt, aka, pyipopt, version alpha, is ready to go.

With the help of this module, in python, you can just use

import pyipopt
nlp = pyipopt.create(n, xl, xu, m, gl, gu, nnzj, nnzh, f, gradf, g, gradg, h, apply_new)
solution = nlp.solve(x0)

to get your NLP problem which n variables (xl, xu are boundaries for variables), m constraints (gl, gu are boundaries for constraints), and objective function f (gradf is the gradient, h is the Hessian matrix) with constraints g (gradg is the Jacobian value) optimized by IPOPT.

The package is available via google code: http://code.google.com/p/pyipopt/   An OpenOpt hooker is under development.

The code has BSD license. you can use this module whatever you want. For bug report and other related things, reach me at youxu AT wustl.edu.

To write a  package is not rocket science, but it might save you some time in your research.

[Special thanks to Dmitrey Kroshko from OpenOpt and Scipy community; Dominique Orban from Nlpy community]

Comments (1)

Bit per Joule 比特每焦耳

I was thinking about an interesting problem:

Since energy can be seen as entropy and workable energy has smaller entropy, so who can tell me ideally, via any kind of machine, how many bit of information can be added to a system if we add one Joule to the system. For instance, in the modern computer, 1 Joule can drive the computer to output several digits of random binary bits. A programmer can take calorie everyday and write thousands lines of code. My question: what’s the ideal machine in the universe that has the most productivity? (Or there are identical, just differs in different forms? What’s the value of that constant if it’s a constant?) Or, is it the right question?

I will award 50$ for the first person who can come up with the systematic explanation to all the questions above, seriously. Send me email [youxu @t wustl.edu]

—- Chinese Edition———–

我最近突然有个很奇怪的想法, 就是怎么衡量在信息社会, 一个系统的信息生产效率. 我们都知道, 所有的可用能量输入封闭系统后, 如果全部利用后减去被系统存储的活动能, 剩余的就是熵. 按照热力学第二定律, 封闭系统的熵是不减少的.

这个能量来到系统中, 如果做功, 我们自然可以算系统的能量转化比. 假如这个能量不是用来做功, 而是生成信息呢? 比如, 人吃饭, 写字. 就是能量到信息的过程. 这个过程, 最佳转化率能达到多少? 如果信息熵和热力学熵有某种等价关系的话, 可以把系统的热力学熵也归为信息熵. 那么, 一焦耳的活动能, 最后变成了多少比特呢? 如果这个是个常数, 能否求出这个常数? 如果不是常数, 这个宇宙中”生产率”最高的系统的转化比又是多少呢?

大家踊跃想想, 这个问题很有趣. 我将给第一个系统的解决所有问题的人发50美元小奖. [youxu [@t] wustl.edu ]

Comments (9)

My plan about the “Read The F Manual” Project

A little bit history

Being in the academical area for years, one of my dreaming tool is an online collaborative paper reader . You can say it’s YouTube with documents, or Digg with pdf files–name actually doesn’t matter. The ultimate goal for this system is to support collaborative document reading/organization for professors and Ph.D. students. Since we now have Ajax (more interactive than before) and Flash(as powerful as PDF), I would expect an light-wighted solution. arXiv and Citeseer do really good jobs in storing and linking all the documents, but the ultimate goal for me is to read.

On winter 2006, I expanded my ambition to a larger project: read the f source code and read the f books (Here of course the f word means fine :). I launched my domain name rtfsc.org [now redirected to Apache.org]. Our team members thought that rtfsc was easier than rtfm because we only need to deal with text information instead of PDF file. However, we set ourselves on the wrong track. It turned out that if we let the users upload and update their codes all the time, we have to implement a Subversion or even a Sourceforge. For us, an online code-reading community without code management and code search is terrible, but we don’t have that much time to finish all these ambitions. The other obstacle comes from my lacking experiences of Ajax development. Anyway, at last, we gave up. We stopped the development after Christmas. On that week, I was the person of the year 2006.

New motivation

I have been thinking about this project for nearly one year without any action. Lots of similar projects were there during this year, for example, flashpaper, edocr and scribd. But none of these are my dreaming tool. I took a retrospect about our project again and found that nobody actually wanted to do that except us because they didn’t actually understand our requirements. Their goals are usually becoming the next DocTube thing, while my goal is to have a handy system open for academical research. Also, I found some open source tools that were really inspiring. These make me interested in the project again.

Technical aspects:

1. PDF to SWF is not so that hard, there is an open source tool: PDF2SWF.

2. I thought that I can’t control the converted SWF as there is no API exposed as Flashpaper, but I was wrong. Actionscript can control it directly. What I need to do is to learn Actionscript, which is not so that hard.

3. I was thought that Flashpaper use some undocumented APIs which we could never know, but I got SWFmill today and now I can study the code of Flashpaper. [Disclaimer: I didn’t say anything about how to decompile flashpaper or violate the EULA of flashpaper]

4. I thought Ajax + Flash development was hard and had a very steep learning curve, but again I was wrong. On hacking Google Finance these days, I found the Flash Ajax Integration Package, which is very handy to use for some newbie like me without any Ajax/Flash developing experiences.

Proof of concept

Here is a proof of concept product. I convert my PDF version CV to swf and then combined it to a controller I wrote based on the default controller provided in PDF2SWF. (This CV has two pages, you can proceed to next page by click the red arrow)

cv.swf

Final Word

What I need is an sophisticated viewer which can be as powerful as flashpaper can communicate with other Ajax components on that webpage. So for developers, if you find this idea interesting and you are good at Ajax/Flash, feel free to take this idea and do it, as I might not have time to do it or it will take me a long time to finish it. I desperately need an Ajax style flash-based document reader with inline comment support. If you can do thing and get your own startup based on the idea, my only request is letting me be your user.

BTW, if there was some lessons in this project, I would say, choose the handy tools before you get start.

Comments (1)

Guest Blog-9 从纳什均衡看旁观者效应 — By Zhiqiang Zhang

作者介绍: 张志强, 清华大学 理论计算机科学研究中心(ITCS) 三年级博士生. 师从著名计算机科学家姚期智教授. 在此之前, 他在北京大学数学系取得本科学位. 他在计算机和数学方面获得的一些荣誉和经历包括 42届 IMO满分金牌, 2004 ACM/ICPC 上海区前5, 密歇根大学(University of Michigan) 访问学者, 微软亚洲研究院和 Google 中国实习生等. 我通过阅读他的博客与他认识, 并很荣幸的被他列为特别好友 :). 他的博客阅微堂亦属中文博客中少数精品. 更多的信息可以访问他的个人网站, 或者订阅他的个人博客.

从纳什均衡看旁观者效应 本文原发 阅微堂 原文链接.

1964年3月13号凌晨3点,纽约酒吧经济Kitty Genovese在即将到达寓所时,遭到持刀暴徒的侵犯,她惊恐的尖叫并恳求帮助。但她的38户邻居,很多人走到窗户前观望了片刻,目睹她在歹徒手中挣 扎,但直到歹徒离开,才有人打电话报警。但Genovese却未能得到及时救治很快就死去了。[1]

为什么Kitty的邻居没有一个人援助她?人们普遍归因于人的异化与冷漠。但心理学家有不同的看法,大量的实验和研究显示在公共场所观看危机事件的旁观者越多,愿意提供帮助的人就越少,这被称为旁观者效应

为什么会这样呢?心理学家

…猜测,当旁观者的数目增加时,任何一个旁观者都会更少地注意到事件的发生,更少地把它解释为一个重大的问题或紧急情况,更少地认为自己有采取行动的责任。[1]

下面用经济学中的纳什均衡的方法定量地说明,在人数变多时,的确是任何一个人提供帮助的可能性变小,而且存在某人提供帮助的可能性也在变小!通俗的 说,在开头的报警案例中,围观者(邻居)越多,报警的可能性越小! (这些来源于2年前与同学的讨论,只不过当时还不知道心理学上也有对应的分析。)

在这里假设人都是利益动物(也就说下面的分析不考虑社会心理学中提到的人的心理因素)。在最开始的抢劫案件中,假设有n个围观者,有人提供帮助(报警),每个人都能得到a的固定收益,但报警者会有额外损失b(可以看成提供帮助所消耗的时间,精力或者报警者所可能遇到的危险——注意最近的彭宇案件)。容易知道,在b>a时,一个完全理性的人不可能去报警,所以我们只考虑0\leq b \leq a的情形。我们来分析一下,在这个模型里面,每个人将如何行动?

按照上面的假定,对于某个人A而言,他的收益矩阵为:

  其他n-1个人不报警 其他n-1个人有人报警
A不报警 0 a
A报警 a-b a-b

我们求上面的收益矩阵的纳什均衡,由于每个人都是对称的(暂且只考虑对称的纳什均衡),无妨假设每个人不报警的概率为p,不难得到纳什均衡在p=(\frac{b}{a})^{\frac1{n-1}}达到。注意p是随着人数n增大而增大的!更重要的是,存在某人报警的概率1-p^n=1-(\frac{b}{a})^{\frac{n}{n-1}} 随着人数的增加而减少!

注意,上面的结果也提供了报警的概率与 \frac{b}{a} 的相关关系。

更多推断:

  • 相对而言,城市居民比小乡村居民更冷漠:在人少的地方获得帮助的可能性反而更大。
  • 朋友并不是越多越好的(?)
  • 求助时不要同时向若干人求助,即便如此也不要让他们互相知道。
  • 常在新闻里看到,一人受伤或者…,多少多少人围观,却没有人提供帮助。其实,多少人围观,与是否会有人提供帮助的关系并不太大——注意在上面的计算中,当n>3时(这时候才算围观吧),有人提供帮助的概率1-p^n与n的关系不大。
  • 一个社会的道德水平,如不考虑别的因素(社会和心理上的),将由ba的比值决定,而在受益a确定的情况下,完全由b决定,这里的b是提供帮助的成本(包括时间,精力,以及有可能遭致的打击报复,甚至忘恩负义者的反咬)。
  • 和谐社会,需要努力降低前面的b值,通过给与金钱上或者精神上的奖励。
  • 最近的彭宇事件,根据网络上的反应,这件事情大大提高了b,将导致道德水平下降。

参考:

[1] David G. Myers, Social Psychology - 社会心理学, P363-369.

Comments (7)