Shannon’s entropy is a measure of the uncertainty of a system, or equivalently the amount of information present in it. It is often measured in bits, and, for a system with possible states, and probabilities described by the Vector , the entropy is given by
Desired properties
The entropy is derived as the only solution that satisfies three properties:
- Continuity: must be continuous: small changes in the probabilities cause small changes in Entropy.
- Monotonicity: For a Uniform Distribution, must be increasing with the number of outcomes, i.e. .
- Additivity: . This property can be visualized as making a composed choice: I can make a choice in two ways both of which should have the same entropy. These ways are:
- Picking one of the or outcomes, with entropy .
- Picking either a or outcome, with entropy , and then, with probability , if I picked , I pick one of the , with entropy , and similarly for , which has a total entropy of .
Formula derivation
We define . By additivity, we have
Now we fix an , and describe as a function of . For any given , we can find a such that
Hence holds . Letting , we can take their lowest common denominator and write . By additivity, , which can be rearranged to get
By continuity, we have that holds such that . The constant defines the information units we work in:
- is bits
- is trits
- is nats
- is harleys