Shannon’s entropy is a measure of the uncertainty of a system, or equivalently the amount of information present in it. It is often measured in bits, and, for a system with possible states, and probabilities described by the Vector , the entropy is given by

Desired properties

The entropy is derived as the only solution that satisfies three properties:

  • Continuity: must be continuous: small changes in the probabilities cause small changes in Entropy.
  • Monotonicity: For a Uniform Distribution, must be increasing with the number of outcomes, i.e. .
  • Additivity: . This property can be visualized as making a composed choice: I can make a choice in two ways both of which should have the same entropy. These ways are:
    • Picking one of the or outcomes, with entropy .
    • Picking either a or outcome, with entropy , and then, with probability , if I picked , I pick one of the , with entropy , and similarly for , which has a total entropy of .

Formula derivation

We define . By additivity, we have

Now we fix an , and describe as a function of . For any given , we can find a such that

Hence holds . Letting , we can take their lowest common denominator and write . By additivity, , which can be rearranged to get

By continuity, we have that holds such that . The constant defines the information units we work in:

  • is bits
  • is trits
  • is nats
  • is harleys