Compression, ad nauseum (Was: [UFO Chicago] CEO's Report for January 24th, 2002)

Fri, 25 Jan 2002 20:07:34 -0600 (CST)

Larry Garfield wrote:

> Compression occurs when an action is taken on an
> entity such that it occupies less volume with respect
> to its environment than the normal baseline volume
> occupied by a given entity within the same
> environment.

The constraint that the pre-compression and
post-compression environments be identical is
unnecessarily orthodox, and creates additional problems
in defining "environment" and "same".

One can attempt to define compression by resorting to
mechanism, but I would urge that the problem can be
reasonably collapsed by adopting a simple functional
definition:

  Something is "compressed" if, after an activity, it
  requires fewer units of storage, however measured (or,
  as a corollary, more of the something can be stored in
  the same units of storage as the original), and if it
  can be restored into a form that user-acceptably
  resembles the original.

Note that compression is generally possible because we
encode or store inefficiently, often to obtain some
convenience in presentation or processing.  Although I
don't think the particular method one uses is relevant
in determining whether compression has occurred, I
would note by way of example that compression may be
achieved by:
  (a) exploiting more efficient storage media;
  (b) eliminating filler (i.e., non-content) or 
      redundancy; or
  (c) by surrendering non-essential content.
Although in discrete-valued systems it may make sense
to distinguish between the encoding and storage
functions, that distinction is less principled and less
meaningful in continuous-valued systems.  Similarly,
although methods (a), (b), and (c) above may be
othogonally distguishable in discrete-valued systems,
they are probably not so distinguishable in
continuous-valued systems.

> When you squeeze a sponge, it is being compressed, as
> it occupies fewer cubic centimeters than it did
> previously in its baseline state (before you started
> pressing on it).

Fine. 

> When you make more efficient use of a plastic mold
> you are not compressing it.  Neither the mold itself
> nor the products (plastic Barbie torso) are occupying
> less spatial volume than their previous state.

Not so fast.  If the information being stored is the
three-dimensional shape of Barbie, then the improved
mold results in more instances being stored per linear
unit of material.

> In the electronic world, the volume an object takes
> up is the number of bits it uses, not its physical
> volume.  A 10 KB file is not compressed by
> transferring it from a 5.25" floppy to a 3.5" floppy,
> even though it takes up considerably less physical
> space.  Information is not matter.  Matter is the
> medium within which information exists, and changing
> the medium in which a given bit is stored does not
> inherently compress it.

More orthodoxy, and to what end?  To the extent we are
now talking about physical media, the question becomes
how many bytes can we fit in a container of predefined
dimensions.  If you archive to physical media and store
that media in a safe-deposit box, an improvement in
volumetric storage efficiency is every bit as good
whether the improvement arises from more
space-efficient media or from running gzip on all the
files.

> Placing the same abstracted stream of bits into a
> physical medium does not itself compress the data.
> Therefore, changing from one physical medium to
> another does not itself compress the data.
> Therefore, switching from a CAV to CLV-based laser
> disks is not compression, because the bits themselves
> are still the same bits.  The physical media is
> merely being used less efficiently.

> Removing characters from a text string is altering
> the abstracted stream of bits, in such a way that the
> same data is representable in fewer bits.  However,
> the relevant data itself is not being altered, only
> the "padding" data.  So whether white space trimming
> counts as compression is subject to debate.  Of
> course, a similar philosophy is used behind most
> audio compression codecs, that is, removal of
> "non-critical" portions of the data, and I don't
> believe anyone questions whether or not mp3 and ogg
> are compressed audio.

> So perhaps we must alter our definition of
> compression, such that an entity when compressed must
> also alter its actual nature in the process.  To be
> useful in electronic terms, it must also be a
> reversible process, although that is not strictly
> necessary.  (An old car is compressed into a small
> cube, an irreversible process but distinctly a form
> of compression.)

> So by that definition, white space trimming is not a
> form of compression, because the data's actual nature
> is not altered, that is, it can still be run through
> a compiler/interpreter without alteration and it will
> be read properly.  Compressed digital audio (mp3,
> ogg, etc.)  is compressed, because the actual nature
> of the entity (bits) is altered and the end results
> occupies fewer bits than the initial baseline state.

More orthodox and arbitrary stuff, but I don't see how
it helps us compare and contrast the two video-disk
recording methods with the conventional technologies
that we all agree are instances of compression.

Constraining "compression" to require "alteration" of
the data, whatever that means, or to require that the
compressed form be somehow unusable, is utter nonsense.
Gzip would be no less useful, and, in fact it would be
even *more* useful, if a gzipped file were directly
readable.  The point is that the gzipped file
(typically) requires less storage.

Removing whitespace, e.g., by voluntarily cropping an
encylopedia, is exactly compression, because the
resulting encyclopedia requires less storage, and could
be restored to user-acceptably resemble the original,
e.g., by photocopying on to larger paper and rebinding.
The fact that it is not done in the electronic domain
is irrelevant.  An analogous operation in the
electronic domain is eliminating the bits needed to
represent certain white space in a scanned image by
cropping the image, or by run-length or
huffman-encoding a portion of the image.  Fax machines
can compress images in all of these ways; users are
equally pleased that the reduced number of bits results
in shorter transmission times without regard to the
respective contributions of various methods.

Now consider constant-density
vs. constant-angular-velocity storage on rotating
media.  By my functional definition, constant-density
storage is indeed compressed, because more frames of
video can be stored on it than on the baseline
constant-angular-velocity media.

But even if we consider the mechanism by which the
improved storage density is achived, it is nonetheless
compression, because it employs a step of optimizing
away some of the storage inefficiency in the baseline
media, which is exactly what is done by any of the
conventional mechanisms that everyone agrees is
compression.  In analog recording systems, whenever you
use more space on the media to record a signal than is
needed to achieve a desired signal-to-noise ratio, you
introduce inefficiency by way of redundancy.  In analog
systems, redundancy may appear in the form of more
space than necessary, or larger signal levels than
necessary, being used to record a desired signal.  In
digital systems, redundancy appears in the form of more
bits than necessary being used to encode symbols, and
you can employ more or less clever techniques to rid
the stored representation of the redundant bits.
Similarly, there are a variety of ways to rid the
analog system of unnecessary redundancy--controlling
signal or media slew rates, improving the record and
playback transducers to reduce the size of the
"recording locus", eliminating surface area reserved
from use, etc.  Constant-density recording is nothing
more than the step of eliminating (some of) the
redundancy employed by the baseline
constant-angular-velocity scheme.  Hence, it is
compression.