Redundancy of English language is a Goldilocks zone for Crossword puzzles

In Shannon’s terms, the feature of messages that makes codecracking possible is redundancy. A historian of cryptography, David Kahn, explained it like this: “Roughly, redundancy means that more symbols are transmitted in a message than are actually needed to bear the information.” Information resolves our uncertainty; redundancy is every part of a message that tells us nothing new. Whenever we can guess what comes next, we’re in the presence of redundancy. Letters can be redundant: because Q is followed almost automatically by U, the U tells us almost nothing in its own right. We can usually discard it, and many more letters besides. As Shannon put it, “MST PPL HV LTTL DFFCLTY N RDNG THS SNTNC.”

Words can be redundant: “the” is almost always a grammatical formality, and it can usually be erased with little cost to our understanding. Poe’s cryptographic pirate would have been wise to slash the redundancy of his message by cutting every instance of “the,” or “;48”—it was the very opening that Mr. Legrand exploited to such effect. Entire messages can be redundant: in all of those weighted-coin cases in which our answers are all but known in advance, we can speak and speak and say nothing new. On Shannon’s understanding of information, the redundant symbols are all of the ones we can do without—every letter, word, or line that we can strike with no damage to the information.

As his approximations of text grew more and more like English, then, they also grew more and more redundant. And if this redundancy grows out of the rules that check our freedom, it is also dictated by the practicalities of communicating with one another. Every human language is highly redundant. From the dispassionate perspective of the information theorist, the majority of what we say —whether out of convention, or grammar, or habit—could just as well go unsaid. In his theory of communication, Shannon guessed that the world’s wealth of English text could be cut in half with no loss of information: “When we write English, half of what we write is determined by the structure of the language and half is chosen freely.” Later on, his estimate of redundancy rose as high as 80 percent: only one in five characters actually bear information.

As it is, Shannon suggested, we’re lucky that our redundancy isn’t any higher. If it were, there wouldn’t be any crossword puzzles. At zero redundancy, in a world in which RXKHRJFFJUJ is a word, “any sequence of letters is a reasonable text in the language and any two dimensional array of letters forms a crossword puzzle.” At higher redundancies, fewer sequences are possible, and the number of potential intersections shrinks: if English were much more redundant, it would be nearly impossible to make puzzles. On the other hand, if English were a bit less redundant, Shannon speculated, we’d be filling in crossword puzzles in three dimensions.

Notes:

Folksonomies: grammar information science probability redundancy

Taxonomies:
/hobbies and interests/games/board games and puzzles (0.461921)
/technology and computing/data centers (0.458904)
/education/high school (0.376158)

Keywords:
redundancy (0.968892 (:0.000000)), crossword puzzles (0.841475 (:0.000000)), MST PPL HV (0.662361 (:0.000000)), RDNG THS SNTNC (0.647543 (:0.000000)), redundant symbols (0.614466 (:0.000000)), Shannon’s terms (0.564131 (:0.000000)), Goldilocks zone (0.483549 (:0.000000)), David Kahn (0.464999 (:0.000000)), English language (0.461061 (:0.000000)), crossword puzzle. (0.457870 (:0.000000)), grammatical formality (0.457605 (:0.000000)), Mr. Legrand (0.444052 (:0.000000)), LTTL DFFCLTY (0.437763 (:0.000000)), cryptographic pirate (0.435455 (:0.000000)), weighted-coin cases (0.429854 (:0.000000)), higher redundancies (0.428613 (:0.000000)), human language (0.426942 (:0.000000)), information theorist (0.424372 (:0.000000)), potential intersections (0.423055 (:0.000000)), fewer sequences (0.420651 (:0.000000)), dispassionate perspective (0.418298 (:0.000000)), Entire messages (0.416368 (:0.000000)), reasonable text (0.413172 (:0.000000)), English text (0.410578 (:0.000000)), dimensional array (0.409638 (:0.000000)), letters (0.371966 (:0.000000)), half (0.305920 (:0.000000)), world (0.270580 (:0.000000)), —it (0.262660 (:0.000000)), practicalities (0.252468 (:0.000000)), approximations (0.250037 (:0.000000)), uncertainty (0.249121 (:0.000000)), instance (0.246635 (:0.000000)), feature (0.246214 (:0.000000)), information. (0.245591 (:0.000000)), right (0.244641 (:0.000000)), cryptography (0.243830 (:0.000000)), Poe (0.242694 (:0.000000)), Words (0.241876 (:0.000000)), historian (0.240739 (:0.000000)), answers (0.240293 (:0.000000)), percent (0.239894 (:0.000000)), estimate (0.239877 (:0.000000)), presence (0.239845 (:0.000000)), letter (0.239774 (:0.000000)), line (0.239734 (:0.000000)), majority (0.239647 (:0.000000)), rules (0.239380 (:0.000000)), freedom (0.239348 (:0.000000)), RXKHRJFFJUJ (0.238254 (:0.000000))

Entities:
Shannon:Person (0.830970 (:0.000000)), David Kahn:Person (0.413505 (:0.000000)), Mr. Legrand:Person (0.336799 (:0.000000)), 80 percent:Quantity (0.336799 (:0.000000))

Concepts:
Puzzle (0.970671): dbpedia_resource
Crossword (0.931742): dbpedia_resource
Information theory (0.873277): dbpedia_resource
Will Shortz (0.864710): dbpedia_resource
Cryptography (0.822745): dbpedia_resource
Sudoku (0.763453): dbpedia_resource
Redundancy (0.730743): dbpedia_resource
Communication (0.695019): dbpedia_resource

 A Mind at Play: How Claude Shannon Invented the Information Age
Books, Brochures, and Chapters>Book:  Soni, Jimmy (2017718), A Mind at Play: How Claude Shannon Invented the Information Age, Retrieved on 2018-07-27
Folksonomies: information science biography