Thank you for correcting the text in this article. Your corrections improve Papers Past searches for everyone. See the latest corrections.

This article contains searchable text which was automatically generated and may contain errors. Join the community and correct any errors you spot to help us improve Papers Past.

Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image
Article image

Huffman squeezer is one way to boost disc capacity

As the fight to put more and more information on smaller and smaller discs continues, various arcane and eclectic methods to achieve this end are being brought into play.

One is the Huffman compression algorithm. This was written by David Huffman in 1952 and is widely used in several areas, including some forms of facsimile transmission, and, I believe, the new massive discdrive from Epson which manages to put an incredible four megabytes on a 5.25-inch floppy disc.

How the Huffman code works is both simple and complex. The basis is that if you give each letter of the alphabet and each punctuation mark a code of equal length — let us say seven bits — then you are wasting a large amount of space.

It is perfectly feasible to take the most commonly used letters and give them a code of, say, three bits and leave the larger bits for letters which are Infrequently used.

This variable-length coding of frequency is used in a simplified form an the Morse code. E is a single dot, A, which is used less frequently is a dot and a dash, and so on. There are English letter frequency tables which show you the number of times that a letter will appear in a standard English scrip, but this is not

permanently fixed. It varies as to the author. Americans, for example, use the letter Z more frequently than do New Zealanders.

In its proper form, the Huffman algorithm forms a new frequency table and a new code for every file on which it operates. It can, if is true, work from a predetermined code, but it will not then maximise the compression of any given file. Frequency tables are not static.

When correctly implemented, the Huffman algorithm can compress a file so that the space saved in storing it is, on average, about 30 per cent. A neat trick.

Another way of getting more information on to a disc is related to the use of error correction. You can easily and substan-

tially increase the amount of information you can cram on a disc, but you will also increase the error rate to a substantial and unacceptable level. There is a way around this, a way to have your cake and eat it — by using an error-correcting algorithm.

If your computer is transferring information at a million bits a second, then an error rate of 10 to the power of minus six will give you an error every second, which will lead to a tearing of hair and a gnashing of teeth.

If you can decrease that error rate to 10 to the power of minus 12, you can switch your computer on and transfer data at the rate of a million bits a second for 11 y 2 days and only average one error over the whole period.

Plainly, an error rate of 10 to the power of minus 12 is the one to aim for, and, in truth, it is the standard most hardware manufacturers use to judge other add-on equipment. If you can achieve that error rate, or better it, then it is acceptable.

Now, if you make a disc drive and go mad cramming information on to the disc, you will end up with an error rate much worse than that acceptable by the hardware manufacturers.

Let us, for the moment, say it is a horrifying 10 to the power of minus five. At this stage, you wheel up your error correction algorithm, which, if properly implemented, will bring the error rate back to 10 to the power of minus 12 which will make everyone very happy, even though implementing the algorithm will eat into available disc space.

Typically, you would expect to lose 10 per cent of the available storage space to error correction, but over-all double the amount of information you can hurl on the disc. Not a bad swap.

There are many different error-correction codes and they have been around since the early 19505. They all have great names, of which my favourite is Bose-Chaudhuri-Hocquenghem, thankfully shortened to BCH.

This code and the Reed-Solomon and the Goppa and all of the other error-correction , codes basically work in a similar way. If you are sending a seven-bit piece of information, it will consist

of a series of 0s and Is. As in 0101001. If you add up the Is in the segment they came to three which is an odd number. You can now stick a little identifying tag on the end of the segment A 1 tag means the added total of the rest of the numbers must be odd.

A 0 tag means the total must be even. If the sums do not work out if this parity check gives the wrong answer, there is an error in transmission.

As an explanation, this is an extreme simplification of the parity-check equation, but it gives the basic idea.

Each of the error-cor-rection algorithms takes this as a starting point and adds as many checks and balances needed to achieve the level of error correction required.

The normal error-cor-rection code used with optical discs is Reed-Solo-mon, which'was invented in .the 19605.

The mathematics used in that code is not your basic school sums. It is based on an exotic type of arithmetic called "finite fields,” which was first postulated by a French mathematical genius, Evariste Galois.

Evariste shuffled off this mortal coil in 1832 when he was blown away in a duel. It was held just before his twenty-first birthday. The mathematics involved are elegant, intricate, and, to a lay mind, incomprehensible.

This matters not What matters is that the algorithms work, and work well in allowing you to cram even more information on to a disc. l <

Permanent link to this item

https://paperspast.natlib.govt.nz/newspapers/CHP19860701.2.141.1

Bibliographic details

Press, 1 July 1986, Page 26

Word Count
969

Huffman squeezer is one way to boost disc capacity Press, 1 July 1986, Page 26

Huffman squeezer is one way to boost disc capacity Press, 1 July 1986, Page 26

Help

Log in or create a Papers Past website account

Use your Papers Past website account to correct newspaper text.

By creating and using this account you agree to our terms of use.

Log in with RealMe®

If you’ve used a RealMe login somewhere else, you can use it here too. If you don’t already have a username and password, just click Log in and you can choose to create one.


Log in again to continue your work

Your session has expired.

Log in again with RealMe®


Alert