File compression is a method used by computers to reduce the size of an electronic file or group of files to the smallest possible size. ZIP files are a common example of file compression. There are many techniques computer programs use to compress files, but the most common method is to replace repeated sequences with smaller sequences. Computers do this by cataloging the first instance and referring back to it in subsequent instances. For example, in the sentences before this one, the letters “compress” have appeared three times. A computer might compress these by cataloging the first instance and equating all subsequent instances with “&cmp”—thus reducing the number of letters, or characters, required and reducing the file size. Over the course of a large file, such replacement can save a great deal of space. It should be noted, however, the user doesn’t see this—the computer does all the work in the background.
To keep track of the abbreviated replacements, computer programs also create a dictionary for the compressed file. In the world of file compression, a dictionary is a list of the original phrases and their replacements. This dictionary is included in the file size, however, so it will also take up space, reducing the amount of space saved by compression. For this reason, it is usually less efficient to compress a single small file than a large file or group.
Programs that perform file compression operate in slightly different ways. These differences are mostly a result of something called an algorithm. An algorithm is a mathematical construct that searches the files for repeats and puts in the simple references. Programmers create the compression algorithms and define the patterns and replacements to be used for compression. The variances can effect the efficiency of the compression. For example, in the phrase “it was the best of times, it was the worst of times,” the sequences “it was the” and “st of times” are repeated. A programmer may choose to replace each entire phrase or perhaps replace “it” individually and “of times” individually. These slight difference in algorithms are why some file compression programs work better on some file types than others.
Some types of compressed files are self-extracting, meaning they open automatically when a user clicks on them—no external programs are required to uncompress them. In order to make these files, the original compression program has to add additional code to the compressed file. For this reason, a self-extracting compressed file is often larger than the same file if it were compressed normally. This is typically considered a good trade off for companies that release programs to be installed on home computers. By making the file self-extracting, it allows the home user to skip a step and makes it more likely the software will be installed. To uncompress files that are not self-extracting, a number of third-party programs are available for download on the Internet.