Next 15.F: Comparing Files with diff and comp  Up 15: File Utilities  Prev 15.F: Finding Files with find  Contents

Specialized

§ 15.H: Sorting Files with sort


The sort command can be used to sort one or more files together and to eliminate duplicate lines. The most basic version of the sort command is to simply give the command sort followed by the filename. This sorts the file in alphabetic order moving from left to right across the line. Try sorting the file sample.doc.
> sort sample.doc > sample.sort

Now use the more or view command to see the changes.
> more sample.sort

Notice that all of the blank lines have been moved from the bottom of the file to the top, so that the first page is (probably) blank.

A useful parameter for the sort command is the -u parameter which eliminates duplicate lines. The datafile bugs.dat contains raw data with a large number of duplicates, by using the -u option to delete duplicate lines, we can substantially decrease the load on the printer and on disk space.
> sort -u bugs.dat > bugs.sort

Use the list command to see how much smaller this file is.
> ls -l bugs.*

Now use the remove command to remove the large file bugs.dat.
> rm bugs.dat

Using sort to remove extra data can save system resources and program time especially if you are running a graphical program such as xmgr.

Like the other utilites, sort has a variety of options, including options to define fields or columns to be sorted and types of sorts. Some of the more common options are listed in Coping with Unix. You can find out about all of them by reading the man pages.


Next 15.F: Comparing Files with diff and comp  Up 15: File Utilities  Prev 15.F: Finding Files with find  Contents

Comments and questions to Dr. Elias N. Houstis at enh@cs.purdue.edu.