Jump to content

sort (Unix)

From Wikipedia, the free encyclopedia
sort
Original authorKen Thompson (AT&T Bell Laboratories)
DevelopersVarious open-source and commercial developers
Initial releaseNovember 3, 1971; 53 years ago (1971-11-03)
Written inC
Operating systemMultics, Unix, Unix-like, V, Plan 9, Inferno, MSX-DOS, IBM i
PlatformCross-platform
TypeCommand
Licensecoreutils: GPLv3+
Plan 9: MIT License

In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of command-line options that can vary by implementation. For instance the "-r" flag will reverse the sort order. Sort ordering is affected by the environment's locale settings.[1]

History

[edit]

A sort command that invokes a general sort facility was first implemented within Multics.[2] Later, it appeared in Version 1 Unix. This version was originally written by Ken Thompson at AT&T Bell Laboratories. By Version 4 Thompson had modified it to use pipes, but sort retained an option to name the output file because it was used to sort a file in place. In Version 5, Thompson invented "-" to represent standard input.[3]

sort is part of X/Open Portability Guide Issue 2 (1987). From there it was inherited into POSIX.[4]

The version of sort bundled in GNU coreutils was written by Mike Haertel and Paul Eggert.[1] This implementation employs the mergesort algorithm. It offers an option to sort in parallel, though performance gain diminishes after 8 threads.[5] GNU parallel also provides a wrapper to perform parallel invocations of sort with similar performance-gain characteristics: on a 48-core system, the speedup is about 3×.[6]

The sort command has also been ported to the IBM i operating system, being accessible from the POSIX-compatible Qshell.[7]

Non-POSIX ports

[edit]

Similar commands are available on many other operating systems, for example a sort command is part of ASCII's MSX-DOS2 Tools for MSX-DOS version 2.[8]

The "uutils" project provides a cross-platform implementation of sort written in Rust, with support for all of GNU coreutil's options. It uses the par_sort_by or par_sort_unstable_by function of Rayon, the Rust multi-threading library, implementing either an adaptive mergesort inspired by timsort or a variation of pattern-defeating quicksort.[9]

Syntax

[edit]
sort [OPTION]... [FILE]...

With no FILE, or when FILE is -, the command reads from standard input.

Parameters

[edit]

In the table below, "Short" indicates only support for the one-letter (short) form of the option. Long options are originally a GNU extension and is not part of any version of SUS or POSIX. It has since also been adopted by FreeBSD.

Name Description SUS/POSIX Plan 9 Inferno FreeBSD Linux MSX-DOS IBM i
-b,
--ignore-leading-blanks
roIgnores leading blanks. Short Short No Yes Yes No Short
-c,
--check
Check that input file is sorted. No Short No Yes Yes No Short
-C,
--check=<silent|quiet>
Like -c, but does not report the first bad line. No No No Yes Yes No No
-d,
--dictionary-order
Considers only blanks and alphanumeric characters. Short Short No Yes Yes No Short
-f,
--ignore-case
Fold lower case to upper case characters. Short Short No Yes Yes No Short
-g,
--general-numeric-sort,
--sort=general-numeric
Compares according to general numerical value. Short Short No Yes Yes No No
-h,
--human-numeric-sort,
--sort=human-numeric
Compare human readable numbers (e.g., 2K 1G). Short No No Yes Yes No No
-i,
--ignore-nonprinting
Considers only printable characters. Short Short No Yes Yes No Short
-k,
--key=POS1[,POS2]
Start a key at POS1 (origin 1), end it at POS2 (default end of line) No No No Yes Yes No No
-m Merge only; input files are assumed to be presorted. No Short No Yes Yes No Short
-M,
--month-sort,
--sort=month
Compares (unknown) < 'JAN' < ... < 'DEC'. Short Short No Yes Yes No No
-n,
--numeric-sort,
--sort=numeric
Compares according to string numerical value. Short Short Short Yes Yes No Short
-o OUTPUT Uses OUTPUT file instead of standard output. No Short No Yes Yes No Short
-r,
--reverse
Reverses the result of comparisons. Short Short Short Yes Yes No Short
-R,
--random-sort,
--sort=random
Shuffles, but groups identical keys. See also: shuf No No No Yes Yes No No
-s Stabilizes sort by disabling last-resort comparison. No No No Yes Yes No No
-S size,
--buffer-size=size
Use size for the maximum size of the memory buffer. No No No Yes No No No
-t char,
--field-separator=char
Uses char instead of non-blank to blank transition. In other words, 'Tab character' separating fields is char. No Short No Yes Yes No Short
-T dir,
--temporary-directory=dir
Uses dir for temporaries. No Short No Yes Yes No No
-u,
--unique
Unique processing to suppress all but one in each set of lines having equal keys. No Short No Yes Yes No Short
-V,
--version-sort
Natural sort of (version) numbers within text No No No Yes Yes No No
-w Like -i, but ignore only tabs and spaces. No Yes No No No No No
-z,
--zero-terminated
End lines with 0 byte, not newline No No No Yes Yes No No
--help Display help and exit No No No Yes Yes No No
--version Output version information and exit No No No Yes Yes No No
/R Reverses the result of comparisons. No No No No No Yes No
/S Specify the number of digits to determine how many digits of each line should be judged. No No No No No Yes No
/A Sort by ASCII code. No No No No No Yes No
/H Include hidden files when using wild cards. No No No No No Yes No

Examples

[edit]

Sort a file in alphabetical order

[edit]
$ cat phonebook
Smith, Brett     555-4321
Doe, John        555-1234
Doe, Jane        555-3214
Avery, Cory      555-4132
Fogarty, Suzie   555-2314
$ sort phonebook
Avery, Cory      555-4132
Doe, Jane        555-3214
Doe, John        555-1234
Fogarty, Suzie   555-2314
Smith, Brett     555-4321

Sort by number

[edit]

The -n option makes the program sort according to numerical value. The du command produces output that starts with a number, the file size, so its output can be piped to sort to produce a list of files sorted by (ascending) file size:

$ du /bin/* | sort -n
4       /bin/domainname
24      /bin/ls
102     /bin/sh
304     /bin/csh

The find command with the ls option prints file sizes in the 7th field, so a list of the LaTeX files sorted by file size is produced by:

$ find . -name "*.tex" -ls | sort -k 7n

Columns or fields

[edit]

Use the -k option to sort on a certain column. For example, use "-k 2" to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data (+2 for the third, etc.). This usage is deprecated.

$ cat zipcode
Adam  12345
Bob   34567
Joe   56789
Sam   45678
Wendy 23456
$ sort -k 2n zipcode
Adam  12345
Wendy 23456
Bob   34567
Sam   45678
Joe   56789

Sort on multiple fields

[edit]

The -k m,n option lets you sort on a key that is potentially composed of multiple fields (start at column m, end at column n):

$ cat quota
fred 2000
bob 1000
an 1000
chad 1000
don 1500
eric 500
$ sort -k2,2n -k1,1 quota
eric 500
an 1000
bob 1000
chad 1000
don 1500
fred 2000

Here the first sort is done using column 2. -k2,2n specifies sorting on the key starting and ending with column 2, and sorting numerically. If -k2 is used instead, the sort key would begin at column 2 and extend to the end of the line, spanning all the fields in between. -k1,1 dictates breaking ties using the value in column 1, sorting alphabetically by default. Note that bob, and chad have the same quota and are sorted alphabetically in the final output.

Sorting a pipe delimited file

[edit]
$ sort -k2,2,-k1,1 -t'|' zipcode
Adam|12345
Wendy|23456
Sam|45678
Joe|56789
Bob|34567

Sorting a tab delimited file

[edit]

Sorting a file with tab separated values requires a tab character to be specified as the column delimiter. This illustration uses the shell's dollar-quote notation[10][11] to specify the tab as a C escape sequence.

$ sort -k2,2 -t $'\t' phonebook 
Doe, John	555-1234
Fogarty, Suzie	555-2314
Doe, Jane	555-3214
Avery, Cory	555-4132
Smith, Brett	555-4321

Sort in reverse

[edit]

The -r option just reverses the order of the sort:

$ sort -rk 2n zipcode
Joe   56789
Sam   45678
Bob   34567
Wendy 23456
Adam  12345

Sort in random

[edit]

The GNU implementation has a -R --random-sort option based on hashing; this is not a full random shuffle because it will sort identical lines together. A true random sort is provided by the Unix utility shuf.

Sort by version

[edit]

The GNU implementation has a -V --version-sort option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of letters are compared alpha-numerically, and blocks of digits are compared numerically (i.e., skipping leading zeros, more digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the first non-equal block in that loop decides which text is larger. This happens to work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.

See also

[edit]

References

[edit]
  1. ^ a b sort(1) – Linux User Manual – User Commands from Manned.org
  2. ^ "Multics Commands". www.multicians.org.
  3. ^ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
  4. ^ sort – Shell and Utilities Reference, The Single UNIX Specification, Version 5 from The Open Group
  5. ^ "Sort invocation (GNU Coreutils 9.8)".
  6. ^ "NAME — GNU Parallel 20250922 documentation". www.gnu.org.
  7. ^ IBM. "IBM System i Version 7.2 Programming Qshell" (PDF). IBM. Retrieved 2020-09-05.
  8. ^ "MSX-DOS2 Tools User's Manual - MSX-DOS2 TOOLS ユーザーズマニュアル". April 1, 1993 – via Internet Archive.
  9. ^ "ParallelSliceMut in rayon::slice - Rust". docs.rs.
  10. ^ "The GNU Bash Reference Manual, for Bash, Version 4.2: Section 3.1.2.4 ANSI-C Quoting". Free Software Foundation, Inc. 28 December 2010. Retrieved 1 February 2013. Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard.
  11. ^ Fowler, Glenn S.; Korn, David G.; Vo, Kiem-Phong. "KornShell FAQ". Archived from the original on 2013-05-27. Retrieved 3 March 2015. The $'...' string literal syntax was added to ksh93 to solve the problem of entering special characters in scripts. It uses ANSI-C rules to translate the string between the '...'.

Further reading

[edit]
[edit]