Sorting in Emacs
In this article, we will perform a series of hands-on experiments that demonstrate the various Emacs commands that can be used to sort text in different ways. There is sufficient documentation available for these commands in the Emacs and Elisp manuals. In this article, however, we will take a look at some concrete examples to illustrate how they work.
Sorting Lines
Our first set of experiments demonstrates different ways to sort lines. Follow the steps below to perform these experiments.
-
First create a buffer that has the following text:
Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR Bob 100 London LCY->CDG Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND
Let us pretend that each line is a record that represents some details about different persons. From left to right, we have each person's name, some sort of numerical ID, their current location, and their upcoming travel plan. For example, the first line says that Carol from London is planning to travel from London Heathrow (LHR) to San Francisco (SFO).
-
Type
C-x h
to mark the whole buffer and typeM-x sort-lines RET
to sort lines alphabetically. The buffer looks like this now:Alice 10 Paris CDG->LHR Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR
-
Type
C-x h
followed byC-u M-x sort-lines RET
to reverse sort lines alphabetically. The key sequenceC-u
specifies a prefix argument that indicates that a reverse sort must be performed. The buffer looks like this now:Dan 20 Tokyo HND->LHR Carol 200 London LHR->SFO Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Alice 10 Paris CDG->LHR
-
Type
C-x h
followed byM-x sort-fields RET
to sort the lines by the first field only. Fields are separated by whitespace. Note that the result now is slightly different from the result ofM-x sort-lines RET
presented in point 2 earlier. Here Bob from Paris comes before Bob from London because the sorting was performed by the first field only. The sorting algorithm ignored the rest of each line. However in point 2 earlier, Bob from London came before Bob from Paris because the sorting was performed by entire lines.Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR
-
Type
C-x h
followed byM-2 M-x sort-fields RET
to sort the lines alphabetically by the second field. The key sequenceM-2
here specifies a numeric argument that identifies the field we want to sort by. Note that100
comes before20
because we performed an alphabetical sort, not numerical sort. The result looks like this:Alice 10 Paris CDG->LHR Bob 100 London LCY->CDG Dan 20 Tokyo HND->LHR Carol 200 London LHR->SFO Bob 30 Paris ORY->HND
-
Type
C-x h
followed byM-2 M-x sort-numeric-fields RET
to sort the lines numerically by the second field. The result looks like this:Alice 10 Paris CDG->LHR Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG Carol 200 London LHR->SFO
-
Type
C-x h
followed byM-3 M-x sort-fields RET
to sort the lines alphabetically by the third field containing city names. The result looks like this:Bob 100 London LCY->CDG Carol 200 London LHR->SFO Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR
Note that we cannot supply the prefix argument
C-u
to this command to perform a reverse sort by a specific field because the prefix argument here is used to identify the field we need to sort by. If we do specify the prefix argumentC-u
, it would be treated as the numeric argument4
which would sort the lines by the fourth field. However, there is a little trick to reverse sort lines by a specific field. The next point shows this. -
Type
C-x h
followed byM-x reverse-region RET
. This reverses the order of lines in the region. Combined with the previous command, this effectively reverse sorts the lines by city names. The result looks like this:Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO Bob 100 London LCY->CDG
-
Type
C-x h
followed byM-- M-2 M-x sort-fields RET
to sort the lines alphabetically by the second field from the right (third from the left). Note that the first two key combinations are meta+- and meta+2. They specify the negative argument-2
to sort the lines by the second field from the right. The result looks like this:Carol 200 London LHR->SFO Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Dan 20 Tokyo HND->LHR
-
Type
M-<
to move the point to the beginning of the buffer. Then typeC-s London RET
followed byM-b
to move the point to the beginning of the wordLondon
on the first line. Now typeC-SPC
to set a mark there.Then type
C-4 C-n C-e
to move the point to the end of the last line. An active region should be visible in the buffer now.Finally type
M-x sort-columns RET
to sort the columns bounded by the column positions of mark and point (i.e., the last two columns). The result looks like this:Bob 100 London LCY->CDG Carol 200 London LHR->SFO Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR
-
Like before, type
M-<
to move the point to the beginning of the buffer. Then typeC-s London RET
followed byM-b
to move the point to the beginning of the wordLondon
on the first line. Now typeC-SPC
to set a mark there.Again, like before, type
C-4 C-n C-e
to move the point to the end of the last line. An active region should be visible in the buffer now.Now type
C-u M-x sort-columns RET
to reverse sort the last two columns.Dan 20 Tokyo HND->LHR Bob 30 Paris ORY->HND Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO Bob 100 London LCY->CDG
-
Warning: This step shows how not to use the
sort-regexp-fields
command. In most cases you probably do not want to do this. The next point shows a typical usage of this command that is correct in most cases.Type
C-x h
followed byM-x sort-regexp-fields RET [A-Z]*->\(.*\) RET \1 RET
to sort by the destination airport. This command first matches the destination aiport in each line in a regular expression capturing group (\(.*\)
). Then we ask this command to sort the lines by the field matched by this capturing group (\1
). The result looks like this:Dan 20 Tokyo LCY->CDG Bob 30 Paris ORY->HND Alice 10 Paris HND->LHR Carol 200 London CDG->LHR Bob 100 London LHR->SFO
Observe how all our travel records are messed up in this result. Now Dan from Tokyo is travelling from LCY to CDG instead of travelling from HND to LHR. Compare the results in this point with that of the previous point. This command has sorted the destination fields fine and it has maintained the association between the source airport and destination airport fine too. But the association between the other fields (first three columns) and the last field (source and destination airports) is broken. This happened because the regular expression matches only the last column and we sorted by only the destination field of the last column, so the association of the fields in the last column is kept intact but the rest of the association is broken. Only the part of each line that is matched by the regular expression moves around while the sorting is performed; everything else remains unchanged. This behaviour may be useful in some limited situations but in most cases, we want to keep the association between all the fields intact. The next point shows how to do this.
Now type
C-/
(orC-x u
) to undo this change and revert the buffer to the previous good state. After doing this, the buffer should look like the result presented in the previous point. -
Assuming the state of the buffer is same as that of the result in point 11, we will now see how to alter the previous step such that when we sort the lines by the destination field, entire lines move along with the destination fields. The trick is to ensure that the regular expression matches entire lines. To do so, we make a minor change in the regular expression. Type
C-x h
followed byM-x sort-regexp-fields RET .*->\(.*\) RET \1 RET
.Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Dan 20 Tokyo HND->LHR Alice 10 Paris CDG->LHR Carol 200 London LHR->SFO
Now the lines are sorted by the destination field and Dan from Tokyo is travelling from HND to LHR.
-
Type
C-x h
followed byM-- M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET
to reverse sort the lines by the destination airport. Note that the first key combination is meta+- here. This key combination specifies a negative argument that results in a reverse sort. The result looks like this:Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR Alice 10 Paris CDG->LHR Bob 30 Paris ORY->HND Bob 100 London LCY->CDG
-
Finally, note that we can always invoke shell commands on a region and replace the region with the output of the shell command. To see this in action, first prepare the buffer by typing
M-<
followed byC-k C-k C-y C-y
to duplicate the first line of the buffer.Then type
C-x h
followed byC-u M-| sort -u RET
to sort the lines but remove duplicate lines during the sort operation. TheM-|
key sequence invokes the commandshell-command-on-region
which prompts for a shell command, executes it, and usually displays the output in the echo area. If the output cannot fit in the echo area, then it displays the output in a separate buffer. However, if a prefix argument is supplied, say withC-u
, then it replaces the region with the output. As a result, the buffer now looks like this:Alice 10 Paris CDG->LHR Bob 100 London LCY->CDG Bob 30 Paris ORY->HND Carol 200 London LHR->SFO Dan 20 Tokyo HND->LHR
This particular problem of removing duplicates while sorting can be also be accomplished by typing
C-x h
followed byM-x sort-lines RET
and thenC-x h
followed byM-x delete-duplicate-lines
. Nevertheless, it is useful to know that we can execute arbitrary shell commands on a region.
Sorting Paragraphs and Pages
We have covered most of the sorting commands mentioned in the Emacs manual in the previous section. Now we will switch gears and discuss a few more of the remaining ones. We will no longer sort individual lines but paragraphs and pages instead.
-
First create a buffer with the content provided below. Note that the text below contains three form feed characters. In Emacs, they are displayed as
^L
. Many web browsers generally do not display them. The^L
symbols that we see in the text below have been overlayed with CSS. But there are actual form feed characters next to those overlays. If you are viewing this post with any decent web browser, you can copy the text below into your Emacs and you should be able to see the form feed characters in Emacs. In case you do not, insert them yourself by typingC-q C-l
.Emacs is an advanced, extensible, customisable, self-documenting editor. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs. At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window. The cursor in the selected window shows the location where most editing commands take effect, which is called point. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point.
-
Our text has six paragraphs spread across three pages. Each form feed character represents a page break. Type
C-x h
followed byM-x sort-pages RET
to sort the pages alphabetically. Note how the second page moves to the bottom because it begins with the letter "W". The buffer now looks like this now:Emacs is an advanced, extensible, customisable, self-documenting editor. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. The cursor in the selected window shows the location where most editing commands take effect, which is called point. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs. At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window.
-
Finally, type
C-x h
followed byM-x sort-paragraphs
to sort the paragraphs alphabetically. The buffer looks like this now:At the very bottom of the frame is an echo area. The main area of the frame, above the echo area, is called the window. Emacs editing commands operate in terms of characters, words, lines, sentences, paragraphs, pages, expressions, comments, etc. Emacs is an advanced, extensible, customisable, self-documenting editor. If you are editing several files in Emacs, each in its own buffer, each buffer has its own value of point. The cursor in the selected window shows the location where most editing commands take effect, which is called point. We will use the term frame to mean a graphical window or terminal screen occupied by Emacs.
References
To read and learn more about the sorting commands described above refer to the following resources:
Within Emacs, type the following commands to read these manuals:
M-: (info "(emacs) Sorting") RET
M-: (info "(elisp) Sorting") RET
Further, the documentation strings for these commands have useful
information too. Use the key sequence C-h f
to look up
the documentation strings. For example, type C-h f
sort-regexp-fields RET
to look up the documentation string
for the sort-regexp-fields
command.