github twitter linkedin hackernews goodreads angellist keybase email pgp rss

Efficiently Browsing Text or Code in Emacs

Searching a large code base or text dumps for sections of interest can be frustrating. This article walks you through using Emacs to make the task easy. Sometimes grep just isn’t enough.

Scenarios

Let’s search the source code of Python 2.7.2 for something interesting. You may find yourself in similar circumstances for reasons like:

  1. You inherited an enormous project and you have to fix an elusive bug or ten. The website is down and your company is hemorrhaging money while you mainline coffee and search the code.
  2. You are wading through thousands of emails to locate one important email address. The fate of a million dollar deal hangs in balance. All you know is that the person said ‘phone’ and ‘sergei’.
  3. You decide to scan all of popular classical literature for a certain pattern of word usage, to make an important point in your thesis.
  4. You are a lawyer in the discovery stage of a case and you need to know why anyone involved ever wrote ‘ouch’, ‘shred’ or ‘insider trading’ in the last 10 years.
  5. You just need to wade through a lot of text.

Alternate Methods

First let’s take look at how you’d accomplish this without using Emacs and then work our way up to a full solution.

You are interested in the keyword ‘tty’. You think this is a good place to start your investigation. So you list the lines of code that contain the word ‘tty’. So you type:

grep -r "tty" *

Now you see all the lines that contain ‘tty’ but nothing else. That doesn’t tell you a lot. You want to see those lines with some context. So you type:

grep -r "tty" -5 *

This displays 5 lines surrounding the lines that contain ‘tty’. Now you see how tty is used in each file.

When you are scrolling down, scanning the output, you see something that interests you and you decide to take a closer look at that file. Now you have to make a note of the file name in the output and open it. Not that convenient but still doable. As you do this multiple times it quickly gets frustrating.

You see that tty occurs in thousands of places. You decide that you are not going to look at every single occurrence of it. You decide to narrow your search to files that contain both ‘tty’ and ‘ioctl’, possibly on different lines. So you run:

#!/usr/bin/env bash
find . | while read -r f
do
  grep -q "tty" "$f" && grep -q "ioctl" "$f" && ls -lR "$f"
done

This narrows things down and you get a list of files that contain both the keywords.

You want to view all these files, so you run:

#!/usr/bin/env bash
find . | while read -r f
do
  grep -q "tty" "$f" && grep -q "ioctl" "$f" && cat "$f" >> tmp.txt
done

and view the file tmp.txt. This may be okay but if you concatenate source files containing programs written in different languages, you can’t get good syntax highlighting while reading the file. If you copied just the files of interest to a directory, viewing each file can be a hassle.

Using Emacs

Egrep

You run open eshell with M-x eshell and type:

egrep -r "tty" -5 *

You can then scan the output and view just the files that interest you, by pressing ‘Enter’ with your cursor on the output line that caught your attention. Emacs will place your cursor in the line that you are interested in.

Emacs TTY

This becomes even easier when you split your Emacs window vertically into two buffers with C-x 3. You quickly move to different lines of interest in the eshell buffer and press enter to view the file in the other buffer.

Virtual Dired

You decide to use the same method (shell script) used earlier to narrow the search-space by only looking at files that have both the keywords you are looking for. So you run this:

#!/usr/bin/env bash
find . | while read -r f
do
  grep -q "tty" "$f" && grep -q "ioctl" "$f" && \
  ls -lR "$f" >> /tmp/listing.txt
done

The output of this command looks awfully similar to dired. dired makes viewing and otherwise manipulating files incredibly easy. You already use dired+ as that gives you several neat additional features. Won’t it be nice if you can create a custom dired buffer with the output of the shell command you ran above?

ls sample output

You do exactly this with virtual-dired. You capture the output of the shell command in a file and open it in Emacs and run M-x virtual-dired. Now you have a custom dired buffer.

virtual dired+

Now you can quickly view the files.

virtual dired+ in use

A Pinch of Elisp

Your search narrowed the file list to just 11. For a different search, even with such filtering, you can end up with 40 or more files to look at. Even with dired+ giving you a nice interface, you’ll quickly get tired of viewing whole files and searching through it for the keywords of interest. For one, you have to type in the search string on opening each file.

So you decide to write some elisp to make it easy.

(key-chord-define-global "fo" 'occur-kw)
;(global-set-key [f3] 'occur-kw)

(defun occur-kw ()
    (interactive)
    (occur "tty\\|ioctl" 5)
    ;(switch-to-buffer "*Occur*")        
    (other-window 1)
    )

Now your work-flow is: You press enter on a line in the dired+ buffer, which opens the file in the other buffer. Then you press the keys ‘f’ and ‘o’ together to narrow the displayed lines to just the lines that interest you. If you are not comfortable with using key chords, you’ll just map the function to a different key instead. In the above example, it is mapped to f3.

‘occur’ called with argument 5 shows five lines of context around the lines of interest. In this case, the lines of interest are the lines with the words ‘tty’ or ‘ioctl’.

customized occur

This makes it a lot easier. However there is still one annoyance left. After you are done scanning a file, you need to switch back to the dired buffer. Often it won’t be next in the buffer list as you would have moved to different buffers to take notes etc. So you can’t just C-x b back to it. You solve this with:

(global-set-key [f4] 
                (lambda () 
                  (interactive) 
                  (switch-to-buffer "listing.txt")
                  ))

A Smidgen of Macros

You pamper yourself by tying the steps together with an Emacs keyboard macro.

You record a macro of you doing these steps:

C-x ( (start recording macro)

  1. f4 - takes you to the virtual-dired buffer
  2. C-n - takes you to the next line, which is the next file to be reviewed
  3. enter - opens the next file
  4. f3 - narrows the buffer to only the lines that interest you

C-x ) (stop recording macro)

You assign this macro to f5.

Your work-flow now consists of you just hitting f5 and Page Down.

Conclusion

Whenever you have to perform a repetitive task, automating even the smallest step really helps. In addition to saving time and reducing your clients’ expenses, the tools you build decrease tedium and thereby reduce errors caused by frustration and fatigue. Once built, the tools become a part of your Batman utility belt. Emacs lends itself well to tool-building and customizations that can make you extremely productive.

Thanks Nick, Stevey, Dru, Rob, Paul, Mani, Vic, Ed, Senthil and Meena for reviewing this.

View comments on HN