Monday, January 07, 2019

Vim (for non-programmers) Chapter O (NOT 0), recipes which are quick and dirty, example two: Dumping Out the Recommendations from IDEOTVPod into One File

(Or, more generally: pulling specified chunks of text from one million semi-well-structured text files, a goal probably often shared by many.)

I had been asked to help a listener pull all the recommendations from my podcast, I Don't Even Own a Television, along with what episode they came from. Since I write the show notes that go on the website, I figured I might have a quicker way to do this than manually going through every post on the site to do the ol' copy-paste. Below is what I came up with. Note: I use Ubuntu at home. The described process works because my laptop is where all the files are, and because, as we'll see, all the relevant files are set up and named the same way.

  1. Navigate to the correct directory
  2. Load all specified files into Vim's argument list—this is a pretentious way of saying "open all the files in Vim at the same time":
    From the command line, run: vim ep_*
    (This works for me because I am a martinet about file naming conventions)
  3. Type :argdo :execute "normal @m"

But what does that even mean? As usual, we have to look at this from right to left (or from inside to out).

To begin with, @m is a macro. Typing @m just means "do a sequence of key-presses that has been specified and then saved to the register m". Or at least that's what typing @m means when you're in Normal mode, which is why we specify :execute ":normal" before the macro. So there's a lot going on with :execute ":normal", but most of it I don't actually understand, so let's skip it for now, because it's complicated, and go to the macro.

The macro is a recording of key-presses. This recording can be of pretty much anything. If you have a repetitive task, a macro can make it much less repetitive. If you have a repetitive task that involves a lot of tricky typing, that's an even more enticing opportunity to use a macro. (In general, macros don't seem to get a lot of use in the Skilled Vim User community, but I think they're often a good way to do a task you have to do a whole bunch of times one day, and maybe not ever again on any other day.) What we have here is:

  • @m = /\d<cr>"By<cr>/Reco<cr>V/\/ul<cr>"By
  • / = search*
  • \d = for the first digit**
  • <cr> execute the search and put the cursor at the result of the search
  • "By = append to the b register the entire line that first digit was on—the functioning of this is a little obscure to me, because it copies the entire line when I don't think it should, but it works, so whatever.
    y means "yank (Vim slang for 'copy')", " means "look for a named register", and B means "use the b register, but I'm capitalized, so append what you're yanking to the end of whatever's in this register, instead of overwriting what's in it"
  • /Reco<cr> = search for the string "Reco" without the quotes and go there
  • V = visually select the entire line
  • /ul<cr> = search for the closing tag of the unordered list that appears in the html file under the heading "Recommendations"—doing this after entering Visual mode will extend the selecting over that whole span: the prefixed \ before the /ul is needed otherwise you can't search for the /, and hitting return, as usual, fixes the selection, and sets us up for the culminating
  • "By, which is another "append this selection to the register we've been working with"

* (for whatever is typed after the slash (in normal mode, which, remember, we specified we'd be in already; otherwise one could preface this with <esc>, which I would normally do just out of muscle memory, honestly))
** \d is of course a regular expression for "digit"

NOTE: for the sake of hygiene, this should almost certainly start off with gg to go to the top of the file, as Vim can be asked to save the last location of the cursor for a given file. So let's pretend I did that.

Finally, the :argdo bit we started with just means "do the following for every argument Vim has right now". In our case, since we opened Vim with all the files we wanted, in Step Two above, this is all the files.

Basically, then, we have one trick, done twice. That trick is "make the computer do the same tedious thing over and over again". We ask the computer to do this first, in one file, as a macro: basically, this automates the process of looking at an open file, searching for and copying the episode name / number into a new place, then searching for and copying the Recommendations section underneath it. (The macro is a dense but plausible reconstruction of the searches/copying one might reasonably do here. The only weird part is dumping everything into a named register instead of an external file.) Second, we then ask the computer to run that macro once per file for a whole pile of files: that's the :argo :execute "normal @m" bit, once the files are loaded into Vim. As for loading them into Vim, that's an operating system task. The cool part is having two separate ways to repeat: the macro and the :argdo command.

Once everything was done, I just pasted the contents of the b register into a text file and emailed it off into the world. If I needed to do this again, I would probably (look up how to) redirect the text I was selecting into a log file somewhere, and skip the register step.

It's debatable whether this was actually quicker than going through a couple year's worth of files. But at least I learned some stuff, and the work I was doing in doing so was much more interesting and engaging than just manually cutting and pasting a whole bunch of stuff a whole bunch of times. Learning :argdo alone will probably make my life a lot easier in future...

Previous entries in Vim (for non-programmers):

No comments: