Monthly Archive for May, 2007

Editing Textile in VIM

I love Textile. All of my blogs use Textile. In some cases I have even extended Textile to provide additional features, making it a simple matter to add programmatically complex functions with a few extra characters.

Sometimes, though, I like to write my articles off-line and post them later. When I do, I typically use either TextMate in OS X, or Vim (wherever I am, sometimes in an SSH session). Unfortunately, though, there isn’t a lot of support for Textile in VIM. But after a little Googling, I found some.

About a year and a half ago, Dominic Mitchell posted his own Textile syntax file for Vim on his blog. I snapped it up and started using it. There is a slight problem, though, in the way it handles Textile URLs. Apparently thanks to the regex pattern for URL strings lifted from RFC 2396, it allows spaces in URLs, which causes the highlighting to continue after the URL ends.

I’ve applied a fix to the script and posted it here in case anyone (other than me) is actually interested in this sort of thing. Thanks much to Dominic Mitchell for creating this syntax file in the first place; I was not relishing the thought of diving in and creating one myself.

Revised Textile syntax file for VIM

If you wish to use the *.textile extension, as TextMate does, to indicate which files contain Textile syntax, you will want to add the following line to your filetypes.vim, which is located wherever your ftplugin directory is located (on my machine it’s /usr/share/vim/vim62/, but it will be different per-installation and wildly different in Windows):

au BufNewFile,BufRead *.textile setf textile  

Bash Pattern Matching and Replacement

Bash is my favorite shell by far. I have known an occasional csh or tcsh aficionado, and I’m sure that those shells have their own virtues, but for me, bash is the end-all be-all of command line interfaces. Come with me now as I wax nerdy all over bash’s pattern matching capabilities.

At work, I often have to make changes to a number of files that all have the same name, but that are stored in a variety of different folders. The challenge is quickly uploading all of these changed files to the server. Because they are in different folders, I would normally have to drag them, one at a time, into the FTP client.

My solution was to write a bash script that would pull these files together into a “staging” folder on my local drive, mirroring the directory structure they live in so that I can drag the entire tree into my FTP client and let it do its work. I know ahead of time what the directory names will be; they are numbers representing the sites. Here is what a typical folder structure might look like:

ROOT
    folder_1
        660
            folder_2
                file.txt
        671
            folder_2
                file.txt  

What I want to be able to do is simply indicate that I want all of the file.txt files to be staged onto my hard drive and have bash go and find the 660, 671, etc. folders based on their numbers and copy them. This is probably more background information than you need to appreciate the bash tricks I’ve used, but it helps to know what on Earth the script is supposed to do.

The concept of the script is to take a list of files (with relative paths) and for each one try to find one of the preemptively known site numbers in our list. Once we know that number, we can loop through all of the numbers, replacing the original number with each new one and using those generated paths to copy the files to my local drive. That way, we end up with a copy of each file that lives in a structure containing each of the numbers in our list.

For me, the challenge was knowing how to evaluate the matching and stem the paths directly within bash. I wanted to avoid writing this in Ruby and I also wanted to avoid doing a lot of shell execution to run things like tr or awk. Here is the code (implemented as a bash function):

export SITES="660 671 672 685 730 761"

function stagefile() {
  if [ -z "$1" ]; then
    echo "You must provide a file or files to stage."
    return 0
  fi

  for file in $*; do
    for site in $SITES; do
      if [[ "$file" == *$site* ]]; then
        FILESITE="$site"
      fi
    done

    for site in $SITES; do
      NEWFILE="${file//$FILESITE/$site}"
      NEWDIR="${NEWFILE%/*}"
      cp -v --parents "$NEWFILE" "/c/Staging"
    done
  done
}  

The first little bash gem that I discovered is the ability to use the [[ ... ]] syntax to perform a string matching test. When you use the == operator, the right side is used as a pattern to match against the left side using bash “glob” syntax. In this example, I am just checking each site number to see if it occurs within my original path. If and when I find one of them, I save it in FILESITE.

The next step is to loop through all of the site numbers. With each one, I replace all occurrences of the originally found number with the current one, use that path to go find a file and then copy it to my local drive using cp’s parents switch, which causes cp to create directories as necessary to duplicate the structure.

The trick to parents is that the destination must be a directory, so I also use bash replacement to trim off the filename from the path given (as well as the trailing forward slash).

The replacement trick I used is ${file//$FILESITE/$site}, which will take the contents of $file and replace all occurrences of $FILESITE with the value of $site. The syntax is reminiscent of good old fashioned s//, and the first double slash simply means “replace all.” A single slash there would only replace the first occurrence (from left to right).

Then, to remove the file from the end of the path, I used ${NEWFILE%/*}, which takes the contents of the variable $NEWFILE and removes the part that matches /*. There are two forms of this replacement operation: the first uses a percent sign (as in my example) and searches from the end of the string, whereas the second uses a number (or pound or hash) sign and searches from the beginning. Using a single symbol is non-greedy and using two of them is greedy. So, to summarize, %/* searches from the end of the string and non-greedily removes characters. The pattern starts with a forward slash, so it stops matching when it comes to the last forward slash in the string.

I can demonstrate some of these techniques using a little bash interactive example:

$ MYFILE="/usr/local/share/temp.txt"
$ echo $MYFILE
/usr/local/share/temp.txt
$ echo "${MYFILE%/*}"
/usr/local/share
$ echo "${MYFILE%%/*}"

$ MYFILE="usr/local/share/temp.txt"
$ echo "${MYFILE%%/*}"
usr  

I think this replacement functionality is one of the least-known and most commonly useful tricks in anyone’s bash toolbox and I’m surprised at how little coverage it receives in the “bash scripting primers” out there. Most useful is the ability to make sure that a given path does or doesn’t end with a trailing slash, which is instrumental in concatenating paths, especially when dealing with user-supplied arguments.

Let’s say you will receive a path fragment as an argument and you need to append a filename to the end of it, but you don’t know whether the fragment will end with a slash. Here’s how you might do it:

function appendpath() {
  echo "${1%/}/file.txt"
}  

Of course, this function will not work correctly if the user-supplied path ends with a whole string of forward slashes, but it handles the two main use cases: one slash or no slashes. The replacement simply says “remove a slash from the end of the variable called 1,” which does nothing if the slash isn’t there. Then you can confidently add your own slash and the filename. Give it a try!

Resolve 750 Domains in One Line

Let’s say, just for example, that you have a (*NIX-formatted) text file filled with 750 domain names and you want to resolve them all to IP addresses. Let’s also assume that you’re using Windows XP and you have access to Cygwin but that’s about it. Cygwin apparently doesn’t have dig yet, so you’re forced to use nslookup. What do you do?

First, let’s take a look at the output of nslookup when you resolve a single domain.

$ nslookup www.slashdot.org
Non-authoritative answer:
Server:  dnsr1.sbcglobal.net
Address:  68.94.156.1

Name:    www.slashdot.org
Address:  66.35.250.151  

Okay, that’s a multi-line answer. Looking through the man page, I couldn’t find a way to limit the output to the IP address, so we’ll have to use bash trickery to make it happen. You’re smart people, here’s my solution with absolutely no ado.

nslookup www.slashdot.org 2>&1 | grep Address | tail -1 | awk '{print $2}'  

I had to use the 2>&1 piece because (apparently) the “Non-authoritative answer:” portion of the output is printed to STDERR. Why? Who knows. Using 2>&1 gloms the STDERR output onto STDOUT so it can be filtered out by the following grep. Then, tail -1 to get the last “Address” line printed, and then awk to print the address rather than the heading.

But wait, there’s trouble! Try Google:

$ nslookup www.google.com
Non-authoritative answer:
Server:  dnsr1.sbcglobal.net
Address:  68.94.156.1

Name:    www.l.google.com
Addresses:  64.233.161.147, 64.233.161.99, 64.233.161.103, 64.233.161.104
Aliases:  www.google.com  

Our grep statement will still catch the “Addresses” line, but now the IPs have commas between them, which awk will include in its output (because it’s tokenized by spaces), so we’ll have to do away with the commas.

nslookup www.slashdot.org 2>&1 | grep Address | tail -1 | tr ',' ' ' | awk '{print $2}'  

The new tr command will translate commas into spaces, allowing awk to properly snag the IP. This gives us a single command that will take in a domain name and output an IP address (provided that nslookup succeeds). How do we feed it an entire text file? Let’s presume that our file is called domains.txt.

for i in `cat domains.txt`
    do nslookup $i 2>&1 | grep Address | tail -1 | tr ',' ' ' | awk '{print $2}'
done  

If you were running this on the command line rather than in a script, you’d want to put semicolons between the lines (or hit \ to continue input on the next line). Provided that the input file is formatted as *NIX text, for i in will pass one line at a time into the body of the block. Pretty sweet, right? Redirect the output of that whole statement into another text file and you’ll have your list of IPs!

The Other Coast

I spent the last week in California, first in Yosemite National Park and then in Santa Cruz. My friend and I went specifically to photograph, but I made sure we spent time slacking off, too. Santa Cruz is such a beautiful place, I can’t imagine why anyone would want to leave. I mean, aside from the cost of living being among the highest in the country and the place being overrun with tourists year ‘round…

This year happens to be the centennial celebration of the “Boardwalk” amusement park—one of the main attractions to the city outside of surfing—so there was much ado, much pomp, much circumstance. Also, the place was engorged with visitors from Friday afternoon until Sunday when we left. They were riding rides, eating food, pushing strollers, the whole nine yards. The weather was absolutely gorgeous and it pained me to get off the plane yesterday at Logan International in Boston to a cheery, 47-degree morning.

I managed to eat six times my weight in food (by informal calculation), walk tens of miles in only a couple of short days, take about 2,000 photographs all said, and pretty badly sunburn both of my arms and my neck (go me!). I posted a few photos from the UCSC arboretum on my photo blog, and I’ll be posting a lot more very soon.