Richard Hollis’s Henry van de Velde: The Artist as Designer is out at long last. A lot of love, sweat, and tears has gone in to that book. It is absolutely jam packed, covering pretty much all of HvdV’s life with over 400 images. As part of Occasional Papers, I worked on the permissions, a bit of editing, and compiled the index.
I ended up writing a Python script to automate the initial index compilation on the command line. I’d add the Python script below but unfortunately I think I lost it in a dumb backup mishap. This is the gist of the script:
Given a CSV of terms (including [Surname, First name] combos) and a PDF of the manuscript, search each page of the PDF for each term. Once the index has been compiled, create a plain text document of the index as well as a new CSV with columns for the term, the page numbers, and whether or not the term is “flagged”. Flag-worthy terms could include terms with less than 5 characters (these may be found as part of larger words), multiple terms that share the same first word (i.e. people with the same surname), or terms that have been found on an unusually high number of pages.
Once I had the new CSV and text doc, I completed a manual edit beginning with flagged terms and then checked it more generally for outliers that may not have been flagged, particularly nicknames or similar alternates.
The script wasn’t gorgeous (a lot of regular expression tomfoolery) but it did generate a reasonable first pass very quickly. This was useful since I had to redo the index once or twice when the pages reflowed. It probably saved me between 4 and 8 hours of work (this is a dense book!).
I’m annoyed that I’ve probably lost it and will keep looking for it, but I don’t have high hopes.