Convert Many Word Documents to ASCII At Once

In a recent tip, we showed you how to convert a Microsoft Word document to plain ASCII text using a CygWin command called CATDOC. At the end of that article, we promised to show you how to do multiple conversions in one fell swoop. Well, hang on tight, because this will go lickety-split.

This job requires the help of a small shell script, as follows:

#!/bin/bash

# Finds all Word Document files (i.e. named
# *.doc) that are at or below the current
# folder and converts them to plain ASCII
# text, by adding '.txt' to the file name.
# For example, Research.doc becomes
# Research.doc.txt (if Research.doc.txt
# already exists, it will be overridden).
# The original Word document is left alone.

for f in $( find . -name '*.doc' ); do
catdoc -w $f > $f.txt
done

Open your favorite text editor, start a new file, and paste the above text into it. Save the file in your CygWin home directory, using the name “catdoc_all”, or whatever you like. (On my system, I installed CygWin in C:\sys\cygwin, therefore my home directory is C:\sys\cygwin\home\craig.) At this point, you should be all set.

Running the Script: To run the conversion, first navigate to the parent folder that contains all of the word documents to be converted using the CD commands. Then, run your new script by typing “. ~/catdoc_all” (without the quotes) — That’s a period, followed by a space, followed by a tilde, followed by a slash, followed by the name of your script. Hit Enter, and your’re done. Ta da! It’s that easy.

If you care to know why this works, then keep reading.

The Source Command: When a period is the first character on a command line, CygWin (bash) assumes that to be a shortcut for the “source” command. In other words, it specifies the source file of a command script to be played. (A period anywhere else in a command line usually refers to the current directory.) The tilde refers to your home directory, and the slash is, of course, the standard separator between a directory and a file name. So this tells CygWin to read the contents of the catdoc_all file that is located in your home folder and execute the commands therein.

How the Script Works: The first line of the script (#!/bin/bash) ensures that the bash shell will be the one to interpret this script. That’s optional, if you know for certain that the bash shell will always be the current one running.

The lines that begin with a pound sign (#) are merely comments. bash ignores those.

The meat of this script consists of three parts, a FIND command (“find . -name ‘*.doc’ “) that locates all of the Word document files to be converted, a for-loop (“for f in $( … ); do … done”) that iterates over those findings, and the CATDOC command (“catdoc -w $f > $f.txt”) that processes each one.

When the shell interpreter sees a for-loop, it looks ahead to the “in” part, in this case the find command, and executes that first. The find command, as written here, searches starting in the current directory (.) for all files with a name that matches the pattern “*.doc”. The result, a potentially long list of specific filenames, is then processed by the for loop. The “f” specified in the for-loop (i.e. between the “for” and the “in” is known as a control variable. We named ours “f” (short for “filename”), but we could have name it anything. Each time through the loop this “f” variable takes on the value of the next file name in the list. In other words, it is a placeholder for the real filenames.

Finally, the “$f” notation within the CATDOC command (“catdoc -w $f > $f.txt”) tells bash to do placeholder substitution using whatever the current filename is that is associated with the f variable.

For example, say that the current folder has two subfolders called “research” and “publish”, and that they each have one word document apiece: “data.doc” and “results.doc”, respectively. Then, the find command will find the two files (research/data.doc, and publish/results.doc) and hand those two names off to the for-loop. The for-loop will therefore act twice, once for each name. The first time through, the f variable will be equated to the first of the two names (research/data.doc). For each name, it executes the CATDOC command. Just before executing it, however, the two places that specify $f gets substituted with the actual filename (research/data.doc). So, what actually gets executed is “catdoc -w research/data.doc > research/data.doc.txt”. The for-loop then repeats in order to process the second name: “catdoc -w publish/results.doc > publish/results.doc.txt”.

And there we have it.

Convert Many Word Documents to ASCII At Once

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112