Friday, March 11, 2016

Search and Replace in a MS Word document

I reviewed options for editing MS Word files in DOCX format and decided that the free and one that has the greatest chance of existing in 10 years is just using the Open XML Word Processing SDK (see my review for details).

For complex stuff maybe a different choice would make sense. However, my requirements are simple much like a mail merge:

  1. Taking an existing MS Word file (DOCX format) as input. (Use it as a template)
  2. Search the MS Word file for some placeholders / tags and replace with real data, but don't save any changes to the original file since it is my template.
  3. Be able to write changes to a new file or stream file to browser for download
As it turns out this can be done in very few lines of code and for FREE. Below is my solution.

// Sample command line application
static void Main(string[] args)
{
    string filename = "Test.docx";
    var oldNewValues = new Dictionary<string, string>();
    oldNewValues.Add("pear", "banana");
    oldNewValues.Add("love", "like");
    byte[] returnedBytes = SearchAndReplace(filename, oldNewValues);
    File.WriteAllBytes("Changed" + filename, returnedBytes);

}


// Does a search and replace in the content (body) of a MS Word DOCX file using only the DocumentFormat.OpenXml.Packaging namespace.
// Reference: http://justgeeks.blogspot.co.uk/2016/03/how-to-do-search-and-replace-in.html
public static byte[] SearchAndReplace(string filename, Dictionary<string, string> oldNewValues)
{
    // make a copy of the Word document and put it in memory.
    // The code below operates on this in memory copy, not the file itself.
    // When the OpenXml SDK Auto saves the changes (that is why we don't call save explicitly)
    // the in memory copy is updated, not the original file.
    byte[] byteArray = File.ReadAllBytes(filename);
    using (MemoryStream copyOfWordFile = new MemoryStream())
    {
        copyOfWordFile.Write(byteArray, 0, (int)byteArray.Length);

        // Open the Word document for editing
        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(copyOfWordFile, true))
        {

            // Get the Main Document Part. It is really just XML.
            // NOTE: There are other parts in the Word document that we are not changing
            string bodyAsXml = null;
            using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
            {
                bodyAsXml = sr.ReadToEnd();
            }

            foreach (var keyValue in oldNewValues)
            {
                string oldValueRegex = keyValue.Key;
                string newValue = keyValue.Value;

                // Do the search and replace. Here we are implementing the logic using REGEX replace.
                Regex regexText = new Regex(oldValueRegex);
                bodyAsXml = regexText.Replace(bodyAsXml, newValue);
            }

            // After making the changes to the string we need to write the updated XML string back to
            // the Word doc (remember it is in memory, not the original file itself)
            using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
            {
                sw.Write(bodyAsXml);
            }
        }

        // Convert the in memory stream to a byte array (binary data)
        // NOTE: These bytes can be written to a physical file or streamed back to the browser for download, etc.
        byte[] bytesOfEntireWordFile = copyOfWordFile.ToArray();

        return bytesOfEntireWordFile;
    }
}



After calling the SearchAndReplace method you have the bytes that make up the MS Word file. It is up to you what you want to do with it. You can save it a file or stream it to the browser when a user clicks a link to download a file.

To write the file to another file (leaving the original unchanged), use the following line:

File.WriteAllBytes(newFilename, returnedBytes);


To stream the bytes back to a browser via a Action method in a ASP.NET MVC controller, use the following:

return File(returnedBytes, "application/msword", "Filename to download as here.docx");


NOTE: Original code inspired by this MSDN example and additional a post or other page I can't remember (sorry).

No comments: