Convert Word doc and docx format to PDF in .NET Core without Microsoft.Office.Interop

I need to display Word .doc and .docx files in a browser. There's no real client-side way to do this and these documents can't be shared with Google docs or Microsoft Office 365 for legal reasons. Browsers can't display Word, but can display PDF, so I want to convert these docs to PDF on the server and then display that. I know this can be done using Microsoft.Office.Interop.Word , but my application is .NET Core and does not have access to Office interop. It could be running on Azure, but it could also be running in a Docker container on anything else. There appear to be lots of similar questions to this, however most are asking about full- framework .NET or assuming that the server is a Windows OS and any answer is no use to me. How do I convert .doc and .docx files to .pdf without access to Microsoft.Office.Interop.Word ?

64.7k 37 37 gold badges 213 213 silver badges 335 335 bronze badges asked Oct 5, 2017 at 8:03 154k 81 81 gold badges 305 305 silver badges 446 446 bronze badges

It's like asking to convert from Word to PDF without the help of Microsoft. It's theoratically possible, but Word is such a huge application, that in the general case, it's practically impossible, Word is still the best for this. You could connect your core apps to an opaque dedicated Windows box exposing a conversion service (don't overlook licensing issues). Otherwise, if you restrict your conversion ambitions, there are some libraries that should help (aspose, itextsharp, etc.). Also, keep in mind that doc and docx are fundamentally very different formats and solutions may vary accordingly.

Commented Oct 9, 2017 at 8:19

@SimonMourier docx is (supposedly) an open format (Microsoft pushed for ages on that) but it is fairly awful - under the hood it's just a load of xml files in a zip. doc is binary, but also pretty much unchanged for 20 years and lots of parsers for the format are already out there. Office has always been a desktop app and an expensive liability on servers, I can't be the first/only person to ask for this.

Commented Oct 9, 2017 at 8:34

@SimonMourier I've used Aspose before - my team wasn't that impressed, it's crushingly expensive for what it does and it's full fat .NET, so no use here anyway. iText is good for PDF manipulation, but it's also expensive when there are plenty of PDF API that are open source.

Commented Oct 9, 2017 at 8:53

Well, looks like you have all the answers already; Indeed, you're not the only one looking for the Holy Grail :-)

Commented Oct 9, 2017 at 9:32

I don't really get the problem. There are a lot of open source implementation for these formats. You could for example get a libreoffice binary and run soffice --convert-to pdf --nologo name.docx and you would have a pdf file.

Commented Oct 9, 2017 at 11:59

9 Answers 9

This was such a pain, no wonder all the third party solutions are charging $500 per developer.

Good news is the Open XML SDK recently added support for .Net Standard so it looks like you're in luck with the .docx format.

Bad news at the moment there isn't a lot of choice for PDF generation libraries on .NET Core. Since it doesn't look like you want to pay for one and you can't legally use a third party service we have little choice except to roll our own.

The main problem is getting the Word Document Content transformed to PDF. One of the popular ways is reading the Docx into HTML and exporting that to PDF. It was hard to find, but there is .Net Core version of the OpenXMLSDK-PowerTools that supports transforming Docx to HTML. The Pull Request is "about to be accepted", you can get it from here:

Now that we can extract document content to HTML we need to convert it to PDF. There are a few libraries to convert HTML to PDF, for example DinkToPdf is a cross-platform wrapper around the Webkit HTML to PDF library libwkhtmltox.

Docx to HTML

Let's put this altogether, download the OpenXMLSDK-PowerTools .Net Core project and build it (just the OpenXMLPowerTools.Core and the OpenXMLPowerTools.Core.Example - ignore the other project).

Set the OpenXMLPowerTools.Core.Example as StartUp project. Add a Word Document to the project (eg test.docx) and set this docx files properties Copy To Output = If Newer

Run the console project:

static void Main(string[] args) < var source = Package.Open(@"test.docx"); var document = WordprocessingDocument.Open(source); HtmlConverterSettings settings = new HtmlConverterSettings(); XElement html = HtmlConverter.ConvertToHtml(document, settings); Console.WriteLine(html.ToString()); var writer = File.CreateText("test.html"); writer.WriteLine(html.ToString()); writer.Dispose(); Console.ReadLine(); 

Make sure the test.docx is a valid word document with some text otherwise you might get an error:

the specified package is invalid. the main part is missing

If you run the project you will see the HTML looks almost exactly like the content in the Word document:

enter image description here

However if you try a Word Document with pictures or links you will notice they're missing or broken.

I had to change the static Uri FixUri(string brokenUri) method to return a Uri and I added user friendly error messages.

static void Main(string[] args) < var fileInfo = new FileInfo(@"c:\temp\MyDocWithImages.docx"); string fullFilePath = fileInfo.FullName; string htmlText = string.Empty; try < htmlText = ParseDOCX(fileInfo); >catch (OpenXmlPackageException e) < if (e.ToString().Contains("Invalid Hyperlink")) < using (FileStream fs = new FileStream(fullFilePath,FileMode.OpenOrCreate, FileAccess.ReadWrite)) < UriFixer.FixInvalidUri(fs, brokenUri =>FixUri(brokenUri)); > htmlText = ParseDOCX(fileInfo); > > var writer = File.CreateText("test1.html"); writer.WriteLine(htmlText.ToString()); writer.Dispose(); > public static Uri FixUri(string brokenUri) < string newURI = string.Empty; if (brokenUri.Contains("mailto:")) < int mailToCount = "mailto:".Length; brokenUri = brokenUri.Remove(0, mailToCount); newURI = brokenUri; >else < newURI = " "; >return new Uri(newURI); > public static string ParseDOCX(FileInfo fileInfo) < try < byte[] byteArray = File.ReadAllBytes(fileInfo.FullName); using (MemoryStream memoryStream = new MemoryStream()) < memoryStream.Write(byteArray, 0, byteArray.Length); using (WordprocessingDocument wDoc = WordprocessingDocument.Open(memoryStream, true)) < int imageCounter = 0; var pageTitle = fileInfo.FullName; var part = wDoc.CoreFilePropertiesPart; if (part != null) pageTitle = (string)part.GetXDocument() .Descendants(DC.title) .FirstOrDefault() ?? fileInfo.FullName; WmlToHtmlConverterSettings settings = new WmlToHtmlConverterSettings() < AdditionalCss = "body < margin: 1cm auto; max-width: 20cm; padding: 0; >", PageTitle = pageTitle, FabricateCssClasses = true, CssClassPrefix = "pt-", RestrictToSupportedLanguages = false, RestrictToSupportedNumberingFormats = false, ImageHandler = imageInfo => < ++imageCounter; string extension = imageInfo.ContentType.Split('/')[1].ToLower(); ImageFormat imageFormat = null; if (extension == "png") imageFormat = ImageFormat.Png; else if (extension == "gif") imageFormat = ImageFormat.Gif; else if (extension == "bmp") imageFormat = ImageFormat.Bmp; else if (extension == "jpeg") imageFormat = ImageFormat.Jpeg; else if (extension == "tiff") < extension = "gif"; imageFormat = ImageFormat.Gif; >else if (extension == "x-wmf") < extension = "wmf"; imageFormat = ImageFormat.Wmf; >if (imageFormat == null) return null; string base64 = null; try < using (MemoryStream ms = new MemoryStream()) < imageInfo.Bitmap.Save(ms, imageFormat); var ba = ms.ToArray(); base64 = System.Convert.ToBase64String(ba); >> catch (System.Runtime.InteropServices.ExternalException) < return null; >ImageFormat format = imageInfo.Bitmap.RawFormat; ImageCodecInfo codec = ImageCodecInfo.GetImageDecoders() .First(c => c.FormatID == format.Guid); string mimeType = codec.MimeType; string imageSource = string.Format("data:;base64,", mimeType, base64); XElement img = new XElement(Xhtml.img, new XAttribute(NoNamespace.src, imageSource), imageInfo.ImgStyleAttribute, imageInfo.AltText != null ? new XAttribute(NoNamespace.alt, imageInfo.AltText) : null); return img; > >; XElement htmlElement = WmlToHtmlConverter.ConvertToHtml(wDoc, settings); var html = new XDocument(new XDocumentType("html", null, null, null), htmlElement); var htmlString = html.ToString(SaveOptions.DisableFormatting); return htmlString; > > > catch < return "The file is either open, please close it or contains corrupt data"; >> 

You may need System.Drawing.Common NuGet package to use ImageFormat

Now we can get images and links:

enter image description here

If you only want to show Word .docx files in a web browser its better not to convert the HTML to PDF as that will significantly increase bandwidth. You could store the HTML in a file system, cloud, or in a dB using a VPP Technology.

HTML to PDF

Next thing we need to do is pass the HTML to DinkToPdf. Download the DinkToPdf (90 MB) solution. Build the solution - it will take a while for all the packages to be restored and for the solution to Compile.

IMPORTANT:

The DinkToPdf library requires the libwkhtmltox.so and libwkhtmltox.dll file in the root of your project if you want to run on Linux and Windows. There's also a libwkhtmltox.dylib file for Mac if you need it.

These DLLs are in the v0.12.4 folder. Depending on your PC, 32 or 64 bit, copy the 3 files to the DinkToPdf-master\DinkToPfd.TestConsoleApp\bin\Debug\netcoreapp1.1 folder.

IMPORTANT 2:

Make sure that you have libgdiplus installed in your Docker image or on your Linux machine. The libwkhtmltox.so library depends on it.

Set the DinkToPfd.TestConsoleApp as StartUp project and change the Program.cs file to read the htmlContent from the HTML file saved with Open-Xml-PowerTools instead of the Lorium Ipsom text.

var doc = new HtmlToPdfDocument() < GlobalSettings = < ColorMode = ColorMode.Color, Orientation = Orientation.Landscape, PaperSize = PaperKind.A4, >, Objects = < new ObjectSettings() < PagesCount = true, HtmlContent = File.ReadAllText(@"C:\TFS\Sandbox\Open-Xml-PowerTools-abfbaac510d0d60e2f492503c60ef897247716cf\ToolsTest\test1.html"), WebSettings = < DefaultEncoding = "utf-8" >, HeaderSettings = < FontSize = 9, Right = "Page [page] of [toPage]", Line = true >, FooterSettings = < FontSize = 9, Right = "Page [page] of [toPage]" >> > >; 

The result of the Docx vs the PDF is quite impressive and I doubt many people would pick out many differences (especially if they never see the original):

enter image description here

Ps. I realise you wanted to convert both .doc and .docx to PDF. I'd suggest making a service yourself to convert .doc to docx using a specific non-server Windows/Microsoft technology. The doc format is binary and is not intended for server side automation of office.

With an EXE and Command Line:

You can convert purely with the wkhtmltopdf.exe available here: https://wkhtmltopdf.org/libwkhtmltox/

UPDATE 2:

Nick Chapsas posted this cool video "The Easiest Way to Create PDFs in .NET" and its uses QuestPDF, a free product for companies with under $1M revenue. It gives you a cool view of the PDF while you're creating it (a RAD of PDFs): https://www.youtube.com/watch?v=_M0IgtGWnvE&t=3m45s

answered Oct 10, 2017 at 4:30 Jeremy Thompson Jeremy Thompson 64.7k 37 37 gold badges 213 213 silver badges 335 335 bronze badges

Cheers, excellent answer. I think I might have the last piece of the puzzle as I found an open source .NET Mono doc > docx converter that can be ported to .NET Core.

Commented Oct 11, 2017 at 6:27

@JeremyThompson I've gotten b2xtranslator up and running in .NET Core, switched from the dedicated ZIP implementation to System.IO.Compression and fixed the weird command line tests to just use NUnit. It's still not quite there - working on getting all unit tests to pass and adding new to cover more use-cases/code. Looking for contributors if you (or anyone) are interested.

Commented Oct 12, 2017 at 6:19

@vapcguy - you didn't read the question. OP specifically can't install Office on Linux and KB257757 says office automation server side is unsupported.

Commented Dec 1, 2018 at 2:56 @JeremyThompson Yeah, you're right. :facepalm: Missed that part. Commented Dec 3, 2018 at 15:54

@BorisLipschitz I know this is old, but for the benefit of anyone else wondering, you install the System.Drawing.Common NuGet package to use ImageFormat

Commented Sep 4, 2019 at 11:44

Using the LibreOffice binary

The LibreOffice project is a Open Source cross-platform alternative for MS Office. We can use its capabilities to export doc and docx files to PDF . Currently, LibreOffice has no official API for .NET, therefore, we will talk directly to the soffice binary.

It is a kind of a "hacky" solution, but I think it is the solution with less amount of bugs and maintaining costs possible. Another advantage of this method is that you are not restricted to converting from doc and docx : you can convert it from every format LibreOffice support (e.g. odt, html, spreadsheet, and more).

The implementation

I wrote a simple c# program that uses the soffice binary. This is just a proof-of-concept (and my first program in c# ). It supports Windows out of the box and Linux only if the LibreOffice package has been installed.

using System; using System.Collections.Generic; using System.Text; using System.Diagnostics; using System.Reflection; namespace DocToPdf < public class LibreOfficeFailedException : Exception < public LibreOfficeFailedException(int exitCode) : base(string.Format("LibreOffice has failed with <>", exitCode)) <> > class Program < static string getLibreOfficePath() < switch (Environment.OSVersion.Platform) < case PlatformID.Unix: return "/usr/bin/soffice"; case PlatformID.Win32NT: string binaryDirectory = System.IO.Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location); return binaryDirectory + "\\Windows\\program\\soffice.exe"; default: throw new PlatformNotSupportedException ("Your OS is not supported"); >> static void Main(string[] args) < string libreOfficePath = getLibreOfficePath(); // FIXME: file name escaping: I have not idea how to do it in .NET. ProcessStartInfo procStartInfo = new ProcessStartInfo(libreOfficePath, string.Format("--convert-to pdf --nologo ", args[0])); procStartInfo.RedirectStandardOutput = true; procStartInfo.UseShellExecute = false; procStartInfo.CreateNoWindow = true; procStartInfo.WorkingDirectory = Environment.CurrentDirectory; Process process = new Process() < StartInfo = procStartInfo, >; process.Start(); process.WaitForExit(); // Check for failed exit code. if (process.ExitCode != 0) < throw new LibreOfficeFailedException(process.ExitCode); >> > > 

Resources

Results

I had tested it on Arch Linux, compiled with mono . I run it using mon and the Linux binary, and with wine : using the Windows binary.

You can find the results in the Tests directory:

Outputs:

answered Oct 12, 2017 at 11:01 2,486 1 1 gold badge 19 19 silver badges 29 29 bronze badges

Note that libreoffice will not be able to properly convert office documents that use proprietary fonts (had the Issue with verdana If I recall correctly) unless they are installed on the OS. Other than this font issue, didn't have much issue with it.

Commented Dec 3, 2020 at 10:52

Just be super careful with handling the filename if it is a user-input value. This can lead to code execution on your server.

Commented Feb 27, 2021 at 4:52

Thank you, you saved my day! It's unbelievable how nothing free and well working can be found anywhere, only paid software or services for such a common task

Commented Jun 15, 2021 at 14:23

I've recently done this with FreeSpire.Doc. It has a limit of 3 pages for the free version but it can easily convert a docx file into PDF using something like this:

private void ConvertToPdf() < try < for (int i = 0; i < listOfDocx.Count; i++) < CurrentModalText = "Converting To PDF"; CurrentLoadingNum += 1; string savePath = PdfTempStorage + i + ".pdf"; listOfPDF.Add(savePath); Spire.Doc.Document document = new Spire.Doc.Document(listOfDocx[i], FileFormat.Auto); document.SaveToFile(savePath, FileFormat.PDF); >> catch (Exception e) < throw e; >> 

I then sew these individual PDFs together later using iTextSharp.pdf:

public static byte[] concatAndAddContent(List pdfByteContent, List localList) < using (var ms = new MemoryStream()) < using (var doc = new Document()) < using (var copy = new PdfSmartCopy(doc, ms)) < doc.Open(); // add checklist at the start using (var db = new StudyContext()) < var contentId = localList[0].ContentID; var temp = db.MailContentTypes.Where(x =>x.ContentId == contentId).ToList(); if (!temp[0].Code.Equals("LAB")) < pdfByteContent.Insert(0, CheckListCreation.createCheckBox(localList)); >> // Loop through each byte array foreach (var p in pdfByteContent) < // Create a PdfReader bound to that byte array using (var reader = new PdfReader(p)) < // Add the entire document instead of page-by-page copy.AddDocument(reader); >> doc.Close(); > > // Return just before disposing return ms.ToArray(); > > 

I don't know if this suits your use case, as you haven't specified the size of the documents you're trying to write, but if they're < 3 pages or you can manipulate them to be less than 3 pages, it will allow you to convert them into PDFs.

As mentioned in the comments below, it is also unable to help with RTL languages, thank you @Aria for pointing that out.

answered Sep 1, 2018 at 10:39 347 4 4 silver badges 9 9 bronze badges

Just to clarify because you didn't mention it. "Spire.Doc" leaves a red "warning evaluation" watermark at the top of the converted PDF. When searching on Nuget, look for "FreeSpire.Doc", this version does not contain the watermark. Nice API, this should be marked as the answer imo.

Commented Jan 7, 2019 at 21:34

Yeah, that's what I did, sorry i should of been more specific. Hopefully this answer helped you out a little!

Commented Jan 8, 2019 at 8:03 I'm using FreeSpire.Doc and still getting the eval warning. Commented Feb 27, 2019 at 1:09

@MarioZ Thanks for your comment, I already resolved the problem by installing Microsoft Word for those servers they need this feature but in future we may change it, I was enjoyed to have conversation with you. thank you for providing useful link.

Commented May 18, 2020 at 8:20

Free Spire.Doc definitely is the ultimate free solution for making a preview image or thumbnail of the 1st page (up to 3 pages with the free version). It worked both for doc and docx file types, the result was very accurate, including pictures and watermarks, formats, and in addition, very fast! @Bomie should be awarded a medal only for this finding!

Commented Aug 22, 2022 at 12:18

If you have no trouble using containerized solution (Docker), there is a very good project out there:

I did give it a try before. It already uses LibreOffice for docx to pdf but it has many more features. Plus it's a stateless dockerized api, which is self sufficient.

answered Jul 7, 2022 at 18:58 470 4 4 silver badges 12 12 bronze badges

Sorry I don't have enough reputation to comment but would like to put my two cents on Jeremy Thompson's answer. And hope this help someone.

When I was going through Jeremy Thompson's answer, after downloading OpenXMLSDK-PowerTools and run OpenXMLPowerTools.Core.Example , I got error like

the specified package is invalid. the main part is missing 
var document = WordprocessingDocument.Open(source); 

After struggling for some hours, I found that the test.docx copied to bin file is only 1kb. To solve this, right click test.docx > Properties , set Copy to Output Directory to Copy always solves this problem.

Hope this help some novice like me :)

answered Aug 7, 2019 at 7:32 Samuel Leung Samuel Leung 85 2 2 silver badges 11 11 bronze badges

For converting DOCX to PDF even with placeholders, I have created a free "Report-From-DocX-HTML-To-PDF-Converter" library with .NET CORE under the MIT license, because I was so unnerved that no simple solution existed and all the commercial solutions were super expensive. You can find it here with an extensive description and an example project:

You only need the free LibreOffice. I recommend using the LibreOffice portable edition, so it does not change anything in your server settings. Have a look, where the file "soffice.exe" (on Linux it is called differently) located, because you need it to fill the variable "locationOfLibreOfficeSoffice".

Here is how it works to convert from DOCX to HTML:

string locationOfLibreOfficeSoffice = @"C:\PortableApps\LibreOfficePortable\App\libreoffice\program\soffice.exe"; var docxLocation = "MyWordDocument.docx"; var rep = new ReportGenerator(locationOfLibreOfficeSoffice); //Convert from DOCX to PDF test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(docxLocation), "Test-Template-out.pdf")); //Convert from DOCX to HTML test.Convert(docxLocation, Path.Combine(Path.GetDirectoryName(docxLocation), "Test-Template-out.html")); 

As you see, you can also convert from DOCX to HTML. Also, you can put placeholders into the Word document, which you can then "fill" with values. However, this is not in the scope of your question, but you can read about that on Github (README).

answered Aug 16, 2019 at 11:56 Smart In Media Smart In Media 73 5 5 bronze badges

I have a couple of questions: 1. Are there any known issues when used in production for avg load of 10 docx to pdf conversions per minute? 2. The portable libreoffice is about 1 GB. Can you indicate which folders / files can be removed to make it lighter without affecting the functionality?

Commented Apr 14, 2020 at 12:40 It is also taking more than 10 seconds to do the conversion. Is it normal? Commented Apr 14, 2020 at 14:32 Commented Oct 31, 2022 at 11:16

An alternate solution could be implemented if you have access to office 365. This has less limitations than my previous answer but requires that purchase.

I get a graph API token, the site I'm wanting to work with and the drive I'm wanting to use.

After that i grab the byte array of the docx

 public static async Task GetByteArrayOfDocumentAsync(string baseFilePathLocation)

This stream is then uploaded to the graph api using a client setup with our graph api token via

 public static async Task UploadFileAsync(HttpClient client, string siteId, MemoryStream stream, string driveId, string fileName, string folderName = "root") < var result = await client.PutAsync( $"https://graph.microsoft.com/v1.0/sites//drives//items/:/:/content", new ByteArrayContent(stream.ToArray())); var res = JsonSerializer.Deserialize(await result.Content.ReadAsStringAsync()); return res.id; > 

We then download from graph api using that api given to get a PDF via

 public static async Task GetPdfOfDocumentAsync(HttpClient client, string siteId, string driveId, string documentId) < var getRequest = await client.GetAsync( $"https://graph.microsoft.com/v1.0/sites//drives//items//content?format=pdf"); return await getRequest.Content.ReadAsStreamAsync(); > 

This gives a stream composed off the document that was just created.

answered Apr 26, 2021 at 13:07 347 4 4 silver badges 9 9 bronze badges That's genius. Is the pdf immediately available? Like in the same request upload and download? Commented Jun 23, 2023 at 17:52 The request returned in the GetPdfOfDocumentAsync will contain the full pdf yes Commented Jun 24, 2023 at 18:06

yes, i mean can i have a single request to my backend, and my backend does the upload and download in 1 request? how many seconds does it take you think?

Commented Jun 24, 2023 at 18:10

Yes. all the methods above are called and returned in a single reply from my backend, and it depends on how big the file is normally, but not excessively long from my experience, a second or two

Commented Jun 25, 2023 at 11:45

This is adding to Jeremy Thompson's very helpful answer. In addition to the word document body, I wanted the header (and footer) of the word document converted to HTML. I didn't want to modify the Open-Xml-PowerTools so I modified Main() and ParseDOCX() from Jeremy's example, and added two new functions. ParseDOCX now accepts a byte array so the original Word Docx isn't modified.

static void Main(string[] args) < var fileInfo = new FileInfo(@"c:\temp\MyDocWithImages.docx"); byte[] fileBytes = File.ReadAllBytes(fileInfo.FullName); string htmlText = string.Empty; string htmlHeader = string.Empty; try < htmlText = ParseDOCX(fileBytes, fileInfo.Name, false); htmlHeader = ParseDOCX(fileBytes, fileInfo.Name, true); >catch (OpenXmlPackageException e) < if (e.ToString().Contains("Invalid Hyperlink")) < using (FileStream fs = new FileStream(fullFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite)) < UriFixer.FixInvalidUri(fs, brokenUri =>FixUri(brokenUri)); > htmlText = ParseDOCX(fileBytes, fileInfo.Name, false); htmlHeader = ParseDOCX(fileBytes, fileInfo.Name, true); > > var writer = File.CreateText("test1.html"); writer.WriteLine(htmlText.ToString()); writer.Dispose(); var writer2 = File.CreateText("header1.html"); writer2.WriteLine(htmlHeader.ToString()); writer2.Dispose(); > private static string ParseDOCX(byte[] fileBytes, string filename, bool headerOnly) < try < using (MemoryStream memoryStream = new MemoryStream()) < memoryStream.Write(fileBytes, 0, fileBytes.Length); using (WordprocessingDocument wDoc = WordprocessingDocument.Open(memoryStream, true)) < int imageCounter = 0; var pageTitle = filename; var part = wDoc.CoreFilePropertiesPart; if (part != null) < pageTitle = (string)part.GetXDocument() .Descendants(DC.title) .FirstOrDefault() ?? filename; >WmlToHtmlConverterSettings settings = new WmlToHtmlConverterSettings() < AdditionalCss = "body < margin: 1cm auto; max-width: 20cm; padding: 0; >", PageTitle = pageTitle, FabricateCssClasses = true, CssClassPrefix = "pt-", RestrictToSupportedLanguages = false, RestrictToSupportedNumberingFormats = false, ImageHandler = imageInfo => < ++imageCounter; string extension = imageInfo.ContentType.Split('/')[1].ToLower(); ImageFormat imageFormat = null; if (extension == "png") imageFormat = ImageFormat.Png; else if (extension == "gif") imageFormat = ImageFormat.Gif; else if (extension == "bmp") imageFormat = ImageFormat.Bmp; else if (extension == "jpeg") imageFormat = ImageFormat.Jpeg; else if (extension == "tiff") < extension = "gif"; imageFormat = ImageFormat.Gif; >else if (extension == "x-wmf") < extension = "wmf"; imageFormat = ImageFormat.Wmf; >if (imageFormat == null) return null; string base64 = null; try < using (MemoryStream ms = new MemoryStream()) < imageInfo.Bitmap.Save(ms, imageFormat); var ba = ms.ToArray(); base64 = System.Convert.ToBase64String(ba); >> catch (System.Runtime.InteropServices.ExternalException) < return null; >ImageFormat format = imageInfo.Bitmap.RawFormat; ImageCodecInfo codec = ImageCodecInfo.GetImageDecoders() .First(c => c.FormatID == format.Guid); string mimeType = codec.MimeType; string imageSource = string.Format("data:;base64,", mimeType, base64); XElement img = new XElement(Xhtml.img, new XAttribute(NoNamespace.src, imageSource), imageInfo.ImgStyleAttribute, imageInfo.AltText != null ? new XAttribute(NoNamespace.alt, imageInfo.AltText) : null); return img; > >; // Put header into document body, and remove everything else if (headerOnly) < MoveHeaderToDocumentBody(wDoc); >XElement htmlElement = WmlToHtmlConverter.ConvertToHtml(wDoc, settings); var html = new XDocument(new XDocumentType("html", null, null, null), htmlElement); var htmlString = html.ToString(SaveOptions.DisableFormatting); return htmlString; > > > catch < return "The file is either open, please close it or contains corrupt data"; >> private static void MoveHeaderToDocumentBody(WordprocessingDocument wDoc) < MainDocumentPart mainDocument = wDoc.MainDocumentPart; XElement docRoot = mainDocument.GetXDocument().Root; XElement body = docRoot.Descendants(W.body).First(); // Only handles first header. Header info: https://learn.microsoft.com/en-us/office/open-xml/how-to-replace-the-header-in-a-word-processing-document HeaderPart header = mainDocument.HeaderParts.FirstOrDefault(); XElement headerRoot = header.GetXDocument().Root; AddXElementToBody(headerRoot, body); // document body will have new headers when we return from this function return; >private static void AddXElementToBody(XElement sourceElement, XElement body) < // Clone the children nodes Listchildren = sourceElement.Elements().ToList(); List childClones = children.Select(el => new XElement(el)).ToList(); // Clone the section properties nodes List sections = body.Descendants(W.sectPr).ToList(); List sectionsClones = sections.Select(el => new XElement(el)).ToList(); // clear body body.Descendants().Remove(); // add source elements to body foreach (var child in childClones) < body.Add(child); >// add section properties to body foreach (var section in sectionsClones) < body.Add(section); >// get text from alternate content if needed - either choice or fallback node XElement alternate = body.Descendants(MC.AlternateContent).FirstOrDefault(); if (alternate != null) < var choice = alternate.Descendants(MC.Choice).FirstOrDefault(); var fallback = alternate.Descendants(MC.Fallback).FirstOrDefault(); if (choice != null) < var choiceChildren = choice.Elements(); foreach(var choiceChild in choiceChildren) < body.Add(choiceChild); >> else if (fallback != null) < var fallbackChildren = fallback.Elements(); foreach (var fallbackChild in fallbackChildren) < body.Add(fallbackChild); >> > > 

You could add similar methods to handle the Word document footer.

In my case, I then convert the HTML files to images (using Net-Core-Html-To-Image, also based on wkHtmlToX). I combine the header and body images together (using Magick.NET-Q16-AnyCpu), placing the header image at the top of the body image.