DayPath Journal

Using scriptcs to Generate EPUB XHTML

I am embarrassed to admit that I was unaware that EPUB is a highly successful application of XHTMLoutside of the Web browser. It has been a standard of the IDPF for over a decade. My entry into the .NET Framework around 2003 was driven by XPathDocument.aspx) which gave way to XDocument under LINQ to XML (around 2007). My document-centric approach to the .NET Framework (coupled with my Desktop publishing background) makes me more than suited to develop a publishing pipeline for EPUB.

For my design, there were two critical elements missing for this pipeline: (i) markdown (in the Visual Studio Code environment) and (ii) scriptcs. Visual Studio Code can be regarded as a “gateway drug” to the cross-platform editing culture of Atom which inspired me (along with Git cultural influences) to see markdown as the most primitive (archival) document format for general-purpose XHTML applications (under the shadow of SGML). I now understand that markdown is a subset of HTML, making its relationship to EPUB direct (and, by the way, I now regard EPUB as the ‘open source’ alternative to PDF).

My chosen technology to apply the XML-processing power of the .NET Framework in an open-source, cross-platform-ish way, is scriptcs. (Currently the scriptcs story on Linux is not quite there as we wait for .NET Standard and .NET Core develop.) Glenn Block of .NET MEF fame is the main dude behind scriptcs. You can listen to Glenn talk about “The Future of ScriptCS” in show #1110 of .NET Rocks!. One of the key takeaways from this show is seeing scriptcs as an application of Roslyn.

My Generate-EPUB *.csx script

My EPUB pipeline starts with an EPUB “seed” based on the handmade set of files from Eric Muss-Barnes. His video, “How to Make an eBook EPUB File,” was an instrumental technical introduction to EPUB, in full view of the real-world publishing market. Since this introduction, I do see a GitHub repository, from the IDPF, full of EPUB samples. I will explore these samples and look for ways to build upon the Muss-Barnes base.

My scriptcs approach therefore is a combination of reading and editing the EPUB seed using publication metadata in a JSON file and a collection of XHTML templates. This automation starts with generate-epub.csx, intending to summarize the automation:

#load "scriptcs-epub-utility.csx"
#load "scriptcs-environment-utility.csx"
#load "scriptcs-markdown-utility.csx"
#load "publication-namespaces.csx"
#load "publication-context.csx"
#load "publication-chapter.csx"
#load "publication-daisy-consortium-ncx.csx"
#load "publication-idpf-package.csx"
#load "publication-oebps-text-biography.csx"
#load "publication-oebps-text-copyright.csx"
#load "publication-oebps-text-dedication.csx"
#load "publication-oebps-text-toc.csx"

var csxRoot = EnvironmentUtility.GetScriptFolder();
var pubContext = new PublicationContext(csxRoot);

Console.WriteLine("End of script.");

The screenshot below shows the layout of the EPUB assets (along with the print-publication assets) with respect to generate-epub.csx (in the \scriptcs folder) in my free, private Git repository hosted by Microsoft:

EPUB pipeline in Git repository

Assuming that I have a future, I will go into detail about this intent and this layout for the EPUB pipeline in subsequent Blog posts.