Printing the site as a book

The organisation of this wiki is a bit odd, one might say not as simple as might be found on other wikis. It's too deep one might argue; a wiki does not need to branch forever. And one may be right.

The reason why we did it this way, is that the navigation of the site almost forces one to build documentation in a structured manner. So, looking at one topic, one is not presented with the entire list of siblings in the navigation; rather, one can see the previous and next sibling, and the parent item. Looking at this page, one can see that the parent is "Application", and the topic "Printing a Book" has one sibling called "Doing Backups". This is similar behaviour to a book, where one may page forward or back, or go to the index.

This approach directs the focus of the author to the topic at hand, and if there are further topics, related to the current one, lets the author create sub topics. The effect is almost like a live Mind Map. When traversed in ordered depth first, the site tree produces a single document where topics are close to each other and follow each other, much the way a book is presented.

All of this makes for a pretty good printable text.

Plugging in Prince

The best converter to PDF from HTML available seems to be a commercial procuct called PrinceXML. This converter is also free for non-commercial use, thanks to the creators of that product. This site does not depend on the converter, but will use it if it is present.

try:  # Figure if we have prince (http://www.princexml.com) installed
    has_prince = subprocess.call(['prince', '--version']) >= 0
except:
    has_prince = False

If the utility is present after startup, then the site will render a "Make Book" button when one navigates to the root of the site.

class MkBookButton(MenuItem):
    '''  A menu item that turns the site into a book
    '''
    grok.context(ISiteRoot)
    grok.require('zope.Public')
    grok.order(-9)
    title = u'Create Book'
    link = 'mkbook'
    mclass = 'nav buttons'

    def condition(self):  # Depends on the 'prince' app being installed
        return has_prince

Generating content for the book

As mentioned, we need to start at the site root, and traverse the content in ordered depth first order, visiting each node in the tree and emitting the HTML associated with each node as we go. This produces a single document which may be processed and turned into a PDF document.

We start by defining a view called mkbook for the main site:

class MkBook(grok.View):
    '''  Turn the content of this site into a book
    '''
    grok.context(ISiteRoot)
    grok.require('zope.Public')
    
    def render(self):
        url = self.url(self.context, name='fullpagehtml')
        try:
            result = subprocess.check_output(['prince', url, '-o', '-'])
        except:
            return
        
        response = self.request.response
        response.setHeader('content-type', 'application/pdf')
        response.addHeader('content-disposition', 'inline;filename="gfn.pdf"')
        return result

This renders the whole site to a single document, uses Prince to convert the document to a PDF (using the FullPageHtml view), and returns the PDF document to the browser.

The FullPageHtml View is defined only for ISiteRoot, being the first node in our tree. However, this view kicks off the process with the first call to a recursive view called PageSimpleHTML

class FullPageHTML(grok.View):
    ''' Return the site as a single HTML page
    '''
    grok.context(ISiteRoot)
    grok.require('zope.Public')

    def update(self):
        style.need()
        textLight.need()

The detail is in the page template for FullPageHtml, which renders a full HTML page:

<!doctype html>
<html itemscope="itemscope" itemtype="http://schema.org/WebPage" 
    xmlns="http://www.w3.org/1999/xhtml"
    xml:lang="en"
    xmlns:tal="http://xml.zope.org/namespaces/tal"
    xmlns:i18n="http://xml.zope.org/namespaces/i18n" lang="en">
	<head>
             ...<snipped for the sake of brevity />
	</head>
	<body i18n:domain="camibox" class="plainText" >
		<h1 class='bookTitle'>Grok 4 Noobs.</h1>
		
		<div class="sectionItems" 
                   tal:content="structure context/@@pagesimplehtml"></div>
	</body>
</html>

As one can see, the view template includes the content of the PageSimpleHTML view. This turns out to be the whole site.

PageSimpleHTML visits each node in the tree, appending the text of each node to the same output. It defines two convenience methods; sortedContent() returns the ordered list of contained items for the article, and articleContent() returns the HTML content for the article.

class PageSimpleHTML(grok.View):
    ''' Render this IArticle as a simple page, then do the same
        for each of the sub-articles
    '''
    grok.context(IArticle)
    grok.require('zope.Public')

    def articleNumber(self):
        order = getattr(self.context, "order", None)
        if order is None:
            self.context.section = ""
        else:
            order = int(order) + 1
            parent = getattr(self.context, "__parent__", None)
            if parent and len(parent.section):
                section = "{}.{}".format(parent.section, order)
            else:
                section = "{}".format(order)
            self.context.section = section
            return section + ": "
        return ""

    def articleContent(self):
        baseUrl = self.url(self.context) + "/"
        text = self.context.text
        if self.context.attachments is not None: 
            for a in self.context.attachments:
                st = 'attachments/{}'.format(a)
                text = text.replace(st, baseUrl+st)
        return text
    
    def sortedItems(self):
        sorter = IArticleSorter(self.context)
        return sorter.sortedItems()

The articleNumber() method creates a label for each IArticle node as we visit it. The very first node (the introduction) does not get a section label. This is normal behaviour for books. Subsequent nodes use the parent's label and append their own order (as determined by sorter.sortedItems()) to produce a new section label.

What articleContent() is doing, is to replace the relative links to image attachments (the src attribute) with absolute links. Normally when visiting a page in the site, it is sufficient to use relative links, and for the sake of backups and restoring backups, it is also a good thing to store them that way in the article. However, this would not work for a single page, as the links are not relative to the initial model.

The page template for the view is again relatively straightforward;

<h2 class='aTitle'>
    <span tal:content="view/articleNumber" />
    <span tal:replace="context/title" />
</h2>
<div class='Content' tal:replace='structure view/articleContent' />
<div class="sectionItem" tal:repeat="ctx view/sortedItems">
    <div tal:replace="structure ctx/@@pagesimplehtml">
      section text
    </div>
</div>

Now, when the MkBook view is called for the ISiteRoot, the view produces a full PDF document from the single HTML document that resulted from the FullPageHTML view.

The only other little details around this relate to the specific CSS required to tell Prince how to format the documents. The bits we use are:

body.plainText pre {
	background-color: #FAFAFA;
    color: #000010;
    max-height:none;
    font-size:8.5pt;
    font-stretch: narrower;
    width:95%;
    margin: 0 2%;
    padding: 0.5em;
}

which overrides some defaults to make the background and font better for printed text, and

@page:first { 
    @top { content: normal }
    @bottom-right { content: normal }
}

@page:left { 
    @top { content: string(booktitle) }
    @bottom-left { content: counter(page) }
}

@page:right {
    @top { content: string(chaptertitle) }
    @bottom-right { content: counter(page) }
}

h1.bookTitle { string-set: booktitle content(); }

h2.aTitle {
	string-set: chaptertitle content(); 
	page-break-before: always;
	page-break-after: avoid;
}

...which fills in page numbers and headers for each page in the book.

The produced document contains the Prince logo on the first page since this is the non commercial version, and frankly given the quality of the document, I would not have minded their logo appearing on every other page too. I certainly hope someone will use Prince commercially as a result of advertisement by this site.

*Username
*Password