Using Pelican as a library

Author

Wasim Lorgat

Published

June 20, 2022

In this post you’ll learn how to use Pelican (a Python static site generator) programmatically rather than through its command-line interface. This will give you a better understanding of how Pelican works internally and enable you to customise it for your needs.

How Pelican works

Pelican’s highest-level of abstraction is its command-line interface, which you would typically use as follows:

$ pelican content output -s pelicanconf.py

This would read all articles and pages in the content directory, convert them to HTML, render web pages with the relevant Jinja templates, and write the resulting static website to the output directory.

The rough flow to achieve this is as follows:

Instantiate a list of Generators (which house all of the relevent Readers and a jinja Environment) and a Writer.
For each Generator:
- Call the generate_context method, which reads the input files, converts them to HTML, and adds the outputs to a context dictionary.
- Call the generate_output method, passing the Writer and context. This gets the relevant jinja Template from the Environment, renders it with the provided context, and writes the result to the final output directory.

As you can see, Generators are responsible for glueing together the lower-level components: Reader, jinja Template, and Writer. In order to understand each of these components, we’ll reimplement the core logic of a Generator from scratch!

Setup

Start by setting root to the directory of your Pelican website. If you don’t yet have a website, follow Pelican’s informative documentation to get started:

from pathlib import Path

root = Path('..')

Now we can load our pelicanconf.py settings file. Pelican provides a function for this which handles details like applying defaults:

from pelican.settings import read_settings

settings = read_settings(root/'pelicanconf.py')

Let’s create a quick blog post for testing. I prefer to write more technical blog posts in Jupyter notebooks but we’ll use markdown here since Pelican supports it natively.

post_filepath = root/'content/2022-06-20-hello-pelican.md'

%%writefile {post_filepath}
Title: Hello Pelican
Slug: hello-pelican
Author: Wasim Lorgat
Date: 2022-06-20
Tags: python, pelican
Category: python

## Welcome

Hello and welcome to our markdown blog post!

Writing ../content/2022-06-20-hello-pelican.md

`Reader`

We’ll start by instantiating a MarkdownReader to read our blog post. We’re using a MarkdownReader because we wrote the post in markdown, but Pelican also provides HTMLReader and RstReader if you prefer those formats.

from pelican.readers import MarkdownReader

reader = MarkdownReader(settings)

The most important part of a Reader is its read method which accepts a file path and returns the contents of the file in HTML format along with metadata about the file:

content, metadata = reader.read(post_filepath)

… content is a string containing the blog post content converted to HTML. Since this was written in a notebook, we can use an IPython function to render it directly!

from IPython.core.display import HTML
HTML(content)

Welcome

Hello and welcome to our markdown blog post!

… and metadata is a dictionary that describes the file:

metadata

{'title': 'Hello Pelican',
 'slug': 'hello-pelican',
 'author': <Author 'Wasim Lorgat'>,
 'date': SafeDatetime(2022, 6, 20, 0, 0),
 'tags': [<Tag 'python'>, <Tag 'pelican'>],
 'category': <Category 'python'>}

`Writer`

Now that we have the contents of the post in HTML format, we’ll render it into a static web page using a Writer. However, we first need to create an appropriate jinja Template. Jinja provides the Environment class for reusing functionality across templates so we’ll use that here.

Pelican searches for templates in the following order:

Individual template overrides, via settings['THEME_TEMPLATES_OVERRIDES'].
The configured theme, via settings['THEME'].
The default simple theme packaged with Pelican.

We can implement this search order using a FileSystemLoader, housed in an Environment for convenience:

import pelican
from jinja2 import Environment, FileSystemLoader
from pathlib import Path

template_paths = [*(Path(o) for o in settings['THEME_TEMPLATES_OVERRIDES']),
                  Path(settings['THEME'])/'templates',
                  Path(pelican.__file__).parent/'themes/simple/templates']
env = Environment(loader=FileSystemLoader(template_paths),
                  **settings['JINJA_ENVIRONMENT'])

Now we can get the article template:

template = env.get_template('article.html')

The last step of preparation is to create the context dictionary that’s passed through to the Template to render the article:

from pelican.contents import Article

context = settings.copy()
article = Article(content, metadata, settings, post_filepath, context)
article.readtime = {'minutes': 1}  # NOTE: this is a workaround to support the readtime plugin that I use
context['article'] = article

And now we can write the final result!

from pelican.writers import Writer

output_dir = root/'test'
writer = Writer(output_dir, settings)
writer.write_file(Path(post_filepath.name).with_suffix('.html'), template, context)

Let’s read it back in and see what it looks like. We’ll extract only the body using a simple regex - I’d usually recommend considering Beautiful Soup for parsing HTML but regex works fine for our case:

import re

with open(output_dir/'2022-06-20-hello-pelican.html') as f: html = f.read()
body = re.findall('<body>(.*?)</body>', html, re.DOTALL)[0].strip()
HTML(body)

Hello Pelican

June 20, 2022 • 1 min read

Welcome

Hello and welcome to our markdown blog post!

The provided templates have added a navigation bar at the top, a title below that, as well as the publication date and estimated reading time. And that’s it, we’ve successfully rendered a blog post web page using Pelican’s low-level components!

Before we end off, clean up the files we made along the way:

import shutil
shutil.rmtree(output_dir, ignore_errors=True)
post_filepath.unlink(missing_ok=True)