When one thinks about data hierarchies, or in other words, information systems where elements may be related to one another in a hierarchical way, a programmer will often think of expressing that relationship by using an inheritence tree.

Python supports multiple inheritence for objects, and this is often used as "mixins";  one can add all of the attributes and methods of a superclass to your derived class by simple inclusion.  So, within limits it may be possible to express the relationship "A is a kind of B, and is also a type of C" through multiple inheritence.

The ZODB naturally stores information hierarchically, but that hierarchy is not an inheritance structure, and is rather a parent->child relationship.  The type of parent in a ZODB structure does not predicate the type of the child.

Hierarchies by Inheritence

Let's consider the following hierarchy:

In summary, a publication has a title, date and one or more authors.  An article is a publication with a few additional attributes.  A book is also a kind of publication having a table of contents and ISBN.  And so forth.

Normally, a relational database might map the above types into a set of tables.  One might then write some Python classes to map onto the structure, for example:

class Publication(object):
    title = ""
    date = ""
    authors = []

class Pamphlet(Publication):
    ''' a kind of publication that is handed out to people '''

class Book(Publication):
    ''' a kind of publication that one can buy off the shelf '''
    contents = {}  # Section to page mapping
    ISBN = ""

class Biography(Book):
    ''' A book about yourself or someone else '''

class Novel(Book):
    ''' A story book '''
    foreword = ""

...

Later, the programmer might map some views onto the classes (IOW the data model) to produce forms for data entry, or to produce reports.  Some web frameworks use class inheritence as a mechanism to define a full set of attributes in order to provide the fields for an automatically generated entry form.  For example, due to inheritance, we know that a Novel object has a title, a list of authors, a publication date, a table of contents, an ISBN and a foreword section.

To update the database with a Novel object, the Publication table would first be updated, then the Book table, then the Novel table using data entered in the form.  Pretty good so far.  Using an object relational mapper (for example Django ORM, STORM or SqlAlchemy), the task of creating or updating records in the database can be greatly simplified.  Django takes the approach of directly defining class attributes as schema fields, so that a field can simultaneously describe a form user interface as well as a database table schema field.

Things would break down rather badly if a database schema were to diverge, and no longer match the class hierarchy. So, in the case where a class hierarchy is used to map a database schema, the two would always have to be in synchrony.

Zope Schemas

Zope & Grok do things rather differently.

A Zope Schema is defined as a set of schema fields, collected into a Zope Interface.  For example, one might define:

from zope import schema, component

class IAuthor(component.Interface):
    name = schema.TextLine(title=u'Name', description=u'Author Name')
    surname = schema.TextLine(title=u'Surname', description=u'Author Surname')
    country = schema.TextLine(title=u'Country', description=u'Country of Origin')
    dob = schema.Date(title=u'Date of Birth', description=u'Author birth date')

class IPublication(component.Interface):
    title = schema.TextLine(title=u'Title:', description=u'The title of the publication')
    authors = schema.List(title=u'Author(s)', description=u'A list of authors',
                          value_type=schema.Object(schema=IAuthor), required=False)
    date = schema.Date(title=u'Date:', description=u'Date of Publication')

class IBook(IPublication):
    contents = schema.Dict(title=u'Contents', description=u'Table of Contents',
                           key_type=schema.TextLine(title=u'Section Header'),
                           value_type=schema.Int(title=u'Page Number'))
    isbn = schema.TextLine(title=u'ISBN', description=u'ISBN')

class INovel(IBook):
    foreword = schema.Text(title=u'Foreword', description=u'Foreword')
    

In summary, a Zope schema defines the fields contained in an entity such as a Novel, and inheritence may be used to help define that schema interface.

However, just defining a schema does not in any way describe or constrain the structure of your data model.  We could for example declare a Novel as a single discrete Python class, such as:

class Novel(object):
    grok.implements(INovel)
    title = u''
    author = []
    date = u''
    contents = {}
    isbn = u''
    foreword = u''

If we were to do this, the plain old data class Novel is now associated with the INovel interface, and also any parent interfaces:

> INovel.implementedBy(Novel)
true
> IBook.implementedBy(Novel)
true
> IPublication.implementedBy(Novel)
true

Since various HTML fom elements are already associated with Zope schema fields via form libraries such as zope.formlib or z3c.forms, it is possible to automatically generate entry forms, add forms or display forms just from the zope schema.  This is similar in some ways to the approach taken by other web frameworks, except that the zope.component.Interface is completely abstracted from the data storage format.

To allow instances of Novel to persist in a ZODB database, we can derive the class from a grok.Model, which in turn derives from persistent.persistent, which is required for objects stored in ZODB.

ZODB storage

Recalling that in our schema everything (other than IAuthor) derives from an IPublication, we could feasibly store all data for our publications in a single ZODB folder.  The different Publication specialisations are similar to files of different types in the same file system folder.  How these publications are treated could simply be a function of the type.

For example, imagine an application with the following simple ZODB structure:

Where app, authors and publications are all containers, this would provide automatic URL's for http://[server]/apphttp://[server]/authors and http://[server]/app/publications.  In code, this might look something like this:

import grok

class Authors(grok.Container):
    ''' A container for authors '''
    grok.implements(IAuthor)   # for an automatic add form

class Publications(grok.Container):
    ''' A container for publications '''

class App(grok.Application, grok.Container):
    ''' Our main application gets installed by the management interface '''

    def __init__(self):
        super(App, self).__init__()
        self['authors'] = Authors()
        self['publications'] = Publications()

Looking at the container App()['publications'], there is no implied restriction as to what might be stored the container.  The trick though, is handling (creating, listing, updating and viewing) data items intelligently.  One might imagine an application which must be able to list all publications regardless of type, but also create new books.  When creating a Novel, there should be entry fields specific to novels not present for other book types.  There are various approaches to accomplishing this, and no approach would be "wrong".

One approach is to add sub-containers for Book, Article and Pamphlet to the Publications container.  One might then add Biography, Novel and Reference sub-containers to Book, and so on.  This has many advantages:

  • The list of items in each container are kept relatively small, adding to efficiency
  • Similar view and model code could be used to represent items in containers
  • URL traversal implicitly selects the publication type
  • It's easy to add new items to each category since you  already know the type from its container

 On the other hand this has some disadvantages too, when compared to a single Publications folder which contains publications of various types:

  • It's harder to generate a list of all books for example, or all publications in one long sorted list.  It would involve listing each sub-container in turn.
  • It is more difficult to find an arbitrary publication without first knowing it's type (and therefore it's hierarchy).
  • The data storage structure may constrain or influence the application flow (a very bad idea): eg. instead of create biography, novel or reference it becomes 
    • create new
      • publication: book, pamphlet or article
        • book: biography, novel or reference

A rule of thumb is to think of ZODB as a data storage system for objects.  While containers generally store items of related classes, it is not required that they do so.  However, structuring one's storage in a logical way aids classification and later searches to find content.  ZODB can store both structured as well as unstructured content.

Compare a persistent object storage container, which may easily store items of varying types, to a relational database table which always stores the same data columns- although generic views might be accomplished through unused columns which contain null data.  Persistent objects may have logic (i.e. methods) as well as data attributes, and while relational databases may have stored functions, it is not the same thing.   Modelling data relationally does have some nice advantages where it comes to foreign key import constraint rules, triggers, queries, joins and numerous other things, for which there are no direct equivalents in object databases.  For example, ensuring uniqueness in an object database might involve an <on_insert> event handler which checks against an index for prior existence; that is extra work which is unnecessary when using relational databases.

Relational mappings

Typically, one will find relational databases do not easily map to object oriented designs.

Other than when working with object-relational databases such as PostgreSQL, a translation of our schema to a set of related tables does not provide us with a mechanism for inheritance.  Rather, the fact that a Publication has a list of Author is described throuh a foreign key relationship between the two tables, and Book and Publication are related in the same way.

Rather than using inheritance to describe the fact that a Book is a Publication with an ISBN and table of contents, or that a Novel is a Book with a foreword, it is perhaps a better approach, at least from the point of view of relational databases, to describe a Novel as having a Book, and a Book as having a Publication. This changes our schema slightly:

from zope import schema, component

class IAuthor(component.Interface):
    name = component.TextLine(title=u'Name', description=u'Author Name')
    surname = component.TextLine(title=u'Surname', description=u'Author Surname')
    country = component.TextLine(title=u'Country', description=u'Country of Origin')
    dob = component.Date(title=u'Date of Birth', description=u'Author birth date')

class IPublication(component.Interface):
    title = schema.TextLine(title=u'Title:', description=u'The title of the publication')
    authors = schema.List(title=u'Author(s)', description=u'A list of authors',
                          value_type=schema.Object(schema=IAuthor), required=False)
    date = schema.Date(title=u'Date:', description=u'Date of Publication')

class IBook(component.Interface):
    publication = schema.Object(title=u'Publication', schema=IPublication)
    contents = schema.Dict(title=u'Contents', description=u'Table of Contents',
                           key_type=schema.TextLine(title=u'Section Header'),
                           value_type=schema.Int(title=u'Page Number'))
    isbn = schema.TextLine(title=u'ISBN', description=u'ISBN')

class INovel(component.Interface):
    book = schema.Object(title=u'Book', schema=IBook)
    foreword = schema.Text(title=u'Foreword', description=u'Foreword')

If our data were stored in an RDBMS such as Postgres, MySql or even SqlLite, one could define database tables as follows:

import grok
from megrok import rdb

class Author(rdb.Model):
    grok.Implements(IAuthor)
    rdb.metadata(metadata)
    rdb.reflected()

class Publication(rdb.Model):
    grok.implements(IPublication)
    rdb.metadata(metadata)
    rdb.reflected()
    authors = rdb.relation('Author', backref='publication')

class Book(rdb.Model):
    grok.implements(IBook)
    rdb.metadata(metadata)
    rdb.reflected()
    publication = rdb.relation('Publication', backref='book')

class Novel(rdb.Model):
    grok.implements(INovel)
    rdb.metadata(metadata)
    rdb.reflected()
    book = rdb.relation('Book', backref='novel')

Now assuming the schema were implemented in the database, the attributes would be automatically read in through reflection, and relations would be defined as described in the database. 

So the Publication class would have fields defined for attributes title, author, and date, and also because Book refers to Publication specifying a 'book' back reference, there will also be a Publication.book attribute containing a list of books.  These attributes map directly to the database, so updating an attribute will result in the appropriate SQL instructions to accomplish the change.

Using Object-relational mappings might seem like a step backward when compared to a true object database such as ZODB, but there are many advantages- not least being the ability to access your database from code written outside of Python.  It is always possible to extend your mapped classes with your own methods and even non-persistent attributes, and so one gains much flexibility and portability by using an ORM such as SqlAlchemy.

Grok 4 Noobs

Interfaces vs. Inheritence: A Real World Example