Validating SQL and NoSQL data models with python and Cerberus.

Published in

Analytics Vidhya

6 min readSep 4, 2019

In this short (hands-on) tutorial I will give you a very basic introduction into using the great Cerberus library to define and validate data models in python. I will use the PythonOnWheels Framework to generate the models easily but the Cerberus schemas and validation will also work without PythonOnWheels.

So what is this Cerberus thing ?

Cerberus is a lightweight and extensible data validation library for Python

It is very handy and useful and makes defining schemas including attribute names, data types and validation rules super simple. Since this is a really handy and pythonic way to deal with data in your application:

And what is this PythonOnWheels thing ?

PythonOnWheels is a layer of glue around some great existing libraries/modules and tools to make your python life easier for the boring tasks.

And because it’s pythonic, super easy to use and very helpful

PythonOnWheels uses Cerberus schemas for all model definitions wether it’s SQL or NoSQL

This doesn’t only give you the benefit of using the same model definition schema for NoSQL and all SQL Databases but it also means that you have validation on board for every model and you can easily switch from SQL to NoSQL

To follow the hands-on part you can either use PythonOnWheels

If you don’t know how to:

install PythonOnWheels read the super short getting_started
generate a new app (I called mine testapp for this tutorial)

You can also follow the tutorial without PythonOnWheels.

in this case you need to install Cerberus (and create a Validator, see below)

$ pip install cerberus

You can then validate the schemas like this (just cut an paste them to your python interpreter)

>>> schema={ .... take some sample schema from below ... }
>>> d={}                  # create a dict (instead of a real model)
>>> d["title"] = "test"   # set some attribures
>>> from cerberus import Validator
>>> v=Validator()
>>> v.schema=schema
>>> v.validate(d)
True
>>>

Let’s start with the action:

For this tutorial I generated a TinyDB (NoSQL DB) model like this:

python generate_model.py -n todo -t tinydb

TinyDB is a super small, file based document DB (NoSQL). You can think of it like the SQlite of NoSQL ;).

And this is how the generated model schema looks like:

# 
# TinyDB Model:  Todo 
# 
from testapp.models.tinydb.tinymodel import TinyModel 
class Todo(TinyModel): 
    # 
    # Use the cerberus schema style  
    # 
    schema = { 
        'title' :   { 'type' : 'string', 'maxlength' : 35 }, 
        'text'  :   { 'type' : 'string' }, 
        'tags'  :   { 'type' : 'list', "default" : [] }, 
        "votes" :   { "type" : "integer", "default" : 0 }    
        }

Remark: a PythonOnWheels SQL model would use the exact same definition syntax.

So with our cerberus schema we can:

define your attributes and their datatypes (e.g. “title”, “type”: “string”)
define the attributes constraints and validation rules (“maxlength”=35)
all in a concise and proven manner using a widely used python library.
and use the same model definition schema for SQL and NoSQL models.

What PythonOnWheels does behind the scenes

Based on the Cerberus schemas PythonOnWheels generates the actual model representation needed for the different Databases for you. So when you define a model like the above. PoW generates the right mongoDB, tinyDB, SQL (sqlalchemy) or Elastic schema in the background that is actually used to work with the database of your choice.

But since this is all about validation, we don’t care about that in the following.

So let’s use the validation

Let’s look again at our model’s schema definition, because the validation uses our definition to check if an actual model instance is correct or not.

Adapt the schema to look like this:

schema = {         
        'title' :   { 'type' : 'string', 'maxlength' : 35 }, 
        'text'  :   { 'type' : 'string' }, 
        "status":   { "type" : "string", "default" : "new",   
                      "allowed" : ["new", "wip", "done", "hold"]}, 
        "votes":   { "type" : "integer", "default" : 0 }   
    }

To ease this just copy and replace the whole schema.

But we just rename tags to status and add the “allowed” : [“new” ,“wip”, “done”, “hold”] definition.

This is how it looks: (in the explorer you see the path to models/tinydb/todo.py)

Attribute Names and Types

You can see that we defined four attributes, named : title, text, status and votes. Additionally we also defined their types. Three of them are strings, only the points attribute is of type integer.

Attribute constraints

For some of the attributes, in this case title and status we also defined constraints. Constraints basically just narrow the allowed values for an attribute type.

For the title attribute the type has to be a string with the constraint that it may not be longer than 35 characters.

The status attribute has to be of type string as well, but it also only validates if the string is one of the allowed: “new”, “wip”, “done” or “hold”.

Since we use Cerberus you can rely on a lot of more possible constraint definitions. Just read the full validation rules docs.

Let’s test a real model instance

Fire up your python interpreter and import your model. If you don’t know how to generate a PythonOnWheels app read the 2 Minute intro. See above how to generate a model.

>>> from testapp.models.tinydb.todo import Todo

Create an instance

>>> t=Todo()

Let us review the schema for the status attribute:

"status":   { "type" : "string", "default" : "new", "allowed" : ["new", "wip", "done", "hold"]},

As you know, this defines status to be of type string AND only accepts the allowed values which are in this case: new, wip, done and hold. Any other value will be validated as False.

Let’s show how the t instance looks like: (by default)

>>> t 
{ '_uuid': '', 
 'created_at': datetime.datetime(2019, 9, 4, 21, 25, 10, 6827), 
 'id': '', 
 'last_updated': datetime.datetime(2019, 9, 4, 21, 25, 10, 6827), 
 'status': 'new', 
 'text': '', 
 'title': '', 
 'votes': 0}

You can see that there are some additional parameters added by PythonOnWheel (created_at, last_updated and _uuid) and that the defaults for status is “new” and for votes is 0. Just as defined.

So let’s validate the model to see if it’s accepted

>>> t.validate()
True

The validate() method of the model returns True meaning that validation was successfull and False otherwise.

Let’s change some attributes

>>> t.title="123456789012345678901234567890123456"
>>> t.status="yihaa"

Validate the model again to see if it’s accepted

>>> t.validate()
False

Ok, this time validation failed. So how can we see what exactly went wrong?

Check the validation result

>>> t.validator.errors
{'status': ['unallowed value yihaa'], 'title': ['max length is 35']}

Just access the errors attribute of the models validator to see which validation rules failed. This is also a super nice way to automatically return simple error messages e.g. for a web form or in a JSON response.

Let’s now fix the errors to see if we can validate to True

Change status to something allowed, e.g. “done”
Change the title to something with less than 35 characters.

>>> t.title="test"
>>> t.status="done"

Let’s check it again

>>> t.validate() 
True

Superb, now our todo passes the validation.

This is it. We defined models, attribute types and validation rules (constraints) and validated them. We also learned how to access the concretre errors we made.

You can do a lot more with Cerberus. And you can do a lot more with PythonOnWheels. But for my use-cases I mostly need the basic stuff and this is already very helpful.

Summary

I hope you agree that using Cerberus schemas makes it really easy to define database models in an elegant way but also gives us a lot of benefit with validation as well.

The schema syntax is very straight forward using the datatypes available in python as well and easy to remember constraints like “allowed”, “maxlength”, “anyof” and so on. See the Cerberus validation rules documentation here.

So finally we can now save the model safely to the DB

This is pretty easy in PythonOnWheels. Just do an upsert() and go.

>>> t.upsert()
insert, new eid: 1

Done! ;)

If you want to read more on data, database an model handling in PythonOnWheels I would recommend reading these short hands-on tutorials. (all about 5–10 Minutes)

Generating models (SQL, NoSQL)
Or check the Models section in the Documentation

Hope you enjoyed the tutorial and PythonOnWheels. If you have any questions or remarks or errors you can open an issue on github or tweet to @pythononwheels.