Models
The primary means of defining objects in pydantic is via models
(models are simply classes which inherit from BaseModel
).
You can think of models as similar to types in strictly typed languages, or as the requirements of a single endpoint in an API.
Untrusted data can be passed to a model, and after parsing and validation pydantic guarantees that the fields of the resultant model instance will conform to the field types defined on the model.
Note
pydantic is primarily a parsing library, not a validation library. Validation is a means to an end: building a model which conforms to the types and constraints provided.
In other words, pydantic guarantees the types and constraints of the output model, not the input data.
This might sound like an esoteric distinction, but it is not. If you're unsure what this means or how it might effect your usage you should read the section about Data Conversion below.
Basic model usage🔗
from pydantic import BaseModel class User(BaseModel): id: int name = 'Jane Doe'
User
here is a model with two fields id
which is an integer and is required,
and name
which is a string and is not required (it has a default value). The type of name
is inferred from the
default value, and so a type annotation is not required (however note this warning about field
order when some fields do not have type annotations).
user = User(id='123')
user
here is an instance of User
. Initialisation of the object will perform all parsing and validation,
if no ValidationError
is raised, you know the resulting model instance is valid.
assert user.id == 123
fields of a model can be accessed as normal attributes of the user object the string '123' has been cast to an int as per the field type
assert user.name == 'Jane Doe'
name wasn't set when user was initialised, so it has the default value
assert user.__fields_set__ == {'id'}
the fields which were supplied when user was initialised:
assert user.dict() == dict(user) == {'id': 123, 'name': 'Jane Doe'}
either .dict()
or dict(user)
will provide a dict of fields, but .dict()
can take numerous other arguments.
user.id = 321 assert user.id == 321
This model is mutable so field values can be changed.
Model properties🔗
The example above only shows the tip of the iceberg of what models can do. Models possess the following methods and attributes:
dict()
- returns a dictionary of the model's fields and values; cf. exporting models
json()
- returns a JSON string representation
dict()
; cf. exporting models copy()
- returns a deep copy of the model; cf. exporting models
parse_obj()
- a utility for loading any object into a model with error handling if the object is not a dictionary; cf. helper functions
parse_raw()
- a utility for loading strings of numerous formats; cf. helper functions
parse_file()
- like
parse_raw()
but for files; cf. helper function from_orm()
- loads data into a model from an arbitrary class; cf. ORM mode
schema()
- returns a dictionary representing the model as JSON Schema; cf. Schema
schema_json()
- returns a JSON string representation of
schema()
; cf. Schema construct()
- a class method for creating models without running validation; cf. Creating models without validation
__fields_set__
- Set of names of fields which were set when the model instance was initialised
__fields__
- a dictionary of the model's fields
__config__
- the configuration class for the model, cf. model config
Recursive Models🔗
More complex hierarchical data structures can be defined using models themselves as types in annotations.
from typing import List from pydantic import BaseModel class Foo(BaseModel): count: int size: float = None class Bar(BaseModel): apple = 'x' banana = 'y' class Spam(BaseModel): foo: Foo bars: List[Bar] m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}]) print(m) #> foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'), #> Bar(apple='x2', banana='y')] print(m.dict()) """ { 'foo': {'count': 4, 'size': None}, 'bars': [ {'apple': 'x1', 'banana': 'y'}, {'apple': 'x2', 'banana': 'y'}, ], } """
(This script is complete, it should run "as is")
For self-referencing models, see postponed annotations.
ORM Mode (aka Arbitrary Class Instances)🔗
Pydantic models can be created from arbitrary class instances to support models that map to ORM objects.
To do this:
- The Config property
orm_mode
must be set toTrue
. - The special constructor
from_orm
must be used to create the model instance.
The example here uses SQLAlchemy, but the same approach should work for any ORM.
from typing import List from sqlalchemy import Column, Integer, String from sqlalchemy.dialects.postgresql import ARRAY from sqlalchemy.ext.declarative import declarative_base from pydantic import BaseModel, constr Base = declarative_base() class CompanyOrm(Base): __tablename__ = 'companies' id = Column(Integer, primary_key=True, nullable=False) public_key = Column(String(20), index=True, nullable=False, unique=True) name = Column(String(63), unique=True) domains = Column(ARRAY(String(255))) class CompanyModel(BaseModel): id: int public_key: constr(max_length=20) name: constr(max_length=63) domains: List[constr(max_length=255)] class Config: orm_mode = True co_orm = CompanyOrm( id=123, public_key='foobar', name='Testing', domains=['example.com', 'foobar.com'] ) print(co_orm) #> <models_orm_mode.CompanyOrm object at 0x7ff4b9ef7940> co_model = CompanyModel.from_orm(co_orm) print(co_model) #> id=123 public_key='foobar' name='Testing' domains=['example.com', #> 'foobar.com']
(This script is complete, it should run "as is")
ORM instances will be parsed with from_orm
recursively as well as at the top level.
Here a vanilla class is used to demonstrate the principle, but any ORM class could be used instead.
from typing import List from pydantic import BaseModel class PetCls: def __init__(self, *, name: str, species: str): self.name = name self.species = species class PersonCls: def __init__(self, *, name: str, age: float = None, pets: List[PetCls]): self.name = name self.age = age self.pets = pets class Pet(BaseModel): name: str species: str class Config: orm_mode = True class Person(BaseModel): name: str age: float = None pets: List[Pet] class Config: orm_mode = True bones = PetCls(name='Bones', species='dog') orion = PetCls(name='Orion', species='cat') anna = PersonCls(name='Anna', age=20, pets=[bones, orion]) anna_model = Person.from_orm(anna) print(anna_model) #> name='Anna' age=20.0 pets=[Pet(name='Bones', species='dog'), #> Pet(name='Orion', species='cat')]
(This script is complete, it should run "as is")
Arbitrary classes are processed by pydantic using the GetterDict
class
(see utils.py), which attempts to
provide a dictionary-like interface to any class. You can customise how this works by setting your own
sub-class of GetterDict
as the value of Config.getter_dict
(see config).
You can also customise class validation using root_validators with pre=True
.
In this case your validator function will be passed a GetterDict
instance which you may copy and modify.
Error Handling🔗
pydantic will raise ValidationError
whenever it finds an error in the data it's validating.
Note
Validation code should not raise ValidationError
itself, but rather raise ValueError
, TypeError
or
AssertionError
(or subclasses of ValueError
or TypeError
) which will be caught and used to populate
ValidationError
.
One exception will be raised regardless of the number of errors found, that ValidationError
will
contain information about all the errors and how they happened.
You can access these errors in a several ways:
e.errors()
- method will return list of errors found in the input data.
e.json()
- method will return a JSON representation of
errors
. str(e)
- method will return a human readable representation of the errors.
Each error object contains:
loc
- the error's location as a list. The first item in the list will be the field where the error occurred, and if the field is a sub-model, subsequent items will be present to indicate the nested location of the error.
type
- a computer-readable identifier of the error type.
msg
- a human readable explanation of the error.
ctx
- an optional object which contains values required to render the error message.
As a demonstration:
from typing import List from pydantic import BaseModel, ValidationError, conint class Location(BaseModel): lat = 0.1 lng = 10.1 class Model(BaseModel): is_required: float gt_int: conint(gt=42) list_of_ints: List[int] = None a_float: float = None recursive_model: Location = None data = dict( list_of_ints=['1', 2, 'bad'], a_float='not a float', recursive_model={'lat': 4.2, 'lng': 'New York'}, gt_int=21, ) try: Model(**data) except ValidationError as e: print(e) """ 5 validation errors for Model is_required field required (type=value_error.missing) gt_int ensure this value is greater than 42 (type=value_error.number.not_gt; limit_value=42) list_of_ints -> 2 value is not a valid integer (type=type_error.integer) a_float value is not a valid float (type=type_error.float) recursive_model -> lng value is not a valid float (type=type_error.float) """ try: Model(**data) except ValidationError as e: print(e.json()) """ [ { "loc": [ "is_required" ], "msg": "field required", "type": "value_error.missing" }, { "loc": [ "gt_int" ], "msg": "ensure this value is greater than 42", "type": "value_error.number.not_gt", "ctx": { "limit_value": 42 } }, { "loc": [ "list_of_ints", 2 ], "msg": "value is not a valid integer", "type": "type_error.integer" }, { "loc": [ "a_float" ], "msg": "value is not a valid float", "type": "type_error.float" }, { "loc": [ "recursive_model", "lng" ], "msg": "value is not a valid float", "type": "type_error.float" } ] """
(This script is complete, it should run "as is". json()
has indent=2
set by default, but I've tweaked the
JSON here and below to make it slightly more concise.)
Custom Errors🔗
In your custom data types or validators you should use ValueError
, TypeError
or AssertionError
to raise errors.
See validators for more details on use of the @validator
decorator.
from pydantic import BaseModel, ValidationError, validator class Model(BaseModel): foo: str @validator('foo') def name_must_contain_space(cls, v): if v != 'bar': raise ValueError('value must be "bar"') return v try: Model(foo='ber') except ValidationError as e: print(e.errors()) """ [ { 'loc': ('foo',), 'msg': 'value must be "bar"', 'type': 'value_error', }, ] """
(This script is complete, it should run "as is")
You can also define your own error classes, which can specify a custom error code, message template, and context:
from pydantic import BaseModel, PydanticValueError, ValidationError, validator class NotABarError(PydanticValueError): code = 'not_a_bar' msg_template = 'value is not "bar", got "{wrong_value}"' class Model(BaseModel): foo: str @validator('foo') def name_must_contain_space(cls, v): if v != 'bar': raise NotABarError(wrong_value=v) return v try: Model(foo='ber') except ValidationError as e: print(e.json()) """ [ { "loc": [ "foo" ], "msg": "value is not \"bar\", got \"ber\"", "type": "value_error.not_a_bar", "ctx": { "wrong_value": "ber" } } ] """
(This script is complete, it should run "as is")
Helper Functions🔗
Pydantic provides three classmethod
helper functions on models for parsing data:
parse_obj
: this is very similar to the__init__
method of the model, except it takes a dict rather than keyword arguments. If the object passed is not a dict aValidationError
will be raised.parse_raw
: this takes a str or bytes and parses it as json, then passes the result toparse_obj
. Parsing pickle data is also supported by setting thecontent_type
argument appropriately.parse_file
: this reads a file and passes the contents toparse_raw
. Ifcontent_type
is omitted, it is inferred from the file's extension.
import pickle from datetime import datetime from pydantic import BaseModel, ValidationError class User(BaseModel): id: int name = 'John Doe' signup_ts: datetime = None m = User.parse_obj({'id': 123, 'name': 'James'}) print(m) #> id=123 signup_ts=None name='James' try: User.parse_obj(['not', 'a', 'dict']) except ValidationError as e: print(e) """ 1 validation error for User __root__ User expected dict not list (type=type_error) """ # assumes json as no content type passed m = User.parse_raw('{"id": 123, "name": "James"}') print(m) #> id=123 signup_ts=None name='James' pickle_data = pickle.dumps({ 'id': 123, 'name': 'James', 'signup_ts': datetime(2017, 7, 14) }) m = User.parse_raw(pickle_data, content_type='application/pickle', allow_pickle=True) print(m) #> id=123 signup_ts=datetime.datetime(2017, 7, 14, 0, 0) name='James'
(This script is complete, it should run "as is")
Warning
To quote the official pickle
docs,
"The pickle module is not secure against erroneous or maliciously constructed data.
Never unpickle data received from an untrusted or unauthenticated source."
Info
Because it can result in arbitrary code execution, as a security measure, you need
to explicitly pass allow_pickle
to the parsing function in order to load pickle
data.
Creating models without validation🔗
pydantic also provides the construct()
method which allows models to be created without validation this
can be useful when data has already been validated or comes from a trusted source and you want to create a model
as efficiently as possible (construct()
is generally around 30x faster than creating a model with full validation).
Warning
construct()
does not do any validation, meaning it can create models which are invalid. You should only
ever use the construct()
method with data which has already been validated, or you trust.
from pydantic import BaseModel class User(BaseModel): id: int age: int name: str = 'John Doe' original_user = User(id=123, age=32) user_data = original_user.dict() print(user_data) #> {'id': 123, 'age': 32, 'name': 'John Doe'} fields_set = original_user.__fields_set__ print(fields_set) #> {'age', 'id'} # ... # pass user_data and fields_set to RPC or save to the database etc. # ... # you can then create a new instance of User without # re-running validation which would be unnecessary at this point: new_user = User.construct(_fields_set=fields_set, **user_data) print(repr(new_user)) #> User(name='John Doe', id=123, age=32) print(new_user.__fields_set__) #> {'age', 'id'} # construct can be dangerous, only use it with validated data!: bad_user = User.construct(id='dog') print(repr(bad_user)) #> User(name='John Doe', id='dog')
(This script is complete, it should run "as is")
The _fields_set
keyword argument to construct()
is optional, but allows you to be more precise about
which fields were originally set and which weren't. If it's omitted __fields_set__
will just be the keys
of the data provided.
For example, in the example above, if _fields_set
was not provided,
new_user.__fields_set__
would be {'id', 'age', 'name'}
.
Generic Models🔗
Note
New in version v0.29.
This feature requires Python 3.7+.
Pydantic supports the creation of generic models to make it easier to reuse a common model structure.
In order to declare a generic model, you perform the following steps:
- Declare one or more
typing.TypeVar
instances to use to parameterize your model. - Declare a pydantic model that inherits from
pydantic.generics.GenericModel
andtyping.Generic
, where you pass theTypeVar
instances as parameters totyping.Generic
. - Use the
TypeVar
instances as annotations where you will want to replace them with other types or pydantic models.
Here is an example using GenericModel
to create an easily-reused HTTP response payload wrapper:
from typing import Generic, TypeVar, Optional, List from pydantic import BaseModel, validator, ValidationError from pydantic.generics import GenericModel DataT = TypeVar('DataT') class Error(BaseModel): code: int message: str class DataModel(BaseModel): numbers: List[int] people: List[str] class Response(GenericModel, Generic[DataT]): data: Optional[DataT] error: Optional[Error] @validator('error', always=True) def check_consistency(cls, v, values): if v is not None and values['data'] is not None: raise ValueError('must not provide both data and error') if v is None and values.get('data') is None: raise ValueError('must provide data or error') return v data = DataModel(numbers=[1, 2, 3], people=[]) error = Error(code=404, message='Not found') print(Response[int](data=1)) #> data=1 error=None print(Response[str](data='value')) #> data='value' error=None print(Response[str](data='value').dict()) #> {'data': 'value', 'error': None} print(Response[DataModel](data=data).dict()) """ { 'data': {'numbers': [1, 2, 3], 'people': []}, 'error': None, } """ print(Response[DataModel](error=error).dict()) """ { 'data': None, 'error': {'code': 404, 'message': 'Not found'}, } """ try: Response[int](data='value') except ValidationError as e: print(e) """ 2 validation errors for Response[int] data value is not a valid integer (type=type_error.integer) error must provide data or error (type=value_error) """
(This script is complete, it should run "as is")
If you set Config
or make use of validator
in your generic model definition, it is applied
to concrete subclasses in the same way as when inheriting from BaseModel
. Any methods defined on
your generic class will also be inherited.
Pydantic's generics also integrate properly with mypy, so you get all the type checking
you would expect mypy to provide if you were to declare the type without using GenericModel
.
Note
Internally, pydantic uses create_model
to generate a (cached) concrete BaseModel
at runtime,
so there is essentially zero overhead introduced by making use of GenericModel
.
If the name of the concrete subclasses is important, you can also override the default behavior:
from typing import Generic, TypeVar, Type, Any, Tuple from pydantic.generics import GenericModel DataT = TypeVar('DataT') class Response(GenericModel, Generic[DataT]): data: DataT @classmethod def __concrete_name__(cls: Type[Any], params: Tuple[Type[Any], ...]) -> str: return f'{params[0].__name__.title()}Response' print(Response[int](data=1)) #> data=1 print(Response[str](data='a')) #> data='a'
(This script is complete, it should run "as is")
Dynamic model creation🔗
There are some occasions where the shape of a model is not known until runtime. For this pydantic provides
the create_model
method to allow models to be created on the fly.
from pydantic import BaseModel, create_model DynamicFoobarModel = create_model('DynamicFoobarModel', foo=(str, ...), bar=123) class StaticFoobarModel(BaseModel): foo: str bar: int = 123
Here StaticFoobarModel
and DynamicFoobarModel
are identical.
Fields are defined by either a tuple of the form (<type>, <default value>)
or just a default value. The
special key word arguments __config__
and __base__
can be used to customise the new model. This includes
extending a base model with extra fields.
from pydantic import BaseModel, create_model class FooModel(BaseModel): foo: str bar: int = 123 BarModel = create_model( 'BarModel', apple='russet', banana='yellow', __base__=FooModel, ) print(BarModel) #> <class 'BarModel'> print(BarModel.__fields__.keys()) #> dict_keys(['foo', 'bar', 'apple', 'banana'])
Custom Root Types🔗
Pydantic models which do not represent a dict
("object" in JSON parlance) can have a custom
root type defined via the __root__
field. The root type can be of any type: list, float, int, etc.
The root type is defined via the type hint on the __root__
field.
The root value can be passed to model __init__
via the __root__
keyword argument or as
the first and only argument to parse_obj
.
from typing import List import json from pydantic import BaseModel from pydantic.schema import schema class Pets(BaseModel): __root__: List[str] print(Pets(__root__=['dog', 'cat'])) #> __root__=['dog', 'cat'] print(Pets(__root__=['dog', 'cat']).json()) #> ["dog", "cat"] print(Pets.parse_obj(['dog', 'cat'])) #> __root__=['dog', 'cat'] print(Pets.schema()) """ { 'title': 'Pets', 'type': 'array', 'items': {'type': 'string'}, } """ pets_schema = schema([Pets]) print(json.dumps(pets_schema, indent=2)) """ { "definitions": { "Pets": { "title": "Pets", "type": "array", "items": { "type": "string" } } } } """
Faux Immutability🔗
Models can be configured to be immutable via allow_mutation = False
. When this is set, attempting to change the
values of instance attributes will raise errors. See model config for more details on Config
.
Warning
Immutability in python is never strict. If developers are determined/stupid they can always modify a so-called "immutable" object.
from pydantic import BaseModel class FooBarModel(BaseModel): a: str b: dict class Config: allow_mutation = False foobar = FooBarModel(a='hello', b={'apple': 'pear'}) try: foobar.a = 'different' except TypeError as e: print(e) #> "FooBarModel" is immutable and does not support item assignment print(foobar.a) #> hello print(foobar.b) #> {'apple': 'pear'} foobar.b['apple'] = 'grape' print(foobar.b) #> {'apple': 'grape'}
Trying to change a
caused an error, and a
remains unchanged. However, the dict b
is mutable, and the
immutability of foobar
doesn't stop b
from being changed.
Abstract Base Classes🔗
Pydantic models can be used alongside Python's Abstract Base Classes (ABCs).
import abc from pydantic import BaseModel class FooBarModel(BaseModel, abc.ABC): a: str b: int @abc.abstractmethod def my_abstract_method(self): pass
(This script is complete, it should run "as is")
Field Ordering🔗
Field order is important in models for the following reasons:
- validation is performed in the order fields are defined; fields validators can access the values of earlier fields, but not later ones
- field order is preserved in the model schema
- field order is preserved in validation errors
- field order is preserved by
.dict()
and.json()
etc.
As of v1.0 all fields with annotations (whether annotation-only or with a default value) will precede all fields without an annotation. Within their respective groups, fields remain in the order they were defined.
from pydantic import BaseModel, ValidationError class Model(BaseModel): a: int b = 2 c: int = 1 d = 0 e: float print(Model.__fields__.keys()) #> dict_keys(['a', 'c', 'e', 'b', 'd']) m = Model(e=2, a=1) print(m.dict()) #> {'a': 1, 'c': 1, 'e': 2.0, 'b': 2, 'd': 0} try: Model(a='x', b='x', c='x', d='x', e='x') except ValidationError as e: error_locations = [e['loc'] for e in e.errors()] print(error_locations) #> [('a',), ('c',), ('e',), ('b',), ('d',)]
(This script is complete, it should run "as is")
Warning
As demonstrated by the example above, combining the use of annotated and non-annotated fields in the same model can result in surprising field orderings. (This is due to limitations of python)
Therefore, we recommend adding type annotations to all fields, even when a default value would determine the type by itself to guarentee field order is preserved.
Required fields🔗
To declare a field as required, you may declare it using just an annotation, or you may use an ellipsis (...
)
as the value:
from pydantic import BaseModel, Field class Model(BaseModel): a: int b: int = ... c: int = Field(...)
(This script is complete, it should run "as is")
Where Field
refers to the field function.
Here a
, b
and c
are all required. However, use of the ellipses in b
will not work well
with mypy, and as of v1.0 should be avoided in most cases.
Data Conversion🔗
pydantic may cast input data to force it to conform to model field types, and in some cases this may result in a loss of information. For example:
from pydantic import BaseModel class Model(BaseModel): a: int b: float c: str print(Model(a=3.1415, b=' 2.72 ', c=123).dict()) #> {'a': 3, 'b': 2.72, 'c': '123'}
(This script is complete, it should run "as is")
This is a deliberate decision of pydantic, and in general it's the most useful approach. See here for a longer discussion on the subject.