0%

描述符

FROM Python how to descriptor

描述符是什么?


A descriptor is what we call any object that defines __get__(), __set__(), or __delete__().
只要某个对象有如上三个特殊方法之一,就是一个描述符对象。

描述符的目的


The main motivation for descriptors is to provide a hook allowing objects stored in class variables to control what happens during attribute lookup.

只有类属性时才有起作用


Descriptors only work when used as class variables. When put in instances, they have no effect.

__set_name__(self, owner, name)


Optionally, descriptors can have a __set_name__() method. This is only used in cases where a descriptor needs to know either the class where it was created or the name of class variable it was assigned to.
(This method, if present, is called even if the class is not a descriptor.)

__set_name__()有两个目的

  • 一是告诉 [描述符] 对象它在哪个类里面被实例化了
  • 二是 [描述符] 对象被分配到哪个类变量名中

调用


通过成员运算符会触发描述符相关方法的调用
Descriptors get invoked by the dot operator during attribute lookup. If a descriptor is accessed indirectly with varssome_class, the descriptor instance is returned without invoking it.

Descriptors are used throughout the language. It is how functions turn into bound methods. Common tools like classmethod(), staticmethod(), property(), and functools.cached_property() are all implemented as descriptors.


Demos


simple example


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Ten:
def __get__(self, obj, objtype):
print(obj, objtype)
return 10


# To use the descriptor, it must be stored as a class variable in another class:
class A:
x = 5 # regular class attribute
y = Ten() # Descriptor instance


def t1():
a = A()
a.x # normal attribute lookup, class dictionary
a.y # descriptor lookup, not in class dictionnary and not in instance dictionary, the value is computed on demand

Dynamic lookups


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import os
class DirectorySize:
def __get__(self, instance, owner):
return len(os.listdir(instance.dirname))


class Direcory:
size = DirectorySize()

def __init__(self, dirname):
self.dirname = dirname


def t2():
s = Direcory('.')
ss = Direcory('..')

print(s.size)
print(ss.size)

Managed attributes


A popular use for descriptors is managing access to instance data. The descriptor is assigned to a public attribute in the class dictionary while the actual data is stored as a private attribute in the instance dictionary. The descriptor’s __get__() and __set__() methods are triggered when the public attribute is accessed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import logging
logging.basicConfig(level=logging.INFO)


class LoggedAgeAccess:

def __get__(self, instance, owner):
value = instance._age
logging.info('Accessing %r giving %r', 'age', value)
return value

def __set__(self, instance, value):
logging.info('Updating %r to %r ', 'age', value)
instance._age = value


class Person:
age = LoggedAgeAccess() # managed attribute

def __init__(self, name, age):
self.name = name
self.age = age # Calls __set__()

def birthday(self):
self.age += 1 # Calls both __get__() and __set()___


def t3():
# all access to the managed attribute age is logged, but that the regular attribute name is not logged
"""
>>> marry = Person("Marry M", 30)
INFO:root:Updating 'age' to 30
>>> dave = Person('David D', 40)
INFO:root:Updating 'age' to 40
>>> vars(marry) # The actual data is in a private attribute
{'name': 'Marry M', '_age': 30}
>>> marry.birthday()
:return:
"""

Customized names


When a class uses descriptors, it can inform each descriptor about which variable name was used. In this example, the Person class has two descriptor instances, name and age. When the Person class is defined, it makes a callback to __set_name__() in LoggedAccess so that the field names can be recorded, giving each descriptor its own public_name and private_name

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import logging
logging.basicConfig(level=logging.INFO)

class LoggedAccess:
def __set_name__(self, owner, name):
print(owner, name)
self.public_name = name
self.private_name = '_' + name

def __get__(self, instance, owner):
value = getattr(instance, self.private_name)
logging.info('Accessing %r giving %r', self.public_name, value)
return value

def __set__(self, instance, value):
logging.info('Updating %r to %r', self.public_name, value)
setattr(instance, self.private_name, value)


class Person2:
name = LoggedAccess()
age = LoggedAccess()

def __init__(self, name, age):
self.name = name
self.age = age

完整的字段校验的例子


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from abc import ABC, abstractmethod


class Validator(ABC):
def __set_name__(self, owner, name):
self.private_name = '_' + name

def __get__(self, instance, owner):
return getattr(instance, self.private_name)

def __set__(self, instance, value):
self.validate(value)
setattr(instance, self.private_name, value)

@abstractmethod
def validate(self, value):
pass


class OneOf(Validator):
def __init__(self, *options):
self.options = set(options)

def validate(self, value):
if value not in self.options:
raise ValueError(f'Expected {value!r} to be one of {self.options!r}')


class Number(Validator):
def __init__(self, minvalue=None, maxvalue=None):
self.minvalue = minvalue
self.maxvalue = maxvalue

def validate(self, value):
if not isinstance(value, (int, float)):
raise TypeError(f'Expected {value!r} to be an int or float')
if self.minvalue is not None and value < self.minvalue:
raise ValueError(f'Expected {value!r} to be at least {self.minvalue!r}')
if self.maxvalue is not None and value > self.maxvalue:
raise ValueError(f'Expected {value!r} to be no more than {self.maxvalue!r}')


class String(Validator):
def __init__(self, minsize=None, maxsize=None, predicate=None):
self.minsize = minsize
self.maxsize = maxsize
self.predicate = predicate

def validate(self, value):
if not isinstance(value, str):
raise TypeError(f'Expected {value!r} to be an str')
if self.minsize is not None and len(value) < self.minsize:
raise ValueError(f'Expected {value!r} to be no smaller than {self.minsize!r}')
if self.maxsize is not None and len(value) > self.maxsize:
raise ValueError(f'Expected {value!r} to be no bigger than {self.maxsize!r}')
if self.predicate is not None and not self.predicate(value):
raise ValueError(f'Expected {self.predicate} to be true for {value!r}')


class Component:
name = String(minsize=3, maxsize=10, predicate=str.isupper)
kind = OneOf('wood', 'metal', 'plastic')
quantity = Number(minvalue=0)

def __init__(self, name, kind, quantity):
self.name = name
self.kind = kind
self.quantity = quantity

技术细节 ⭐


类属性中某个属性值定义了 descriptor protocol 中某个方法,这个属性就是个描述符对象。
The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.

For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the method resolution order of type(a). If the looked-up value (应该是类属性) is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead.

Where this occurs in the precedence chain depends on which descriptor methods were defined.

a.x lookup chain a.__dict__['x'] ==> type(a).__dict__['x'] ==> mro(a).__dict__['x']

protocol


  • descr.__get__(self, obj, type=None) -> value
  • descr.__set__(self, obj, value) -> None
  • descr.__delete__(self, obj) -> None

data descriptor define __set__ or __delete__

non-data descriptor only define __get__

区别:DIFFS

Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary.

  • If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence.
  • If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.

To make a read-only data descriptor, define both __get__() and __set__()with the __set__() raising an AttributeError
when called. Defining the __set__() method with an exception raising placeholder is enough to make it a data descriptor.

如何初始化值呢?

  1. 提供一个方法直接操作 __dict__;

描述符的调用 ⭐


obj.x 关于描述符的调用顺序取决 obj 是什么,object, class, or instance of super?

invocation from an object ⭐


instance.x 表达式会发生如下操作,数据描述符描述符具有最高优先级

  1. data descriptors
  2. instance variables
  3. non-data descriptors
  4. class variables
  5. __getattr__()

The logic for a dotted lookup is in object.__getattribute__()

注意:__getattribute__ 中没有 __getattr__() hook

That is why calling __getattribute__() directly or with super().__getattribute__ will bypass __getattr__() entirely.

dot operator 和 getattr() 函数负责当 __getarribute__() 抛出 AttributeError 时调用 __getattr__. 这些逻辑封装在一个helper function 中。

__getattribute__ 几乎每次都会触发,只有在该函数跑出 AttributeError 时,__getattr__ 才会被调用(最后一步尝试)。

描述符的逻辑在 __getattribute__ 函数内,最好谨慎覆盖该函数,一般都用 __getattr__

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def find_name_in_mro(cls, name, default):
for base in cls.__mro__:
if name in vars(base):
return vars(base)[name]
return default


def object_getattribute(obj, name):
null = object()
objtype = type(obj) # 对象的类
cls_var = find_name_in_mro(objtype, name, null) # 类的属性
descr_get = getattr(type(cls_var), '__get__', null) # 是否定义 __get__
if descr_get is not null:
if hasattr(type(cls_var), '__set__') or (hasattr(type(cls_var), '__delete__')):
return descr_get(cls_var, obj, objtype) # 数据描述符
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name] # instance variable
if descr_get is not null:
return descr_get(cls_var, obj, objtype) # 非数据描述符
if cls_var is not null:
return cls_var # class variable
raise AttributeError(name)

# helper function
def getattr_hook(obj, name):
"Emulate slot_tp_getattr_hook() in Objects/typeobject.c"
try:
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__'):
raise
return type(obj).__getattr__(obj, name) # __getattr__

invocation from a class

desc.__get__(None, A)

invocation from super

B.__dict__['m'].__get__(obj, A)

总结 ⭐


描述符的机制嵌入在 object、type、super()的 __getattribute__ 方法中

  1. Descriptors are invoked by the __getattribute__() method.
  2. Classes inherit this machinery from object, type, or super().
  3. Overriding __getattribute__() prevents automatic descriptor calls because all the descriptor logic is in that method.
  4. object.__getattribute__() and type.__getattribute__() make different calls to __get__(). The first includes the
    instance and may include the class. The second puts in None for the instance and always includes the class.
  5. Data descriptors always override instance dictionaries.
  6. Non-data descriptors may be overridden by instance dictionaries.

Automatic name notification


发生在类定义的时候由元类调用 __set_name__(),如果在类定义后手动添加 descriptor,则需要手动调用 __set_name__.

Sometimes it is desirable for a descriptor to know what class variable name it was assigned to.

When a new class is created, the type metaclass scans the dictionary of the new class. If any of the entries are descriptors and if they define __set_name__(), that method is called with two arguments. The owner is the class where the descriptor is used, and the name is the class variable the descriptor was assigned to.

Since the update logic is in type.__new__(), notifications only take place at the time of class creation.
If descriptors are added to the class afterwards, __set_name__() will need to be called manually.

python 中使用描述符的例子


The descriptor protocol is simple and offers exciting possibilities. Several use cases are so common that they have been prepackaged into built-in tools. Properties, bound methods, static methods, class methods, and __slots__ are all based on the descriptor protocol.

  1. properties

    Calling property() is a succinct(简洁) way of building a data descriptor that triggers a function call upon access to an attribute. Its signature is: property(fget=None, fset=None, fdel=None, doc=None) -> property

  2. Functions and methods

    Python’s object oriented features are built upon a function based environment. Using non-data descriptors, the two are merged seamlessly.
    Functions stored in class dictionaries get turned into methods when invoked. Methods only differ from regular functions in that the object instance is prepended to the other arguments. By convention, the instance is called self but could be called this or any other variable name.

  3. Static method

    不需要访问实例的属性

  4. Class method

    仅仅需要访问类的属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class MethodType:
def __init__(self, func, obj):
self.__func__ = func
self.__self__ = obj

def __call__(self, *args, **kwargs):
func = self.__func__
obj = self.__self__
return func(obj, *args, **kwargs)

class Function:
def __get__(self, instance, owner):
if instance is None:
return self # 普通函数
return MethodType(self, instance) # 方法

class StaticMethod:
def __init__(self, f):
self.f = f

def __get__(self, instance, owner=None):
return self.f

def __call__(self, *args, **kwargs):
return self.f(*args, **kwargs)

class ClassMethod:
def __init__(self, f):
self.f = f

def __get__(self, instance, cls=None):
if cls is None:
cls = type(instance)
if hasattr(type(self.f), '__get__'):
return self.f.__get__(cls, cls)
return MethodType(self.f, cls)

__slots__


当一个 class 定义了 __slots__ 属性后,它会用一个定长的数组替代对象的字典来保存数据,从用户的角度看有几点影响:

  1. 快速定位属性名拼写错误, because only attribute names specified in __slots__ are allowd
  2. Helps create immutable objects where descriptors manage access to private attributes stored in __slots__
  3. Saves memory On a 64-bit Linux build, an instance with two attributes takes 48 bytes with __slots__ and 152 bytes without.
  4. Improves speed
  5. Block tools like functools.cached_property() which require an instance dictionary to function correctly 拦截某些工具

我们不可能使用 python 来实现 __slots__ 的功能,因为需要涉及到 c 的数据结构和内存分配,但是可以使用描述符模拟这种行为。代码略