How to write easily customizable code?

For example, how you to convert str to class in Python?

alex_ber
9 min readMay 13, 2020

In this article I will walk through utility that I wrote about year ago. I will also mention some enhancement that I did recently. I will give you also some general advice how to improve yourself. The reminder of the article will be some theoretical background.

Motivation

I want to start with the question why do you want to do import programmatically?

  1. You have some configuration file and you want to write some value that represents the class in your code.
  2. You want your code to be easily customizable.

For p.1 it can be yml file, ini file, cfg file or command line argument, it doesn’t really matter. For example, here I wrote Rocker Paper Scissors game, I have configuration file where I want to encode the player. It can be human player or some implemented strategy. Anyway, I should be able to read this configuration and eventually make instance of this class.

Another example: yml logger configuration. Let’s see a little snippet.

Let’s look on the console’s class. It is logging.StreamHandler.

Let’s look on the all_file_handler class. It is logging.handlers.TimedRotatingFileHandler

One way to achieve it, is to use some API that will mimic what happen when you write

import logging.StreamHandler
import logging.handlers.TimedRotatingFileHandler

Of course, in case of logging, you can find another solution, but if you want that logging framework to be extensible/customizable, that is if you want to
enable anybody to write it’s own handler, it is better to use some sort of importing.

Example of API usage

  1. You can import class.

This code prints True, OrderedDict([(‘b’, 2), (‘a’, 1)])

The interesting part is in line 5. We’re passing string and we’re receiving string literal.

Another example:

This code prints True.

2. While importing class is typical use-case, you can actually import any Python construct. For example, you can import specific method within class.

Here we imported cwd() method from Path class within pathlib module.

3. importer module contains continent new_instance function. For classes, it will instantiate it with parameters you’re passing in (as *args and **kwargs), for any other Python constructor it will execute importer logic and returned the imported thing.

For example, the code above can be rewritten as:

In line 4 method is (bounded) method. In line 5 we make actual call to it and we will receive the answer.

For classes new_instance will returned instantiated instance as if we combines lines 4 and 5.

Prints [(‘a’, 5), (‘b’, 4), (‘c’, 3)] and 5.

Note: Based on documentation of the collections.Counter class.

Again, new_instance() has optional *args and **kwargs that will be forwarded to the class constructor.

Note: importer() function will recursively import packages (if needed)
following dot notation from left to right. If the component exists in package (is defined and imported) it will be used, otherwrise, it will be imported.

Note:PEP 420 (implicit Namespace Packages) was added to importer() function recently.

PEP 420 implementation is available from Python 3.3 onwards.

You can see example of such package here: splitpackage, splitpackage_cont

This is the file structure:

Directory layout

What makes regular package “special” is __path__ attribute that is set in package’s __init__.py file.

import os as _os
__path__.append(_os.path.join(_os.path.dirname(_os.path.dirname(__file__)), 'splitpackage_cont'))

There you can set path to another directory, that it’s files will be considered as part of the current package.

For example, splitpackage_cont directory has module other_module.py with function theansweris().

The code prints True 3 times.

Note: that other_module on the filesystem is not laying inside splitpackage directory. Nevertheless, Python recongises it as such, see line 10.

importer function behaves in the same way (see line 4).

Theoretical background

Here I’ve written some utility that run subprocess and logs it’s output to the logger.

The source code you can found here. It is available as part of my AlexBerUtils s project.

You can install AlexBerUtils from PyPi:

python3 -m pip install -U alex-ber-utils

See here for more details explanation on how to install.

I will write detail explanation about it in my next article.

When I wrote it, I have very specific use-case in mind, I have some long-running subprocess that writes to it’s stdout unbound amount of data (you can’t store it memory) and I want just to log it’s output.

Indeed, one of my first versions did just this, but later I ask myself, can I easily make my code customizable? I have a class, LogPipe that receive output from subprocess line by line and just write it to logger. There is no BasePipe or FilePipe class. If I will not do nothing, the user of my code can simply monkey-patching my class, effectively he can replace my class with user’s own. For example, he can do something like :

Note: this example is somewhat vague. For example, method cleanUp() didn’t exists before class was splited. In such a case, I would need to override close() method.

As you see, this code is at very least not trivial (for example, note in line 7 use of super(LogPipe, self).This line is written inside LogPipPatched class, so it should be super(LogPipePatche, self) or even just super().(Of course, this because of assignment at line 19, see great article Python’s super() considered super! for detail explanation.).

Side note: one could just write BasePipe.__init__(self, *args, **kwargs) , but it is actually worse, because visually, it seems wrong (one would expect this line to be LogPipe.__init__(self, *args, **kwargs)and besides that, if inheritance hierarchy of the code will change this code can break. You can read more about this here The Python 2.3 Method Resolution Order https://www.python.org/download/releases/2.3/mro/

As a library author I don’t want to force my users to write code like this (but sometimes I make tricks like this).

As a library writer, I want enable the user to change code behaviour easily. So, I did two things:

  1. I break my class into base class and actual class.
  2. I provide easy mechanism to switch the usage from the provided default class to user defined one.

As of p.1 it is good idea anyway to break code into smaller pieces, this makes code cleaner.

In the reminder of the article I want to focus on p.2.

So, now you all classes available at your disposal.

You want to customize behavior of the library. For demonstration purposes let’s suppose that you just want to hard-code some filename. So, you will create your custom class, something like:

Now, you have 4 options.

  1. Monkey-patch LogPipe .
  2. Assign new value to default_logpipe_cls.
  3. Pass user’s custom class to initConfig() function.
  4. Pass user’s custom class to run_sub_process() function.

First option was described above.

Sometimes, you have something like global variable. For example, in open source “HiYaPyCo — A Hierarchical Yaml Python Config” there are (see
https://github.com/zerwes/hiyapyco/blob/master/hiyapyco/__init__.py#L66 ) following lines:

# you may set this to something suitable for you                       jinja2env = Environment(undefined=Undefined)

Here the author of HiYaPyCo explicitly says that the user’s of his library can change jinja2env global variable.

Note: Setting global (module) variable to non-immutable object has another undesired effect — it can be initialized too early. In this example, we’re using Environment object that takes it’s default value from jinja2.defaults module. If somebody will change default value their, your globally defined jinja2envmay still use old one (it depends in the execution order of your code, practically on the import order) (I know, that it is unlikely to happen in practice, I bring this case to make a point). If you postpone initialization to real non-immutable object, it will happen after all changes are applied (after initialization faze is finished, practically, when all imports are done).

Note: In general, don’t initialize global variable on something that may change during execution of the code, including another (system) global variables such as os.environ.

So, processinvokes.py has public global variable default_logpipe_cls.By default it will be LogPipe, but you can set it point to another value, something like this:

I don’t like this way, but this my personal preference. I will shortly outline why, but I’m not insisting on this. This approach is widely used in Python ecosystem (HiYaPyCo is far from being isolated example).

There are 3 main reasons, why I dislike such approach.

  1. It is manipulated of global state (from outside of the module).
  2. You can’t easily add validation on the assignment step.
  3. Module can’t ensure that change to related values are done atomically.

As of p.1 manipulated of global state considered evil because of multi-threaded problem and problems on reasoning about what is going on. If you have many modules that occasionally updates global state of the other module, you can easily step on fingers of one another.

As of p.2…

On the class you define property, that syntactically will look like regular variable assignment, but you can provide custom code to be executed on setting the value (writing to the value).

For example, this code will work:

Note: You can read about property in Python here and to see
how property() is implemented a pure Python equivalent here.

After Python 3.5 you you have following way:

import sys, types
class _MyModuleType(types.ModuleType):
@property
def prop(self, instance, owner):
...

sys.modules[__name__].__class__ = _MyModuleType

© https://www.python.org/dev/peps/pep-0549/

Note: this PEP was rejected, I’m just cited from their.

There’re some other limitations for this approach:

1._init__() method on MyModuleType is not called, attributes are not initialized.

2. self point to the module, not the MyModuleType. So, self.f will reference to the field on the module.

After Python 3.7 there is simple way (See PEP 562) to get property on the module. But you can’t set them.

There is way that (kind of) works, but it looks very hacky (and still has limitation):

I see such a code in real projects. We actually, substitute module with single class instance. In such a code __init__() method is executed (but you can’t get any dynamic parameters to it!), self is pointed to the class. Even your class can inherit from another class.

Note: If you want to improve the quality of your code, read code of another projects, preferable popular open source one with many users.

Anyway, my processinvokes.py module support such direct manipulation with global variables.

Note: using my API in a such a way don’t give you any guarantees on atomicity on change, and no checks on post-condition / state consistency.

Besides, direct assignment to (module’s) global variable, my module support another 2 methods. Now, we will focus on initCofig() function.

Note: I know, this is not standard at Python. So, I made usage of such function optional. It is invoked automatically when you imports my module. It overwrites all (module) global variables at once. It is also idempotent — invoking this function twice with the same argument with the same result. So, you can invoke it later, optionally, with the same parameters as it was initialized.

Side note: Such design is inspired by Inversion of Control (IoC) containers, that use Dependency Injection (DI) as a tool, specifically Spring (in Java Language). When your application is based on DI, you have application lifecycle, you have at least initialization phase and execution phase.

In the initialization phase you’re instantiated your object and make wiring: if one objects use another ones, it receive reference to them in initialization phase and then use it in execution phase. You’re not instantiated objects in regular way as you go.

Note: This initConfig() is supposed to be called when your application starts to execute.. In Python it means, early in your main() function, before you’ve started to execute your business logic, in the same place where you’re making initialization for the logging.

Let’s look on code example:

The intended usage is, that you make call to initConig() at the beginning of the application and then you make call to run_sub_process()where do you need. Of course, you can make both calls one after the other.

The interesting parts happens in line 11.

You can pass class class literal MyPipe(as in examples above). This will also works. But you can also pass the string representation of the class. Code inside initConig()will convert your str to class if needed (see examples usage section above).

Note: The actually code looks a bit strange because I have class definition and passing the class literal in the same module. If will also run this module, the name of the module is __main__, if I run another module it will be the name of the module. Using __name__ handles both cases.

Now we will see last variant. You can pass default_logpipe_cls to run_sub_process. In general, you shouldn’t mix your initialization logic with execution, but in Python this is pretty common.

As before you can pass both class literal or string, here, at line 11, I’m passing class literal.

This particular module has only 1 public function, so one can argue that merging two function calls to one simplifies usage.

In other module, you can have a case when you can have some default parameter in initConfig() and you can, optionally, override it in specific function calls (see, for example, implicit_convert in init_app_conf.py module, you can make it,say, True, buy default, but when you can to_convex_map() , for example, you can implicit_convert=False (because you have correct types already).

--

--