My processinvokes module

Description of high-level API module at AlexBerUtils project

9 min readSep 10, 2020

The source code you can found here. It is available as part of my AlexBerUtils s project.

You can install AlexBerUtils from PyPi:

python3 -m pip install -U alex-ber-utils

See here for more details explanation on how to install.

Brief Description

The main motivation for processInvokes module is following: You have script in Python that launches in another process some long-living task, something that take time and may fail. You want to have an ability to monitor what’s going on in sub-process. The easiest way is to redirect stdout and stderr of subprocess to logging built-in module — you can than, save everything to the file, or write some filter and if you see ERROR the logger will send e-mail. If you need something simple, there is also variant that will bypass logging and will save stdout and stderr of subprocess to file-like object. Of course, you can also write your own variant.

processInvokes module has one primary function — run_sub_process(). This function run subprocess and logs it’s out to the logger. This method is sophisticated decorator to subprocess.run(). It is useful, when your subprocess run’s a lot of time and you’re interesting to receive it’s stdout and stderr. By default, it’s streamed to log. You can easily customize this behavior, see initConig() method.

Code Example

config.yml

config-dev.yml

config-local.yml

Put the files above to the same directory as the script above. Configuration files, such as config.yml should be also to the same directory.

Note:

It is not required to have multiple configuration file, 1 file may be enough.
You can also have no configuration file at all, and configure logger handler with logging API.

Lines 1–8 some generic import.

On line 3 we’re importing fixabscwd() function . See Making relative path to file to work.

On lines 10–11 we set initial value to the logger and process_invokes_logger. It will be used to redirect into it stdout and stderr of the subprocess. It will initialize with real value later, after parsing of configurations file will be done.

On lines 13–15 we’re extending init_app_conf.conf helper class, I’m adding some this example specific key to existing one.

On line 19 we have process_invokes() function, it will simulate running sub process with (almost) only default parameters.

On line 28 we have process_file_pipe() function, it will simulate running sub process with non-default parameters.

On line 44 we have run() function, it will extract application params and call process_invokes() and process_file_pipe().

On line 53 we have _log_config() function, this function will instantiate ((global) module level) logger and process_invokes_logger.

On line 69 we have main() function, that receives optional args parameter. It has default None value. If no explicit value is passed than sys.args will be used implicitly. This function can be called from another module.

On lines 87–88 we have standard code snippet: if this module executes as __main__ (and not imported to another module), than after all methods and (global) module-level attributes will be defined we’re calling main() function.

When this code is executed, after all methods and (global) module-level attributes are define, main() function is called with args=None.

On lines 70–72 I’m making temporary logger initialization, when all logs with level INFO and above are redirected to stderr, all warnings are redirected to stderr. This will be effective till we’ll parse log’s configuration and reinitialize logger. This is done, in order to see log’s output, especially warnings. See Integrating Python’s logging and warnings packages for more details.

On line 74 we makes relative path to file to work.

On lines 76–79 we’re parsing configuration files. See My major init_app_conf module for details.

On line 81 we’re passing it to _log_config() function. Their we’re poping general.log part from the dict config and we’re calling logging.config.dictConfig() to reinitialize logger. We’re initializing (global) module-level attributes logger (standard logger) process_invokes_logger (this logger will be used to redirect into it stdout and stderr of the subprocess). At the end we’re pretty-print parsed configuration. Uncommented line uses ymlparsers.as_str() as convenient method to print logger configuration to the logger. Commented-out line us pretty-print pprint module (it is available in standard Python). It has caveat, though.

Note: Actual type is OrderedDict and not dict, but it is mainly for historical reasons…
Side note: Up to (not included) Python 3.6 the order in which key/value are stored was undefined. In Python 3.6 it was stated that this is implementation detail of CPython (and best practice is not to relay on this behavior). Started from Python 3.7 dictionary order is guaranteed to be insertion order.
See https://stackoverflow.com/a/58676384/1137529 for the differences between dict and OrderedDict.

https://medium.com/analytics-vidhya/my-parser-module-429ed1457718

pprint was written way before 3.6. In order to produce consistent result, it sorts out dict by key. I have found this unappropriated, I want to see the configuration in exact same order as it defined in the configuration file. In Python 3.8, however, you can specify sort_dicts=False in order to disable this sorting functionality. So, if you’re using Python 3.8 or above this will also works.

On line 65 we just remove general.log from parsed result. You can change remove this line, if you want to keep log configuration.

On line 82 we’re calling run() method with our business logic.

On line 45 we’re logging “run()” message to regular log.

On line 46 we just get our application configuration for the usage in the application code.

On line 48 we’re calling process_invokes() function for the first simulation of running sub process.

On line 4 we’re calling process_file_pipe() function for the second simulation of running sub process.

On line 20 we’re emitting log message process_invokes().

On lines 21–22 we have

exp_log_msg = "simulating run_sub_process"
process_invoke_run = f"echo '{exp_log_msg}'"

process_invoke_runis quoted string, that will be used later toecho exp_log_msgto immulate long-running process that send something to stdout.

On line 22 we have

cmd = shlex.split(process_invoke_run)

cmd contains process_invoke_runas array. This form is suitable for the usage of the subprocess standard module.

Quote:

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings.
The shlex module defines the following functions:
shlex.split(s, comments=False, posix=True)
Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if the posix argument is false.

https://docs.python.org/3/library/shlex.html

Another quote:

On POSIX, if args [cmd in out case] is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program.
Note
It may not be obvious how to break a shell command into a sequence of arguments, especially in complex cases. shlex.split() can illustrate how to determine the correct tokenization for args:
>>>
>>> import shlex, subprocess
>>> command_line = input()
/bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'"
>>> args = shlex.split(command_line)
>>> print(args)
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args) # Success!
Note in particular that options (such as -input) and arguments (such as eggs.txt) that are separated by whitespace in the shell go in separate list elements, while arguments that need quoting or backslash escaping when used in the shell (such as filenames containing spaces or the echo command shown above) are single list elements.

https://docs.python.org/3/library/subprocess.html?highlight=subprocess#subprocess.Popen

On line 25 we have

process_invoke_cwd = _os.getcwd()

We’re storing current working directory to the variable.

On line 26 we have

processinvokes.run_sub_process(*cmd, **{'kwargs': {'cwd': process_invoke_cwd}})

We’re sending cmd that is echo command with message “simulating run_sub_process” in sequence of arguments representation (see above). The “simulating run_sub_process” message will be send to stodout of sub-process. It will be piped to stdout of our Python process by be processinvokes module (specifically to logging.getLogger(‘process_invoke_run’) because of processinvokes.initConfig(**{‘default_log_name’: ‘process_invoke_run’}) at line 61.

On line 29 we’re emitting log message process_file_pipe().

On lines 30–21 we have

exp_log_msg = "simulating run_sub_process"
process_invoke_run = f"echo '{exp_log_msg}'"

process_invoke_runis quoted string, that will be used later toecho exp_log_msgto immulate long-running process that send something to stdout.

On line 32 we have

cmd = shlex.split(process_invoke_run)

cmd contains process_invoke_runas array. This form is suitable for the usage of the subprocess standard module. See explanation about shlexabove.

On line 34 we have

process_invoke_cwd = _os.getcwd()

We’re storing current working directory to the variable.

On lines 35–36 we have:

This lines initialize processinvokes module with log name ‘process_invoke_run’ (this is the same as before) and with `alexber.utils.processinvokes.FilePipe’ as default_logpipe_cls. This means that `alexber.utils.processinvokes.FilePipe’ will be used instead of LogPipe (see details below)

On lines 38–40 we have

at lines 35–36.

Note: You can also pass log_name and logpipe_cls to run_sub_process without invokinginitConfig.

cwdthat we’re supplying will be forwarded to subprocess.run() and then to subprocess.Popen().We are provided our main process cwd to subprocess in order to resolve filename “my.log” to the working directory of the main process. This filename is provided as kwargs parameter of logpipe: logPipe={‘kwargs’: {‘fileName’: “my.log”}.

Now, let’s look on initConfig() method first.

This method can be optionally called prior any call to another function in this module. It is indented to be called in the MainThread. If running from the MainThread, this method is idempotent. This method can be call with empty params.

default_log_name: Optional. — name of the logger where the messages will be streamed to. Default values is: processinvokes.
default_log_level: Optional. — log level to be used in logger. Default values is: logging.INFO.
default_logpipe_cls: can be class or str. Optional. You can use your custom class for the logging. For example, FilePipe. Default values is: LogPipe.

Note: FilePipe is example how you can add your own custom class. While BasePipelooks complicated, you typically need to overwrite only few methods and you’re done.

default_log_subprocess_cls: can be class or str. Optional. Default values is: LogSubProcessCall.
executor: internally used to run sub-process.
Default values are
max_workers:1,
thread_name_prefix: processinvokes
This means, by default:
We’re using up to 1 worker.
In log message generated from the worker processinvokes-xxx will be used as thread_name.

run_sub_process() — this is primary function of the processinvokes module.

This method is sophisticated decorator to subprocess.run(). It is useful, when your subprocess run’s a lot of time and you’re interesting to receive it’s stdout and stderr. By default, it’s streamed to log. You can easily customize this behavior, see description ofinitConfig() above.

As I’ve said this method is sophisticated decorator to subprocess.run()(that is decorator for subprocess.Popen()). Note, that some parameters (cwd, for example) that can be used in popenkwargs are listed in Popen constructor. See example usage above.

logPipe is customizable object that essentially forwards output from subprocess to the logger using logName and logLevel (see below).

See description ofinitConfig() above for description of default parameters.

Default parameters that used in processinvokes module for call to subprocess.run() are (they can be overridden in popenkwargs):

‘stdout’:logPipe,
‘stderr’:STDOUT,
‘text’:True,
‘bufsize’: 1,
‘check’: True

This means:

logPipe is used as subprocess’s standard output and standard error.
Do decode stdin, stdout and stderr using the given encoding or the system default otherwise.
1 is supplied as the buffering argument to the open() function when creating the stdin/stdout/stderr pipe file objects. Essentially, no OS-level buffering between process, provided that call to write contains a newline character.
If the exit code of subprocess is non-zero, it raises a CalledProcessError.

Note: It is generally not advice to override them, but you can if you know, what you’re doing.

Parameters of the run_sub_process()function:

args: will be passed as popenargs to subprocess.run() method.
roughly kwargs[‘kwargs’] will be passed as popenkwargs to subprocess.run() method.
roughly kwargs[‘logPipe’][‘cls’] or default_logpipe_cls (if first one is empty) will be passed as logPipeCls to create logPipe. Can be class or str.
roughly kwargs[‘logPipe’][‘kwargs’] will be passed as kwargs to logPipeCls to create logPipe.
roughly kwargs[‘logSubprocess’][‘cls’] or default_log_subprocess_cls (if first one is empty) will be passed as logSubProcessCls to create LogSubProcessCall. Can be class or str.
roughly kwargs[‘logSubprocess’][‘kwargs’] will be passed as kwargs to logSubprocess.

My processinvokes module

Description of high-level API module at AlexBerUtils project

Brief Description

Code Example

See also:

Written by alex_ber