My parser module

Description of one the oldest AlexBerUtils project

alex_ber
15 min readJun 7, 2020

It is safe to say that AlexBerUtils starts from this module. Surprisingly, this module is serving as foundation for another modules. I will make overview of the module’s function below.

The source code you can found here. It is available as part of my AlexBerUtils s project.

You can install AlexBerUtils from PyPi:

python3 -m pip install -U alex-ber-utils

See here for more details explanation on how to install.

is_empty() function. The main motivation for writing down this function was the following use-case — I have received some collection d from some function call (it can be some API or call to my own method) and I want to make some shortcut action if this collection is empty.

Empty here means vaguely either d is None or len(d)==0.

Note: I know, that len(d)is not defined on arbitrary iterable . But len is defined on collections or sequences that is usually what iterable is really is. That’s why I use it to illustrate the idea what is to be empty.

Note: Using PEP 424 (available from Python 3.4) we can try len(d), and fall back to d.__length_hint__to guess the len. See here discussion about this and why this will not work for general iterable.

Side Note: Iterable in Python is just object that have __iter__() method or, that is less known, have a__getitem__() method. __iter__() method returns iterator while __getitem__() method is used for getting elements by indexes. __getitem__() method raises IndexError to indicate when the indexes are no longer valid (it can be used to figure out when the iteration ends).

An iterator is an object with __next__() method (In Python 2 it was next() ). It raise StopIteration exception to indicate that iteration ends.

You can see here interesting example of implementation of Iterable by using virtual subclassing.

In most cases in practice, I will follow the same execution path regardless whether it is actually None or it is “just” has not elements in it.

Typically, it means that I have some if statement and I want to check whether d is empty or not.

Now, I have to write 2 notes. First, I know that Pythonic way to achieve this is to write:

if not d

Personally, I found this notation very cryptic. While I really want to ask, is whether d is (positively) empty, I’m expression my intention with negation (not).

The second point, is what happens with express evaluation if d is None? This question just blows up my mind, because it rests on C-style thinking about what boolean really is (I will elaborate more on this below) and I used to think “Java way”. The correct argument goes as follow, None is essentially 0, so “not 0" is essentially any non-zero number, say, for, simplicity, 1. “if 1” evaluates to True, so the condition holds on None as expected.

This so, annoying for me, that even I’m copy&paste some working code snippet with this expression, I’m changing it to use my is_empty() function.

Let see some working examples:

Yes, this that simple.

Actually, this method works as expected for any iterable object, for example, for strings.

Implementation of this method handle None case explicitly, so I don’t have to worry about this over and over again. Reminder of the method use if d idiom, so this method will work for any type, even documentation explicitly state that behaviors is undefined. See also details about if d idiom below in discussion about PEP 285.

I made this statement for 2 reasons:

  1. I want to have freedom to change implementation.
  2. There is weird behaviors with zero.

Given that Python have dynamic typing (as opposite to C, which have static typing, their such behavior perfectly make sense) and that if you take input from the outside (whether from system argument or from ini-file, even from yml-file, while in the later case, usually, the library you’re working with will make type conversion for you) it first appears as string and then, after you’re making explicit type conversion it will appears with correct type.

So, you can have somenumvariable that will hold str “0” right after some API call (more on this below) and in this case is_empty(num)will be False. Then, maybe few line below, your num variable will hold, say int (really it can be any numeric built-in type including float and decimal, the result will be the same). When num's type is int is_empty(num) will evaluates, surprisingly, to True.

Again, because of luck of typing information in the code, such code will be suffer from readability issue — is_empty(num) can be False and few lines later the same expression is_empty(num) can be True.

Side note: In Python 3 the long datatype has been removed. Yes, it is crazy as is sound Python have dropped primitive type (but, hey, there is more to discover below). Effectively int means any integer number. For example, print(type(math.factorial(30))) will prints <class ‘int’> (even 64-bit number can’t hold it, you can see it by print(math.factorial(30)) will prints 265252859812191058636308480000000. You can find more, mainly historical, details here.

parse_boolean() function.

First of all, why it even exists? I don’t have parse_int() or something similar. So, why I have this function in the first place?

It appears, that many libraries has it’s own parse_boolean() function, that works differently. Some, use “FALSE”, “false”, “False”, “F” or even “f” to represent False value. Python standard interpretation is that only “False” is False and “True” is True. Personally, I have found this too restrictive, so I’ve invented my own function.

Python not only drops primitive type from the language (as we see above). It also adds new primitive type. Boolean is just that. It was added in PEP 285 at Python 2.3 (along time ago, but still…). I will provide you with a quote from there:

Most languages eventually grow a Boolean type; even C99 (the new and improved C standard, not yet widely adopted) has one.

Many programmers apparently feel the need for a Boolean type…

Should bool inherit from int?

=> Yes.

In an ideal world, bool might be better implemented as a separate integer type that knows how to perform mixed-mode arithmetic. However, inheriting bool from int eases the implementation enormously (in part since all C code that calls PyInt_Check() will continue to work — this returns true for subclasses of int). Also, I believe this is right in terms of substitutability: code that requires an int can be fed a bool and it will behave the same as 0 or 1. Code that requires a bool may not work when it is given an int; for example, 3 & 4 is 0, but both 3 and 4 are true when considered as truth values.

…There’s never a reason to write

if bool(x): ...

since the bool is implicit in the “if”. Explicit is not better than implicit here, since the added verbiage impairs redability and there’s no other interpretation possible.


This PEP does not change the fact that almost all object types can be used as truth values. For example, when used in an if statement, an empty list is false and a non-empty one is true; this does not change and there is no plan to ever change this.

The only thing that changes is the preferred values to represent truth values when returned or assigned explicitly. Previously, these preferred truth values were 0 and 1; the PEP changes the preferred values to False and True, and changes built-in operations to return these preferred values.

You can see couple of things here. First of all “an empty list is false” and “There’s never a reason to writeif bool(x):" together with difficulties to compute size of iterable (that itself can be defined in more than one way as we saw above) make strong indication that current implementation of is_empty() can’t be improved.

Second, because boolinherits from int and because of backward-compatibility issues (that are not relevant today, but still exists in the language) bool in Python behaves almost like int in C. In some sense bool in Python as just alias to int.

So, while Python 3 do have primitive bool type and there is built-in tools to parse bool from str (I will show them below) I added my own versions for following reasons:

  1. Standard parsing rules seems to me too stricked.
  2. They behave well only if we have str.
  3. They accepts “None” as valid value (making it None).

Let’s go over one-by-one.

p1. It seems illogical for me to treat only True as True and not true or TRUE. If we lock on YML definition (and using this function as part of parsing YML file is definitely one of the use-cases that I have in my mind when I wrote this function) of the boolean we will see that the following are treated as True: true|True|TRUE|y|Y|yes|Yes|YES|on|On|ON

Treating On|y|yes as True will deviate from Python too far. So, I decided to treat any lower-case/upper-case sequence of the letters T,R,U,E as True and any sequence of letters F,A,L,S,E, case-insensitive, as False. This is also standard treatment of True/False in Java Language.

p2. As I described above, it is expected to have a code when we have some variable that holds str and few lines later the same variable will hold bool. I want that my function willn’t blow up if it receive bool value, but it will be just return it as is. This will increase a little bit, number of lines of the function, but it completely worse it.

p.3 “None” is not considered legal option for bool.Indeed, any variable in Python could be None, so if you will pass None to parse_boolean() it will return you None back. This is documented corner-case. “None” is not treated as None, but as “somethingwrong”.

Let’s see code example:

You can validate that type of returned object from parse_boolean() is bool (or None).

As you can see, this function implements all requirements that I’ve outlined above: “None and “InvalidValue” causes to exception to be raised, None, True, False returns value as is and “TRUE”, “false” or any other lower/upper case variation of them treats as True or False value each one.

Implementation note:

1. TRuE will be resolved to True.
It fulfill requirement of being “
any lower-case/upper-case sequence of the letters T,R,U,E as True”.
In practice, I’m not expecting to be a widespread usage, but if it will happen, I think it is fine to assume that True value was intended.

2. I use str.casefold() for caseless matching. The casefolding algorithm is described in section 3.13 of the Unicode Standard. I think, that using str.lower() (The lowercasing algorithm used is described in section 3.13 of the Unicode Standard) in this particular case is fine also (True / False are plain words in Latin alphabet, so the result of applying either of algorithm with compassion to expected sequence of letters should lead to the same result).

Function safe_eval().This is alternative way to convert str to correct type.

Let’s talk about the name of the function. Why the name is “safe” eval? Do we have “unsafe” eval?

The short answer is, yes, Python do have “unsafe” eval()?

Quote from Stack overflow:

eval: This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer.


eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables

https://stackoverflow.com/questions/15197673/using-pythons-eval-vs-ast-literal-eval/34904657#34904657

Quote from documentation of built-eval():

The expression argument is parsed and evaluated as a Python expression…The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example:

>>>

>>> x = 1
>>> eval('x+1')
2

This function can also be used to execute arbitrary code objects (such as those created by compile()). In this case pass a code object instead of a string. If the code object has been compiled with 'exec' as the mode argument, eval()’s return value will be None.

Note: there is also exec() built-in function, you may want to see its documentation.

Python has dynamic typing. In the example above, x has type int because the assignment ofvalue 1 (that has type int in Python). eval() function is able to take string (among others) with Python expression and evaluate it. Part of the evaluation process is resolving (dynamic) types of the variable. So, essentially we can convert str to correct type.

This works, but this is inherently dangerous you can run arbitrary code with this . Attempts to restricted this “unsafe” eval() was’t successful. See Eval really is dangerous.

Let’s take a quick detour. Python 2 has built-in functions: raw_input() and input(). First function reads a line from input, converts it to a string (stripping a trailing newline), and returns that. Second function, according it’s documentation is equivalent to eval(raw_input(prompt)). This means that second function reads a line from input (stripping a trailing newline), converts it to appropriate type and returns that.

Because, eval() is dangerous, variant with automatic type conversion was dropped from the Python. Python 3 has only 1 function that reads a line from input, converts it to a string (stripping a trailing newline), and returns that. It’s name is input(). So, what you have to do in Python 3 if you want to convert to appropriate type?

You have 3 choices:

  1. Use eval() despite it’s dangeos (eval() is available as build-in function).
  2. Use int()/float()/etc built-ins.
  3. Use some “safe” alternative to eval().

As far as p.1, if you have p.3 why should you use it?

On p.2 API of parsing system argument (ArgumentParser) or parsing of ini-file (ConfigParser) is based upon. For example, ConfigParser.getint() is wrapper arround call to int() (We will talk about these API below).

This means, that you should write a lot of boilerplate to parse every parameter. This is ok, if you have small amount of parameters, but this is not very convenient.

On p.3. how our “safe” alternative is implemented? What limitation does it have?

My safe_eval() is based on ast.literal_eval() From the Stack Overflow:

ast.literal_eval: Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.

From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link

https://stackoverflow.com/questions/15197673/using-pythons-eval-vs-ast-literal-eval/34904657#34904657

Actually, if we look on the documentation of eval() built-in function, we will find:

See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals.

https://docs.python.org/3/library/functions.html#eval

It should be perfect match for described use-case: conversion from str to correct type.

Implementation notes:

Basically, my implementation calls ast.literal_eval().
If it succeeds, I’m returning the result.
If it fails with exception, I’m returning the initial request.

Initially I’ve catches only ValueError, but recently I’ve empirically discovered that SyntaxError is also possible. Having 1 central place where I’m making such type conversion enables me to apply fix easily.

Now, let’s go through actual examples:

In lines 3–19 we use str for call to the safe_eval function.

In lines 3 we are passing ‘John’, internally we’re receiving ValueError, so str ‘John’ is returned as is.

In line 4 we’re internally receiving SynaxError.This is actual first instance that I’ve uncounted when I’ve internally receiving SynaxError and not ValueError.

Line 5 is slight variation of line 4, when we’re also internally receiving SynaxError, so original str will be returned.

Line 6 is slight variation of line 4, but we’re internally receiving SynaxError. I don’t know why.

In line 7 we have str “1000” that is clearly (dynamically typed) int. So, safe_eval() correctly returns 1000 as int.

Line 8 is slight variation of line 7, it shows, that negative numbers also works.

In line 9 we have str “0.1” that is clearly (dynamically typed) float. So, safe_eval() correctly returns 0.1 as float.

Side note: Floating point numbers are usually implemented using double in C.

Line 10 is variation of line 9, it shows that negative zero is also supported as float type to fulfill the IEEE 754 standard that requires both +0 and −0. Note: that in Python -0==+0.

In line 12 we see, that canonical “True” value is indeed interpreted as bool True.

In line 13 we see, that unlike parse_boolean() method “TRUE” value is not interpreted as bool True, but as str "TRUE". Internally we’re receiving ValueError, so the value is returned as is.

In line 14 we see that special value “None” is correctly resolved to None with

In line 17 there is note, that decimal is not supported. We can’t syntactically distinguish it from float,so there is no actual example.

In line 18 there is note that datetime is not supported and in line 19 there is example. Internally, we have SyntaxError, so the value is returned as is.

In line 22–27 we use actual type for the value we’re passing (not str) to ensure, that this function will returned them without change.

In line 22 we see that if we’re passing int value, we’re internally receiving SynaxError,so we’re correctly returning the value as is.

In line 23 we see that if we’re passing negative int value, we’re internally receiving SynaxError,so we’re correctly returning the value as is.

In line 24 we see that if we’re passing float value, we’re internally receiving SynaxError,so we’re correctly returning the value as is.

Line 25 is slight variation of line 24. We see that if we’re negative zero (special float value), we’re internally receiving SynaxError,so we’re correctly returning the value as is.

In line 26 we see that if we’re passing bool value, we’re internally receiving SynaxError,so we’re correctly returning the value as is.

In line 27 we see that if we’re passing NoneType value, we’re internally receiving SynaxError,so we’re correctly returning the value as is.

ConfigParser.as_dict() method — this is example of monkey-patching of configparser.ConfigParser class —( dynamically) added to it new method as_dict().

Note: [In Python]…the term monkey patch only refers to dynamic modifications of a class or module at runtime, motivated by the intent to patch existing third-party code as a workaround to a bug or feature which does not act as desired.

In simple words: It’s simply the dynamic adding or replacement of attributes/methods at runtime. I’m adding at runtime new method as_dict() to standard ConfgParser class. In order to benefit from this new method, it is sufficient to import my module, for example:

import alexber.utils.parsers

and than when you will import ConfgParser class

from configparser import ConfigParser

you will see my method as_dict().

Alternatively, you can make following import

from alexber.utils.parsers import ConfigParser

and you will get standard ConfgParser class with new method as_dict().

This is one the first methods of the project.

The rational of this method is API unification.

ConfgParser is used to parse ini-file. ini-file contains multiple sections. Each section has it’s own key/value mapping. In order to get value from such ini-file we need to specify section, key inside section and to supply relevant convert function to convert value from str to expacted type.

This is very cuber-some. This module is one of the oldest Python modules. You can compare it with json module or how you’re parsing yml-files. You’re getting nice dict with value converted to the expected type and that’s it (in most cases). This method goes half-way it this direction — you’re getting nice looking (nested)dict d where value is (unconverted) str and in order to get is you can call d[section][key]. Of course, you can use d[section] to receive inner dict with key/value mapping. Now, you can, for example, use safe_eval() method above to convert values to their correct type.

Note: Actual type is OrderedDict and not dict, but it is mainly for historical reasons. I wanted to preserve the key order to reflect the order in the ini-file.

Side note: Up to (not included) Python 3.6 the order in which key/value are stored was undefined. In Python 3.6 it was stated that this is implementation detail of CPython (and best practice is not to relay on this behavior). Started from Python 3.7 dictionary order is guaranteed to be insertion order.

See https://stackoverflow.com/a/58676384/1137529 for the differences between dict and OrderedDict.

For example, suppose I have following config.ini file:

The example usage will be:

ArgumentParser.as_dict() — monkey-patched as_dict() method to argparse.ArgumentParser class.

This is one the first methods of the project.

The rational of this method is API unification.

ArgumentParser is used to parse arguments. You can pass them explicitly as args list. Otherwise, sys.argv[1:] will be used as source for arguments.

This method go over source for arguments, takes argument of the form --key=value. Create dict. Strip out ‘ -- ‘ prefix from the key and put key/value (as str) to dict. The value is str. You can, for example, use safe_eval() method above to convert values to their correct type.

For historical reasons, it use OrderedDict.

Example usage (explicit argument passing for simplicity):

Note: Standalone --conf is resolved as key=conf and value=None.

Note: Actual type returned from as_dict() method is OrderedDict.

parse_sys_args() is one the last function that was added. It is just convenient wrapper for parsing system argument as dictionary.

You can pass initialized ArgumentParser with add_argument() calls. Such arguments will be returned as part of param (first) returned value. All unknown argument will be return as (second) dict value.

If ArgumentParser is not passed in than it will be instantiated with general.config.file argument.

It will resolves --general.config.file to config.yml (you can override this with different value in args or if you explicitly pass ArgumentParser)

Internally, it uses undocumented API params, unknown_arg = argumentParser.parse_known_args(args=args) to parse all argument that wasn’t explicitly specifyed in argumentParser.add_argument()

Internally, it use argumentParser.as_dict(args=unknown_arg).

It returns params and dict form as_dict() method.

For example:

Note: that ignored parameter form the call parse_sys_args() is actually config.yml.

--

--