My parser module
It is safe to say that AlexBerUtils starts from this module. Surprisingly, this module is serving as foundation for another modules. I will make overview of the module’s function below.
The source code you can found here. It is available as part of my AlexBerUtils s project.
You can install AlexBerUtils from PyPi:
python3 -m pip install -U alex-ber-utils
See here for more details explanation on how to install.
is_empty()
function. The main motivation for writing down this function was the following use-case — I have received some collection d
from some function call (it can be some API or call to my own method) and I want to make some shortcut action if this collection is empty.
Empty here means vaguely either d
is None
or len(d)==0
.
Note: I know, that
len(d)
is not defined on arbitrary iterable . Butlen
is defined on collections or sequences that is usually what iterable is really is. That’s why I use it to illustrate the idea what is to be empty.Note: Using PEP 424 (available from Python 3.4) we can try
len(d)
, and fall back tod.__length_hint__
to guess thelen.
See here discussion about this and why this will not work for general iterable.Side Note: Iterable in Python is just object that have
__iter__()
method or, that is less known, have a__getitem__()
method.__iter__()
method returns iterator while__getitem__()
method is used for getting elements by indexes.__getitem__()
method raisesIndexError
to indicate when the indexes are no longer valid (it can be used to figure out when the iteration ends).An iterator is an object with
__next__
() method (In Python 2 it wasnext()
). It raiseStopIteration
exception to indicate that iteration ends.You can see here interesting example of implementation of Iterable by using virtual subclassing.
In most cases in practice, I will follow the same execution path regardless whether it is actually None or it is “just” has not elements in it.
Typically, it means that I have some if statement and I want to check whether d is empty or not.
Now, I have to write 2 notes. First, I know that Pythonic way to achieve this is to write:
if not d
Personally, I found this notation very cryptic. While I really want to ask, is whether d is (positively) empty, I’m expression my intention with negation (not).
The second point, is what happens with express evaluation if d
is None
? This question just blows up my mind, because it rests on C-style thinking about what boolean really is (I will elaborate more on this below) and I used to think “Java way”. The correct argument goes as follow, None is essentially 0
, so “not 0"
is essentially any non-zero number, say, for, simplicity, 1. “if 1”
evaluates to True
, so the condition holds on None
as expected.
This so, annoying for me, that even I’m copy&paste some working code snippet with this expression, I’m changing it to use my is_empty()
function.
Let see some working examples:
Yes, this that simple.
Actually, this method works as expected for any iterable object, for example, for strings.
Implementation of this method handle None
case explicitly, so I don’t have to worry about this over and over again. Reminder of the method use if d
idiom, so this method will work for any type, even documentation explicitly state that behaviors is undefined. See also details about if d
idiom below in discussion about PEP 285.
I made this statement for 2 reasons:
- I want to have freedom to change implementation.
- There is weird behaviors with zero.
Given that Python have dynamic typing (as opposite to C, which have static typing, their such behavior perfectly make sense) and that if you take input from the outside (whether from system argument or from ini-file, even from yml-file, while in the later case, usually, the library you’re working with will make type conversion for you) it first appears as string and then, after you’re making explicit type conversion it will appears with correct type.
So, you can have somenum
variable that will hold str “0” right after some API call (more on this below) and in this case is_empty(num)
will be False.
Then, maybe few line below, your num
variable will hold, say int
(really it can be any numeric built-in type including float
and decimal,
the result will be the same). When num
's type is int
is_empty(num)
will evaluates, surprisingly, to True.
Again, because of luck of typing information in the code, such code will be suffer from readability issue — is_empty(num)
can be False
and few lines later the same expression is_empty(num)
can be True.
Side note: In Python 3 the long datatype has been removed. Yes, it is crazy as is sound Python have dropped primitive type (but, hey, there is more to discover below). Effectively int
means any integer number. For example, print(type(math.factorial(30)))
will prints <class ‘int’>
(even 64-bit number can’t hold it, you can see it by print(math.factorial(30))
will prints 265252859812191058636308480000000
. You can find more, mainly historical, details here.
parse_boolean()
function.
First of all, why it even exists? I don’t have parse_int()
or something similar. So, why I have this function in the first place?
It appears, that many libraries has it’s own parse_boolean()
function, that works differently. Some, use “FALSE”, “false”, “False”, “F” or even “f” to represent False value. Python standard interpretation is that only “False” is False and “True” is True. Personally, I have found this too restrictive, so I’ve invented my own function.
Python not only drops primitive type from the language (as we see above). It also adds new primitive type. Boolean is just that. It was added in PEP 285 at Python 2.3 (along time ago, but still…). I will provide you with a quote from there:
Most languages eventually grow a Boolean type; even C99 (the new and improved C standard, not yet widely adopted) has one.
Many programmers apparently feel the need for a Boolean type…
…
Should bool inherit from int?=> Yes.
In an ideal world, bool might be better implemented as a separate integer type that knows how to perform mixed-mode arithmetic. However, inheriting bool from int eases the implementation enormously (in part since all C code that calls PyInt_Check() will continue to work — this returns true for subclasses of int). Also, I believe this is right in terms of substitutability: code that requires an int can be fed a bool and it will behave the same as 0 or 1. Code that requires a bool may not work when it is given an int; for example, 3 & 4 is 0, but both 3 and 4 are true when considered as truth values.
…
…There’s never a reason to write
if bool(x): ...
since the bool is implicit in the “if”. Explicit is not better than implicit here, since the added verbiage impairs redability and there’s no other interpretation possible.
…
This PEP does not change the fact that almost all object types can be used as truth values. For example, when used in an if statement, an empty list is false and a non-empty one is true; this does not change and there is no plan to ever change this.The only thing that changes is the preferred values to represent truth values when returned or assigned explicitly. Previously, these preferred truth values were 0 and 1; the PEP changes the preferred values to False and True, and changes built-in operations to return these preferred values.
You can see couple of things here. First of all “an empty list is false” and “There’s never a reason to writeif bool(x):
" together with difficulties to compute size of iterable (that itself can be defined in more than one way as we saw above) make strong indication that current implementation of is_empty()
can’t be improved.
Second, because bool
inherits from int
and because of backward-compatibility issues (that are not relevant today, but still exists in the language) bool
in Python behaves almost like int
in C. In some sense bool
in Python as just alias to int.
So, while Python 3 do have primitive bool
type and there is built-in tools to parse bool
from str
(I will show them below) I added my own versions for following reasons:
- Standard parsing rules seems to me too stricked.
- They behave well only if we have
str
. - They accepts “None” as valid value (making it None).
Let’s go over one-by-one.
p1. It seems illogical for me to treat only True as True and not true or TRUE. If we lock on YML definition (and using this function as part of parsing YML file is definitely one of the use-cases that I have in my mind when I wrote this function) of the boolean we will see that the following are treated as True: true|True|TRUE|y|Y|yes|Yes|YES|on|On|ON
Treating On|y|yes as True will deviate from Python too far. So, I decided to treat any lower-case/upper-case sequence of the letters T,R,U,E as True and any sequence of letters F,A,L,S,E, case-insensitive, as False. This is also standard treatment of True/False in Java Language.
p2. As I described above, it is expected to have a code when we have some variable that holds str
and few lines later the same variable will hold bool
. I want that my function willn’t blow up if it receive bool
value, but it will be just return it as is. This will increase a little bit, number of lines of the function, but it completely worse it.
p.3 “None” is not considered legal option for bool.
Indeed, any variable in Python could be None, so if you will pass None to parse_boolean()
it will return you None back. This is documented corner-case. “None” is not treated as None, but as “somethingwrong”.
Let’s see code example:
You can validate that type of returned object from parse_boolean()
is bool
(or None).
As you can see, this function implements all requirements that I’ve outlined above: “None and “InvalidValue” causes to exception to be raised, None, True, False returns value as is and “TRUE”, “false” or any other lower/upper case variation of them treats as True or False value each one.
Implementation note:
1. TRuE will be resolved to True.
It fulfill requirement of being “any lower-case/upper-case sequence of the letters T,R,U,E as True”.
In practice, I’m not expecting to be a widespread usage, but if it will happen, I think it is fine to assume that True value was intended.2. I use
str.casefold()
for caseless matching. The casefolding algorithm is described in section 3.13 of the Unicode Standard. I think, that usingstr.lower()
(The lowercasing algorithm used is described in section 3.13 of the Unicode Standard) in this particular case is fine also (True / False are plain words in Latin alphabet, so the result of applying either of algorithm with compassion to expected sequence of letters should lead to the same result).
Function safe_eval().
This is alternative way to convert str
to correct type.
Let’s talk about the name of the function. Why the name is “safe” eval? Do we have “unsafe” eval?
The short answer is, yes, Python do have “unsafe” eval()?
Quote from Stack overflow:
eval: This is very powerful, but is also very dangerous if you accept strings to evaluate from untrusted input. Suppose the string being evaluated is “os.system(‘rm -rf /’)” ? It will really start deleting all the files on your computer.
…
eval("__import__('os').system('rm -rf /')")
# output : start deleting all the files on your computer.
# restricting using global and local variables
Quote from documentation of built-eval():
The expression argument is parsed and evaluated as a Python expression…The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example:
>>>
>>> x = 1
>>> eval('x+1')
2This function can also be used to execute arbitrary code objects (such as those created by
compile()
). In this case pass a code object instead of a string. If the code object has been compiled with'exec'
as the mode argument,eval()
’s return value will beNone
.
Note: there is also exec() built-in function, you may want to see its documentation.
Python has dynamic typing. In the example above, x has type int
because the assignment ofvalue
1 (that has type int
in Python). eval() function is able to take string (among others) with Python expression and evaluate it. Part of the evaluation process is resolving (dynamic) types of the variable. So, essentially we can convert str
to correct type.
This works, but this is inherently dangerous — you can run arbitrary code with this . Attempts to restricted this “unsafe” eval() was’t successful. See Eval really is dangerous.
Let’s take a quick detour. Python 2 has built-in functions: raw_input()
and input().
First function reads a line from input, converts it to a string (stripping a trailing newline), and returns that. Second function, according it’s documentation is equivalent to eval(raw_input(prompt))
. This means that second function reads a line from input (stripping a trailing newline), converts it to appropriate type and returns that.
Because, eval() is dangerous,
variant with automatic type conversion was dropped from the Python. Python 3 has only 1 function that reads a line from input, converts it to a string (stripping a trailing newline), and returns that. It’s name is input(). So, what you have to do in Python 3 if you want to convert to appropriate type?
You have 3 choices:
- Use
eval()
despite it’s dangeos (eval()
is available as build-in function). - Use
int()
/float()
/etc built-ins. - Use some “safe” alternative to
eval().
As far as p.1, if you have p.3 why should you use it?
On p.2 API of parsing system argument (ArgumentParser) or parsing of ini-file (ConfigParser) is based upon. For example, ConfigParser.getint()
is wrapper arround call to int()
(We will talk about these API below).
This means, that you should write a lot of boilerplate to parse every parameter. This is ok, if you have small amount of parameters, but this is not very convenient.
On p.3. how our “safe” alternative is implemented? What limitation does it have?
My safe_eval()
is based on ast.literal_eval()
From the Stack Overflow:
ast.literal_eval: Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None, bytes and sets.
…From python 3.7 ast.literal_eval() is now stricter. Addition and subtraction of arbitrary numbers are no longer allowed. link
Actually, if we look on the documentation of eval()
built-in function, we will find:
See
ast.literal_eval()
for a function that can safely evaluate strings with expressions containing only literals.
https://docs.python.org/3/library/functions.html#eval
It should be perfect match for described use-case: conversion from str
to correct type.
Implementation notes:
Basically, my implementation calls ast.literal_eval().
If it succeeds, I’m returning the result.
If it fails with exception, I’m returning the initial request.
Initially I’ve catches only ValueError,
but recently I’ve empirically discovered that SyntaxError
is also possible. Having 1 central place where I’m making such type conversion enables me to apply fix easily.
Now, let’s go through actual examples:
In lines 3–19 we use str
for call to the safe_eval
function.
In lines 3 we are passing ‘John’, internally we’re receiving ValueError
, so str
‘John’ is returned as is.
In line 4 we’re internally receiving SynaxError.
This is actual first instance that I’ve uncounted when I’ve internally receiving SynaxError
and not ValueError.
Line 5 is slight variation of line 4, when we’re also internally receiving SynaxError
, so original str will be returned.
Line 6 is slight variation of line 4, but we’re internally receiving SynaxError.
I don’t know why.
In line 7 we have str “1000” that is clearly (dynamically typed) int.
So, safe_eval()
correctly returns 1000
as int.
Line 8 is slight variation of line 7, it shows, that negative numbers also works.
In line 9 we have str “0.1” that is clearly (dynamically typed) float
. So, safe_eval()
correctly returns 0.1
as float.
Side note: Floating point numbers are usually implemented using
double
in C.
Line 10 is variation of line 9, it shows that negative zero is also supported as float
type to fulfill the IEEE 754 standard that requires both +0 and −0. Note: that in Python -0==+0.
In line 12 we see, that canonical “True” value is indeed interpreted as bool True.
In line 13 we see, that unlike parse_boolean()
method “TRUE” value is not interpreted as bool True,
but as str
"TRUE". Internally we’re receiving ValueError,
so the value is returned as is.
In line 14 we see that special value “None” is correctly resolved to None
with
In line 17 there is note, that decimal
is not supported. We can’t syntactically distinguish it from float,
so there is no actual example.
In line 18 there is note that datetime
is not supported and in line 19 there is example. Internally, we have SyntaxError,
so the value is returned as is.
In line 22–27 we use actual type for the value we’re passing (not str
) to ensure, that this function will returned them without change.
In line 22 we see that if we’re passing int
value, we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
In line 23 we see that if we’re passing negative int
value, we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
In line 24 we see that if we’re passing float
value, we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
Line 25 is slight variation of line 24. We see that if we’re negative zero (special float
value), we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
In line 26 we see that if we’re passing bool
value, we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
In line 27 we see that if we’re passing NoneType
value, we’re internally receiving SynaxError,
so we’re correctly returning the value as is.
ConfigParser.as_dict()
method — this is example of monkey-patching of configparser.ConfigParser
class —( dynamically) added to it new method as_dict().
Note: [In Python]…the term monkey patch only refers to dynamic modifications of a class or module at runtime, motivated by the intent to patch existing third-party code as a workaround to a bug or feature which does not act as desired.
In simple words: It’s simply the dynamic adding or replacement of attributes/methods at runtime. I’m adding at runtime new method as_dict()
to standard ConfgParser
class. In order to benefit from this new method, it is sufficient to import my module, for example:
import alexber.utils.parsers
and than when you will import ConfgParser
class
from configparser import ConfigParser
you will see my method as_dict().
Alternatively, you can make following import
from alexber.utils.parsers import ConfigParser
and you will get standard ConfgParser
class with new method as_dict().
This is one the first methods of the project.
The rational of this method is API unification.
ConfgParser
is used to parse ini
-file. ini
-file contains multiple sections
. Each section has it’s own key/value mapping. In order to get value from such ini
-file we need to specify section, key inside section and to supply relevant convert
function to convert value from str
to expacted type.
This is very cuber-some. This module is one of the oldest Python modules. You can compare it with json
module or how you’re parsing yml
-files. You’re getting nice dict
with value converted to the expected type and that’s it (in most cases). This method goes half-way it this direction — you’re getting nice looking (nested)dict d
where value is (unconverted) str
and in order to get is you can call d[section][key].
Of course, you can use d[section]
to receive inner dict
with key/value mapping. Now, you can, for example, use safe_eval()
method above to convert values to their correct type.
Note: Actual type is
OrderedDict
and notdict,
but it is mainly for historical reasons. I wanted to preserve the key order to reflect the order in the ini-file.Side note: Up to (not included) Python 3.6 the order in which key/value are stored was undefined. In Python 3.6 it was stated that this is implementation detail of CPython (and best practice is not to relay on this behavior). Started from Python 3.7 dictionary order is guaranteed to be insertion order.
See https://stackoverflow.com/a/58676384/1137529 for the differences between
dict
andOrderedDict
.
For example, suppose I have following config.ini
file:
The example usage will be:
ArgumentParser.as_dict()
— monkey-patched as_dict()
method to argparse.ArgumentParser
class.
This is one the first methods of the project.
The rational of this method is API unification.
ArgumentParser
is used to parse arguments. You can pass them explicitly as args
list. Otherwise, sys.argv[1:] will be used as source for arguments.
This method go over source for arguments, takes argument of the form --key=value. Create dict
. Strip out ‘ -- ‘ prefix from the key and put key/value (as str) to dict
. The value is str.
You can, for example, use safe_eval()
method above to convert values to their correct type.
For historical reasons, it use OrderedDict.
Example usage (explicit argument passing for simplicity):
Note: Standalone --conf
is resolved as key=conf
and value=None.
Note: Actual type returned from as_dict()
method is OrderedDict
.
parse_sys_args()
is one the last function that was added. It is just convenient wrapper for parsing system argument as dictionary.
You can pass initialized ArgumentParser
with add_argument()
calls. Such arguments will be returned as part of param
(first) returned value. All unknown argument will be return as (second) dict
value.
If ArgumentParser
is not passed in than it will be instantiated with general.config.file
argument.
It will resolves --general.config.file
to config.yml
(you can override this with different value in args
or if you explicitly pass ArgumentParser
)
Internally, it uses undocumented API params, unknown_arg = argumentParser.parse_known_args(args=args)
to parse all argument that wasn’t explicitly specifyed in argumentParser.add_argument()
Internally, it use argumentParser.as_dict(args=unknown_arg).
It returns params
and dict
form as_dict()
method.
For example:
Note: that ignored parameter form the call parse_sys_args()
is actually config.yml.
See also:
importer
module or How to write easily customizable code?fixabscwd()
function inmains
module or Making relative path to file to work.fix_retry_env()
function inmains
module or Make path to file on Windows works on Linux.- parser module or Description of one the oldest AlexBerUtils project
- ymlparsers module or Description of low-level API module for another modules
- init_app_conf module, or My major init_app_con module
- deploys module, or My deploys module
- emails module,or My emails module
- processinvokes module, or My processinvokes module
- stdLogging module, or My stdLogging Module