Continuation, coroutine, and generator

and other routine

What is generator in Python? Let’s look on what Wikipedia has to say:

Note: I’ve simplified code a bit.

Generators were added to Python in version 2.2 in 2001.[6] An example generator:

In Python, a generator can be thought of as an iterator that contains a frozen stack frame. Whenever next() is called on the iterator, Python resumes the frozen frame, which executes normally until the next yield statement is reached. The generator's frame is then frozen again, and the yielded value is returned to the caller.

https://en.wikipedia.org/wiki/Generator_(computer_programming)#Python

So, does generator is simply iterator? But what is this weird yield thing? And how break statement inside for-loop exits the generator?

Let’s look on the header:

In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators. A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.

https://en.wikipedia.org/wiki/Generator_(computer_programming)

But how generator can be a routine that looks like a function but behaves like an iterator? Routine (you can think function for simplicity) returns single value (or no regular value at all) and exits it’s execution. If you call it again it will not remember it’s previous state. It is said, that it returns not a single value, “not an array containing all the values and returning them all at once”, butgenerator yields the values one at a time”. This is weird beast.

Let’s continue to read in Wikipedia:

Generators can be implemented in terms of more expressive control flow constructs, such as coroutines or first-class continuations. Generators, also known as semicoroutines, are a special case of (and weaker than) coroutines, in that they always yield control back to the caller (when passing a value back), rather than specifying a coroutine to jump to; see comparison of coroutines with generators.

https://en.wikipedia.org/wiki/Generator_(computer_programming)

Ok, so, generatos can be implemented as coroutines or continuations. But what is these beasts? Let’s look on Wikipedia:

In computer science, a continuation is an abstract representation of the control state of a computer program. A continuation implements (reifies) the program control state, i.e. the continuation is a data structure that represents the computational process at a given point in the process’s execution; the created data structure can be accessed by the programming language, instead of being hidden in the runtime environment. Continuations are useful for encoding other control mechanisms in programming languages such as exceptions, generators, coroutines, and so on.

https://en.wikipedia.org/wiki/Continuation

So, continuation is abstract representation of something. This is not very helpful. It can “encode” (whatever this means) generators and coroutines. Let’s have a look on what is coroutine:

Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed.

https://en.wikipedia.org/wiki/Coroutine

What is non-preemptive multitasking? Wikipedia again:

Cooperative multitasking, also known as non-preemptive multitasking, is a style of computer multitasking in which the operating system never initiates a context switch from a running process to another process. Instead, processes voluntarily yield control periodically or when idle or logically blocked in order to enable multiple applications to be run concurrently. This type of multitasking is called “cooperative” because all programs must cooperate for the entire scheduling scheme to work. In this scheme, the process scheduler of an operating system is known as a cooperative scheduler, having its role reduced down to starting the processes and letting them return control back to it voluntarily.

https://en.wikipedia.org/wiki/Cooperative_multitasking

What is subroutines? Coroutine should generalize them somehow…Wikipedia again:

In computer programming, a subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.

Subroutines may be defined within programs, or separately in libraries that can be used by many programs. In different programming languages, a subroutine may be called a routine, subprogram, function, method, or procedure. Technically, these terms all have different definitions. The generic, umbrella term callable unit is sometimes used.

Call stack

Most modern implementations of a subroutine call use a call stack, a special case of the stack data structure, to implement subroutine calls and returns. Each procedure call creates a new entry, called a stack frame, at the top of the stack; when the procedure returns, its stack frame is deleted from the stack, and its space may be used for other procedure calls. Each stack frame contains the private data of the corresponding call, which typically includes the procedure’s parameters and internal variables, and the return address.

… another advantage of the call stack method is that it allows recursive subroutine calls, since each nested call to the same procedure gets a separate instance of its private data.

https://en.wikipedia.org/wiki/Subroutine

So, roughly, subroutine is regular function that is usually implemented with call stack.

Corotine is generalization of “function” that allows the execution to be suspended and resumed. It does it in non-preemptive way, that is the “function” itself should yield the control.

Spoiler:

Corutine can be implemented as “explicit state machine in the form of a large and complex switch statement or via a goto statement” (see below).

In order to understand the concept better, let’s look on comparison of coroutine with subroutines. Quote from https://en.wikipedia.org/wiki/Coroutine#Comparison_with_subroutines

Comparison with subroutines

Subroutines are special cases of coroutines. When subroutines are invoked, execution begins at the start, and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine; from the coroutine’s point of view, it is not exiting but calling another coroutine. Thus, a coroutine instance holds state, and varies between invocations; there can be multiple instances of a given coroutine at once. The difference between calling another coroutine by means of yielding to it and simply calling another routine (which then, also, would return to the original point), is that the relationship between two coroutines which yield to each other is not that of caller-callee, but instead symmetric.

Any subroutine can be translated to a coroutine which does not call yield.[4]

Well, this is trivial.

Let’s continue:

Comparison with threads

Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked. This means that coroutines provide concurrency but not parallelism. The advantages of coroutines over threads are that they may be used in a hard-realtime context (switching between coroutines need not involve any system calls or any blocking calls whatsoever), there is no need for synchronisation primitives such as mutexes, semaphores, etc. in order to guard critical sections, and there is no need for support from the operating system.

It is possible to implement coroutines using preemptively-scheduled threads, in a way that will be transparent to the calling code, but some of the advantages (particularly the suitability for hard-realtime operation and relative cheapness of switching between them) will be lost.

Comparison with generators

Generators, also known as semicoroutines, are a subset of coroutines. Specifically, while both can yield multiple times [but how?], suspending their execution and allowing re-entry at multiple entry points, they differ in coroutines’ ability to control where execution continues immediately after they yield, while generators cannot, instead transferring control back to the generator’s caller.[6]

This means, we can forget about preemptively vs cooperatively multitasking, this is not relevant for generators. Let’s continue:

That is, since generators are primarily used to simplify the writing of iterators, the yield statement in a generator does not specify a coroutine to jump to, but rather passes a value back to a parent routine.

I’m deliberately skipping some part, I will get back to it below.

Implementations for C

In order to implement general-purpose coroutines, a second call stack must be obtained, which is a feature not directly supported by the C language. A reliable (albeit platform-specific) way to achieve this is to use a small amount of inline assembly to explicitly manipulate the stack pointer during initial creation of the coroutine…

Once a second call stack has been obtained… the setjmp and longjmp functions in the standard C library can then be used to implement the switches between coroutines. These functions save and restore, respectively, the stack pointer, program counter, callee-saved registers, and any other internal state as required by the ABI, such that returning to a coroutine after having yielded restores all the state that would be restored upon returning from a function call. Minimalist implementations, which do not piggyback off the setjmp and longjmp functions, may achieve the same result via a small block of inline assembly which swaps merely the stack pointer and program counter, and clobbers all other registers. This can be significantly faster, as setjmp and longjmp must conservatively store all registers which may be in use according to the ABI, whereas the clobber method allows the compiler to store (by spilling to the stack) only what it knows is actually in use.

Ok. Do we have another alternatives?

Implementations for C#

[9] — The .NET 2.0+ Framework now provides semi-coroutine (generator) functionality through the iterator pattern and yield keyword.

C# 5.0 includes await syntax support.

Implementations for Java

There are several implementations for coroutines in Java. Despite the constraints imposed by Java’s abstractions, the JVM does not preclude the possibility.[39] There are four general methods used, but two break bytecode portability among standards-compliant JVMs.

* Modified JVMs. It is possible to build a patched JVM to support coroutines more natively. The Da Vinci JVM has had patches created.[40]

* Modified bytecode. Coroutine functionality is possible by rewriting regular Java bytecode, either on the fly or at compile time. Toolkits include Javaflow, Java Coroutines, and Coroutines.

* Platform-specific JNI mechanisms. These use JNI methods implemented in the OS or C libraries to provide the functionality to the JVM.

* Thread abstractions. Coroutine libraries which are implemented using threads may be heavyweight, though performance will vary based on the JVM’s thread implementation.

Implementations for Kotlin

Kotlin implements coroutines as part of a first-party library.

Implementations for Python

Python 2.5 implements better support for coroutine-like functionality, based on extended generators (PEP 342)

Python 3.3 improves this ability, by supporting delegating to a subgenerator (PEP 380)

Python 3.4 introduces a comprehensive asynchronous I/O framework as standardized in PEP 3156, which includes coroutines that leverage subgenerator delegation

Python 3.5 introduces explicit support for coroutines with async/await syntax (PEP 0492).

Since Python 3.7 async/await became reserved keywords[10].

Eventlet

Greenlet

gevent

stackless python

Implementations in assembly languages

Machine-dependent assembly languages often provide direct methods for coroutine execution. For example, in MACRO-11, the assembly language of the PDP-11 family of minicomputers, the “classic” coroutine switch is effected by the instruction “JSR PC,@(SP)+”, which jumps to the address popped from the stack and pushes the current (i.e that of the next) instruction address onto the stack. On VAXen (in Macro-32) the comparable instruction is “JSB @(SP)+”. Even on a Motorola 6809 there is the instruction “JSR [,S++]”; note the “++”, as 2 bytes (of address) are popped from the stack. This instruction is much used in the (standard) ‘monitor’ Assist 09.

Let’s go back to skipped part:

As of 2003, many of the most popular programming languages, including C and its derivatives, do not have direct support for coroutines within the language or their standard libraries. (This is, in large part, due to the limitations of stack-based subroutine implementation.) An exception is the C++ library Boost.Context, part of boost libraries, which supports context swapping on ARM, MIPS, PowerPC, SPARC and x86 on POSIX, Mac OS X and Windows. Coroutines can be built upon Boost.Context.

In situations where a coroutine would be the natural implementation of a mechanism, but is not available, the typical response is to use a closure — a subroutine with state variables (static variables, often boolean flags) to maintain an internal state between calls, and to transfer control to the correct point. Conditionals within the code result in the execution of different code paths on successive calls, based on the values of the state variables. Another typical response is to implement an explicit state machine in the form of a large and complex switch statement or via a goto statement, particularly a computed goto. Such implementations are considered difficult to understand and maintain, and a motivation for coroutine support.

Threads, and to a lesser extent fibers, are an alternative to coroutines in mainstream programming environments today. Threads provide facilities for managing the realtime cooperative interaction of simultaneously executing pieces of code. ..

One important difference between threads and coroutines is that threads are typically preemptively scheduled while coroutines are not. Because threads can be rescheduled at any instant and can execute concurrently, programs using threads must be careful about locking. In contrast, because coroutines can only be rescheduled at specific points in the program and do not execute concurrently, programs using coroutines can often avoid locking entirely. (This property is also cited as a benefit of event-driven or asynchronous programming.)

Since fibers are cooperatively scheduled, they provide an ideal base for implementing coroutines above.[20] However, system support for fibers is often lacking compared to that for threads.

https://en.wikipedia.org/wiki/Coroutine#Implementations

Here, is the answer! Corutine can be implemented as “explicit state machine in the form of a large and complex switch statement or via a goto statement”. For example, this approach was chosen in Kotlin. You can see 2 video embedded below.

As it stated their Javascript, C# also use this approach.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Senior Software Engineer at Pursway

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store