Category theory and Haskell

I’m recommending to look on MIT lectures “Programming with Categories”, with Brendan Fong, Bartosz Milewski and David Spivak.

Image for post
Image for post

https://www.youtube.com/playlist?list=PLhgq-BqyZ7i7MTGhUROZy3BOICnVixETS

Quote of Bartosz Milewsk (emphasizes are mine):

Category theory is a treasure trove of extremely useful programming ideas. Haskell programmers have been tapping this resource for a long time, and the ideas are slowly percolating into other languages, but this process is too slow. We need to speed it up…

Composition is at the very root of category theory — it’s part of the definition of the category itself. And I will argue strongly that composition is the essence of programming. We’ve been composing things forever, long before some great engineer came up with the idea of a subroutine. Some time ago the principles of structured programming revolutionized programming because they made blocks of code composable. Then came object oriented programming, which is all about composing objects. Functional programming is not only about composing functions and algebraic data structures — it makes concurrency composable — something that’s virtually impossible with other programming paradigms.

https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/

You can also view extended version of why it is good idea for programmer to learn Category Theory here https://www.youtube.com/watch?v=oScqdMBk8Q8

Course description

We will assume no background knowledge on behalf of the student, starting from scratch on both the programming and mathematics. (Flyer)

http://brendanfong.com/programmingcats.html — official page of the course.

Here you can find the syllabus.

And here http://brendanfong.com/programmingcats_files/cats4progs-DRAFT.pdf you can see the course notes.

  • This is course in applied math and programming. For example, the Set theory is used very informally (see link to other course below).
  • This course concentrated on the intuition. For example, you can see in the picture above description of the intuition about natural transformation.
  • This course doesn’t provide you with complicated math examples (there is no topological spaces, for example), it concentrated on simple math example or the ones that are important for the categorical theory itself (like Set, Poset).
  • The main example is Haskell programming language. You’re learning it alongside with categorical theory applying it’s ideas directly. This is sort of main example of categorical theory application in the course.
  1. There is many “non-precise”/”applied” usage of math (especially when dealing with Set Theory, in particular the Russel Paradox is ignored; Cat is category of small categories). You should be ready to ignore them (you can see https://www.youtube.com/watch?v=p54Hd7AmVFU and https://www.youtube.com/watch?v=O2lZkr-aAqk and https://www.youtube.com/watch?v=NcT7CGPICzo with more rigorous math, though it lacks proves, but the math. statements itself are correct).
  2. Bare in mind, the for categorical theory 2 “entities” are “the same” if they are isomorphic. In particular, if 2 “entities” are equal they considered to be “the same”. But, if 2 “entities” are isomorphic they’re considered “equal for all intended purposes”. There is also “weak isomorphism” as define by adjunctions, see below.
  • The dotted line in the diagram represent isomorphic object (it actually never explained why dotted line is used). For example, when we construct categorical product (you can think of pair in programming language or Cartesian product in Set) we’re constructing isomorphic pair of morphisms.
  • Preorder and Order are almost the same from categorical point of view. The difference between them that Order has addition antisemitic property: if a≤b and b≤a than a=b. But in Preorder if a≤b and b≤a than a is isomorphic to b (because composition leads to Id_a or Id_b). The “unintended” difference will be that in graph representation of Order there will be no “backwards” edged and the number of vertexes will be lower (2 vertexes with back-on-forth edges inPreorder are replaced with 1 vertex in Order).

3. Thing category is category whose homsets each contain at most one morphism (this definition is not provided in the course). It provides example of category where morphism is isomorpoish, but there is no “inverse” morphism for it (being ismoprhic is “strickly less” than to have “inverse” morphism; this is not true in Set with functions category). See details in the appendix below.

About programming:

  1. Knowledge of functional programming is never assumed, but you should better have some idea about currying beforehand (it is used before it explained in the course).
  2. Bottom type (⊥) existence is ignored (while is briefly mentioned). This make Haskel type not strictly equivalent to set in Set with functions category. For example, set analogue to () type (unit-type) is singleton set. It is said that in Haskell unit-type has single value (that is denoted as ()). This should be initial object in Hask with types and in Set with functions. But, technically ⊥ is also member of () type. So, technically, there is 2 value their. Actually, buttom is member of any type in Haskel, that is ignored. This leads, that function in Hask are actually partial(because of bottom). We’re reason about function in Hask as they where total (defined for every value in the domain). You can read about justification why “Fast and Loose Reasoning is Morally Correct” here http://www.cse.chalmers.se/~nad/publications/danielsson-et-al-popl2006.html
  3. There is implicit usage of Object Oriented concepts such as parametric polymorphism. You can think of Generic in Java or C++ template. There is very brief explanation in the lecture (it is, however, assumed that you know what is “regular” polymorphism is).
  4. It is assumed that you know what is pattern matching (see definition of length function in Haskell or see Scala example below). For example, latest Java 15 still has no pattern matching (there is only Pattern match for instanceof as preview language feature in Java 14, see my post here for the details).
  5. I don’t know how to can I emphasis this enough:

I don’t understand why this point is ignored also in Haskell tutorials, if I had know this before, it would make my you life easier.

Consider, simplified version from the standard library:

Based on
http://hackage.haskell.org/package/base-4.5.1.0/docs/src/GHC-Base.html

http://hackage.haskell.org/package/base-4.5.1.0/docs/src/Data-Maybe.html

Let’s look on fmap signature in Functor Mayber, it will be roughly

fmap::(a->b)->Maybe a-> Maybe b

or (using curing) and denoted f in place of a->b

fmap::(f, Maybe a)-> Maybe b

Note: Here "self” is Maybe a.

Let’s look on simplified implementation in Python:

Based on

https://github.com/dbrattli/OSlash/blob/f7283d92a379ca86eba2cf7109ef68a7bc27dc3c/oslash/typing/functor.py

https://github.com/dbrattli/OSlash/blob/f7283d92a379ca86eba2cf7109ef68a7bc27dc3c/oslash/maybe.py

Let’s look on fmap signature in Maybe(Functor),it is:

def fmap(self, f)

where self is Maybe, f is function.

Note:

  1. I have erased Python type hints for clarity. See Scala example below to recover the details.
  2. Just and Maybe should be really “sealed classes” . See, for example, alternative Scala implementation:

Based on https://github.com/hmemcpy/milewski-ctfp-pdf/releases/download/bca9cf5/category-theory-for-programmers-scala.pdf

Note:

  1. Plus sign in Maybe definition means, that Maybe is covariant, i.e. if X<:A (X is subtype of A, X “extends” A), than Maybe[X]<:Maybe[A]. You can read my separate post about variance. This means Maybe is covariant Functor in category theory.
  2. Here f is passed implicitly.
  3. Nothing in Scala is analogue to Void in Haskell (that is not void Java!).
  4. None is defined as singleton object. For every Maybe[B] it can be use as value.
  5. Case (pattern matching) is done on this that has type Maybe[A]. if it is singleton None type than None value is returned. If it is Just[A] type this type is deconstructed using primary constructor, x binds with Just.this.a value and used for computing f(x) later.

The JVM see the signature of the fmap method roughly as following :

public static scala.Function1<Maybe, Maybe> fmap(Maybe this, scala.Function1<java.lang.Object, java.lang.Object> f)

Note:

  1. First java.lang.Object is erased type from A, second java.lang.Object is erased type from B.
  2. First Maybe occurrence is erased type from Maybe[A], second Maybe occurrence is erased type from Maybe[B].

The point that I want to demonstrate here is that this reference is passed as first parameter, not the last as in Haskell.

There are couple of topics that I have to look on another places and than go back and watch again in order to understand on what’s going on.

6. Adjunctions
https://www.youtube.com/playlist?list=PL54B49729E5102248&pbj=1

There are 2 equivalent definitions for adjunctions and actually it is better to start to one with unit η and counit ε and not with one that are used in the MIT lecture.

If 2 categories has adjunctions they are “very similar” or are “weak isomoprhic”.

7. Recursion Schemes, F-algebra, catamorphism, Lambek’s Lemma, fix-point functor

  • For any constant functor exists a fix point
    You can think about regular f(x)=5 function, it is clear that f(5)=5, 5 is fix point. You can easily find it, start from any x_0 f(x_0)=5. Than use x_1=5 as new starting point: f(x_1)=f(5)=5. Note, that actually, what we did is f(f(x_0)=5 and further applying of f doesn’t change the result.
  • Consider function f(x)=x². It is clear that f(0)=0, that is 0 is fixed point (actually 1 is another fixed point). We can start from any point in (-1,1). Let’s take x=1/2.
    f(1/2)=1/4.
    f(1/4)=1/16

    We will get the following sequence
    {1/2, 1/4,1/16…1/2^n…| the n-number of applying of f}
    This process will converge at 0 (the limit of the sequence above is 0).
    (See Newton’s method, it is actually mentioned in the lecture).
  • At this point you may want to look Peeling the Banana: Recursion Schemes from First Principles — Zainab Ali https://www.youtube.com/watch?v=XZ9nPZbaYfE
    It is very unformall, but has working code (in Scala) and great following diagram that helps me for understanding on what is going on:
Image for post
Image for post

A — is carrier object (Int for example). We want evaluate our data-structure to an Int.

R — is recursive data-structure, “full-blown tree”.

FR — is data-structure as a functor, “the growing tree” that “approximate” R.

FA — is data-structure-as-a-functor that holds “lifted” answer of type “lifted” carrier object.

alg: function from data-structure-as-a-functor to the carrier type, aka evaluator or structure map function. The type is alg:: FA->A. For example, sum:: ExprF Int->Int.

in: isomorphism from data-structure-as-a-functor ( “the growing tree”) to recursive data-structure (“full-blown tree”). It is implemented first manually, and than using Fix constructor from Lambek’s Lemma (see details below, they are not provided in the video).

out: isomorphism from recursive data-structure (“full-blown tree”) to data-structure-as-a-functor ( “the growing tree”). It is implemented first manually, and than using unFix constructor from Lambek’s Lemma (see details below, they are not provided in the video).

cata: (shortcut from catamorpish) is generalization of List’s fold function, higher-order function that knows how to “walk on” recursive data-structure. It takes as parameters “the actions” that we want to apply. These “actions” has also accumulator as placeholder for previous result, that cata will supply.

fmap cata is “lifted” cata function.

cata can be construct as

alg ∘ fmap cata ∘ unFix=alg(fmap(cata(alg)unFix(r)) where r is recursive data-structure, tree for example.

Let’s walk through the formula of cata. r — is recursive data-structure, “full-blown tree”, first by invoking unFix we’re converting it to data-structure-as-a functor. Than we’re recursively (cata) applying alg to the “growing tree”, we’re “pushing down” (fmap) alginto our data-structur. When it hits “leaves” (ConstF term, for example), it doesn’t take any alg parameter, so this is base case of recursion. Applying alg on leaves “change it’s state” according to alg, than we are “moving up” (we’re exiting recursion calls) and “updating” data-structure state. When we finish the fmap cata, we’re in FA state, we have our data-structure-as-a-functor that holds the answer. Applying alg one more take will extract the answer from data-structure-as-a-functor to answer A.

Note: Now, I’m going back to the MIT lecture.

  • The process above is similar of the “growing tree” using ExprF functor, where in iteration n we can have trees up to the depth n. We’re using directed colimit in category theory and not regular limit, see Adámek’s theorem). Note, that in the lecture only the definition of direct limits of algebraic objects and not for an arbitrary category is provided.
    Using colimits help us to identify the “same” trees from different iteration. (for example, we have leaves in every iteration, we have trees with depth=1 on every but first iteration, etc).
  • Adámek’s theorem says that under some condition exists initial F-algebra (and it can be constructed as colimits of ω-chains). Lambek’s Lemma says, if functor F (“the tree”) has initial algebra α:F(X)→X then X is isomorphic to F(X) via α, in other words in initial algebra F(X) ≅ X or in other words initial algebra of functor F (“the tree”) is fixed point of the F (there is prove in the lecture that doesn’t rely on Adámek’s theorem).
    This means, that practically, it is enough to rely on Lambek’s Lemma to find the Fix ffixed point of the functor F (and to hope that there are no more fixed points…) to construct initial algebra.
    Haskell code is provided in the lecture. Isomorphism of Lambek’s Lemma is implemented by newtype (any newtype in Haskell defines isomorphism; the same effect can be achieved using pattern match with case class, see the Scala code from the video above).

Alternative course

You can also see more “pure math” and extended version of this lecture here (Seattle, Summer 2016):

Note: This course requires some knowledge of naïve Set theory (there is heavy usage of many properties without any explanation about why they holds), at the bare minimum Cartesian product, discriminated union of sets, relations (reflexivity, antysemmetricity, transitivity) all properties about function between sets, properties about function composition, including why this operation is associative and how idbehaves with it, properties about injective and subjective function and their connection with function composition (“cancellation” rules). Cantor–Bernstein theorem is not required, though. :-) (You don’t require to know this or this).

https://www.youtube.com/watch?v=I8LbkfSSR58&list=PLbgaMIhjbmEnaH_LTkxLI7FMa2HsnawM_

https://www.youtube.com/watch?v=3XTQSx1A3x8&list=PLbgaMIhjbmElia1eCEZNvsVscFef9m0dm

https://www.youtube.com/watch?v=F5uEpKwHqdk&list=PLbgaMIhjbmEn64WVX4B08B4h2rOtueWIL

You can also find Bartosz Milewski’s ‘Category Theory for Programmers’ unofficial PDF and LaTeX source here https://github.com/hmemcpy/milewski-ctfp-pdf/

Note:

Bottom type (⊥) existence is recognized and discussed in some details (without providing any proof). You can read about justification why “Fast and Loose Reasoning is Morally Correct” here http://www.cse.chalmers.se/~nad/publications/danielsson-et-al-popl2006.html

Correspondence between Thing category and Preorder is provided by observation that for every vertexes x,y hom(x, y) = {(x, y)} if x ≤ y (and otherwise the empty set)} and (y, z)∘ (x, y) = (x, z). Also note that associativity of ≤ as Preorder implies associativity of composition in thing category (and by contsraction vice versa, because if morphism exists, we have exactly one morphism that corresponds to ≤). So, Another way to look on this is to say that Preorder is enough to create category and there is no additional structure that is not imposed by being category in it. In this sense Preorder is “the essence” of ordering (compare to partial order and total order — both has additional requirements).

Note: In Thing category every morphism is epic, every morphism is mono (we have at most 1 morphism between 2 elements a, b). If we have partial order than every morphism f:a->b is isomoprphism, but there is no “inverse” morphism g:b->a (so, for every f there is no g such that f ∘ g = Id_b and g ∘ f = Id_a) (because there no “inverse” morphisms at all).

Note: In Set category function is injective and surjective iff exists “inverse” function, but this doesn’t hold in any category.

Senior Software Engineer at Pursway

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store