Java’s record and sealed classes as categorical product and sum types. Part I

Why they where added so late to Java

alex_ber
8 min readMar 26, 2024

Introduction

Arrays allow to define type of variables that can hold several data items of the same kind. Similarly structure (or struct)is another user defined data type available in C that allows to combine data items of different kinds.

In JDK 1.0 struct was deliberately left out of language. It was though that array and class are enough. Array usage pretty quickly become impractical. Class usage evolve to JavaBean convention. IDE support for generated JavaBeans was developed. Project Lambook (and another alternatives) was introduced, but they were fragile, they don’t play well with other library and tend to break on major Java release. Couple of utility methods at Java 5,7, 9 were introduced to java.util.Arrays and java.util.Objects to easier implementation of hashCode() and equals() of JavaBeans. At Java 14 record was released as standard feature .

Also unionwas not bring into the language. In C aunion s a special data type available that allows to store different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Unions provide an efficient way of using the same memory location for multiple-purpose.

With C union and struct one can defined tagged union or discriminated union. This is union that have associated with them a piece of data that tracks which of the potential union properties is currently set.

Let see an example:

In this example we have a simple union that holds a char and an int. This union is wrapped in a struct that has a char variable and an instance of the union. We use a single letter char to represent each type, either i or c. In main we initialize the tagged union and then loop. In each iteration we check which property is set, log the value, flip the tag and set the associated property. The output from running it is:

21
A
1
A
3

https://medium.com/@almtechhub/c-c-tagged-discriminated-union-ecd5907610bf

Note: See my Categorical Sum and Product types using struct tagged union in C https://alex-ber.medium.com/implementation-of-categorical-sum-and-product-types-using-struct-tagged-union-in-c-c01cdbb13793 if you want to deep dive in C examples.

The easiest way to model struct was class with public data-members only. The main drawback of this approach is luck of the encapsulation.

Note: In C it is perfectly fine to have function that will update both value and use it to change the temperature in both Fahrenheit and in Celsius. C doesn’t have “private” method and doesn’t rely on encapsulation. In Java we don’t have top-level function, so we should put it into some class. We can put to another class and pass our “struct” as parameter (essentially, this is extension function in Kotlin). But this will increase complexity of the code and it will be difficult for the reader to understand what’s going on. Java’s encapsulation-by-default just complicates things without any benefit. This point is exactly one that lead to introduction of records in Java 14. See Explaining invokedynamic. Records for short introduction, Keeping Pace With What’s New in Java 14 with example usages, Explaining invokedynamic. Dynamical hashCode implementation if you’re interesting in implementation details. Also we will discuss records and sealed classes in part II.

Quote from Wikipedia (informal definition of sum type or coproduct):

In computer science, a tagged union, also called… discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several “cases”, each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leafs. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

https://en.wikipedia.org/wiki/Tagged_union

C struct can be viewed as categorical product type.

The categorical productis, informally, of two objects A and B is the most common object in this category, for which there are projections on A and B. In many categories (sets, groups, graphs, programming types…) the product of the objects is their cartesian product. It is C struct or C++ std:tuple.In Java 14+, this is records.

Java dispatches call to right function under inheritance, but it can be viewed as some restricted form of the virtual functions.

Virtual functions provide an alternative to discriminated unions.

Quote from Wikipedia:

In a typical class hierarchy in object-oriented programming, each subclass can encapsulate data unique to that class. The metadata used to perform virtual method lookup identifies the subclass and so effectively acts as a tag identifying the particular data stored by the instance.

Nevertheless, a class hierarchy involves true subtype polymorphism; it can be extended by creating further subclasses of the same base type, which could not be handled correctly under a tag/dispatch model. Hence, it is usually not possible to do case analysis or dispatch on a subobject’s ‘tag’ as one would for tagged unions. Some languages such as Scala allow base classes to be “sealed”, and unify tagged unions with sealed base classes.

https://en.wikipedia.org/wiki/Tagged_union#Class_hierarchies_as_tagged_unions

At this point you should realize, that Java 16 (second preview feature)sealed classesis “glorified” tagged union. I will explain why in part II.

Another quote from Wikipedia:

The main disadvantage of tagged unions is that the tag occupies space. Since there are usually a small number of alternatives, the tag can often be squeezed into 2 or 3 bits wherever space can be found, but sometimes even these bits are not available. In this case, a helpful alternative may be …encoded tags, where the tag value is dynamically computed from the contents of the union field. Common examples of this are the use of reserved values, where, for example, a function returning a positive number may return -1 to indicate failure, and sentinel values, most often used in tagged pointers.

https://en.wikipedia.org/wiki/Tagged_union#Advantages_and_disadvantages

So, from this point of view Java 8 java.util.Optional is also sort of “glorified” tagged union. It represent discriminated union of the non-null value with null (that represents absence of value).

Side note: Does java.util.OptionalMonad? The short answer is yes (almost). But it is unintended result. You can read here https://alex-ber.medium.com/does-java-util-optional-is-monad-492911fb66ee for more details on this.

Also, Java’s boolean is also sort of “glorified” tagged union. It represent discriminated union of reserved value true and reserved value false.

As of Java’s 5.0 java.lang.Enum, we will get back to this below.

Enum as tagged Union

Java’s Enum type can be viewed as degenerate case of categorical sum type— discriminated union of unit types. In more simple words:

Quote from Wikipedia:

An enumerated type [Java’s enum] can be seen as a degenerate case: a tagged union of unit types. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag.

https://en.wikipedia.org/wiki/Tagged_union#Description

If we have language such as Haskell or ML, where tagged union is first-class citizen, we could implement enumusing it.

Let’s look on 2 extreme cases of such Java’senum implementation first, one constant Java enum and two constant Java enum.

  1. Java enum with one constant is actually stateless Singleton.

Something like:

public enum NothingType {
Nothing
}

In Haskell it can be expressed as

This expression is also equivalent to singleton type().

Side note: In Java we can have not only stateless Singlteon, but also statefull. It may surprise you, but actually up to Java 5.0 many of singleton implementation has a bug. It was so wide spreader, that Java Memory Model was fixed to make the buggy code work. The same problem was in C++, that was fixed in C ++ 11. For more details, see https://alex-ber.medium.com/java-5-2005-c-11-c11-memory-model-appears-that-have-been-invented-by-the-same-people-efa2cb6576e2

2. Java enum with two constants

Something like:

public enum Bool {
True,
False
}

In Haskell it can be expressed as

This is essentially primitive Java’s boolean.

3. java.lang.Enum

This section is quote from my story Java Typesafe enum history
https://alex-ber.medium.com/java-typesafe-enum-history-20053e8d0ea6

Let’s see on code example:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

{ double eval(double x, double y) { return x + y; } },

is example of enum constant, namely PLUS, that have class body (see above).

Conceptually, it is equivalent to the following code:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

Note:

  • You can think as for every constant we can override some method. Another way to think about it, that this “extended enum” has some one method with internal dispatch mechanism as described above. If it resembles sealed class usage (disjoint union), this is correct…(we will go into it in part II).
  • Quote from the link above:

This works fine, but it will not compile without the throw statement, which is not terribly pretty. Worse, you must remember to add a new case to the switch statement each time you add a new constant to Operation. If you forget, the eval method with fail, executing the aforementioned throw statement.

See https://alex-ber.medium.com/java-exception-hierarchy-f6aef08ab9b about why AssertionError is thrown.

  • It is also interesting that when Enum was added at JDK 5.0 special construction enum constant that has a class body was added in order to avoid using “switch on enum”. If you look on this closely, this is example of Pattern matching. It is funny enough, that JDK 14 introduces a limited form of pattern matching, so, essentially now it is admitted that using switch is preferred way, and whole enum constant that has a class body was mistake.
  • I want to re-iterate again, when JDK 5.0 was released using enum constant that has a class body was preferred our switching over enum (this is form of pattern matching). From JDK 14 Java is moving to support Sum and product types (those are concepts from functional programming [see below and in part II]…. Here I want to mention only, that this enum constant that has a class body was abundon. Again, it show you that using inheritance mechanism to implement disjouint union is a wrong way to go.

Enum constant that has a class body

Also I want to mention that enum constant that has a class body is some generalization of the the concept that enum can have data and behavior on it.

For example consider the planets of the solar system. Each planet knows its mass and radius, and can calculate its surface gravity and the weight of an object on the planet. Here is how it looks:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

The enum type Planet contains a constructor, and each enum constant is declared with parameters to be passed to the constructor when it is created.

https://alex-ber.medium.com/java-typesafe-enum-history-20053e8d0ea6

The enum type Planet contains a constructor, and each enum constant is declared with parameters to be passed to the constructor when it is created.

Here is a sample program that takes your weight on earth (in any unit) and calculates and prints your weight on all of the planets (in the same unit):

$ java Planet 175
Your weight on MERCURY is 66.107583
Your weight on VENUS is 158.374842
Your weight on EARTH is 175.000000
Your weight on MARS is 66.279007
Your weight on JUPITER is 442.847567
Your weight on SATURN is 186.552719
Your weight on URANUS is 158.397260
Your weight on NEPTUNE is 199.207413
Your weight on PLUTO is 11.703031

https://docs.oracle.com/javase/8/docs/technotes/guides/language/enums.html

For more see part II.

--

--