Java’s record and sealed classes as categorical product and sum types. Part II

Why they where added so late to Java

alex_ber
12 min readMar 30, 2024

Quick recap

The categorical productis, informally, of two objects A and B is the most common object in this category, for which there are projections on A and B. In many categories (sets, groups, graphs, programming types…) the product of the objects is their cartesian product. It is C struct or C++ std:tuple.In Java 14+, this is records.

Quote from Wikipedia (informal definition of sum type or coproduct):

In computer science, a tagged union, also called… discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several “cases”, each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leafs. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

https://en.wikipedia.org/wiki/Tagged_union

Java 16 (second preview feature)sealed classesis “glorified” tagged union. I will explain why below.

See Part I for more details.

Categorical sum type or coproduct

Quick recap: another name for this is tagged union, discriminated union, or disjoint union.

Examples that we saw:

  • C tagUnion
union vals {
char ch;
int nt;
};
struct tagUnion {
char tag;
vals val;
};
  • Java enum with 1 variable.

Something like:

public enum NothingType {
Nothing
}

In Haskell it can be expressed as

  • Java enum with 2 variables

Something like:

public enum Bool {
True,
False
}

In Haskell it can be expressed as

This is essentially primitive Java’s boolean.

  • Java enum type in general (without class body)

It can be viewed as degenerate case of categorical sum type / discriminated union of unit types [it represents the absence of meaningful data or to indicate that a function has side effects but doesn’t produce any useful result].

This is essentially a restricted form of sum types. They represent a choice between a finite number of alternatives, but each alternative doesn’t carry any additional data (unlike more expressive sum types found in functional languages).

For example, in Haskel:

data List a = Nil | Cons a (List a)

Nil and Cons are different types in Haskell, while in Java Enum, all enum’s value has the same type.

  • Java enum type in with class body

In this case alternatives do “carry any additional data”, but the type of the enum constant is the same. We can’t express List as we can in Haskel. It is still restricted form of sum types, it is less restricted as in previous point, but still.

  • Sealed class

Sealing allows classes and interfaces to have more control over their permitted subtypes, This is particularly useful for general domain modeling.

A class or interface may be declared sealed, which means that only a specific set of classes or interfaces may directly extend it:

package com.example.geometry;

public abstract sealed Shape
permits Circle, Rectangle, Square { ... }

This declares a sealed interface called Shape. The permits list means that only Circle and Rectangle may implement Shape. (In some cases, the compiler may be able to infer the permits clause for us). Any other class or interface that attempts to extend Shape will receive a compilation error (or a runtime error, if you try to cheat and generate an off-label classfile which declares Shape as a supertype.)

We are already familiar with the notion of restricting extension through final classes; sealing can be thought of as a generalization of finality. Restricting the set of permitted subtypes may lead to two benefits: the author of a supertype can better reason about the possible implementations since they can control all the implementations, and the compiler can better reason about exhaustiveness (such as in switch statements or cast conversion).

It should be possible for a superclass, such as Shape above, to be widely accessible (since it represents an important abstraction for users) but not widely extensible (since its subclasses should be restricted to those known to the author). The author of such a superclass should be able to express that it is co-developed with a given set of subclasses, both to document intent for the reader and to allow enforcement by the Java compiler.

The permits clause allows a sealed class, such as the Shape above, to be accessible-for-invocation by code in any module, but accessible-for-implementation by code in only the same module as the sealed class (or same package if in the unnamed module). This makes the type system more expressive than the access-control system. With access control alone, if Shape is accessible-for-invocation by code in any module (because its package is exported), then Shape is also accessible-for-implementation in any module; and if Shape is not accessible-for-implementation in any other module, then Shape is also not accessible-for-invocation in any other module.

Sealed classes (cont).

A sealed class or interface can be extended or implemented only by those classes and interfaces permitted to do so.

A class is sealed by applying the sealed modifier to its declaration. Then, after any extends and implements clauses, the permits clause specifies the classes that are permitted to extend the sealed class. See example above.

The classes specified by permits must be located near the superclass: either in the same module (if the superclass is in a named module) or in the same package (if the superclass is in the unnamed module). For example, in the following declaration of Shape its permitted subclasses are all located in the same package if in the unnamed module:

package com.example.geometry;

public abstract sealed class Shape
permits Circle,
Rectangle,
Square { ... }

When the permitted subclasses are small in size and number, it may be convenient to declare them in the same source file as the sealed class. When they are declared in this way, the sealed class may omit the permits clause and the Java compiler will infer the permitted subclasses from the declarations in the source file. (The subclasses may be nested). For example, if the following code is found in Root.java then the sealed class Root is inferred to have three permitted subclasses:

abstract sealed class Root { ... 
final class A extends Root { ... }
final class B extends Root { ... }
final class C extends Root { ... }
}

Note: Classes specified by permits must have a canonical name, otherwise a compile-time error is reported. This means that anonymous classes and local classes cannot be permitted subtypes of a sealed class.

A sealed class imposes three constraints on its permitted subclasses:

  1. The sealed class and its permitted subclasses must belong to the same package, if declared in an unnamed module, or to the same module otherwise.
  2. Every permitted subclass must directly extend the sealed class.
  3. Every permitted subclass must use a modifier to describe how it propagates the sealing initiated by its superclass:
  • A permitted subclass may be declared final to prevent its part of the class hierarchy from being extended further. (Record classes, we will talk about them below, are implicitly declared final).
  • A permitted subclass may be declared sealed to allow its part of the hierarchy to be extended further than envisaged by its sealed superclass, but in a restricted fashion.
  • A permitted subclass may be declared non-sealed so that its part of the hierarchy reverts to being open for extension by unknown subclasses. A sealed class cannot prevent its permitted subclasses from doing this.

As an example of the third constraint, Circle and Square may be final while Rectangle is sealed and we add a new subclass, WeirdShape, that is non-sealed:

package com.example.geometry;

public abstract sealed class Shape
permits Circle, Rectangle, Square, WeirdShape { ... }

public final class Circle extends Shape { ... }

public sealed class Rectangle extends Shape
permits TransparentRectangle, FilledRectangle { ... }
public final class TransparentRectangle extends Rectangle { ... }
public final class FilledRectangle extends Rectangle { ... }

public final class Square extends Shape { ... }

public non-sealed class WeirdShape extends Shape { ... }

Note:

1. Even though the WeirdShape is open to extension by unknown classes, all instances of those subclasses are also instances of WeirdShape. Therefore code written to test whether an instance of Shape is either a Circle, a Rectangle, a Square, or a WeirdShape remains exhaustive.

2. The final modifier can be considered a special case of sealing, where extension/implementation is prohibited completely. That is, final is conceptually equivalent to sealed plus a permits clause which specifies nothing, though such a permits clause cannot be written.

3. Exactly one of the modifiers final, sealed, and non-sealed must be used.

4. Sealed or non-sealed class can be abstract, and have abstract members. A sealed class may permit subclasses which are abstract, providing they are notfinal.

5. Subclass may be less accessible than the sealed class. This means that, in a future release when pattern matching is supported by switches, some code will not be able to exhaustively switch over the subclasses unless a default clause (or other total pattern) is used.

Sealed interfaces

As for classes, an interface can be sealed by applying the sealed modifier to the interface. After any extends clause to specify superinterfaces, the implementing classes and subinterfaces are specified with a permits clause. For example, the planetary example from above can be rewritten as follows:

sealed interface Celestial 
permits Planet, Star, Comet { ... }

final class Planet implements Celestial { ... }
final class Star implements Celestial { ... }
final class Comet implements Celestial { ... }

Records

Records are a new preview feature in Java 14 providing a nice concise syntax to declare classes that are supposed to be dumb data holders.

You can read here (Records (preview language feature) section) for more details. Also, in Explaining invokedynamic. Dynamical hashCode implementation. Part V I’m provided you most part of the implementation with explanation.

Here’s a simple record example:

public record Color(String name, int code) {}

Given this simple one-liner, Java compiler generates appropriate implementations for accessor methods, toString, equals, and hashcode.

In C array was historical predecessor of struct. In C’s array you could access every element of it by index. Type of array can be the same for all element or not. Struct data-members are accessed by name and may have different types. Record can be viewed as back-porting struct from C (once more, I want to emphasize that implementation of record in Java is totally different).

Side-note:

1. Records, so as product type are immutable, while Struct from C or class in Java are generally mutable.

2. Java’s class can be also used to model struct from C, but it has many features that are lucking in the struct like ability to hold methods, to inherit methods and variables (including variable shadowing), polymorphism, data-access control, etc.

Records offer us a bargain;

what they ask us to give up is the ability to decouple the API from the representation, which in turn allows the language to derive both the API and implementation for construction, state access, equality comparison, and representation mechanically from the state description.

Binding the API to the representation may seem to conflict with a fundamental object-oriented principle: encapsulation. While encapsulation is an essential technique for managing complexity, and most of the time it is the right choice, sometimes our abstractions are so simple — such as an x-y point — that the costs of encapsulation exceed the benefits. Some of these costs are obvious — such as the boilerplate required to author even a simple domain class. But there is also another, less obvious cost: that relationships between API elements are not captured by the language, but instead only by convention. This undermines the ability to mechanically reason about abstractions, which in turn leads to even more boilerplate coding.

Side-note: JEP-395 (JDK 16) relaxes the longstanding restriction whereby an inner class cannot declare a member that is explicitly or implicitly static. This will become legal and, in particular, will allow an inner class to declare a member that is a record class.

Also, nested enum classes and nested interfaces are already implicitly static, so for consistency we define local enum classes and local interfaces, which are also implicitly static.

This is actually changes behavior introduced in JDK 1.1, but in backward compatible way (previous rejected-by-compiler code, now considered legal, so this is not changes existing accepted-by-compiler code).

Records come with some restrictions.

  1. Record classes are implicitly final.
  2. The record class itself cannot extend other class.
  3. Their instance fields (which correspond to the components declared in the record header) are implicitly final.
  4. They cannot have any other instance fields.
  5. If the canonical constructor is explicitly declared then its access modifier must provide at least as much access as the record class.
  6. if the canonical constructor is implicitly declared then its access modifier is the same as the record class.
  7. A record class, and the components in its header, may be decorated with annotations. Any annotations on the record components are propagated to the automatically derived fields, methods, and constructor parameters, according to the set of applicable targets for the annotation. Type annotations on the types of record components are also propagated to the corresponding type uses in the automatically derived members.
  8. Instances of record classes can be serialized and deserialized. However, the process cannot be customized by providing writeObject, readObject, readObjectNoData, writeExternal, or readExternal methods. The components of a record class govern serialization, while the canonical constructor of a record class governs deserialization.
  9. local record classes, akin to the existing construct of local classes.
List<Merchant> findTopMerchants(List<Merchant> merchants, int month) {
// Local record
record MerchantSales(Merchant merchant, double sales) {}

return merchants.stream()
.map(merchant -> new MerchantSales(merchant, computeSales(merchant, month)))
.sorted((m1, m2) -> Double.compare(m2.sales(), m1.sales()))
.map(MerchantSales::merchant)
.collect(toList());
}

Local record classes are a particular case of nested record classes. Like nested record classes, local record classes are implicitly static. This means that their own methods cannot access any variables of the enclosing method; in turn, this avoids capturing an immediately enclosing instance which would silently add state to the record class. The fact that local record classes are implicitly static is in contrast to local classes, which are not implicitly static. In fact, local classes are never static — implicitly or explicitly — and can always access variables in the enclosing method.

The declaration of a record can “override” the implicit constructor and method declarations should these prove unsuitable.

One example of where a record might want to refine the implementation of the constructor is to validate the state in the constructor. For example, in a Range class, we would want to check that the low end of the range is no higher than the high end:

public record Range(int lo, int hi) {
public Range(int lo, int hi) {
if (lo > hi)
throw new IllegalArgumentException(String.format("%d, %d", lo, hi));
this.lo = lo;
this.hi = hi;
}
}

However, records permit a special compact form for explicitly declaring the canonical constructor. In this form, the argument list can be omitted in its entirety (it is assumed to be the same as the state description), and the constructor arguments are implicitly committed to the fields of the record at the end of the constructor. The constructor parameters themselves are mutable. The following is the compact version of the above record declaration:

public record Range(int lo, int hi) {
public Range {
if (lo > hi)
throw new IllegalArgumentException(String.format("%d, %d", lo, hi));
}
}

Records for compound map keys as example

Note: It is goes without saying that such compound key should have correct hashCode() and equals() implementation. And it is better to have also readable toString() for debugging. Record supply as with these methods for free.

Sometimes we want a Map keyed on the conjunction of two distinct values, such as representing the last time a given user used a certain feature.

record PersonAndFeature(Person p, Feature f) { }
Map<PersonAndFeature, LocalDateTime> lastUsed = new HashMap<>();

In such cases I’ve used/abused AbstractMap.SimpleEntry.

Let’s take a look on the source code (I’ve removed JavaDoc for clarity):


static final boolean equals(Object o1, Object o2){
return o1 == o2 || (o1 != null && o1.equals(o2));
}

static final int hashCode(Object o){
return o == null ? 0 : o.hashCode();
}

public static class SimpleEntry<K, V> implements Entry<K, V>, Serializable {

private static final long serialVersionUID = -8499721149061103585L;

K key;

V value;

public SimpleEntry(K newKey, V newValue){
key = newKey;
value = newValue;
}

public SimpleEntry(Entry<? extends K, ? extends V> entry){
this(entry.getKey(), entry.getValue());
}

public boolean equals(Object o){
if (! (o instanceof Map.Entry))
return false;
// Optimize for our own entries.
if (o instanceof SimpleEntry){
SimpleEntry e = (SimpleEntry) o;
return (AbstractMap.equals(key, e.key)
&& AbstractMap.equals(value, e.value));
}
Map.Entry e = (Map.Entry) o;
return (AbstractMap.equals(key, e.getKey())
&& AbstractMap.equals(value, e.getValue()));
}

public K getKey(){
return key;
}

public V getValue(){
return value;
}

public int hashCode(){
return (AbstractMap.hashCode(key) ^ AbstractMap.hashCode(value));
}

public V setValue(V newVal){
V r = value;
value = newVal;
return r;
}

public String toString(){
return key + "=" + value;
}
}

If we want to declare our own PersonAndFeature type, without records, we should repeat the whole boilerplate, but we’re lazy, so using of SimpleEntry was seemed as less of the evil.

Worst approach is to concatenate the name of the Person with the name of the Feature. This results in harder-to-read, more error-prone code.

See part III for continue.

--

--