Algebraic Data Types In Scala

Object-Oriented Meets Functional
Scala’s punchline, https://scala-lang.org

Scala claims to unite object-oriented and functional programming, and offers a rich set features from both worlds. Developers coming from object-oriented languages—Java in particular—quickly adopt the object-oriented features of Scala but often struggle to find their way around functional programming. Some aspects of functional programming found their way into object-oriented languages: Higher order functions or combinators like `map` and `filter` appear in today’s C# or Java code, and even a preference for immutable data structures spreads out into conventional object-oriented languages.

But algebraic data types (ADTs) still do not appear in object-oriented programming although these enable the true power of functional programming: Types well-founded on theory let us model the problem domain in types and thus help us write correct-by-construction code which provides a higher level of compile-time confidence than possible with the type systems of most object-oriented languages. This article aims to help developers from object-oriented languages understand what it means, and become familiar with the basic set of algebraic data types commonly used in functional programming and its appearance in Scala.

Simple Types

Developers from object-oriented programming languages often conflate “types” with “classes” but in fact represent a much simpler, yet more powerful concept. For this article we define “type” as just a name for a set of values. We can define a Boolean type with standard set notation now:

This type has just two values, the well-known boolean constants `true` and `false`. We can also define more complex types, eg, Int for all integers, positive and negative:

Or a String as sequence of characters where Unicode denotes the set of all unicode code points:

Combined Types

We can combine these simple types in two fundamental ways: Either as a group of values of other types, ie, a product type, or as an alternative between values of different types, ie, a coproduct type or sum type. We can “calculate” with these types just like we can calculate sums and products of numbers, and these types obey similar laws. We say they “form an algebra”, hence the name “algebraic data types”.

Product types

Formally we can define the product of types[^product-type] T1, T2, … as follows:

C denotes the constructor of the new type. This constructor serves as a “tag” to differentiate between two product types combined of the same types. With the help of this tag we can create two different products of, eg, Boolean and Nat, by giving them different constructors. The following type introduces a generic pair, for example, one of the most basic product types:

We use the common infix notation (v, t) by which pairs appear in almost all contemporary programming languages, from Python to Haskell. Now (42, “Donald Duck”) becomes a value of the type Pair[Int, String]. By using a different constructor we can also give this generic pair a special name, like User with an ID and a name:

We follow the convention to name the constructor like the type, but we can also choose different names for each. This new type combines the same types as Pair[Int, String] but holds different values and thus becomes distinct from a pair. User 42 “Donald Duck” is a value of type User, but not of Pair[Int, String],

Sum types

Like a product combines values of different types at the same time a sum type[^sum-type] provides an alternative between values of different types. Formally we can define a sum of types T1, T2, … as follows:

Again C1, C2, … denote constructors where each constructor lifts another constituent type to the sum type. We can now define the common Either type, as an alternative between two types:

Now we can use the type Either[String, User] to represent the result of finding a user in a database. In case of error we return Left “User not found”—which strictly speaking is of type Either[String, T] for any T—otherwise we return Right user where user is a value of type User.

Types in Scala

In Scala we can already use pairs and tuples—the standard library includes these—and we can also define our own products with case classes:

``````> (42, "Donald Duck")
res0: (Int, String) = (42,Donald Duck)
> final case class User(id: Int, name: String)
> User(42, "Donald Duck")
res1: User = User(42,Donald Duck)
``````

Scala also supports sum type, but lacks a syntax for these. A sum type like `Either` looks straight-forward in Haskell:

``````data  Either a b  =  Left a | Right b
``````

The same definition in Scala looks considerable more noisy1:

``````sealed trait Either[L, R]

case class Left[L, R](value: L) extends Either[L, R]
case class Right[L, R](value: R) extends Either[L, R]
``````

This illustrates the common pattern to define sum types in Scala: In the absence of first-level support for sum types we must exploit subtyping to achieve the effect of sum types.

We define the type itself as a sealed trait. The `sealed` keyword forces us to define all subtypes in the same file as the trait and thus allows the Scala compiler to subsequently warn if a pattern match over the type does not match all subtypes (“exhaustiveness check”)2. Then we define each branch of the sum as distinct subclass of the trait, and use the corresponding type parameter as type for the value of either side. This use of subtyping has important implications for the ergonomics of the type which we will cover in the section after the next.

Shapes of generic types

If we look at the types in the previous section we notice some similiarity between product types. A Pair[Int, String] and our User type have the same shape: Apart from their constructors they have the same values. In fact constructors exist just to introduce an “artificial” distinction between otherwise equal sums or products of types, and thus allow us to give different names to the same type to aid understanding of our programs. But we can “omit” the constructors and abstract over the shape of these types.

This leads us to the famous [Shapeless][] library for Scala which provides types to abstract over the shape of algebraic data types. At the core of Shapeless lies a special `HList` type for a heterogenous list—a list where each element has a different type:

In other words, each alternative of a sum type becomes a distinct type. The expression `Left("Foo")` has type `Left[String]` for an arbitrary `R`, not `Either[String, R]`. We can then widen `Left[String]` to `Either[Left, R]` by invoking the subtype relation. The compiler automatically widens co-variant types3 but often we do not wish for automatic widening; in particular automatic widening complicates implicit search which in turn impacts type class resolution so libraries like cats and scalaz made most of their types invariant to prevent automatic widening to subtypes.

With invariant types we can find ourselves in a situation where the compiler infers the subtype, ie, a sum-type variant like `Left`, but needs the base type, ie, `Either`. The following (simplified) example with `OptionT` and `EitherT`—both invariant[^either-variance]—does not compile, for instance:

``````import cats.data._
import cats.Id

sealed trait Sum
case object A extends Sum

val res: EitherT[Id, Sum, Int] = for {
id <- OptionT.pure[Id](42).toRight(A)
} yield id
``````

The compiler complains:

``````type mismatch;
found   : cats.data.EitherT[cats.Id,A.type,Int]
required: cats.data.EitherT[cats.Id,Sum,Int]
Note: A.type <: Sum, but class EitherT is invariant in type A.
You may wish to define A as +A instead. (SLS 4.5)
``````

We must explicitly downcast to `Sum` with a type ascription to make the code compile:

``````id <- OptionT.pure[Id](42).toRight(A : Sum)
``````

This issue appears frequently and thus impacts the ergonomics of sum types in Scala, in particular when it causes much worse and less understandable compiler errors than the one above. It appears so frequently that cats and scalaz even have their own family of functions to help with subtyping. The `Functor` type also provides `widen` to widen the type of the functor argument:

``````> A.some.widen
res10: Option[A.type] = Some(A)
> A.some.widen[Sum]
res11: Option[Sum] = Some(A)
``````

Likewise `BiFunctor` (a functor with two “sides”, like `Either`) also has a `leftWiden` which we can use instead of the type ascription above4:

``````id <- OptionT.pure[Id](42).toRight(A).leftWiden
``````

While these functions provide some convenience to cope with subtyping, all in all ergonomics of sum types often falls short of what other functional languages like Haskell—which lack subtyping—can provide.

Summary

Product types combine different types into a new type; sum types describe an alternative of other types. In Scala the former appear as simple and well-known case classes, whereas the latter have a more intricate representation as subtypes of sealed traits or classes. In some cases this subtyping in sum types interferes with type inference and invariance which makes ADTs in Scala less ergonomic than in other languages like Haskell. Algebraic data types have similar shapes, and we can abstract over these shapes to write generic programs over all kinds of sum or product types. The famous shapeless library provides the necessary infrastructure for this abstraction—in particular a heterogeneous list type as generalization of tuples and generic types to convert concrete algebraic types into the corresponding heterogeneous list.

We hope that this article helped you understand how algebraic data types work, how they appear in Scala, and what the shapeless library contributes to generic programming.

1. We simplify the definition of `Either` for this article. In particular we omit all methods on `Either` and elide variance annotations. For the actual definition see scala.util.Either.

2. Instead of a trait we can use a sealed abstract class—in case we need to pass values to the base class, because Scala does not support trait parameters yet. Dotty will add these to Scala.

3. The article Of variance and functors provides an in-depth explanation of variance and its effects in types.

4. This code makes use of partial unification, see Cats FAQ.