Remember **functors**? Recall from my last post, {{linkedTitle "_posts/2022-03-15-contexts-and-effects.md"}}, they are structures that abstract away complexity imposed by nondeterminism present in **contexts** that produce some output; contexts such as optionality, network interaction, or validation. When contexts fail to produce some output, they are in their **undesired case** and no computation may be performed against them. In this post we will explore how to exploit this characteristic to halt computation in order to express control flow.
Contexts that are functors thus allow abstraction over unknown cases. For example, a function `f: A => B` may be lifted into an `Option[A]` and applied if an instance of `A` is present. If no instance of `A` is present, then nothing happens. Specifically, by using `map()` the function `f` is unconcerned with the unknown quantity of `A`'s presence. Many contexts encode dimensions of unknown quantities, and as functors they completely abstract the nondeterminism of these quantities, allowing the business logic expressed by the lifted functions to focus only on the terms that they operate against.
What this means is that for any context in the **desired case**, such as a `Some` of `Option[A]` or a `Right` of `Either[X, A]`, the function `f` will be applied when lifted with `map()`. Any number of functions may be applied to the new contexts returned by subsequent applications of `map()`, and they will all apply as the initial context was in the **desired case**. Functors thus may be considered as enablers of data transformation, as lifted functions transform data if it exists. But if the initial context is in an **undesired case**, none of the lifted functions will apply.
Functions lifted into a context are permitted to compute if the context is in the **desired case**. But if a function is lifted into a context that is in the **undesired case**, then computation is halted. This means that the case of any context may control the flow of execution within a program.
Functors only allow lifting functions of the form `f: A => B`. The context can't be modified with a function having this signature, which means we can't use a functor specifically to influence control flow by injecting a context in its **undesired case**. Functors respect the _existing case_ of a context: they cannot modify it.
This two-argument analog of `map()` unlocks a key capability: controlling whether to proceed or halt computation against the terms contained within the contexts `fa` and `fb`. If both `fa` and `fb` are in their **desired case**, then there are instances of `A` and `B` against which the function `f` may be applied. But if either one or both are in their **undesired case**, then `f` does not apply, and the **undesired cases** are _propagated_ through `F[C]`. This means that `fa` and `fb` become _levers_ with which to halt computation that would be performed using function `f` and subsequent computation against context `F[C]`.
Only when both arguments are in their **desired case** does the function of addition apply. If either or both arguments are in their **undesired case**, then the **undesired case** is _propagated_. This does not allow the function to apply, and halts any further operations against the context. _The functions that produce the two input contexts are thus capable of controlling whether computation via `f` proceeds and permits further computation against the context `F[C]`._
The `map2()` function is implemented using a new structure, a specialization of a functor called an **applicative functor**, or simply an _applicative_.
Applicative functors as a specialization arise in the type of `A` contained within a functor `F[_]`. If `A` is merely an opaque type, then `F[A]` is a functor and no more. But if `A` is specifically known to have some type `A => B`, that is to say _`A` is a function_, then `F[A => B]` is an _applicative_ functor.
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/control/Applicative.scala) for the definition in the sample repository.
Note that `Applicative` extends `Functor` as it is a specialization. All applicatives are also functors and therefore also provide the `map()` function.
*`pure()` which lifts the result of a pure computation `A` into the context such that `pure: A => F[A]`. This is essentially a constructor producing a context in the **desired case**, such as `Some` for `Option[A]` or `Right` for `Either[X, A]`. In short, `pure()` puts `A` in the box.
*`ap()`, read as _apply_, for applying a lifted function to a lifted argument. Given two boxes, if the first contains a function and the second contains an argument, `ap()` will apply them and put them back in the box.
You might be wondering why a function would ever be lifted into a context? I will demonstrate why this is desirable in how `ap()` works by defining `map2()` within `Applicative`:
2. Lifting it with `pure()` gives `F[A => B => C]` in the **desired case**, which gives us a clean slate to start our computation with.
3. Then, with `ap()` we may apply the first argument `fa: F[A]` which will give back `F[B => C]` in the **desired case**_if `fa` is itself in the desired case_.
4. Then, with `ap()` we may apply the second argument `fb: F[B]` which will give back `F[C]` in the **desired case**_if `fb` is itself in the desired case_.
Each step of lifted function application accounts for the case of the function and argument contexts and halts if either context is in the **undesired case**. The **undesired case** will _propagate instead_ through `F[C]` if and when it exists.
You might have noticed, `map2()` looks an awful lot like `map()`. In fact, `Applicative` provides a default implementation of `map()` following the same pattern:
If your context implements `Applicative`, then it also implements `Functor` with no extra work. You can always provide your own implementation of `map()` if it is more efficient to do so.
Let's walk through a powerful capability of applicatives: _validation_.
### Validating a `User` from external data
When constructing a `User` from data that we receive from an external source, such as a form or API, we can use `ap()` to lift `User`'s curried constructor into a validation context and apply it to validated arguments. If all arguments are valid, then we should receive a validation context containing a valid `User`. If any arguments are invalid, then we should receive an invalid context with all reasons for validation failure.
Each of `username`, `email`, and `password` must to be valid in order for `User` itself to be valid. This requires the introduction of a validation context:
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/data/Validated.scala) for the definitions in the sample repository.
For both cases of `Left` they are immediately returned and there is no specific handling for situations where both `ff` and `fa` may be in the `Left` case. This means that the first `Left` propagates and all subsequent `Left`s are swallowed. In the context of validation, this means that for any number of validation errors that the context might produce, we would only receive the first error. We would have to resolve the error and re-run the operation, and repeat for each subsequent error until the operation as a whole succeeded. This makes `Either` a very poor choice for modeling validation. It represents strictly one thing or the other, whereas validation we can specialize to propagate all reasons for failure.
`Validated` contains the valid value you want or the reasons for invalidation. We could fail to receive a `User` for three or more reasons related to `username`, `email`, and `password` all being invalid, which implies that term `E` represents some data containing one or more of _something_. This has a specific implication on how we define an instance of `Applicative` for `Validated`:
When there are two instances of `E` we don't have a way to combine them as `E` is an opaque type. Without concretely defining `E`, such as with `List[String]` or another similar structure, we won't be able to combine their values, but this creates an inflexible API. Specifically, this inability to combine `E` leaves `Validated` in the same position that `Either` is in: the _first_**undesired case** propagates and subsequent cases are _swallowed_. How do we _combine_ the **undesired cases**?
Structures defining a `combine()` function form a typeclass known as a **semigroup** under a specific condition: that `combine()` is associative. Semigroups are very common, and constraining `E` to have an instance of `Semigroup` provides great API flexibility. First, let's see how the `Semigroup` typeclass is defined:
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/data/Semigroup.scala) for the definition in the sample repository.
> [See here]({{code_repo}}/src/test/scala/green/thisfieldwas/embracingnondeterminism/data/SemigroupLaws.scala) for the definition in the sample repository.
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/data/Validated.scala#L111-L149) for the definition in the sample repository.
Naively, a `List[String]` works for `E`. It forms a `Semigroup` under concatenation, but concatenation isn't cheap in Scala `List`s. It can also be empty per its type, which means that as `E` you have to code for invariants where it is actually empty.
There exists a better structure, and we can whip it together pretty quick: the `NonEmptyChain`. This structure is a context modeled as two cases: either a single value, or a pair of separate instances of itself appended together. This allows for a `List`-like structure with constant-time concatenation that can be converted to a `Seq` in linear time.
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/data/NonEmptyChain.scala) for the definition in the sample repository.
> [See here]({{code_repo}}/src/test/scala/green/thisfieldwas/embracingnondeterminism/data/ValidatedSpec.scala#L50-L141) for the specs in the sample repository.
Applicatives thus enable entire computations to succeed if all context arguments are in the **desired case**. If any argument is in the **undesired case**, then this case is _propagated_ and the computation as a whole fails.
Each of `validateUsername()`, `validateEmail()`, and `validatePassword()` act as levers on whether a `User` is successfully produced. Writing specific if-statements to guide whether a `User` is produced or errors returned instead is not required: the `Applicative` typeclass succinctly abstracts away the necessary plumbing to control the flow of logic required to handle **undesired cases**. Errors are declared where they should occur and the abstraction handles the rest.
It may not have been obvious from `validateUser()`, but each validation function evaluates independently of the other validation functions. In the `Validation` context, this means that each function executes without impacting the other functions regardless of individual success or failure. Imagine for a moment, what if the functions were evaluated within an asynchronous context?
> [See here]({{code_repo}}/src/main/scala/green/thisfieldwas/embracingnondeterminism/control/Applicative.scala#L83-L96) for the definition in the sample repository.
val loadingUsers = Applicative[Future].sequence(List(
loadUser("test@email.com"),
loadUser("student@school.edu"),
loadUser("admin@foundation.net"),
))
```
:::
The variable `loadingUsers` now contains `Future[List[User]]`. As each `Future[User]` resolves, they are collected into a `List`. Because each `loadUser()` function executes independently, this has a profound implication in the context of a `Future`: they are executed concurrently!
The pattern offered by `Applicative` is an _all-or-nothing_ result in its output. If all inputs are in the **desired case**, then the output will be in the **desired case** as well. But if any are in an **undesired case**, then the **undesired case**_propagates_ and computation halts.
Given a 2-tuple of functors `(F[A], F[B])` you can invert the nesting of the context and the 2-tuple using the `pure()` and `ap()` functions from `Applicative` with the following steps:
scala> val productOfSomes = (Option(42), Option("banana"))
val productOfSomes: (Option[Int], Option[String]) = (Some(42), Some("banana"))
scala> val someOfProduct = (_: Int, _: String).curried.pure[Option].ap(productOfSomes._1).ap(productOfSomes._2)
val someOfProduct: Option[(Int, String)] = Some((42, "banana"))
```
:::
This has an important implication, specifically that unlike the `sequence()` function these operations allow for gathering effectful operations that produce contexts with heterogeneous terms. Scala allows for tuples up to 22 elements, and it makes sense to abstract the above operations for each tuple size, especially because even at just 2 elements writing all of these out is already clunky!
In the sample repository, I have written a macro which [adds two extension methods]({{code_repo}}/macro/src/main/scala/green/thisfieldwas/embracingnondeterminism/data/GenerateTupleSyntax.scala) to each tuple size. Here's the code that is generated for the 2-tuple:
Being able to invert the nesting of a tuple and a context is most powerful when constructing case classes from arguments produced by effectful operations. By using the `mapN()` function, for example, validating arguments to the `User` constructor may be rewritten like this:
This syntax afforded by the `mapN()` extension method is much more concise and closely matches the constructor arguments order passed to `User` itself. After all, `User` is a product of results produced by contexts, which evaluate independently of each other as they do in the `sequence()` function.
In order to become an `Applicative`, an effect type must implement the typeclass. Let's implement instances for the usual suspects, `Option`, `Either`, and `List`. As `Applicative` is a specialization of `Functor`, we can simply upgrade our current `Functor` instances to become `Applicative`s:
`Option` and `Either`'s instances of `Applicative` are straight-forward: if a function and argument are present, they are applied and the result returned in the **desired case**. If either are missing, then the **undesired case** is _propagated_ instead.
`List` looks very different at first glance, but conceptually performs the same way. Specifically, `List` performs a Cartesian product of its functions and arguments, applying each pair together and building a new `List` from the results. If either the function or argument `List` are empty, then an empty result `List` is returned, as an empty `List` represents the **undesired case**.
`Option`, `Either`, and especially `List`'s `Applicative` instances look different. How do we know that they are well-behaved as applicatives? Just like functors, applicatives are expected to conform to a set of laws defined in the higher math of [category theory][].
There are four applicative laws, which must hold for all applicatives in addition to the functor laws.
1.**Preservation of identity functions**: A lifted identity function applied to a lifted argument is the same as the identity function applied directly to the lifted argument.
2.**Preservation of function homomorphism**: Lifting a function and an argument then applying them produces the same result as applying the unlifted function and unlifted argument then lifting the result.
3.**Preservation of function interchange**: Given a lifted function and an unlifted argument, applying the lifted function after lifting the argument should give the same result as when reversing the order of the function and argument. This is difficult to express in words, and the code is hard to follow, but roughly this translates to `ap(ff: F[A => B])(pure(a)) == ap(pure(f => f(a)))(ff)`.
4.**Preservation of function composition**: Given lifted functions `ff: F[A => B]` and `fg: F[B => C]` and argument `fa: F[A]`: lifting `compose()` and applying `fg`, `ff`, and `fa` produces the same result as applying `fg` after applying `ff` to `fa`.
These laws are rigorous and we can write tests for these to prove that our applicative instances are defined correctly.
In order to properly test our applicative instances, we need to be able to generate a broad range of inputs to verify that the applicative properties hold with a high degree of confidence. Specifically, "for all" `List`s, for example, the property checks for `Applicative` must pass for each generated instance. We will leverage `scalacheck` for property-based testing. `scalacheck` will generate for us a set of arbitrary instances of the contexts and execute tests called _property checks_ against each to verify that each check passes. If all checks pass, then the property may be considered to hold "for all" instances of the tested context.
Generating an arbitrary context `F[_]` containing an arbitrary `A` is not supported directly by `scalacheck`, however. We can leverage this typeclass below to enable generating instances of `F[A]` from any generator for `A`:
> [See here]({{code_repo}}/src/test/scala/green/thisfieldwas/embracingnondeterminism/util/LiftedGen.scala) for the definition in the sample repository.
Now we will walk through defining the properties of `Applicative` in such a way that we simply supply the contexts as the argument to the test. This way we only define the properties once.
> [See here]({{code_repo}}/src/test/scala/green/thisfieldwas/embracingnondeterminism/control/ApplicativeLaws.scala) for the full definition of the trait.
Each of `Option`, `Either`, and `List` conform to the applicative laws and we only had to write the properties once. These properties prove that functions and arguments used within these contexts maintain referential transparency in their arrangements and that the specific contexts do not change the factoring semantics of the code.
What does change, however, are these contexts' specific effects. For example, you would not have to refactor code abstracted by applicative functions if you changed the backing implementation from `Either` to `List`, but your code would produce potentially more than one result in the **desired case**.
This is the goal, however, as these effects' dimensions of unknown quantity should not burden our code. Instead, we push the complexity to the edge of the context, where it is important that our context is an `Either` or a `List`, and keep our business logic focused on individual instances contained within each context.
Applicatives primarily offer independent computation. Specifically, the arguments to applicative functions such as `ap()`, `map2()`, or `sequence()` are evaluated independently of one another, and their individual outputs as a whole influence whether the functions consuming them are permitted to compute against the outputs of their **desired cases** or if they should halt computation and _propagate_ any **undesired cases**.
When all inputs to an applicative function are in the **desired case**, then the output of the lifted functions will also be in the **desired case**. Conversely, if any input is in the **undesired case**, then it will be propagated instead, and the other cases will be _discarded_. In this regard, applicative functions provide an _all-or-nothing_ operation.
Independent computation provides some level of control flow, but it doesn't guide execution to proceed only if the previous execution has succeeded, as all operations evaluate independently of each other. Applicatives therefore do not provide a mechanism to support imperative programming. For this kind of control flow, you need to further specialize the applicative functor.
In my next post {{linkedTitle "_posts/2022-06-17-imperative-computation.md"}} we will explore the infamous _**monad**_ and how it enables imperative control flow in functional programming.