Pure Functions
Summary
In this chapter we will begin our journey by exploring functions. We will look at a piece of code that can be improved by using pure functions. If you are already familiar with the idea of pure functions and immutability, you can safely skip this chapter.
Prelude
It's 10AM. The window in the developer room is fogged-up from dozens of steamy cups of coffee. You lean on the window frame and lose yourself in painting little lambdas on the misty glass. A tiny hummingbird flaps your way, with a coffee-stained slip of paper in its beak. Unfazed -- and feeling like a disney princess (this isn't the first time workload was sent by way of fowl) -- you snatch it from the air with swift hands. It reads:
The junior wants to build the admin view, could you aid him and build the
model? It should display the registration date in long date format and the user
initials. Jim is building the side bar, maybe you guys should have a quick
chat?
You open up your editor, excited to solve some problems. The small bird briefly pauses on your shoulder, silently judging your choice of programing language, before fluttering away again.
Something something user
Alright. Let's have a look at the data.
output
[
{firstName: 'Barbara', lastName: 'Selling', registered: '01.03.2017'},
{firstName: 'John', lastName: 'Smith', registered: '12.24.2019'},
{firstName: 'Frank', lastName: 'Helmsworth', registered: '05.11.2011'},
{firstName: 'Anna', lastName: 'Freeman', registered: '07.09.2003'},
{firstName: 'Damian', lastName: 'Sipes', registered: '12.12.2001'},
{firstName: 'Mara', lastName: 'Homenick', registered: '08.14.2007'},
];
Ah, glancing at the registered
field we can see that it was stored
as a US-formatted string, MONTH.DAY.YEAR
, or MM.DD.YYYY
, if you are
familiar with date field notation.
The view needs it in a different format though, something like Wednesday, 9 July 2003
,
or EEEE, d MMMM YYYY
. Before we go about and create our view
model, we'll capture the structure of our data in a type.
type User {
firstName: string;
lastName: string;
registered: string;
}
Unlike in classical object-oriented programming (OOP), where designs tend to combine data, behaviour, state and identity, in the style of functional programming that we will learn as part of this series of tutorials, we clearly distinguish and separate these.
When we see a type, we don't mean an "an instance of a class", we mean "a piece
of data that matches a certain structure". In our specific case, any data that
matches its structure can be considered a User
. Think one of those toys that
seem to be sold exclusively to doctors waiting rooms, where children are
supposed to match wooden blocks in various shapes (stars, circles, rectangles)
to their corresponding holes. If the block matches the "star" hole, for all
intents and purposes, its a star.
This kind of type system is called structural sub typing. It means that the type of a thing is not defined by its place of declaration or internal name (as is the case in nominal typed languages like Java or C#), but by its properties. This means that the same piece of data can match many types.
Let's have a look at an example: We can explicitly tell TypeScript which shape
we're expecting this object, damian
, to have. If the structure wouldn't match
the type, TypeScript would yell at us.
const damian: User = {
firstName: 'Damian',
lastName: 'Sipes',
registered: '12.12.2001',
};
Just to re-iterate, we're not instantiating a User
here, we're just giving
a type hint. damian
is just labeled data. Poor Damian. To show the use,
let's define a function that explicitly takes a user and returns its
firstName
.
const firstName = (user: User): string => user.firstName;
console.log(firstName(damian));
output
Damian
Damians friend Mara doesn't like to be labeled:
const mara = {
firstName: 'Mara',
lastName: 'Homenick',
registered: '08.14.2007',
};
TypeScript doesn't care, mara
matches the shape of a User
, so the following
code is perfectly valid and reasonable:
console.log(firstName(mara));
output
Mara
Yet another friend of theirs, Anna, does not feel like a User
at all,
they're off, creating their own type, with black jack and hookers:
type Person = {
firstName: string;
lastName: string;
registered: string;
};
const anna: Person = {
firstName: 'Anna',
lastName: 'Freeman',
registered: '07.09.2003',
};
But, if it has a firstName
, a lastName
and registered
, it can be used
like a User
.
console.log(firstName(anna));
output
Anna
Noteworthy: Our function doesn't even need to take a User
, we only really
care about the firstName
property, so we can
destructure
it from the input. Now we've signaled to the caller that we care even less
about the label of the input, as long as it contains a firstName
of type
string
.
const firstName = ({firstName}: {firstName: string}): string => input.firstName;
console.log(firstName(anna));
output
Anna
How concrete or general we are in defining our input types is entirely up to us. Each comes with benefits and trade offs we will explore in chapters to come.
Intermission
By now the opaque window of the developer room has cleared up. You get up, lacking the coffee to motivate yourself to continue. In passing, you hear Jim muttering curses under his breath. Barbara seems to hold on to her temples for dear life, staring in disbelief at the juniors code from past week, while said junior furiously researches ways to center divs.
As you sit down again, armed with fresh coffee ready to directly inject it into your blood stream, your colleague Jim (who works on a similar part of the code) erupts: "Done. Pushed. Merged."
You raise your eyebrows, concerned, as you pull in Jims changes.
Procedural impurity and shared complexity
Before we write our own model, let's have a look at Jims code.
type User = {
firstName: string;
lastName: string;
registered: string;
+ shortName?: string;
};
+
+const usersToSidebar = () => {
+ for (const user of users) {
+ // get the year, e.g. '2003'
+ user.registered = user.registered.slice(6);
+ // e.g. 'a.smith'
+ user.shortName = `${user.firstName[0].toLowerCase()}.${user.lastName.toLowerCase()}`;
+ }
+};
+
+usersToSidebar();
We didn't really listen in the stand-up, so we don't know precisely what Jims feature is supposed to do, but we can assume from the comments. Let's have a look at the result.
console.log(users);
output
[
{
firstName: 'Barbara',
lastName: 'Selling',
registered: '2017',
shortName: 'b.selling'
},
{
firstName: 'John',
lastName: 'Smith',
registered: '2019',
shortName: 'j.smith'
},
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: '2011',
shortName: 'f.helmsworth'
},
{
firstName: 'Anna',
lastName: 'Freeman',
registered: '2003',
shortName: 'a.freeman'
},
{
firstName: 'Damian',
lastName: 'Sipes',
registered: '2001',
shortName: 'd.sipes'
},
{
firstName: 'Mara',
lastName: 'Homenick',
registered: '2007',
shortName: 'm.homenick'
}
]
This may work for Jim, but we have a problem. Jims procedure freely mutates
the data (i.e. changes it in place). If we want to derive our own view model
from users
, we would have to do it before Jims procedure runs, otherwise
his mutation of specifically the registered
field makes it impossible to
derive our own transformation of the field.
Note: We're intentionally calling
usersToSidebar
a procedure. There is a critical difference between a function and a procedure. A function merely takes values as inputs and returns values as outputs -- all without affecting its environment, e.g. by changing inputs in-place. A procedure may run a series of statements, changing its environment, its inputs, having side effects other than returning values.
Let's explore this issue a bit more. We'll define our own model-deriving procedure and see what happens.
First, we extend User
with an optional initials
field.
type User = {
firstName: string;
lastName: string;
registered: string;
shortName?: string;
+ initials?: string;
};
Then we're adding our own procedure and execute it after Jims.
const usersToAdmin = () => {
for (const user of users) {
// resolves to format: EEEE, d MMMM YYYY
// e.g. "Wednesday, 20 June 2019"
user.registered = new Date(user.registered).toLocaleDateString(
'en-gb',
{
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
},
);
// e.g. 'AF'
user.initials = `${user.firstName[0]}${user.lastName[0]}`;
}
};
usersToSidebar();
usersToAdminView();
console.log(users);
output
[
{
firstName: 'Barbara',
lastName: 'Selling',
registered: 'Sunday, 1 January 2017',
shortName: 'b.selling',
initials: 'BS'
},
{
firstName: 'John',
lastName: 'Smith',
registered: 'Tuesday, 1 January 2019',
shortName: 'j.smith',
initials: 'JS'
},
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: 'Saturday, 1 January 2011',
shortName: 'f.helmsworth',
initials: 'FH'
},
{
firstName: 'Anna',
lastName: 'Freeman',
registered: 'Wednesday, 1 January 2003',
shortName: 'a.freeman',
initials: 'AF'
},
{
firstName: 'Damian',
lastName: 'Sipes',
registered: 'Monday, 1 January 2001',
shortName: 'd.sipes',
initials: 'DS'
},
{
firstName: 'Mara',
lastName: 'Homenick',
registered: 'Monday, 1 January 2007',
shortName: 'm.homenick',
initials: 'MH'
}
]
Yikes, look at registered
. Seems they are all fixed on 1 January
because
our procedure transforms registered
after it was cut down to just the
year by usersToSidebar
. Let's flip the order of invocation, just to see if
that would fix it.
-usersToSidebar();
usersToAdminView();
+usersToSidebar();
console.log(users);
output
[
{
firstName: 'Barbara',
lastName: 'Selling',
registered: 'y, 3 January 2017',
initials: 'BS',
shortName: 'b.selling'
},
{
firstName: 'John',
lastName: 'Smith',
registered: 'y, 24 December 2019',
initials: 'JS',
shortName: 'j.smith'
},
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: 'day, 11 May 2011',
initials: 'FH',
shortName: 'f.helmsworth'
},
{
firstName: 'Anna',
lastName: 'Freeman',
registered: 'day, 9 July 2003',
initials: 'AF',
shortName: 'a.freeman'
},
{
firstName: 'Damian',
lastName: 'Sipes',
registered: 'day, 12 December 2001',
initials: 'DS',
shortName: 'd.sipes'
},
{
firstName: 'Mara',
lastName: 'Homenick',
registered: 'y, 14 August 2007',
initials: 'MH',
shortName: 'm.homenick'
}
]
Oof. Completely broken. What we're seeing here is the cost of shared, mutable state. In our procedures we are reaching into the global scope and changing data in place that is used in other places. We're not preserving the integrity of the data. Like a spoiled toddler, we're rampaging through the sweets section of the supermarket, trained on that unearned treat, leaving a trail of destruction in our wake.
And we've introduced another problem on the type level: because we keep changing
our original data in place, we have to stuff all our new fields into the
original type. User
would grow in size if we kept adding new fields for
totally unrelated views.
We can imagine that writing an entire application in this style of unmanaged state mutation is an explosion of complexity.
Fortunately, we have a simple solution at hand: the function. Specifically, the pure function.
Out of the tar pit
You know what is fundamentally simple? A table:
key | value |
---|---|
a | 1 |
b | 99 |
c | 1000 |
d | 99999 |
So simple and so utterly boring. It's so dull, I hesitate to talk about it. But we need to, in order to make a point. So here we go:
- This table has two columns, a key and a value column.
- We can look up values via its key.
- Though the rows may grow, at the time of accessing it, all values and their types are known.
Know what behaves like a table? A pure function!
const f = (key: 'a' | 'b' | 'c' | 'd') =>
key === 'a' ? 1
: key === 'b' ? 99
: key === 'c' ? 1000
: 99999;
- The pure function has two sets of values, input (
'a' | 'b' | 'c' | 'd'
) and output (1 | 99 | 1000 | 99999
). - We can look up output values by passing input values
- Though we may add values to input and output sets, at the time of accessing it, all values and their types are known.
Noteworthy, a function like this does not share any of the problems of procedures:
- It does not reach into its parent scope, all dependencies of it are declared right there in the parameters.
- It does not mutate the values it is working with, it effectively only maps outputs to inputs.
- If you run it, you don't need to fear side effects. It just returns values.
If we can somehow rewrite our problematic procedures so that we get these benefits we have improved the comprehensibility of our program immensely. It is quite simple, actually:
- Instead of reaching into the parent scope, explicitly define inputs as parameters.
- Instead of mutating inputs, treat them as immutable data and create copies.
- Instead of storing the outputs in shared state, return them.
Let's do that for usersToAdmin
and usersToSidebar
:
const usersToAdmin = (users: User[]) => {
const result = [];
for (const user of users) {
result.push({
...user,
// resolves to format: EEEE, d MMMM YYYY
// e.g. "Wednesday, 20 June 2019"
registered: new Date(user.registered).toLocaleDateString('en-gb', {
weekday: 'long',
year: 'numeric',
month: 'long',
day: 'numeric',
}),
// e.g. 'AF'
initials: `${user.firstName[0]}${user.lastName[0]}`,
});
}
return result;
};
const usersToSidebar = (users: User[]) => {
const result = [];
for (const user of users) {
result.push({
...user,
// get the year, e.g. '2003'
registered: user.registered.slice(6),
// e.g. 'a.smith'
shortName: `${user.firstName[0].toLowerCase()}.${user.lastName.toLowerCase()}`,
});
}
return result;
};
const sidebarUsers = usersToSidebar(users);
const adminUsers = usersToAdmin(users);
console.log({sidebarUsers, adminUsers});
output
{
sidebarUsers: [
{
firstName: 'Barbara',
lastName: 'Selling',
registered: '2017',
shortName: 'b.selling'
},
{
firstName: 'John',
lastName: 'Smith',
registered: '2019',
shortName: 'j.smith'
},
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: '2011',
shortName: 'f.helmsworth'
},
{
firstName: 'Anna',
lastName: 'Freeman',
registered: '2003',
shortName: 'a.freeman'
},
{
firstName: 'Damian',
lastName: 'Sipes',
registered: '2001',
shortName: 'd.sipes'
},
{
firstName: 'Mara',
lastName: 'Homenick',
registered: '2007',
shortName: 'm.homenick'
}
],
adminUsers: [
{
firstName: 'Barbara',
lastName: 'Selling',
registered: 'Tuesday, 3 January 2017',
initials: 'BS'
},
{
firstName: 'John',
lastName: 'Smith',
registered: 'Tuesday, 24 December 2019',
initials: 'JS'
},
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: 'Wednesday, 11 May 2011',
initials: 'FH'
},
{
firstName: 'Anna',
lastName: 'Freeman',
registered: 'Wednesday, 9 July 2003',
initials: 'AF'
},
{
firstName: 'Damian',
lastName: 'Sipes',
registered: 'Wednesday, 12 December 2001',
initials: 'DS'
},
{
firstName: 'Mara',
lastName: 'Homenick',
registered: 'Tuesday, 14 August 2007',
initials: 'MH'
}
]
}
Gorgous. And the original data is untouched:
console.log(users);
output
[
{
firstName: 'Barbara',
lastName: 'Selling',
registered: '01.03.2017'
},
{ firstName: 'John', lastName: 'Smith', registered: '12.24.2019' },
{
firstName: 'Frank',
lastName: 'Helmsworth',
registered: '05.11.2011'
},
{ firstName: 'Anna', lastName: 'Freeman', registered: '07.09.2003' },
{ firstName: 'Damian', lastName: 'Sipes', registered: '12.12.2001' },
{ firstName: 'Mara', lastName: 'Homenick', registered: '08.14.2007' }
]
Our functions are now practically pure (not technically pure, but pure for all we care about right now). We can run them in any order, repeatedly. They will return the same values every time.
We can also fix the type issue we talked about: Instead of stuffing fields into
the User
type, we can make our new models explicit by defining separate
types for them:
type AdminUser = User & {
initials: string;
};
type SidebarUser = User & {
shortName: string;
};
And adding them to our function signatures:
-const usersToAdmin = (users: User[]) => {
+const usersToAdmin = (users: User[]): AdminUser[] => {
-const usersToSidebar = (users: User[]) => {
+const usersToSidebar = (users: User[]): SidebarUser[] => {
Note: We're being somewhat naive here, forcing immutability by aggressively copying data. While individually, the cost is small, in a large application the memory overhead will add up. Though we can alleviate some of the cost by using libraries that provide data structures made specifically for enabling immutability (such as immer.js), the memory overhead is a trade-off functional programmers are willing to make.
There are lots of opportunity for refactoring here which we will explore in the very next chapter, but we can be happy with the progress we've made so far!
Next Up
In the next chapter we will refactor our code, removing some redundancies by exploring the idea of using functions as values.