Background

Pretty much everybody thinks that typing things as Any is bad.

If you ask people around you why they think so, though, you probably won’t get answers that go beyond generic statements about ‘maintainability’ and ‘readability’.

Not that these answers are wrong, but I don’t think they touch the meat of the discussion around type systems and how they help programmers program better. It shows a lack of understanding, or perhaps familiarity with a core element of modern programming languages: The type system.

Just like other elements of programming languages, the type system is a tool that programmers should utilize to help them produce less error-prone programs in a shorter amount of time.

Unfortunately, with the popularity of dynamically-typed languages like Python and JavaScript, the average modern day programmer’s knowledge on how to use this one tool in particular seems to be a bit lacking.

It is also very unfortunate that there is a very high barrier of entry to many great articles on type system usage such as:

“Parse, don’t validate” by Alexis King
“Type Safety Back and Forth” by Mark Parsons
“Make invalid states unrepresentable” by Jacob Lindahl

At a glance, these articles seem to assume that the reader already has knowledge of various comp-sci concepts (in particular functional programming) prior to reading. Many of the examples in these articles are even written in functional programming languages!

This creates an audience disconnect, as the average programmer (who may come from a non-comp-sci background), being part of the population who would benefit the most from an understanding of the concepts being discussed, would struggle to even begin to read through the articles discussing those concepts.

In this article, I’d like to share my point of view on the matter, in the hopes that it would be more accessible to the average programmer less concerned with theoretical what-ifs and more with practical benefits in their day-to-day job.

Why is Any bad?

I really like the phrase ‘think from first principles’, and I think starting from a common ground everybody is familiar with leads to a more intuitive understanding of the concepts being discussed.

Recall the first and second paragraphs of this article. Exactly why is Any bad? Let’s start with a straightforward example.

def get_user_roles(users: Any, groups: Any) -> Any:
    # ...

What should we pass to use this function? What does this function return? What happens if we pass the wrong thing?

There is actually a very easy way to answer every single one of these questions: Just read the code! However…

This takes a lot of time! Time that would be better spent somewhere else.
It requires a context switch in your brain to understand code in an entirely different level of abstraction to what you were working on before.
Most crucially: The human brain can only fit so much information before it starts forgetting things.

Typing things as Any causes this function signature to violate the ‘black box rule’ of abstractions, which means you cannot just treat the function as a ‘black box’ that you can just use without having to also understand its internals!

There is the separate problem of your IDE not being able to warn you when you pass the wrong thing into the function, but we’ll talk about that later.

So, does replacing Any with the following basic type annotations solve the problem?

def get_user_roles(
    users: list,
    groups: list,
) -> dict:
    # ...

I think it should be self-evident here that the answer is no.

By this point, it should be clear that the core issue here is not typing things as Any.

Continuing on

As discussed in the previous section, typing things as Any causes two core issues that you as a programmer should be concerned about:

You need to read through the function to understand what it needs and what it returns.
The IDE will not catch mistakes and issue warnings and errors.

The above list is incomplete, but let’s not think too hard about things at the moment and try to focus on solving these two issues first.

Let’s continue on and add full types to this function signature and see if these two problems are fixed.

def get_user_roles(
    users: list[str],
    groups: list[str],
) -> dict[Tuple[str, str], str]:
    # ...

Now we know that the function:

Accepts a list of users and groups in string format.
Returns a mapping of combinations of one user and a group they belong to, and their role in that group.

As a bonus, the IDE now helps us know if we’ve passed anything that is not a str to the function!

Now that we have ‘full’ type-hinting, are the two problems fully solved? Unfortunately, the answer is still no. At this point, the problems themselves are not even that obvious.

Does the function behave correctly with duplicate users? What about duplicate groups?
What kinds of strings are users and groups supposed to be?
- Is users a list of emails? IDs? Usernames?
- Is groups a list of IDs? Slugs? Names?
- Let’s assume we have rules of what IDs, slugs, and usernames should be (e.g. 1-20 characters, lowercase only). Does this function behave correctly with non-conforming strings?
In the mapping being returned, is the correct mapping key (user, group) or (group, user)?
Similar to groups, the roles in the returned mapping are of type str. Is this the ID of a role? The name? Perhaps a stringified JSON of the role details?

These are all very valid questions, and making the wrong assumption of the answer to any one of these questions will lead to a non-working program in the best case, or worse a program that appears to work but breaks in spectacular ways later on.

As you can see, not even ‘full’ typing solves the problems entirely! We still have to read through the function to answer most of the questions above!

For the sake of this article, let’s answer all of the above questions here:

The function takes non-duplicate users and groups.
The function expects only validated strings of the correct format.
- users expects a list of usernames.
- groups expects a list of slugs.
The correct mapping key is (username, group).
The returned role is a str with the ID of a role.

How do we help potential users of the function to understand the above answers without having them read through the entire function?

A lot of programmers will probably answer with “Just leave a comment,” or “Write a docstring,” but I heavily disagree. Why not?

The second problem is still there. Even with comments and documentation, the IDE will not warn you when:

You pass duplicate users or groups to the function.
You pass a non-validated username or slug.
You use (group, username) as the key to access a value in the returned mapping.

Of course, the “Just be careful,” camp will always be there, and indeed if you are careful and keep the answers to all of the above in mind, you will probably end up with a working program somehow.

The problem here is one I previously mentioned: The human brain can only fit so much information before it starts forgetting things.

If the only thing you were working on was this function, then all will probably be well. Imagine however, that you were working on an entirely separate level of abstraction (e.g. a Django template view that renders a user/group role table) before context switching and reading this function. Not feeling very confident now?

Beyond types as documentation

Up until now, we have been treating the type system as a mere documentation tool. That is, the types in the program only serve to tell the human programmer what to do. It does not prevent the human programmer from doing something that should not be done.

The following is an example of what taking ‘types as documentation’ to the extreme looks like:

Username = str
GroupStub = str
RoleId = str

def get_user_roles(
    users: list[Username],
    groups: list[GroupStub],
) -> dict[Tuple[Username, GroupStub], RoleId]:
    # ...

What this does, is simply convert the comments/docstrings of the previously discussed answers to annotations in the program! This approach is almost no better than just leaving comments, as you can still pass any string to the function without the IDE warning you.

Where do we go next? First, let’s focus on the problem of duplicate users and groups. We can use set to force the function to only accept non-duplicate objects.

def get_user_roles(
    users: set[Username],
    groups: set[GroupStub],
) -> dict[Tuple[Username, GroupStub], RoleId]:
    # ...

Great! That’s one problem solved.

This approach is described in Type Safety Back and Forth as ‘pushing the responsibility for failure back’ to the caller of the function.

What more can we do? We can use the approach described in Parse, don’t validate and setup classes that ‘parse’ strings into valid usernames and group stubs. We can then use those classes in the function type annotations to only allow valid usernames and group stubs!

Here is the updated version of our function with the relevant classes, using the so-called ‘smart constructor’ pattern taken from Haskell to simulate distinct types and prevent intermixing of semantically different objects:

class Username(str):
    def __new__(cls, username: str):
        if not 1 <= len(username) <= 20:
            raise ValueError("Length must be in [1, 20]")
        if not username.lower() == username:
            raise ValueError("Contents must be lowercase")
        return super().__new__(cls, username)

class GroupStub(str):
    def __new__(cls, stub: str):
        if not 1 <= len(stub) <= 20:
            raise ValueError("Length must be in [1, 20]")
        if not stub.lower() == stub:
            raise ValueError("Contents must be lowercase")
        return super().__new__(cls, stub)

RoleId = str

def get_user_roles(
    users: set[Username],
    groups: set[GroupStub],
) -> dict[Tuple[Username, GroupStub], RoleId]:
    # ...

Now, we are guaranteed that:

The function will only ever accept unique values (enforced by set).
The function will only ever accept validated values.
The returned dict will only ever accept (Username, GroupStub) as key.

The IDE will warn us if we pass a list to the function (may contain duplicates), and it will also tell us to first construct a Username or GroupStub from our str before passing it to the function!

This new function signature now conveys a lot of information while also helping the IDE catch many accidental mistakes.

Closing

I hope this article was approachable!

We’ve gone from Any to meaningful types for our example function, but there are a lot of places where similar modifications could be done in real code.

As always, consider whether the time you spend enforcing semantically-correct typing is worth it. A throwaway script does not need 100% perfect types, and PoCs could of course skip some of the extra checks in the name of speed.

Anyway, if you’re interested and are willing to read through more complex, theoretical articles, I would recommend clicking and reading through all the external links in this article. Everything in this blog is inspired from at least one such article.

That will be all for this article. Thank you for reading through the end!

Types: Beyond “Any is Bad”

Background

Why is Any bad?

Continuing on

Beyond types as documentation

Closing

Leave a Reply Cancel reply