Skip to content

Unsoundness with typing.IO and friends #1994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JelleZijlstra opened this issue May 6, 2025 · 3 comments
Open

Unsoundness with typing.IO and friends #1994

JelleZijlstra opened this issue May 6, 2025 · 3 comments
Labels
topic: other Other topics not covered

Comments

@JelleZijlstra
Copy link
Member

The classes typing.IO, typing.BinaryIO, and typing.TextIO look like they want to be Protocols or ABCs, but in fact they are defined as regular generic classes, both at runtime and in typeshed. However, they are meant to encompass the concrete IO classes defined in the io module, and in typeshed we implement that by having these classes inherit from typing.*IO classes, even though there is no such inheritance at runtime.

This can easily lead to unsound behavior:

import io, typing

def f(x: int | io.BytesIO) -> int:
    if isinstance(x, typing.BinaryIO):
        return x.fileno()
    return x

f(io.BytesIO()) + 1  # boom

Type checkers think BytesIO is a subclass of BinaryIO, because that's how it's defined in typeshed, but in fact it isn't at runtime.

I can see a few solutions:

  1. Special-case typing.*IO in the spec and say that type checkers should reject isinstance()/issubclass() calls involving them.
  2. Deprecate the typing.*IO classes and eventually remove them, nudging people to use their own Protocols (or the new io.Reader/io.Writer) instead. This is conceptually clean but may be annoying for a lot of users; the typing classes are nice to use in simple application code.
  3. Make these classes actually (runtime-checkable?) Protocols at runtime, though they would be unwieldily large.

Even if we do (2) or (3) type checkers might still want to do (1) since it will be a while before the relevant runtime changes take effect.

(Noticed this while looking into python/cpython#133492 .)

@JelleZijlstra JelleZijlstra added the topic: other Other topics not covered label May 6, 2025
@srittau
Copy link
Collaborator

srittau commented May 6, 2025

I still tend towards (2), although easy protocol composition would go a long way here. Reader and Writer were originally intended to be broader to encompass most cases, but became smaller in the process (which is probably for the best). But in python/cpython#133492 I suggested that we add a fairly broad protocol to _typeshed for now, maybe that's a good first step.

I also think we should looking into deprecating BinaryIO and TextIO first, before looking into IO. BinaryIO is basically identical with IO[bytes] (except that IO[bytes].__enter__ return IO not BinaryIO). TextIO has a few more methods, but they all look highly situational.

@cmaloney
Copy link

For these is there any way to differentiating/expose the difference in .write() behavior between "Raw I/O" (ex. io.FileIO) vs. high level I/O (BufferedIO and TextIO)? For "Raw I/O" if the write() system call only writes part of the buffer, it isn't retried while higher level I/O will loop until either the whole buffer is written or an exception is raied (non blocking fds have extra caveats).

@JelleZijlstra
Copy link
Member Author

I don't think so; in general typing can only cover information in the method's interface (what parameters it take and of what types, and what it returns), not more subtle aspects of behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: other Other topics not covered
Projects
None yet
Development

No branches or pull requests

3 participants