Enhancing the Python ecosystem with type checking and free threading
Meta and Quantsight have improved key libraries in the Python Ecosystem. There is plenty more to do and we invite the community to help with our efforts.
We’ll look at two key efforts in Python’s packaging ecosystem to make packages faster and easier to use:
Unlock performance wins for developers through free-threaded Python – where we leverage Python 3.13’s support for concurrent programming (made possible by removing the Global Interpreter Lock (GIL)).
Increase developer velocity in the IDE with improved type annotations.
Enhancing typed Python in the Python scientific stack
Type hints, introduced in Python 3.5 with PEP-484, allow developers to specify variable types, enhancing code understanding without affecting runtime behavior. Type-checkers validate these annotations, helping prevent bugs and improving IDE functions like autocomplete and jump-to-definition. Despite their benefits, adoption is inconsistent across the open source ecosystem, with varied approaches to specifying and maintaining type annotations.
The landscape of open source software is fractured with respect to how type annotations are specified, maintained, and distributed to end users. Some projects have in-line annotations (types directly declared in the source code directly), others keep types in stub files, and many projects have no types at all, relying on third party repositories such as the typeshed to provide community-maintained stubs. Each approach has its own pros and cons, but application and maintenance of them has been inconsistent.
Meta and Quansight are addressing this inconsistency through:
- Direct contributions: We have improved the type coverage for pandas-stubs and numpy, and are eager to expand the effort to more packages.
- Community engagement: Promoting type annotation efforts to encourage community involvement, listen to feedback and create actionable ways to improve the ecosystem.
- Tooling and automation: Developing tools to address common challenges adding types and keeping the types up-to-date with the source code.
Improved type annotations in pandas
TL;DR: Pandas is the second most downloaded package from the Python scientific stack. We improved pandas-stubs package type annotation coverage from 36% to over 50%.
Background
The pandas community maintains its own stubs in a separate repository, which must be installed to obtain type annotations. While these stubs are checked separately from the source code, it allows the community to use types with their own type checking and IDE.
Improving type coverage
When we began our work in pandas-stubs, coverage was around 36%, as measured by the percentage of parameters, returns, and attributes that had a complete type annotation (the annotation is present and all generics have type arguments). After several weeks of work and about 30 PRs, type completeness is now measured at over 50%. The majority of our contributions involved adding annotations to previously-untyped parameters, adding type arguments to raw generic types, and removing deprecated/undocumented interfaces. We also improved several inaccurate annotations and updated others to match the inline annotations in the pandas source code.
Key introductions
Two key introductions significantly increased coverage:
- Replacing raw Series types with UnknownSeries, a new type aliased to Series[Any]. When applied to return type annotations, this reduces the number of type checker false-positives when the function is called.
- Improving types of core Dataframe operations like insert, combine, replace, transpose, and assign, as well as many timestamp and time-zone related APIs.
Tooling development
In addition to improving coverage directly, we developed tooling to catalog public interfaces missing annotations. We also augmented our tools for measuring type coverage to handle the situation where stubs are distributed independently, rather than being packaged into the core library wheel.
What is free-threaded Python ?
Free-threaded Python (FTP) is an experimental build of CPython that allows multiple threads to interact with the VM in parallel. Previously, access to the VM required holding the global interpreter lock (GIL), thereby serializing execution of concurrently running threads. With the GIL becoming optional, developers will be able to take full advantage of multi-core processors and write truly parallel code.
Benefits of free-threaded Python
The benefits of free-threaded Python are numerous:
- True parallelism in a single process: With the GIL removed, developers can write Python code that takes full advantage of multi-core processors without needing to use multiple processes. CPU-bound code can execute in parallel across multiple cores.
- Improved performance: By allowing multiple threads to execute Python code simultaneously, work can be effectively distributed across multiple threads inside a single process.
- Simplified concurrency: Free-threading provides developers with a more ergonomic way to write parallel programs in Python. Gone are the days of needing to use multiprocessing.Pool and/or resorting to custom shared memory data structures to efficiently share data between worker processes.
Getting Python’s ecosystem ready for FTP
The ecosystem of Python packages must work well with free-threaded Python in order for it to be practically useful; application owners can’t use free-threading unless their dependencies work well with it. To that end, we have been taking a “bottoms up” approach to tackle the most difficult/popular packages in the ecosystem. We’ve added free-threading support to many of the most popular packages used for scientific computing (e.g. numpy, scipy, scikit-learn) and language bindings (e.g. Cython, nanobind, pybind, PyO3).
Just getting started
Together, we made substantial progress in improving type annotations and free-threading compatibility in Python libraries. We couldn’t have done it without the Python community and are asking others to join our efforts. Whether it’s further updates to the type annotations or preparing your code for FTP, we value your help moving the Python ecosystem forward!
To learn more about Meta Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Facebook, Threads, X and LinkedIn.
The post Enhancing the Python ecosystem with type checking and free threading appeared first on Engineering at Meta.