Skip to content

Commit 553d40f

Browse files
ambvencukousethmlarsonAA-Turnerserhiy-storchaka
committed
pythongh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') (python#135037)
Addresses CVEs 2024-12718, 2025-4138, 2025-4330, and 2025-4517. Signed-off-by: Łukasz Langa <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Seth Michael Larson <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 3612d8f)
1 parent 78fd7ce commit 553d40f

File tree

11 files changed

+967
-170
lines changed

11 files changed

+967
-170
lines changed

Doc/library/os.path.rst

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -408,9 +408,26 @@ the :mod:`glob` module.)
408408
system). On Windows, this function will also resolve MS-DOS (also called 8.3)
409409
style names such as ``C:\\PROGRA~1`` to ``C:\\Program Files``.
410410

411-
If a path doesn't exist or a symlink loop is encountered, and *strict* is
412-
``True``, :exc:`OSError` is raised. If *strict* is ``False`` these errors
413-
are ignored, and so the result might be missing or otherwise inaccessible.
411+
By default, the path is evaluated up to the first component that does not
412+
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
413+
All such components are appended unchanged to the existing part of the path.
414+
415+
Some errors that are handled this way include "access denied", "not a
416+
directory", or "bad argument to internal function". Thus, the
417+
resulting path may be missing or inaccessible, may still contain
418+
links or loops, and may traverse non-directories.
419+
420+
This behavior can be modified by keyword arguments:
421+
422+
If *strict* is ``True``, the first error encountered when evaluating the path is
423+
re-raised.
424+
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
425+
or another :exc:`OSError` if it is otherwise inaccessible.
426+
427+
If *strict* is :py:data:`os.path.ALLOW_MISSING`, errors other than
428+
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
429+
Thus, the returned path will not contain any symbolic links, but the named
430+
file and some of its parent directories may be missing.
414431

415432
.. note::
416433
This function emulates the operating system's procedure for making a path
@@ -429,6 +446,15 @@ the :mod:`glob` module.)
429446
.. versionchanged:: 3.10
430447
The *strict* parameter was added.
431448

449+
.. versionchanged:: next
450+
The :py:data:`~os.path.ALLOW_MISSING` value for the *strict* parameter
451+
was added.
452+
453+
.. data:: ALLOW_MISSING
454+
455+
Special value used for the *strict* argument in :func:`realpath`.
456+
457+
.. versionadded:: next
432458

433459
.. function:: relpath(path, start=os.curdir)
434460

Doc/library/tarfile.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,15 @@ The :mod:`tarfile` module defines the following exceptions:
255255
Raised to refuse extracting a symbolic link pointing outside the destination
256256
directory.
257257

258+
.. exception:: LinkFallbackError
259+
260+
Raised to refuse emulating a link (hard or symbolic) by extracting another
261+
archive member, when that member would be rejected by the filter location.
262+
The exception that was raised to reject the replacement member is available
263+
as :attr:`!BaseException.__context__`.
264+
265+
.. versionadded:: next
266+
258267

259268
The following constants are available at the module level:
260269

@@ -1068,6 +1077,12 @@ reused in custom filters:
10681077
Implements the ``'data'`` filter.
10691078
In addition to what ``tar_filter`` does:
10701079

1080+
- Normalize link targets (:attr:`TarInfo.linkname`) using
1081+
:func:`os.path.normpath`.
1082+
Note that this removes internal ``..`` components, which may change the
1083+
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
1084+
symbolic links.
1085+
10711086
- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
10721087
that link to absolute paths, or ones that link outside the destination.
10731088

@@ -1099,6 +1114,10 @@ reused in custom filters:
10991114
Note that this filter does not block *all* dangerous archive features.
11001115
See :ref:`tarfile-further-verification` for details.
11011116

1117+
.. versionchanged:: next
1118+
1119+
Link targets are now normalized.
1120+
11021121

11031122
.. _tarfile-extraction-refuse:
11041123

@@ -1127,6 +1146,7 @@ Here is an incomplete list of things to consider:
11271146
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
11281147
to prevent e.g. exploiting pre-existing links, and to make it easier to
11291148
clean up after a failed extraction.
1149+
* Disallow symbolic links if you do not need the functionality.
11301150
* When working with untrusted data, use external (e.g. OS-level) limits on
11311151
disk, memory and CPU usage.
11321152
* Check filenames against an allow-list of characters

Doc/whatsnew/3.14.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1608,6 +1608,16 @@ os
16081608
(Contributed by Cody Maloney in :gh:`129205`.)
16091609

16101610

1611+
os.path
1612+
-------
1613+
1614+
* The *strict* parameter to :func:`os.path.realpath` accepts a new value,
1615+
:data:`os.path.ALLOW_MISSING`.
1616+
If used, errors other than :exc:`FileNotFoundError` will be re-raised;
1617+
the resulting path can be missing but it will be free of symlinks.
1618+
(Contributed by Petr Viktorin for :cve:`2025-4517`.)
1619+
1620+
16111621
pathlib
16121622
-------
16131623

@@ -1796,6 +1806,28 @@ sysconfig
17961806
(Contributed by Xuehai Pan in :gh:`131799`.)
17971807

17981808

1809+
tarfile
1810+
-------
1811+
1812+
* :func:`~tarfile.data_filter` now normalizes symbolic link targets in order to
1813+
avoid path traversal attacks.
1814+
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2025-4138`.)
1815+
* :func:`~tarfile.TarFile.extractall` now skips fixing up directory attributes
1816+
when a directory was removed or replaced by another kind of file.
1817+
(Contributed by Petr Viktorin in :gh:`127987` and :cve:`2024-12718`.)
1818+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
1819+
now (re-)apply the extraction filter when substituting a link (hard or
1820+
symbolic) with a copy of another archive member, and when fixing up
1821+
directory attributes.
1822+
The former raises a new exception, :exc:`~tarfile.LinkFallbackError`.
1823+
(Contributed by Petr Viktorin for :cve:`2025-4330` and :cve:`2024-12718`.)
1824+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
1825+
no longer extract rejected members when
1826+
:func:`~tarfile.TarFile.errorlevel` is zero.
1827+
(Contributed by Matt Prodani and Petr Viktorin in :gh:`112887`
1828+
and :cve:`2025-4435`.)
1829+
1830+
17991831
threading
18001832
---------
18011833

Lib/genericpath.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
1010
'getsize', 'isdevdrive', 'isdir', 'isfile', 'isjunction', 'islink',
11-
'lexists', 'samefile', 'sameopenfile', 'samestat']
11+
'lexists', 'samefile', 'sameopenfile', 'samestat', 'ALLOW_MISSING']
1212

1313

1414
# Does a path exist?
@@ -189,3 +189,12 @@ def _check_arg_types(funcname, *args):
189189
f'os.PathLike object, not {s.__class__.__name__!r}') from None
190190
if hasstr and hasbytes:
191191
raise TypeError("Can't mix strings and bytes in path components") from None
192+
193+
# A singleton with a true boolean value.
194+
@object.__new__
195+
class ALLOW_MISSING:
196+
"""Special value for use in realpath()."""
197+
def __repr__(self):
198+
return 'os.path.ALLOW_MISSING'
199+
def __reduce__(self):
200+
return self.__class__.__name__

Lib/ntpath.py

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"abspath","curdir","pardir","sep","pathsep","defpath","altsep",
3030
"extsep","devnull","realpath","supports_unicode_filenames","relpath",
3131
"samefile", "sameopenfile", "samestat", "commonpath", "isjunction",
32-
"isdevdrive"]
32+
"isdevdrive", "ALLOW_MISSING"]
3333

3434
def _get_bothseps(path):
3535
if isinstance(path, bytes):
@@ -601,9 +601,10 @@ def abspath(path):
601601
from nt import _findfirstfile, _getfinalpathname, readlink as _nt_readlink
602602
except ImportError:
603603
# realpath is a no-op on systems without _getfinalpathname support.
604-
realpath = abspath
604+
def realpath(path, *, strict=False):
605+
return abspath(path)
605606
else:
606-
def _readlink_deep(path):
607+
def _readlink_deep(path, ignored_error=OSError):
607608
# These error codes indicate that we should stop reading links and
608609
# return the path we currently have.
609610
# 1: ERROR_INVALID_FUNCTION
@@ -636,7 +637,7 @@ def _readlink_deep(path):
636637
path = old_path
637638
break
638639
path = normpath(join(dirname(old_path), path))
639-
except OSError as ex:
640+
except ignored_error as ex:
640641
if ex.winerror in allowed_winerror:
641642
break
642643
raise
@@ -645,7 +646,7 @@ def _readlink_deep(path):
645646
break
646647
return path
647648

648-
def _getfinalpathname_nonstrict(path):
649+
def _getfinalpathname_nonstrict(path, ignored_error=OSError):
649650
# These error codes indicate that we should stop resolving the path
650651
# and return the value we currently have.
651652
# 1: ERROR_INVALID_FUNCTION
@@ -673,25 +674,26 @@ def _getfinalpathname_nonstrict(path):
673674
try:
674675
path = _getfinalpathname(path)
675676
return join(path, tail) if tail else path
676-
except OSError as ex:
677+
except ignored_error as ex:
677678
if ex.winerror not in allowed_winerror:
678679
raise
679680
try:
680681
# The OS could not resolve this path fully, so we attempt
681682
# to follow the link ourselves. If we succeed, join the tail
682683
# and return.
683-
new_path = _readlink_deep(path)
684+
new_path = _readlink_deep(path,
685+
ignored_error=ignored_error)
684686
if new_path != path:
685687
return join(new_path, tail) if tail else new_path
686-
except OSError:
688+
except ignored_error:
687689
# If we fail to readlink(), let's keep traversing
688690
pass
689691
# If we get these errors, try to get the real name of the file without accessing it.
690692
if ex.winerror in (1, 5, 32, 50, 87, 1920, 1921):
691693
try:
692694
name = _findfirstfile(path)
693695
path, _ = split(path)
694-
except OSError:
696+
except ignored_error:
695697
path, name = split(path)
696698
else:
697699
path, name = split(path)
@@ -721,24 +723,32 @@ def realpath(path, *, strict=False):
721723
if normcase(path) == devnull:
722724
return '\\\\.\\NUL'
723725
had_prefix = path.startswith(prefix)
726+
727+
if strict is ALLOW_MISSING:
728+
ignored_error = FileNotFoundError
729+
strict = True
730+
elif strict:
731+
ignored_error = ()
732+
else:
733+
ignored_error = OSError
734+
724735
if not had_prefix and not isabs(path):
725736
path = join(cwd, path)
726737
try:
727738
path = _getfinalpathname(path)
728739
initial_winerror = 0
729740
except ValueError as ex:
730741
# gh-106242: Raised for embedded null characters
731-
# In strict mode, we convert into an OSError.
742+
# In strict modes, we convert into an OSError.
732743
# Non-strict mode returns the path as-is, since we've already
733744
# made it absolute.
734745
if strict:
735746
raise OSError(str(ex)) from None
736747
path = normpath(path)
737-
except OSError as ex:
738-
if strict:
739-
raise
748+
except ignored_error as ex:
740749
initial_winerror = ex.winerror
741-
path = _getfinalpathname_nonstrict(path)
750+
path = _getfinalpathname_nonstrict(path,
751+
ignored_error=ignored_error)
742752
# The path returned by _getfinalpathname will always start with \\?\ -
743753
# strip off that prefix unless it was already provided on the original
744754
# path.

Lib/posixpath.py

Lines changed: 33 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
"samefile","sameopenfile","samestat",
3737
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
3838
"devnull","realpath","supports_unicode_filenames","relpath",
39-
"commonpath", "isjunction","isdevdrive"]
39+
"commonpath", "isjunction","isdevdrive","ALLOW_MISSING"]
4040

4141

4242
def _get_sep(path):
@@ -402,10 +402,18 @@ def realpath(filename, *, strict=False):
402402
curdir = '.'
403403
pardir = '..'
404404
getcwd = os.getcwd
405-
return _realpath(filename, strict, sep, curdir, pardir, getcwd)
405+
if strict is ALLOW_MISSING:
406+
ignored_error = FileNotFoundError
407+
strict = True
408+
elif strict:
409+
ignored_error = ()
410+
else:
411+
ignored_error = OSError
412+
413+
lstat = os.lstat
414+
readlink = os.readlink
415+
maxlinks = None
406416

407-
def _realpath(filename, strict=False, sep=sep, curdir=curdir, pardir=pardir,
408-
getcwd=os.getcwd, lstat=os.lstat, readlink=os.readlink, maxlinks=None):
409417
# The stack of unresolved path parts. When popped, a special value of None
410418
# indicates that a symlink target has been resolved, and that the original
411419
# symlink path can be retrieved by popping again. The [::-1] slice is a
@@ -477,27 +485,28 @@ def _realpath(filename, strict=False, sep=sep, curdir=curdir, pardir=pardir,
477485
path = newpath
478486
continue
479487
target = readlink(newpath)
480-
except OSError:
481-
if strict:
482-
raise
483-
path = newpath
488+
except ignored_error:
489+
pass
490+
else:
491+
# Resolve the symbolic link
492+
if target.startswith(sep):
493+
# Symlink target is absolute; reset resolved path.
494+
path = sep
495+
if maxlinks is None:
496+
# Mark this symlink as seen but not fully resolved.
497+
seen[newpath] = None
498+
# Push the symlink path onto the stack, and signal its specialness
499+
# by also pushing None. When these entries are popped, we'll
500+
# record the fully-resolved symlink target in the 'seen' mapping.
501+
rest.append(newpath)
502+
rest.append(None)
503+
# Push the unresolved symlink target parts onto the stack.
504+
target_parts = target.split(sep)[::-1]
505+
rest.extend(target_parts)
506+
part_count += len(target_parts)
484507
continue
485-
# Resolve the symbolic link
486-
if target.startswith(sep):
487-
# Symlink target is absolute; reset resolved path.
488-
path = sep
489-
if maxlinks is None:
490-
# Mark this symlink as seen but not fully resolved.
491-
seen[newpath] = None
492-
# Push the symlink path onto the stack, and signal its specialness
493-
# by also pushing None. When these entries are popped, we'll
494-
# record the fully-resolved symlink target in the 'seen' mapping.
495-
rest.append(newpath)
496-
rest.append(None)
497-
# Push the unresolved symlink target parts onto the stack.
498-
target_parts = target.split(sep)[::-1]
499-
rest.extend(target_parts)
500-
part_count += len(target_parts)
508+
# An error occurred and was ignored.
509+
path = newpath
501510

502511
return path
503512

0 commit comments

Comments
 (0)