Skip to content

Commit a12f367

Browse files
ambvencukousethmlarsonAA-Turnerserhiy-storchaka
authored andcommitted
[3.9] pythongh-135034: Normalize link targets in tarfile, add os.path.realpath(strict='allow_missing') (pythonGH-135037)
Addresses CVEs 2024-12718, 2025-4138, 2025-4330, and 2025-4517. (cherry picked from commit 3612d8f) Co-authored-by: Łukasz Langa <[email protected]> Signed-off-by: Łukasz Langa <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Seth Michael Larson <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
1 parent 00af979 commit a12f367

File tree

11 files changed

+1041
-132
lines changed

11 files changed

+1041
-132
lines changed

Doc/library/os.path.rst

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -351,10 +351,26 @@ the :mod:`glob` module.)
351351
links encountered in the path (if they are supported by the operating
352352
system).
353353

354-
If a path doesn't exist or a symlink loop is encountered, and *strict* is
355-
``True``, :exc:`OSError` is raised. If *strict* is ``False``, the path is
356-
resolved as far as possible and any remainder is appended without checking
357-
whether it exists.
354+
By default, the path is evaluated up to the first component that does not
355+
exist, is a symlink loop, or whose evaluation raises :exc:`OSError`.
356+
All such components are appended unchanged to the existing part of the path.
357+
358+
Some errors that are handled this way include "access denied", "not a
359+
directory", or "bad argument to internal function". Thus, the
360+
resulting path may be missing or inaccessible, may still contain
361+
links or loops, and may traverse non-directories.
362+
363+
This behavior can be modified by keyword arguments:
364+
365+
If *strict* is ``True``, the first error encountered when evaluating the path is
366+
re-raised.
367+
In particular, :exc:`FileNotFoundError` is raised if *path* does not exist,
368+
or another :exc:`OSError` if it is otherwise inaccessible.
369+
370+
If *strict* is :py:data:`os.path.ALLOW_MISSING`, errors other than
371+
:exc:`FileNotFoundError` are re-raised (as with ``strict=True``).
372+
Thus, the returned path will not contain any symbolic links, but the named
373+
file and some of its parent directories may be missing.
358374

359375
.. note::
360376
This function emulates the operating system's procedure for making a path
@@ -373,6 +389,15 @@ the :mod:`glob` module.)
373389
.. versionchanged:: 3.9.23
374390
The *strict* parameter was added.
375391

392+
.. versionchanged:: next
393+
The :py:data:`~os.path.ALLOW_MISSING` value for the *strict* parameter
394+
was added.
395+
396+
.. data:: ALLOW_MISSING
397+
398+
Special value used for the *strict* argument in :func:`realpath`.
399+
400+
.. versionadded:: next
376401

377402
.. function:: relpath(path, start=os.curdir)
378403

Doc/library/tarfile.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,15 @@ The :mod:`tarfile` module defines the following exceptions:
237237
Raised to refuse extracting a symbolic link pointing outside the destination
238238
directory.
239239

240+
.. exception:: LinkFallbackError
241+
242+
Raised to refuse emulating a link (hard or symbolic) by extracting another
243+
archive member, when that member would be rejected by the filter location.
244+
The exception that was raised to reject the replacement member is available
245+
as :attr:`!BaseException.__context__`.
246+
247+
.. versionadded:: next
248+
240249

241250
The following constants are available at the module level:
242251

@@ -954,6 +963,12 @@ reused in custom filters:
954963
Implements the ``'data'`` filter.
955964
In addition to what ``tar_filter`` does:
956965

966+
- Normalize link targets (:attr:`TarInfo.linkname`) using
967+
:func:`os.path.normpath`.
968+
Note that this removes internal ``..`` components, which may change the
969+
meaning of the link if the path in :attr:`!TarInfo.linkname` traverses
970+
symbolic links.
971+
957972
- :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
958973
that link to absolute paths, or ones that link outside the destination.
959974

@@ -982,6 +997,10 @@ reused in custom filters:
982997

983998
Return the modified ``TarInfo`` member.
984999

1000+
.. versionchanged:: next
1001+
1002+
Link targets are now normalized.
1003+
9851004

9861005
.. _tarfile-extraction-refuse:
9871006

@@ -1008,6 +1027,7 @@ Here is an incomplete list of things to consider:
10081027
* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
10091028
to prevent e.g. exploiting pre-existing links, and to make it easier to
10101029
clean up after a failed extraction.
1030+
* Disallow symbolic links if you do not need the functionality.
10111031
* When working with untrusted data, use external (e.g. OS-level) limits on
10121032
disk, memory and CPU usage.
10131033
* Check filenames against an allow-list of characters

Doc/whatsnew/3.9.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1662,3 +1662,37 @@ email
16621662
check if the *strict* paramater is available.
16631663
(Contributed by Thomas Dwyer and Victor Stinner for :gh:`102988` to improve
16641664
the CVE-2023-27043 fix.)
1665+
1666+
1667+
Notable changes in 3.10.18
1668+
==========================
1669+
1670+
os.path
1671+
-------
1672+
1673+
* The *strict* parameter to :func:`os.path.realpath` accepts a new value,
1674+
:data:`os.path.ALLOW_MISSING`.
1675+
If used, errors other than :exc:`FileNotFoundError` will be re-raised;
1676+
the resulting path can be missing but it will be free of symlinks.
1677+
(Contributed by Petr Viktorin for CVE 2025-4517.)
1678+
1679+
tarfile
1680+
-------
1681+
1682+
* :func:`~tarfile.data_filter` now normalizes symbolic link targets in order to
1683+
avoid path traversal attacks.
1684+
(Contributed by Petr Viktorin in :gh:`127987` and CVE 2025-4138.)
1685+
* :func:`~tarfile.TarFile.extractall` now skips fixing up directory attributes
1686+
when a directory was removed or replaced by another kind of file.
1687+
(Contributed by Petr Viktorin in :gh:`127987` and CVE 2024-12718.)
1688+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
1689+
now (re-)apply the extraction filter when substituting a link (hard or
1690+
symbolic) with a copy of another archive member, and when fixing up
1691+
directory attributes.
1692+
The former raises a new exception, :exc:`~tarfile.LinkFallbackError`.
1693+
(Contributed by Petr Viktorin for CVE 2025-4330 and CVE 2024-12718.)
1694+
* :func:`~tarfile.TarFile.extract` and :func:`~tarfile.TarFile.extractall`
1695+
no longer extract rejected members when
1696+
:func:`~tarfile.TarFile.errorlevel` is zero.
1697+
(Contributed by Matt Prodani and Petr Viktorin in :gh:`112887`
1698+
and CVE 2025-4435.)

Lib/genericpath.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
1010
'getsize', 'isdir', 'isfile', 'samefile', 'sameopenfile',
11-
'samestat']
11+
'samestat', 'ALLOW_MISSING']
1212

1313

1414
# Does a path exist?
@@ -153,3 +153,12 @@ def _check_arg_types(funcname, *args):
153153
f'os.PathLike object, not {s.__class__.__name__!r}') from None
154154
if hasstr and hasbytes:
155155
raise TypeError("Can't mix strings and bytes in path components") from None
156+
157+
# A singleton with a true boolean value.
158+
@object.__new__
159+
class ALLOW_MISSING:
160+
"""Special value for use in realpath()."""
161+
def __repr__(self):
162+
return 'os.path.ALLOW_MISSING'
163+
def __reduce__(self):
164+
return self.__class__.__name__

Lib/ntpath.py

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@
2929
"ismount", "expanduser","expandvars","normpath","abspath",
3030
"curdir","pardir","sep","pathsep","defpath","altsep",
3131
"extsep","devnull","realpath","supports_unicode_filenames","relpath",
32-
"samefile", "sameopenfile", "samestat", "commonpath"]
32+
"samefile", "sameopenfile", "samestat", "commonpath",
33+
"ALLOW_MISSING"]
3334

3435
def _get_bothseps(path):
3536
if isinstance(path, bytes):
@@ -532,9 +533,10 @@ def abspath(path):
532533
from nt import _getfinalpathname, readlink as _nt_readlink
533534
except ImportError:
534535
# realpath is a no-op on systems without _getfinalpathname support.
535-
realpath = abspath
536+
def realpath(path, *, strict=False):
537+
return abspath(path)
536538
else:
537-
def _readlink_deep(path):
539+
def _readlink_deep(path, ignored_error=OSError):
538540
# These error codes indicate that we should stop reading links and
539541
# return the path we currently have.
540542
# 1: ERROR_INVALID_FUNCTION
@@ -567,7 +569,7 @@ def _readlink_deep(path):
567569
path = old_path
568570
break
569571
path = normpath(join(dirname(old_path), path))
570-
except OSError as ex:
572+
except ignored_error as ex:
571573
if ex.winerror in allowed_winerror:
572574
break
573575
raise
@@ -576,7 +578,7 @@ def _readlink_deep(path):
576578
break
577579
return path
578580

579-
def _getfinalpathname_nonstrict(path):
581+
def _getfinalpathname_nonstrict(path, ignored_error=OSError):
580582
# These error codes indicate that we should stop resolving the path
581583
# and return the value we currently have.
582584
# 1: ERROR_INVALID_FUNCTION
@@ -600,17 +602,18 @@ def _getfinalpathname_nonstrict(path):
600602
try:
601603
path = _getfinalpathname(path)
602604
return join(path, tail) if tail else path
603-
except OSError as ex:
605+
except ignored_error as ex:
604606
if ex.winerror not in allowed_winerror:
605607
raise
606608
try:
607609
# The OS could not resolve this path fully, so we attempt
608610
# to follow the link ourselves. If we succeed, join the tail
609611
# and return.
610-
new_path = _readlink_deep(path)
612+
new_path = _readlink_deep(path,
613+
ignored_error=ignored_error)
611614
if new_path != path:
612615
return join(new_path, tail) if tail else new_path
613-
except OSError:
616+
except ignored_error:
614617
# If we fail to readlink(), let's keep traversing
615618
pass
616619
path, name = split(path)
@@ -641,16 +644,24 @@ def realpath(path, *, strict=False):
641644
if normcase(path) == normcase(devnull):
642645
return '\\\\.\\NUL'
643646
had_prefix = path.startswith(prefix)
647+
648+
if strict is ALLOW_MISSING:
649+
ignored_error = FileNotFoundError
650+
strict = True
651+
elif strict:
652+
ignored_error = ()
653+
else:
654+
ignored_error = OSError
655+
644656
if not had_prefix and not isabs(path):
645657
path = join(cwd, path)
646658
try:
647659
path = _getfinalpathname(path)
648660
initial_winerror = 0
649-
except OSError as ex:
650-
if strict:
651-
raise
661+
except ignored_error as ex:
652662
initial_winerror = ex.winerror
653-
path = _getfinalpathname_nonstrict(path)
663+
path = _getfinalpathname_nonstrict(path,
664+
ignored_error=ignored_error)
654665
# The path returned by _getfinalpathname will always start with \\?\ -
655666
# strip off that prefix unless it was already provided on the original
656667
# path.

Lib/posixpath.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
"samefile","sameopenfile","samestat",
3636
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
3737
"devnull","realpath","supports_unicode_filenames","relpath",
38-
"commonpath"]
38+
"commonpath", "ALLOW_MISSING"]
3939

4040

4141
def _get_sep(path):
@@ -403,6 +403,15 @@ def _joinrealpath(path, rest, strict, seen):
403403
sep = '/'
404404
curdir = '.'
405405
pardir = '..'
406+
getcwd = os.getcwd
407+
if strict is ALLOW_MISSING:
408+
ignored_error = FileNotFoundError
409+
elif strict:
410+
ignored_error = ()
411+
else:
412+
ignored_error = OSError
413+
414+
maxlinks = None
406415

407416
if isabs(rest):
408417
rest = rest[1:]
@@ -426,8 +435,6 @@ def _joinrealpath(path, rest, strict, seen):
426435
try:
427436
st = os.lstat(newpath)
428437
except OSError:
429-
if strict:
430-
raise
431438
is_link = False
432439
else:
433440
is_link = stat.S_ISLNK(st.st_mode)

0 commit comments

Comments
 (0)