fkie_cve-2025-39756
Vulnerability from fkie_nvd
Published
2025-09-11 17:15
Modified
2025-09-15 15:22
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
fs: Prevent file descriptor table allocations exceeding INT_MAX
When sysctl_nr_open is set to a very high value (for example, 1073741816
as set by systemd), processes attempting to use file descriptors near
the limit can trigger massive memory allocation attempts that exceed
INT_MAX, resulting in a WARNING in mm/slub.c:
WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 __kvmalloc_node_noprof+0x21a/0x288
This happens because kvmalloc_array() and kvmalloc() check if the
requested size exceeds INT_MAX and emit a warning when the allocation is
not flagged with __GFP_NOWARN.
Specifically, when nr_open is set to 1073741816 (0x3ffffff8) and a
process calls dup2(oldfd, 1073741880), the kernel attempts to allocate:
- File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes
- Multiple bitmaps: ~400MB
- Total allocation size: > 8GB (exceeding INT_MAX = 2,147,483,647)
Reproducer:
1. Set /proc/sys/fs/nr_open to 1073741816:
# echo 1073741816 > /proc/sys/fs/nr_open
2. Run a program that uses a high file descriptor:
#include <unistd.h>
#include <sys/resource.h>
int main() {
struct rlimit rlim = {1073741824, 1073741824};
setrlimit(RLIMIT_NOFILE, &rlim);
dup2(2, 1073741880); // Triggers the warning
return 0;
}
3. Observe WARNING in dmesg at mm/slub.c:5027
systemd commit a8b627a introduced automatic bumping of fs.nr_open to the
maximum possible value. The rationale was that systems with memory
control groups (memcg) no longer need separate file descriptor limits
since memory is properly accounted. However, this change overlooked
that:
1. The kernel's allocation functions still enforce INT_MAX as a maximum
size regardless of memcg accounting
2. Programs and tests that legitimately test file descriptor limits can
inadvertently trigger massive allocations
3. The resulting allocations (>8GB) are impractical and will always fail
systemd's algorithm starts with INT_MAX and keeps halving the value
until the kernel accepts it. On most systems, this results in nr_open
being set to 1073741816 (0x3ffffff8), which is just under 1GB of file
descriptors.
While processes rarely use file descriptors near this limit in normal
operation, certain selftests (like
tools/testing/selftests/core/unshare_test.c) and programs that test file
descriptor limits can trigger this issue.
Fix this by adding a check in alloc_fdtable() to ensure the requested
allocation size does not exceed INT_MAX. This causes the operation to
fail with -EMFILE instead of triggering a kernel warning and avoids the
impractical >8GB memory allocation request.
References
Impacted products
Vendor | Product | Version |
---|
{ "cveTags": [], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nfs: Prevent file descriptor table allocations exceeding INT_MAX\n\nWhen sysctl_nr_open is set to a very high value (for example, 1073741816\nas set by systemd), processes attempting to use file descriptors near\nthe limit can trigger massive memory allocation attempts that exceed\nINT_MAX, resulting in a WARNING in mm/slub.c:\n\n WARNING: CPU: 0 PID: 44 at mm/slub.c:5027 __kvmalloc_node_noprof+0x21a/0x288\n\nThis happens because kvmalloc_array() and kvmalloc() check if the\nrequested size exceeds INT_MAX and emit a warning when the allocation is\nnot flagged with __GFP_NOWARN.\n\nSpecifically, when nr_open is set to 1073741816 (0x3ffffff8) and a\nprocess calls dup2(oldfd, 1073741880), the kernel attempts to allocate:\n- File descriptor array: 1073741880 * 8 bytes = 8,589,935,040 bytes\n- Multiple bitmaps: ~400MB\n- Total allocation size: \u003e 8GB (exceeding INT_MAX = 2,147,483,647)\n\nReproducer:\n1. Set /proc/sys/fs/nr_open to 1073741816:\n # echo 1073741816 \u003e /proc/sys/fs/nr_open\n\n2. Run a program that uses a high file descriptor:\n #include \u003cunistd.h\u003e\n #include \u003csys/resource.h\u003e\n\n int main() {\n struct rlimit rlim = {1073741824, 1073741824};\n setrlimit(RLIMIT_NOFILE, \u0026rlim);\n dup2(2, 1073741880); // Triggers the warning\n return 0;\n }\n\n3. Observe WARNING in dmesg at mm/slub.c:5027\n\nsystemd commit a8b627a introduced automatic bumping of fs.nr_open to the\nmaximum possible value. The rationale was that systems with memory\ncontrol groups (memcg) no longer need separate file descriptor limits\nsince memory is properly accounted. However, this change overlooked\nthat:\n\n1. The kernel\u0027s allocation functions still enforce INT_MAX as a maximum\n size regardless of memcg accounting\n2. Programs and tests that legitimately test file descriptor limits can\n inadvertently trigger massive allocations\n3. The resulting allocations (\u003e8GB) are impractical and will always fail\n\nsystemd\u0027s algorithm starts with INT_MAX and keeps halving the value\nuntil the kernel accepts it. On most systems, this results in nr_open\nbeing set to 1073741816 (0x3ffffff8), which is just under 1GB of file\ndescriptors.\n\nWhile processes rarely use file descriptors near this limit in normal\noperation, certain selftests (like\ntools/testing/selftests/core/unshare_test.c) and programs that test file\ndescriptor limits can trigger this issue.\n\nFix this by adding a check in alloc_fdtable() to ensure the requested\nallocation size does not exceed INT_MAX. This causes the operation to\nfail with -EMFILE instead of triggering a kernel warning and avoids the\nimpractical \u003e8GB memory allocation request." } ], "id": "CVE-2025-39756", "lastModified": "2025-09-15T15:22:38.297", "metrics": {}, "published": "2025-09-11T17:15:39.343", "references": [ { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/04a2c4b4511d186b0fce685da21085a5d4acd370" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/237e416eb62101f21b28c9e6e564d10efe1ecc6f" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/628fc28f42d979f36dbf75a6129ac7730e30c04e" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/749528086620f8012b83ae032a80f6ffa80c45cd" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/9f61fa6a2a89a610120bc4e5d24379c667314b5c" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/b4159c5a90c03f8acd3de345a7f5fc63b0909818" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/d4f9351243c17865a8cdbe6b3ccd09d0b13a7bcc" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/dfd1f4ea98c3bd3a03d12169b5b2daa1f0a3e4ae" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/f95638a8f22eba307dceddf5aef9ae2326bbcf98" } ], "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "vulnStatus": "Awaiting Analysis" }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…