Quantcast
Channel: Debian User Forums
Viewing all articles
Browse latest Browse all 3424

Hardware • Disk locks during ongoing heavy filesystem usage

$
0
0
I have an i9 13900k w/ 128gb of DDR4 and 2x980 Pro NVME in software RAID w/ ext4. When I initially build this system in April 2023, I deployed Ubuntu Desktop and subsequently moved to Debian 12 around Oct. This is not intended as a brag, but may be relevant to the crashes described below - I would not recommend such a configuration for most users.

On both deployments, I've experienced disk lockups when performing a substantial amount of writes to the filesystem, I'm yet to be able to reproduce 100% reliably, but I most frequently saw this when attempting to perform a multi-threaded workload writing thousands of ~8mb files across 16-32 threads, bottle-necked by my 1GBE network connection, aside from this, I rarely see the lockup trigger - maybe once or twice a month but only under heavy disk load - I'm yet to confirm this is disk writes but I don't recall any specific issues with disk reads. With the above threaded app, I was able to trigger crashes relatively quickly, maybe within an hour.

When a crash occurs, audio (youtube/spotify) continue to play for a duration of a few minutes, the system appears responsive to anything that doesn't involve disk access - opening a gnome terminal from an existing one will open a gnome terminal but no prompt will show, an existing terminal I can run 'whoami' but `ls` will freeze the terminal permanently. `dmesg` also freezes the terminal. Eventually, audio will freeze and the system will totally lock up - my guess is the browser and/or X11 attempt to write to a log and end up removing any remaining responsiveness of the system.

I have previously tried:

- Replacing the SSD to ensure that it was not SSD hardware related - the second drive also froze in a similar way, so I had two working drives hence the raid.
- Moving from Ubuntu to Debian, which didn't resolve the issue

I have a few theories:

- My unusually large amount of ram (for the processor) is somehow triggering a condition where flushing to disk fails, noting this would be contradicted by workstations/servers with substantially more
- Something related to the CPU / Motherboard which occurs only on heavy PCIE usage or similar

I'm unsure how to best test any theories, nor if my theories are sound/complete. I don't particularly want to brutalize the SSDs if I can avoid it, and formatting those SSDs would be difficult (though, I do have a third lower performance NVME SSD I could dedicate to such a task). Has anyone experienced similar or could make other suggestions I should try both in an attempt to diagnose the issue as well as once a lockup has occurred?

Statistics: Posted by Dazzling-Jacket — 2024-02-04 06:00 — Replies 4 — Views 102



Viewing all articles
Browse latest Browse all 3424

Trending Articles