"On x86-64, there are two CPU settings which control the kernel’s ability to access memory."
There are a couple more than two, even in 2021.
Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.
That's all that comes to mind at the moment. It's definitely a fun question!
not entirely, IOMMU is a thing, that is IIRC how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
>how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
Even if we take those promises at face value, it practically doesn't mean much because every server still needs to handle reboots, which is when they can inject their evil code.
TL;DR: when a user writes to /proc/self/mem, the kernel bypasses the MMU and hardware address translation, opting to emulate it in software (including emulated page faults!), which allows it to disregard any memory protection that is currently setup in the page tables.
"On x86-64, there are two CPU settings which control the kernel’s ability to access memory."
There are a couple more than two, even in 2021.
Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.
That's all that comes to mind at the moment. It's definitely a fun question!
I'm still surprised I was the first one to notice when Linus tried to change this - I always thought it was a pretty well known behavior.
The kernel owns the page tables. It can always find another way in.
> The kernel owns the page tables.
not entirely, IOMMU is a thing, that is IIRC how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
>how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).
Even if we take those promises at face value, it practically doesn't mean much because every server still needs to handle reboots, which is when they can inject their evil code.
If your threat model is being v& by feds, maybe you should keep your server at home behind Tor.
TL;DR: when a user writes to /proc/self/mem, the kernel bypasses the MMU and hardware address translation, opting to emulate it in software (including emulated page faults!), which allows it to disregard any memory protection that is currently setup in the page tables.
Thank You.