Linux gets double-quick double-update to fix kernel Oops! – Naked Security

0
1
Linux gets double-quick double-update to fix kernel Oops! – Naked Security


Linux has never suffered from the infamous BSoD, short for blue screen of deaththe name given to the dreaded “something went wrong” message associated with a Windows system crash.

Microsoft has tried many things over the years to shake off that “BSoD” nickname, including changing the background color used when crash messages appear, adding a super-sized sad-face emoticons to make the message more compassionate, displaying QR codes that you can create. snap with your phone to help you diagnose the problem, and not fill the screen with a technobabble list of things in the kernel code that happened to be loaded at the time.

(Those crash dump lists often lead to anti-virus and threat-prevention software being blamed for every system crash, simply because their names tend to appear at or near the top of the list of loaded module – not because they have anything to do with the crash, but because they generally load early and just happen to be at the top of the list, thus making a convenient scaepgoat.)

Even better, “BSoD” is no longer the everyday, throwaway pejorative term it used to be, as Windows crashes more often than ever.

We’re not suggesting that Windows never crashes, or implying that it’s now magically bug-free; just mentioning that you generally don’t need the word BSoD as often as you used to.

Linux crash notifications

Of course, Linux never had BSoDs, even when Windows seemed to have them all the time, but that’s not because Linux never crashes, or is magically bug-free.

It’s simply that Linux is not BSoD (yes, the term can be used as an intransitive verb, as in “my laptop BSoDded half way through an email”), because – in a delightful understatement – ​​it is suffering a oopsor if the oops is severe enough that the system cannot be reliably maintained even with bad performance, it panic.

(It is also possible to configure a Linux kernel so that a oops always “promoted” to a panicfor environments where security considerations make it better to have a system that shuts down suddenly, even with some data not saved in time, than a system that goes into a uncertain state that may lead to data leakage or data corruption.)

An oops usually produces console output something like this (we’ve provided source code below if you want to explore oops and panic for yourself):


[12710.153112] oops init (level = 1)
[12710.153115] triggering oops via BUG()
[12710.153127] ------------[ cut here ]------------
[12710.153128] kernel BUG at /home/duck/Articles/linuxoops/oops.c:17!
[12710.153132] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[12710.153748] CPU: 0 PID: 5531 Comm: insmod . . . 
[12710.154322] Hardware name: XXXX
[12710.154940] RIP: 0010:oopsinit+0x3a/0xfc0 [oops]
[12710.155548] Code: . . . . .
[12710.156191] RSP: . . .  EFLAGS: . . .
[12710.156849] RAX: . . .  RBX: . . .  RCX: . . .
[12710.157513] RDX: . . .  RSI: . . .  RDI: . . .
[12710.158171] RBP: . . .  R08: . . .  R09: . . .
[12710.158826] R10: . . .  R11: . . .  R12: . . .
[12710.159483] R13: . . .  R14: . . .  R15: . . .
[12710.160143] FS:  . . .  GS: . . .  knlGS: . . . 
. . . . .
[12710.163474] Call Trace:
[12710.164129]  
[12710.164779]  do_one_initcall+0x56/0x230
[12710.165424]  do_init_module+0x4a/0x210
[12710.166050]  __do_sys_finit_module+0x9e/0xf0
[12710.166711]  do_syscall_64+0x37/0x90
[12710.167320]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[12710.167958] RIP: 0033:0x7f6c28b15e39
[12710.168578] Code: . . . . .
[. . . . .
[12710.173349]  
[12710.174032] Modules linked in: . . . . .
[12710.180294] ---[ end trace 0000000000000000 ]---

Unfortunately, when kernel version 6.2.3 came out at the end of last week, two small changes quickly proved there is a problemwith users reporting kernel oopses when managing disk storage.

Kernel 6.1.16 seems to be subject to the same changes, and thus susceptible to the same oopsiness.

For example, plugging in a removable drive and mounting it works fine, but unmounting the drive when you’re done with it can cause a oops.

While an oops doesn’t immediately freeze the entire computer, the kernel-level code that crashes when mounting disk storage is alarming enough that a well-informed user probably wants to shut down as soon as possible period, in case of persistent problem leading to data corruption…

…but some users reported that oops prevented what is known in jargon as an proper closurewhich requires force cycling the power, by pressing the power button for a few seconds, or temporarily cutting the mains supply to a server.

The good news is the kernels 6.2.4 and 6.1.17 was immediately released over the weekend to restore the problems.

Due to the speed with which the Linux kernel is released, those updates have been followed by 6.2.5 and 6.1.18that they themselves have been updated (today, 2023-03-13) by 6.2.6 and 6.1.19.

What to do?

If you are using a 6.x-version Linux kernel and you haven’t updated yet, make sure you’re not installing 6.2.3 or 6.1.16 along the way.

If you already have one of those versions (We had 6.2.3 for a few days and were unable to cause the driver to crash, probably because our kernel configuration protected us accidentally from triggering the bug), consider -updated soon…

…because even if you haven’t run into any disk-volume-based problems so far, you might be immune by good luck, but by upgrading your kernel again you’ll be immune by design.


EXPLORING OOPS AND PANIC EVENTS ON YOUR OWN

You will need a kernel built from source code already installed on your test computer.

Create a directory, let’s call it /test/oopsand save this source code as oops.c:


#include <linux/kernel.h> 
#include <linux/module.h> 
#include <linux/moduleparam.h> 
#include <linux/init.h> 

MODULE_LICENSE("GPL");

static int level = 0;
module_param(level,int,0660);
 
static int oopsinit(void) { 
   printk("oops init (level = %d)\n",level);
   // level: 0->just load; 1->oops; 2->panic
   switch (level) {
      case 1:
         printk("triggering oops via BUG()\n");
         BUG(); 
         break;
      case 2: 
         printk("forcing a full-on panic()\n");
         panic("oops module"); 
         break;
   }
   return 0; 
} 

static void oopsexit(void) { 
   printk("oops exit\n"); 
} 
 
module_init(oopsinit); 
module_exit(oopsexit);

Create a file in the same directory called Kbuild to control build parameters, like this:


 EXTRA_CFLAGS = -Wall -g
 obj-m        = oops.o

Then build the module as shown below.

The -C the option says make where to start searching Makefilesthus pointing the build process to the correct kernel source code tree, and the M= the setting says make where the actual code of the module to be built on this occasion can be found.

You must provide the full, absolute path for M=so don’t try to save typing by using ./ (the current directory moves during the build process):


/test/oops$ make -C /where/you/built/the/kernel M=/test/oops
CC [M]  /home/duck/Articles/linuxoops/oops.o
MODPOST /home/duck/Articles/linuxoops/Module.symvers
CC [M]  /home/duck/Articles/linuxoops/oops.mod.o
LD [M]  /home/duck/Articles/linuxoops/oops.ko

You can load and unload the new oops.ko kernel module with parameters level=0 just to check if it works.

Look inside dmesg for a log of init and exit calls:


/test/oops# insmod oops.ko level=0
/test/oops# rmmod oops
/test/oops# dmesg
. . .
[12690.998373] oops: loading out-of-tree module taints kernel.
[12690.999113] oops init (level = 0)
[12704.198814] oops exit

To provoke a oops (recover) oa panic (Hang up your computer), use level=1 o level=2 respectively.

Don’t forget to save all your work before triggering any condition (you’ll need to reboot afterwards), and don’t do it on someone else’s computer without formal permission.