After a couple hours of debugging and analyzing ktr(9) traces in ddb, I managed to figure out the problem which was causing kernel panic just 3 hardclock ticks into bootup. I had conveniently forgotten to unlock a mutex in callout_tick() — not exactly forgotten, but overlooked the case when there were no callouts to expire during the current hardclock tick. The mutex should have been unlocked regardless of whether we are scheduling softclock() to run or not.
As soon as I fixed this, the kernel booted up well and everything seems to be working so far:
[pvaibhav@matrix:src/sys]$ uptime
4:58AM up 2:07, 2 users, load averages: 0.13, 0.32, 0.39
Although my laptop’s wireless LED seems to get stuck every now and then, when it should be blinking depending on traffic. I’m not sure if this is because the signal is weak right now, or whether the callouts needed by the LED taskqueue in the iwi(4) driver are not being serviced properly.
Testing has also only been done on my uniprocessor setup, so it remains to be seen how it performs in SMP environment. I’ve left the locking semantics largely untouched from the original implementation so there shouldn’t be any problems. I also need to devise a plan to test the whole thing more thoroughly than simply.. using the kernel. Perhaps it’ll be interesting to see how (if at all) the performance is affected by using the binary heap and not having to loop through all the callouts stored in each “bucket” of the old callout wheel, during a hardware interrupt.
I decided to maintain a development blog for my Google Summer of Code 2009 project with FreeBSD. The main intent is to document my thought process, successes and failures through this project: mostly as a scratch pad. I hope this will also be useful for those curious to track the progress.
Background information for the project can be found on the FreeBSD wiki. In short, the project’s goal is to redesign the way the FreeBSD kernel handles callouts/timeouts. The changes include a new API for clients, underlying data structure changes for better efficiency, and the most important of all (in my opinion): the possibility of making the kernel tickless by doing away with periodic timer interrupts, and using programmable deadline timers exclusively.
Notes about the current status and implementation details:
A generic binary heap has been implemented, and priority queue based on it now works. Both MIN and MAX heaps are working, although we will only consume the MINHEAP variant. The design is similar to the SLIST, TAILQ and friends found in sys/queue.h. The advantage of using this implementation of a priority queue is that the binary heap, on which this PQ is based, provides O(1) time complexity on an average for most operations including inserting and removing nodes or changing their priority. Although the upper bound time complexity is still O(log n), in general this will only occur occasionally when a new callout is scheduled which must bubble up to the root (ie., have the smallest timeout).
callout_clock() was modified to peek at the first pending callout, and schedule the software clock handler softclock() only if the expiry time has arrived (or is past, if ticks were missed). Peeking at the first item is O(1) worst-case, so this operation is extremely efficient.
softclock(), the software interrupt handler, will extract items in priority-order and if their expiry time has come or is in the past, will run the associated function and remove the callout from the queue. It then checks the next callout in line and repeats this process until the expiry time of the next callout is found to be in the future. At this point, softclock()’s job is done (for now).
Auxiliary functions to reset/stop callouts have been implemented, and they all use the new priority queue.
The current issues to note are:
The kernel doesn’t boot to desktop! Although the build is successful and the kernel can run until the time the first callout is scheduled, the kernel panics with the message mi_switch: switch in a critical section. I am currently trying to debug this, although it’s made difficult by the fact that I can’t obtain a dump of the crash because the harddrives are not even ready when this occurs. I’m looking to use ktr(9), the kernel trace facility to see if we can figure out where the problem is.
The kernel still has to service periodic interrupts. The tickless-work has not yet started — and won’t be until the current implementation works with binary heaps as its queue instead of the call “wheel.” Because of this, we’re still using hardclock ticks as the timing unit, and not the opaque hwclocktick_t that I have proposed in the plan. Once the binary heap priority queue is working, I’ll start to implement the common timer interface (most of this already exists, just have to add functions to get frequency and convert between walltime and hardware time). Only after this can we begin implementing the new functions which work with opaque time values.
The extremely old timeout() and untimeout() calls are still supported, but they are a pain to manage. These functions do not mandate clients to allocate their own callout structures, so the kernel must do it for them. Right now, the existing method of allocating a fixed number of callouts locally, is used. We are manually managing the callouts that have been used and freed, using a linked list. In the future I want to use uma(9) zone allocator which is meant for this kind of work. However, I couldn’t find a way to create a zone if I already have some memory preallocated (although this is mentioned in the man pages!). So this part has been postponed for later.
That’s about it for now. Later I’ll post more info about the heavyweight debug session I’m about to begin. Let’s hope I’m able to track it down!