Alan Stern
2014-07-18 20:25:23 UTC
Some OHCI controllers (most notably those made by NVIDIA, but others
too) sometimes lose track of completed Transfer Descriptors. When a TD
completes, the controller is supposed to add it to the start of the
Done Queue, to let the driver know the transfer is finished. The buggy
controllers occasionally fail to do this.
ohci-hcd already contains a couple of ad-hoc mechanisms for dealing
with these failures. One of the quirk handlers (for Compaq ZF Micro)
looks for lost TDs on an interrupt endpoint. In addition, the driver
recognizes that whenever a TD is on the Done Queue, all the earlier TDs
for the same endpoint must have completed as well, even if they aren't
on the Done Queue.
Still, these mechanisms don't handle all the possible scenarios. Lost
TDs have been observed for non-interrupt endpoints, and if the lost TD
is the last one in a transfer then there might not be anything
following it in the Done Queue.
This patch series replaces the ad-hoc mechanisms with a general
approach. A new I/O watchdog routine runs every 200 ms as long as
there are any active URBs. The routine scans the lists of TDs, looking
for any which have completed but haven't shown up in the Done Queue,
and takes care of them. (This will add a small amount of overhead, but
OHCI has never been high-throughput.) The routine also checks for
controllers malfunctioning so badly that they are unusable, and
declares them dead.
Making these changes requires a certain amount of care, because the
controller might add a TD to the Done Queue any time up to a
millisecond after the TD completes. The watchdog routine has to make
sure it doesn't race with the hardware, and the done list (the driver's
equivalent of the hardware's Done Queue) has to be treated differently
from the way it is now. Also, there will be two pathways by which URBs
may complete: the hardware IRQ handler and the watchdog routine. This
requires the driver to make sure that URB completions are always
single-threaded.
The first four patches in this series remove the ad-hoc zfmicro quirk
and make other preliminary adjustments. The last two patches add the
I/O watchdog and add to it a check for a non-updating frame counter
(another type of hardware problem observed in the field).
In the past, users have reported controller failures like these that
ended up hanging the kernel's USB stack. With these changes in place,
the hardware problems will show up as graceful failures, leaving the
rest of the USB subsystem intact.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
too) sometimes lose track of completed Transfer Descriptors. When a TD
completes, the controller is supposed to add it to the start of the
Done Queue, to let the driver know the transfer is finished. The buggy
controllers occasionally fail to do this.
ohci-hcd already contains a couple of ad-hoc mechanisms for dealing
with these failures. One of the quirk handlers (for Compaq ZF Micro)
looks for lost TDs on an interrupt endpoint. In addition, the driver
recognizes that whenever a TD is on the Done Queue, all the earlier TDs
for the same endpoint must have completed as well, even if they aren't
on the Done Queue.
Still, these mechanisms don't handle all the possible scenarios. Lost
TDs have been observed for non-interrupt endpoints, and if the lost TD
is the last one in a transfer then there might not be anything
following it in the Done Queue.
This patch series replaces the ad-hoc mechanisms with a general
approach. A new I/O watchdog routine runs every 200 ms as long as
there are any active URBs. The routine scans the lists of TDs, looking
for any which have completed but haven't shown up in the Done Queue,
and takes care of them. (This will add a small amount of overhead, but
OHCI has never been high-throughput.) The routine also checks for
controllers malfunctioning so badly that they are unusable, and
declares them dead.
Making these changes requires a certain amount of care, because the
controller might add a TD to the Done Queue any time up to a
millisecond after the TD completes. The watchdog routine has to make
sure it doesn't race with the hardware, and the done list (the driver's
equivalent of the hardware's Done Queue) has to be treated differently
from the way it is now. Also, there will be two pathways by which URBs
may complete: the hardware IRQ handler and the watchdog routine. This
requires the driver to make sure that URB completions are always
single-threaded.
The first four patches in this series remove the ad-hoc zfmicro quirk
and make other preliminary adjustments. The last two patches add the
I/O watchdog and add to it a check for a non-updating frame counter
(another type of hardware problem observed in the field).
In the past, users have reported controller failures like these that
ended up hanging the kernel's USB stack. With these changes in place,
the hardware problems will show up as graceful failures, leaving the
rest of the USB subsystem intact.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html