This time we're digging into HID - Human Interface Devices and more
specifically the protocol your mouse, touchpad, joystick, keyboard, etc. use to talk to your computer.
Remember the good old days where you had to install a custom driver for
every input device? Remember when PS/2 (the protocol) had to be extended
to accommodate for mouse wheels, and then again for five button mice. And
you had to select the right protocol to make it work. Yeah, me neither, I
tend to suppress those memories because the world is awful enough as it is.
As users we generally like devices to
work out of the box. Hardware manufacturers generally like to add bits and
bobs because otherwise who would buy that new device when last year's device looks identical. This difference in needs can only be
solved by one superhero: Committee-man, with the superpower to survive
endless meetings and get RFCs approved.
Many many moons ago, when USB itself was in its infancy, Committee man and
his sidekick Caffeine boy got the USB consortium agree on a standard for
input devices that is so self-descriptive that operating systems (Win95!)
can write one driver that can handle this year's device, and next year's,
and so on. No need to install extra drivers, your device will just work out
of the box. And so HID was born. This may only be an approximate
summary of history.
Originally HID was designed to work over USB. But just like Shrek
the technology world is obsessed with layers so these days HID works
over different transport layers. HID over USB is what your mouse uses, HID
over i2c may be what your touchpad uses. HID works over Bluetooth and it's
celebrity-diet version BLE. Somewhere, someone out there is very slowly moving a mouse
pointer by sending HID over carrier pigeons just to prove a point. Because there's always that one
guy.
HID is incredibly simple in that the static description of the device can
just be bytes burnt into the ROM like the Australian sun into unprepared English
backpackers. And the event frames are often an identical series of bytes
where every bit is filled in by the firmware according to the
axis/buttons/etc.
HID is incredibly complicated because parsing it is a stack-based mental
overload. Each individual protocol item is simple but getting it right and
all into your head is tricky. Luckily, I'm here for you to make this simpler
to understand or, failing that, at least more entertaining.
As said above, the purpose of HID is to make devices describe themselves in
a generic manner so that you can have a single driver handle any input
device. The idea is that the host parses that standard protocol and knows
exactly how the device will behave. This has worked out great, we only have
around 200 files dealing with vendor- and hardware-specific HID quirks as of
v4.20.
HID messages are Reports. And to know what a Report means and how to
interpret it, you need a Report Descriptor. That Report Descriptor is
static and contains a series of bytes detailing "what" and "where", i.e.
what a sequence of bits represents and where to find those bits in the
Report. So let's try and parse one of Report Descriptors, let's say for a
fictional mouse with a few buttons. How exciting, we're at the forefront of
innovation here.
The Report Descriptor consists of a bunch of Items. A parser reads
the next Item, processes the information within and moves on. Items are
small (1 byte header, 0-4 bytes payload) and generally only apply exactly
one tiny little bit of information. You need to accumulate several items to
build up enough information to actually know what's happening.
The "what" question of the Report Descriptor is answered with the so-called
Usage. This could be something simple like X or Y (0x30
and 0x31) or something more esoteric like System Menu Exit (0x88). A
Usage is 16 bits but all Usages are grouped into so-called Usage
Pages. A Usage Page too is a 16 bit value and together they form the
32-bit value that tells us what the device can do. Examples:
0001 0031 # Generic Desktop, Y
0001 0088 # Generic Desktop, System Menu Exit
0003 0005 # VR Controls, Head Tracker
0003 0006 # VR Controls, Head Mounted Display
0004 0031 # Keyboard, Keyboard \ and |
Note how the Usage in the last item is the same as the first one, without
the Usage Page you
will mix things up. It helps if you always think
of as the Usage as a 32-bit number. For your kids' bed-time story time, here
are
the
HID
Usage Tables from 2004 and the
approved HID Usage Table Review Requests
of the last decade. Because nothing puts them to sleep quicker than droning
on about hex numbers associated with remote control buttons.
To successfully interpret a Report from the device, you need to know which
bits have which Usage associated with them. So let's go back to our
innovative mouse. We would want a report descriptor with 6 items like this:
Usage Page (Generic Desktop)
Usage (X)
Report Size (16)
Usage Page (Generic Desktop)
Usage (Y)
Report Size (16)
This basically tells the host: X and Y both have 16 bits. So if we get a
4-byte Report from the device, we know two bytes are for X, two for Y.
HID was invented when a time when bits were more expensive than printer ink,
so we can't afford to waste any bits (still the case because who would want to spend an extra penny on more ROM). HID makes use of so-called Global items, once those are set
their value applies to all following items until changed. Usage Page and
Report Size are such Global items, so the above report descriptor is really
implemented like this:
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
The Report Count just tells us that 2 fields of the current Report Size are
coming up. We have two usages, two fields, and 16 bits each so we know what
to do. The
Input item is sort-of the marker for the end of the stack,
it basically tells us "process what you've seen so far", together with a few
flags.
Rel in this case means that the Usages are relative. Oh, and
Input means that this is data from device to host.
Output would be
data from host to device, e.g. to set LEDs on a keyboard. There's also
Feature which indicates configurable items.
Buttons on a device are generally just numbered so it'd be monumental 16-bits-at-a-time waste to have
HID send Usage (Button1), Usage (Button2), etc. for every button on the
device. HID instead provides a Usage Minimum and Usage Maximum
to sequentially order them. This looks like this:
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
So we have 5 buttons here and each button has one bit. Note how the buttons are
Abs because a button state is not a relative value, it's either down
or up. HID is quite intolerant to Schrödinger's thought experiments.
Let's put the two things together and we have an almost-correct Report
descriptor:
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
New here is
Cnst. This signals that the bits have a constant value, thus don't need a Usage and basically
don't matter (haha. yeah, right. in theory). Linux does indeed ignore
those. Cnst is used for
padding to align on byte boundaries - 5 bits for buttons plus 3 bits padding
make 8 bits. Which makes one byte as everyone agrees except for granddad
over there in the corner. I don't know how he got in.
Were we to get a 5-byte Report from the device, we'd parse it
approximately like this:
button_state = byte[0] & 0x1f
x = bytes[1] | (byte[2] << 8)
y = bytes[3] | (byte[4] << 8)
Hooray, we're almost ready. Except not. We may need more info to correctly
interpret the data within those reports.
The Logical Minimum and Logical Maximum specify the value
range of the actual data. We need this to tell us whether the data is signed
and what the allowable range is. Together with the Physical Minimum
and the Physical Maximum they specify what the values really mean.
In the simple case:
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
This just means our x/y data is signed. Easy. But consider this combination:
...
Logical Minimum (0)
Logical Maximum (1)
Physical Minimum (1)
Physical Maximum (12)
This means that if the bit is 0, the effective value is 1. If the bit is 1,
the effective value is 12.
Note that the above is one report only. Devices may have multiple Reports,
indicated by the Report ID. So our Report Descriptor may look like
this:
Report ID (01)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Report ID (02)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
If we were to get a Report now, we need to check byte 0 for the Report ID so
we know what this is. i.e. our single-use hard-coded parser would look like
this:
if byte[0] == 0x01:
button_state = byte[1] & 0x1f
else if byte[0] == 0x02:
x = bytes[2] | (byte[3] << 8)
y = bytes[4] | (byte[5] << 8)
A device may use multiple Reports if the hardware doesn't gather all data
within the same hardware bits. Now, you may ask: if I get fifteen reports,
how should I know what belongs together? Good question, and lucky for you
the HID designers are miles ahead of you. Report IDs are grouped into
Collections.
Collections can have multiple types. An Application Collection
describes a set of inputs that make sense as a whole. Usually, every Report
Descriptor must define at least one Application Collection but you may have
two or more. For example, a a keyboard with integrated trackpoint
should and/or would use two. This is how the kernel knows it needs to create
two separate event nodes for the device. Application Collections have a few
reserved Usages that indicate to the host what type of device this is. These
are e.g. Mouse, Joystick, Consumer Control. If you ever
wondered why you have a device named like "Logitech G500s Laser Gaming Mouse
Consumer Control" this is the kernel simply appending the Application
Collection's Usage to the device name.
A Physical Collection indicates that the data is collected at one
physical point though what a point is is a bit blurry. Theoretical
physicists will disagree but a point can be "a mouse". So it's quite common
for all reports on a mouse to be wrapped in one Physical Collections. If you
have a device with two sets of sensors, you'd have two collections to
illustrate which ones go together. Physical Collections also have reserved
Usages like Pointer or Head Tracker.
Finally, a Logical Collection just indicates that some bits of data
belong together, whatever that means. The HID spec uses the example of
buffer length field and buffer data but it's also common for all inputs from
a mouse to be grouped together. A quick check of my mice here shows that
Logitech doesn't wrap the data into a Logical Collection but Microsoft's
firmware does. Because where would we be if we all did the same thing...
Anyway. Now that we know about collections, let's look at a whole report
descriptor as seen in the wild:
Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Application)
Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Logical)
Report ID (26)
Usage (Pointer)
Collection (Physical)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Logical Minimum (0)
Logical Maximum (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
Usage (Wheel)
Physical Minimum (0)
Physical Maximum (0)
Report Count (1)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
End Collection
End Collection
End Collection
We have one Application Collection (Generic Desktop, Mouse) that contains
one Logical Collection (Generic Desktop, Mouse). That contains one Physical
Collection (Generic Desktop, Pointer). Our actual Report (and we have only
one but it has the decimal ID 26) has 5 buttons, two 16-bit axes (x and y)
and finally another 16 bit axis for the Wheel. This device will thus send
8-byte reports and our parser will do:
if byte[0] != 0x1a: # it's decimal in the above descriptor
error, should be 26
button_state = byte[1] & 0x1f
x = byte[2] | (byte[3] << 8)
y = byte[4] | (byte[5] << 8)
wheel = byte[6] | (byte[7] << 8)
That's it. Now, obviously, you can't write a parser for every HID descriptor
out there so your actual parsing code needs to be generic. The Linux kernel
does
exactly that and so does everything else that needs to parse HID.
There's a huge variety in devices out there, all with HID descriptors that
may or may not be correct. As with so much in life, correct HID
implementations are often defined by "whatever Windows accepts" so if you
like playing catch, Linux development is for you.
Oh, in case you just got a bit too optimistic about the state of the world:
HID allows for vendor-defined usages. Which does exactly what you'd think it
does, it hides vendor-specific protocol inside what should be a generic
protocol. There are devices with hidden report IDs that you can only
unlock by sending the right magic sequence to the report and/or by defeating
the boss on Level 4. Usually those devices present themselves as
basic/normal devices over HID but if you know the magic sequence you get to
use *gasp* all buttons. Or access the device-specific configuration
features. Logitech's HID++ is just one example here but at least that's one
where we have most of the specs available.
The above describes how to parse the HID report descriptor and interpret the reports. But
what happens once you have a HID report correctly parsed?
In the case of the Linux kernel, once the report descriptor is parsed
evdev nodes are created (one per Application Collection, more or less). As
the Reports come in, they are mapped to evdev codes and the data appears on
the evdev node. That's where userspace like libinput can pick it up. That bit is actually quite simple (mostly anyway).
The above output was generated with the tools from the hid-tools repository. Go forth and hid-record.