[ESP32] WebSerial / WebUSB – another Rabbit Hole

In my recent post about accessing the bootloader via WebSerial, I only covered the common desktop and Chrome setups. While that covers my main use case, I still wanted to go a step further and use my mobile phone to flash an ESP32 via a USB cable.

But plugging in an ESP32-C6 into my Samsung Galaxy didn’t register any “Serial CDC Device” (or similar), so Chrome couldn’t connect and I couldn’t access the device with the firmware analysis tool. Sad.

End of the story? No.

WebUSB to the rescue.

So I forced Github Copilot into writing me a crude “WebSerial-via-WebUSB” adapter class and after a few manual tweaks that worked okay-ish. Still some unclear issues with RTS/DTR handling and riddles, why the Windows CDC -> Chrome -> WebSerial path works differently than WinUSB -> Chrome -> WebUSB -> my class. Really weird behavior, causing unreliable enter-bootloader-resets.

Made a workaround for this by modifying the well-working reset procedure once again.

This is basically copied from Espressif’s esptool.py reset sequence

After this was fixed, communicating with bootloader by issuing a “Read Register” command worked fine. I could read the GPIO-IN register million times without any dropout.
Great, now it’s finished :)

Nope.

When the stub loader, required for reading flash, was loaded, the “Read Register” command simply timed out. Sometimes. Not always. These are the worst kinds of errors: intermittent faults, unclear when and why.

After digging for a while, I made one observation I didn’t expect. When calling the “Read Register” command, the ESP32 ROM bootloader returns two bulk transfers. For a single 14 byte response I was expecting.

Before going into the data bytes, one must know that the bootloader uses SLIP framing to separate packets from another. Every packet ends with a 0xC0 (192 dec) byte – the END marker. According to RFC 1055, every packet may or shall start with a 0xC0 (END marker) as well, to ensure proper packet parsing in the good old acoustic-coupler modem days, where line noise was common. Not sure if that really helped in anything, but at least, this is common practice still today.

In the dumps above, you can see the 0xC0 (192) initiating the SLIP frame, followed by a ‘1’ and the command we sent. These are the first two bytes of the response frame:

Then we receive another USB Bulk transfer – the length bytes, the value and some final data bytes, defined by the bootloader protocol:

This leaves us with TWO packets on the USB Bus for one SLIP frame of data in response to a “Read Register” command.

[0xC0 0x01 0x0A] [0x04 0x00 0x40 0x22 0xFF 0x77 0x00 0x00 0x00 0x00 0xC0]

0xC0 - SLIP Start Marker 0x01 - Must be 1 for responses 0x0A - Read Register command 0x0004 - Returned data length (the extra data, status, error and two unused bytes) 0x77FF2240 - the value field, the register result 0x00 - status - success 0x00 - error - none 0x00 two bytes reserved 0xC0 - SLIP End marker

So far so good. The reading seems to work fine with my code. But when we loaded the “Stub loader”, which extends the feature set to also read back data from flash, the response to the “Read Register” command changes slightly.

Some things changed. Here the changes in short:

we receive THREE bulk packets
the first one is just the SLIP Start Marker
then we see the “is a response packet” and “Read Register 0x0A” command
in the third packet, we receive data, but two bytes status/error instead of four

According to Espressif’s documentation, stub loaders only send two bytes instead of four, so this is well-documented as shown in the screenshot about “Status Bytes”. No suprise here.

But why do we see three packets?

After some digging, I found the reason which – to me – is basically a bug. Or call it “unwanted behavior”.

If you check the code in the stub loaders, it checks for a written 0xC0 and when found, it flushes the buffer immediately. This is meant to send packets when the final 0xC0 byte of the SLIP frame was received, for an increased throughput (?). At least it’s a good measure to ensure every frame is transmitted as soon it’s finished, reducing latency or eaten responses.

Good measure basically. But the first byte of a frame is 0xC0 as well. So every response suddenly causes two USB bulk transfers instead of just one. This was a patch that was meant to increase throughput, but has the side effect in sending more USB packets than needed. Not a deadly bug, but certainly unexpected behavior.

Another finding:
The ROM routine uart_tx_one_char() used by the bootloader flushes the JTAG UART after every 0x0A byte it writes as well, which causes even more unnecessary, only partially filled bulk transfers.

That’s why every READ REGISTER command uses at least two bulk transfers: the command code is 0x0A.

In the left path, the ESP32-C6 ROM checks for the byte 0x0A and flushes the buffers, if so. Good for terminal output.

But why should this bother us?

I am not absolutely certain yet, however as it looks to me right now, I just miss the 9-byte response most of the reads.

My WebUSB code starts receiving a bulk transfer, quickly queues it into some bottom-half code and re-issues another transfer. This keeps the loop tight and the latency as low as it can get.
But it seems inbetween there is no USB transfer queued by the WebUSB/WinUSB layer and due to that the packet is never passed to my JS code.

We are talking about some “you-get-what-your-system-is-capable-of” code execution latency and “no-guarantee-for-anything” JavaScript code. If there is a bulk transfer queued, the driver levels below know how to handle them and pass them to the waiting transfer in user space, returning them in the promise.

If there are now multiple smaller USB bulk transfer responses in quick succession, I am afraid, that they are not buffered until I queue a bulk read using device.transferIn(). But then why is the two-bulk bootloader transfer working and the three-bulk stub-transfer not so much? Does the WebUSB use some double-buffer like queueing and drops the third bulk transfer?

I honestly don’t know yet.

What now?

This my current understanding of the issues I see when accessing the ESP32’s USB with WebUSB.
Lets see where it leads me to. Maybe patch the stub loaders? Or even use the new ones, which are rust-based and do not seem to work as an out-of-the-box replacement?
But why are these “new ones” showing:
“This repository was archived by the owner on Mar 5, 2025. It is now read-only.”

Which are the recommended stub loaders now?

These? They also don’t work – “This project is experimental and not yet ready for production use.”

Unfortunately the new Rust-based loaders for ESP32-C6 don’t reply with the ‘OHAI’ magic…

I will update this post as soon I find the solution to this problem.

Update

After some more experiments and modifying the stub loaders myself, I can conclude that this is feasible, but not without specialized stub loaders.

While I could now modify the stub loader to properly send full frames in a single bulk transfer, I realized that reading from flash floods the host PC with even more bulk transfers. I can enqueue a huge transfer which would most likely work, because the WebUSB seems to concatenate non-flushed transfers. But this failed at least once in my tests.

This dump shows the first few hundred bytes read back from the flash reading routine. Looking okay on first sight, reveals that just before the “ration” string 0x1BA bytes are simply missing. As I queued an up-to-32k transfer, I expected the data to arrive without any dropouts – except the WinUSB/WebUSB isn’t working perfectly either.

Actual ROM content with missing parts highlighted

It’s getting out of hand.

While I could still modify the stub loaders to send packets in a way that the WinUSB/WebUSB code can keep up with, it would be quite a bit of work. Not all USB-capable devices expose the required ROM-functions (only C5/C6/C61/H2/P4), and the code would turn into a somewhat target-specific mess.

Since probably no one – not even me – really needs this, it would be more of an academic solution.

I will probably publish it as EXPERIMENTAL, but not meant for real use.

EDIT: Another thought…

I cannot find any code that waits for the USB endpoint buffer to be fetched by the USB host before it is populated again. Does this mean the ROM routines *and* the stub loader simply assume that the host is fast enough to fetch USB bulk transfers?

In other words, does the ESP send data as fast as it can, without waiting for a potentially slower host to queue the next burst transfers?
Maybe I will get an answer in the issue i reported.

Or do you have an idea? If so, please contact me at [email protected]

Finally!!

I finally was able to fix the issues with just a small patch to the stub loaders. It basically is inbetween the stub loaders doing not optimal and the WebUSB concept not being as robust as native drivers. Who could have thought…

Key points:

The USB host treats any short packet (flush with < 64 bytes) as end-of-transaction
The stub frequently triggered such short packets:
- Explicit flushes at the wrong time
- SLIP start/end marker (0xC0) causing immediate flushes
- ROM routines flushing on 0x0A
This resulted in multiple partial bulk transfers per logical response, often 2-3 back-to-back.
Native CDC drivers tolerate this because they continuously drain the endpoint
WebUSB / WinUSB follow a host-driven request-response model and drop unexpected bulk transfers as no further request is queued for a very short period

Funny, that this patch also silently fixes the issue of slow ESP32-S3 and ESP32-C3 flash reads I faced with the official stub loaders. At least the official release loaders still had the issue.

So I can finally close this case.