Burp Suite User Forum

Create new post

How can I get XML view of a response that is in utf-16?

Roman | Last updated: Jul 05, 2016 04:06PM UTC

Hi, I don't have a convenient way to view responses with bodies that are XML encoded in utf-16. I think handling this would involve a coding change, but if there's a configuration I've overlooked, please let me know. What happens when I look at a response whose body is XML encoded in utf-16 is that the RAW tab of the Response shows the headers normally, but then shows the XML with most of the characters in the XML replaced by boxes. Also, the nice XML tab which formats the XML is missing because the response is not recognized as XML. What I'd like to have happen is for the XML tab to be present just like it is for responses in utf-8 or ascii. Let me know what you think. Cheers, Roman

PortSwigger Agent | Last updated: Jul 06, 2016 08:03AM UTC

Do the response headers state that the content type is using the UTF-16 charset? In its default settings, Burp looks in the response headers to determine the charset, and tries to render the response using that. If the charset isn't stated, or it isn't working for some other reason, you can tell Burp what charset to use at User options / Display / Character sets.

Burp User | Last updated: Jul 14, 2016 09:32PM UTC

Sorry for the late reply. The response comes back with the following header. Content-Type: application/soap+xml; charset=utf-16LE

PortSwigger Agent | Last updated: Jul 15, 2016 07:44AM UTC

Thanks for confirming. If you go to User options / Display / Character sets and look in the drop-down next to "Use a specific character set", do you see this charset listed? If so, Burp ought to be making use of it based on the header.

Burp User | Last updated: Jul 15, 2016 02:11PM UTC

It is listed in the drop-down. It doesn't help if I switch from "Recognize automatically based on message headers" to "Use a specific character set: UTF-16LE". I still see Asian characters instead of XML.

PortSwigger Agent | Last updated: Jul 15, 2016 02:16PM UTC

Ok, thanks. And is the actual response genuinely encoded using the specified charset? If the stated charset is incorrect then you'd need to figure out the actual charset in use in order to view its characters correctly.

Burp User | Last updated: Jul 15, 2016 02:25PM UTC

The headers in the response are ascii/utf8, but I think that's expected. The XML in the body of the response is in utf16. It's also interesting to see the hex tab on the response. Both the headers and the body show up correctly in the "plain text" section on the right. The hex for the body shows every other byte is 00 in the response.

PortSwigger Agent | Last updated: Jul 15, 2016 02:33PM UTC

Have you tried selecting UTF-16 manually as the charset, in case they got the LE part wrong? Also, after manually selecting a charset, ensure the item is redisplayed by selecting something else and then coming back to it.

Burp User | Last updated: Jul 15, 2016 02:54PM UTC

If I manually select UTF-16 (not UTF-16LE) I can see these responses correctly. Not surprisingly, this breaks display of the responses that are in utf-8, so the workflow is awkward, but at least I can see it all in burp now. I just noticed another thing. The response Content-Type header is application/soap+xlm; charset=utf-16LE, but the XML in the body starts with this tag: <?xml version="1.0" encoding="utf-16"?>

PortSwigger Agent | Last updated: Jul 15, 2016 03:52PM UTC

Ok, so it looks like the Content-type header is wrong. In general, it isn't reliable to state the charset within the content itself, because you need to know the charset in order to correctly interpret the content. Browsers often have some heuristics that lets them detect/infer common errors. We're aware of the clunky workflow in cases like this where you need to manually switch charsets. Ideally, we'd let you select this within the panel itself. Can't promise an ETA on that feature, sorry.

Burp User | Last updated: Jul 15, 2016 04:01PM UTC

No worries. Thanks for such quick replies! Any thoughts on a good way to learn the difference(s) between utf16-le and utf16?

PortSwigger Agent | Last updated: Jul 18, 2016 10:48AM UTC

The LE part simply refers to the byte ordering in the two-byte UTF-16 sequences. These can be either big endian or little endian (LE), depending on which byte comes first in each pair.

Burp User | Last updated: Jul 18, 2016 07:39PM UTC

Ok. Thanks for all your help.

You must be an existing, logged-in customer to reply to a thread. Please email us for additional support.