Unicode Character problems in JSON and playing with BOM (Byte Order Mark)

3 min readMay 8, 2021

We are currently working on a JSON API which endpoint is developed on C# and the request is made by PHP. I wanted to share what I learned during this integration process. Here we go!

Let’s start our article with a question.

Is it enough to set “application / json; charset = utf-8” as the Content Type of the page to solve Unicode problems?

Let’s relieve the developer who came here by saying what we gonna say the end of the day.

The answer is NO !

Because you cannot fix the encoding type of the page by playing with Content Type.

So what is needed to prevent Unicode characters from appearing as question marks?

The answer is set the Content Encoding not Content Type !

Of course, this has its unique syntax in every language. It is enough to revise this line according to which language you are writing the application in.

Let’s see how it is used by putting the full version of the code here:

Note: If you say (nt, Formatting.Indented), \ r \ n marks will appear after each semicolon on the side that will decode this endpoint. \ r corresponds to the Enter character (ascii 13) and \ n to the Newline character (ascii 10). Therefore, it would not be wise for developers who design endpoints to format and shape JSON files. Remember that every move you make will create decode problems on the opposite side.

Well, if the company does not fix the Unicode character problem in the JSON endpoint knowingly or due to the company structure, can the reader who decoded it solve the problem on their side?

Yes, you can set the Encoding type while doing this reading;

This is the final version:

After mentioning the subject that should be emphasized in this article, let’s go into some details.

How can we debug the error during processing?

The main purpose here is to find the cause of the error and tell the endpoint what needs to be corrected.

Let’s make a debug and observe the value :

When you debug the PHP code, you see that the $json data is fine, no leading or trailing spaces, no unreadable characters in its content, you think :)

Copy the value and paste it into the textbox at this URL. https://apps.timwhitlock.info/unicode/inspect