Developer · 9 min read

A Practical Guide to JSON: Formatting, Validating and Converting

JSON is everywhere, but it is easy to get wrong. A field guide to the syntax that trips people up, the difference between formatting and validation, and how to safely convert between JSON and CSV.

By The Utylo team · Published April 22, 2026

JSON has become the default way computers exchange structured data over the internet. It is in every web API, every config file, every infrastructure-as-code pipeline, and increasingly in places it really ought not to be. The format itself is small enough to fit on the back of a napkin — there are only six types and a couple of pieces of punctuation — but it is precise about what it accepts, and most of the bugs people run into come from the gap between "JSON-ish data that looks fine" and "data that actually parses". This guide covers the essentials of reading and writing JSON cleanly, the syntax rules that catch people out, and how to safely move data between JSON and other common formats like CSV.

The full grammar, in two paragraphs

A JSON document is one of six things: a string in double quotes ("hello"), a number (42, -3.14, 1e10), one of the literals true, false, or null, an array of values inside square brackets ([1, 2, 3]), or an object of key-value pairs inside curly braces ({"name": "Ada"}). Keys in an object must be strings in double quotes. Values inside arrays and objects can be any of the six types, including more arrays and objects, and that is how you build arbitrarily deep structures.

That is essentially the whole specification. The complete grammar at json.org fits on one page. Most of the things that look like they ought to be legal JSON are not, and that is by design — JSON's value comes precisely from being strict.

The six rules that account for 90% of parse errors

  1. Strings must use double quotes. Single quotes are not legal JSON. {'name': 'Ada'} looks fine to a human, parses fine in JavaScript, and is rejected by every JSON parser.
  2. Object keys must be quoted. Even when the key is a single word with no special characters. {name: "Ada"} is JavaScript, not JSON.
  3. No trailing commas. The last item in an array or object must not be followed by a comma. JavaScript and Python both tolerate trailing commas; JSON does not. This is the single most common cause of parse errors when hand-editing config files.
  4. No comments. JSON has no syntax for comments. Some tools support extensions like JSON5 or JSONC that allow them, but if you are passing data to a generic JSON parser, comments will fail.
  5. Numbers do not have leading zeros, hex, or special values. 007 is not a valid JSON number; neither is 0xFF, NaN, or Infinity. If you need any of these, encode them as strings and convert at the edge.
  6. Strings need their special characters escaped. Inside a string, the backslash, double quote, and a handful of control characters must be escaped (\\, \", \n, etc.). Forgetting this is how you end up with broken strings that span the rest of the document.

When a parser fails, it usually points at the line and character where it gave up — but the actual mistake is typically a few characters earlier. Paste the document into the JSON Formatter and the offending location is usually obvious within a second.

Formatting versus validating

These two operations look similar but answer different questions, and it is worth being clear about which one you actually want.

Formatting takes a valid JSON document and re-emits it with consistent indentation, key ordering, and line breaks. Pretty-printing makes it easier to read; minifying strips whitespace to make it as small as possible for transport. Formatting only succeeds if the input parses, so a successful format implicitly proves the document is valid JSON. But formatting tells you nothing about whether the document means what you intended.

Validation, in the strict sense, checks that a document conforms to a schema — the right keys are present, the values have the right types, the numbers are within range, the dates look like dates. JSON Schema is the standard for describing those rules, and a JSON Schema validator will catch problems like "this user record is missing an email" that a formatter never could.

For day-to-day debugging, formatting is usually what you want: paste the suspicious document in, see whether it parses, and if it does, eye the structure. For systems where the same shape of data needs to be produced reliably by multiple sources, write a JSON Schema and run a real validator.

Pretty-printing for humans, minifying for machines

Indented, multi-line JSON is much easier for humans to read but is up to 50% larger than the same data minified onto a single line. The right format depends on the audience.

  • Files committed to a repository (config, fixtures, locale data) should be pretty-printed. Diffs are unreadable otherwise, and the size cost is irrelevant when files are gzipped for transport.
  • Data sent over the network (API responses, web socket messages) should be minified. Every byte saved is a byte less to transmit, and computers do not care about indentation.
  • Data displayed in a UI should be pretty-printed at render time, not stored that way. Store the smallest representation, format on demand.

The JSON Formatter does both with a single click, so there is no reason to keep the indented and minified versions in sync by hand.

Going to and from CSV

CSV is the lingua franca of spreadsheets and ETL pipelines, and a lot of work involves shuttling data between CSV and JSON. The conversion is mostly mechanical, but it has a couple of sharp edges.

The natural mapping from CSV to JSON is an array of objects, where each row becomes one object and the header row supplies the keys. A CSV like

name,age,city
Ada,36,London
Linus,55,Helsinki

becomes

[
  { "name": "Ada", "age": "36", "city": "London" },
  { "name": "Linus", "age": "55", "city": "Helsinki" }
]

The catches: CSV has no concept of types, so every value comes through as a string unless the converter is told otherwise. CSV cells can contain commas if they are wrapped in double quotes ("Smith, John"), and embedded quotes are doubled ("She said ""hi"""). Different tools handle the header row differently, and some CSVs do not have one at all. The CSV to JSON converter handles these cases out of the box, but if you are writing your own pipeline it is worth using a real CSV parser rather than splitting on commas.

The reverse direction has its own surprise: nested JSON has no good flat representation. If your records contain arrays or sub-objects, you have to decide whether to drop them, flatten them with dotted keys, or serialise them back as JSON strings inside their cells. None of those choices is wrong, but you should pick one consciously.

Encoding, escaping, and the Unicode question

JSON is defined as Unicode and recommended as UTF-8, and there is almost never a reason to use anything else. Where things get awkward is when JSON has to be embedded inside another encoding — pasted into a URL, stored in HTML, or wrapped in a JWT. In those cases, the JSON gets URL-encoded, HTML-entity-encoded, or Base64-encoded before transport, and decoded on the other side. Confusing the layers is the cause of most "why does my JSON have &in it?" questions. The URL Encoder and Base64 Encoder are useful for unwrapping these layers when debugging.

A few good habits

  • Always validate at the boundary. When you receive JSON from anywhere you do not control, parse it inside a try/catch and reject anything that does not match the shape you expect.
  • Use a real parser, never a regex. Hand-rolled JSON parsing is the source of an embarrassing number of security vulnerabilities.
  • Prefer null to omitting a key when a field is genuinely absent. Optional keys are convenient but make schema design harder.
  • Keep keys short and consistent: created_at in one object and createdAt in another is a recipe for bugs downstream.
  • When in doubt, format and inspect. The JSON Formatter takes two seconds and catches problems that would take ten minutes to find by eye.

JSON is not a complicated format, but its strictness is what makes it reliable. The handful of rules above account for almost every practical problem people run into, and a quick pass through a formatter clears up most of the rest.

Tools mentioned in this guide

More guides