The first time I came across YAML was around a year ago when I use it to write OpenAPI definitions to document a RESTful API using Swagger API Documentation and, to be honest, I really hated it.
Being a JSON “fan”, the YAML syntax felt weird and unnatural to me, so for a while, I didn’t pay any attention to it.
This changed a few months ago, when I started to get into CI/CD, since both Azure and GitLab pipelines require a YAML file to setup. So I finally decided to properly learn about YAML, and after doing some reading I found the ideas behind it fascinating.
In this article I’ll cover the basics of YAML, including its main goals, basic syntax and some of its more complex features.
Table of contents
Introduction
YAML is a data-serialization language often used for configuration files, such as Open API specifications or CI/CD pipelines.
Fun fact! 🤓
According to YAML 1.0 specification document (2001-05-26) the acronym “YAML” stands for “Yet Another Markup Language”, but it was later changed to the recursive acronym “YAML Ain’t Markup Language” in the 2002-04-07 specification.
As stated in the latest spec YAML is designed to be friendly to people working with data and achieves “unique cleanness” by minimizing the use of structural characters, allowing the data to appear in a natural and meaningful way.
The latest spec also states that YAML 1.2 is in compliance with JSON as an official subset, meaning that most JSON documents can be parsed to YAML.
YAML achieves easy inspection of data’s structures by using indentation-based scoping (similar to Python).
Another fun fact! 🤓
DEV.to articles use YAML to define custom variables like title, description, tags, etc.
Basic Syntax
YAML documents are basically a collection of key-value pairs where the value can be as simple as a string or as complex as a tree. Here are a few notes about YAML syntax:
- Indentation is used to denote structure. Tabs are not allowed and the amount of whitespace doesn’t matter as long as the child node is more indented than the parent.
- UTF-8, UTF-16 and UTF-32 encodings are allowed.
Strings
# Strings don't require quotes:
title: Introduction to YAML
# But you can still use them:
title-w-quotes: 'Introduction to YAML'
# Multiline strings start with |
execute: |
npm ci
npm build
npm test
The above code will translate to JSON as:
{
"title": "Introduction to YAML",
"title-w-quotes": "Introduction to YAML",
"execute": "npm ci\nnpm build\nnpm test\n"
}
Numbers
# Integers:
age: 29
# Float:
price: 15.99
# Scientific notation:
population: 2.89e+6
The above code will translate to JSON as:
{
"age": 29,
"price": 15.99,
"population": 2890000
}
Boolean
# Boolean values can be written in different ways:
published: false
published: False
published: FALSE
All of the above will translate to JSON as:
{
"published": false
}
Null values
# Null can be represented by simply not setting a value:
null-value:
# Or more explicitly:
null-value: null
null-value: NULL
null-value: Null
All of the above will translate to JSON as:
{
"null-value": null
}
Dates & timestamps
ISO-Formatted dates can be used, like so:
date: 2002-12-14
canonical: 2001-12-15T02:59:43.1Z
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5
Sequences
Sequences allow us to define lists in YAML:
# A list of numbers using hyphens:
numbers:
- one
- two
- three
# The inline version:
numbers: [ one, two, three ]
Both of the above sequences will parse to JSON as:
{
"numbers": [
"one",
"two",
"three"
]
}
Nested values
We can use all of the above types to create an object with nested values, like so:
# Nineteen eighty four novel data.
nineteen-eighty-four:
author: George Orwell
published-at: 1949-06-08
page-count: 328
description: |
A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.
It was published in June 1949 by Secker & Warburg as Orwell's ninth and final book.
Which will translate to JSON as:
{
"nineteen-eighty-four": {
"author": "George Orwell",
"published-at": "1949-06-08T00:00:00.000Z",
"page-count": 328,
"description": "A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.\nIt was published in June 1949 by Secker & Warburg as Orwell's ninth and final book.\n"
}
}
List of objects
Combining sequences and nested values together we can create a lists of objects.
# Let's list books:
- nineteen-eighty-four:
author: George Orwell
published-at: 1949-06-08
page-count: 328
description: |
A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.
- the-hobbit:
author: J. R. R. Tolkien
published-at: 1937-09-21
page-count: 310
description: |
The Hobbit, or There and Back Again is a children's fantasy novel by English author J. R. R. Tolkien.
Distinctive Features
The following are some more complex features that caught my attention and that also differentiate YAML from JSON.
Comments
As you’ve probably already noticed in my prior examples, YAML allows comments starting with #
.
# This is a really useful comment.
Reusability with Node Anchors
Node anchors mark a node for future reference, which allow us to reuse the node. To mark a node we use the &
character, and to reference it we use *
:
In the following example we’ll define a list of books and reuse the author data, so we only have to define it once:
# The author data:
author: &gOrwell
name: George
last-name: Orwell
# Some books:
books:
- 1984:
author: *gOrwell
- animal-farm:
author: *gOrwell
The above code will look like this once parsed to JSON:
{
"author": {
"name": "George",
"last-name": "Orwell"
},
"books": [
{
"1984": {
"author": {
"name": "George",
"last-name": "Orwell"
}
}
},
{
"animal-farm": {
"author": {
"name": "George",
"last-name": "Orwell"
}
}
}
]
}
Explicit data types with tags
As we’ve seen in previous examples, YAML autodetects the type of our values, but it’s possible to specify which type we want.
We specify the type by including it before the value preceded by !!
.
Here are some examples:
# The following value should be an int, no matter what:
should-be-int: !!int 3.2
# Parse any value to string:
should-be-string: !!str 30.25
# I need the next value to be boolean:
should-be-boolean: !!bool yes
This will translate to JSON as:
{
"should-be-int": 3,
"should-be-string": "30.25",
"should-be-boolean": true
}
Conclusion
Reading and writing about YAML, and experimenting with it was super interesting.
What I like: I specially loved to read about the goals of YAML in relation to code cleanness and readability, and how it achieves that. I also feel better about properly learning the syntax at last 😅.
What I don’t like: I don’t like that I need a parser (which means installing a new dependency) to use YAML with the main technologies I work with (node.js and .NET Core).
However, I will now consider YAML, specially if I need something that JSON can’t cover like reusability, explicit types or comments. I’m sure that working with pipelines will be easier now too.
Also, I’d strongly recommend reading YAML 1.2 Specification document (3rd review) - Introduction to learn more about YAML goals, origins and relationship with other languages.
What are you using YAML for? 💬
Are you using YAML? For what? What are your thoughts about it?