D-separation is a critical idea in Bayesian Networks and causal inference. The problem it intends to tackle is: given a causal graph G, is a set X of variables independent of another set Y, given a third set Z?
At the first sight, it may look intimidating. However, with a few examples, you shall able to understand it better. For this article, I mainly focus on the application aspect and will talk about:
- three rules to check d-separation in the corresponding scenarios;
- one step-by-step algorithm to check d-separation in general;
- how to use R package bnlearn to check d-separation.
Three Scenarios and Three Rules
If two nodes are not d-connected, then they are d-separated. It is easier to understand d-connectedness than d-separation, so the following rules will be based on d-connectedness. There is one concept you need to familiarize yourself first.
- Collider: if a node has two arrows pointing to it, then it is a collider. For example, in x → y ← z, y is a collider.
Scenario 1:
Z is empty, namely, we don’t condition on any variables.
Rule 1: if there exists a path from x to y and there is no collider on the path, then x and y are d-connected.
Example 1: Consider the following graph.
- t is collider since there are two arrows pointing to it (s → t ← u).
- x and t are d-connected, since there is no collider along the path.
- x and u are not d-connected and hence d-separated, since t is along the path. So are (x, v) and (x, y).
- u and y are d-connected, since there is no collider along the path u-v-y. Note that v has two arrows emerging from it, not pointing to it, so it is not a collider.
Scenario 2:
Z is non-empty, and the colliders don’t belong to Z or have no children in Z.
Rule 2: if there exists a path from x to y and none of the nodes on the path belongs to Z, then x and y are d-connected.
Example 2: Consider the following graph, where Z = {r, v}.
- t is a collider, but t doesn’t belong to Z and t has no children in Z, so Rule 2 fits this case.
- Only pair s — t, t — u are d-connected. All the other pairs are d-separated as either r or v falls along the path in between.
Scenario 3:
Z is non-empty, and there are colliders either inside Z or have children in Z.
Rule 3: for colliders that fall inside Z or have children in Z, they are no longer seen as colliders.
Example 3: Consider the following graph, where Z = {r, p}.
- There is one collider t in the graph. It child p falls inside Z, so Rule 3 fits this case. Therefore, t is no longer seen as a collider and hence s and u are d-connected, so are (s, v), (s, q), (s, y).
- Note that x are not d-connected to any other node (by Rule 2), since r is along all possible paths linking to x.
A step-by-step algorithm
In the above, we check d-separation by examining the paths in the causal graph and by following specific scenarios. Now, let’s take a look at an algorithm which can unveil the connectedness in a step-by-step manner. This algorithm is very easy to follow and let’s begin!
- Ancestral Graph: select the subgraph which contains all the nodes in X, Y, and Z, as well as their ancestors (i.e., parents, parents’ parents, …), and all the associated nodes.
- Moralization: moralize the subgraph, namely, if x and y are both parents of z, then draw an edge connecting x and y. You can think it as marriage between the parents.
- Disorientation: remove the direction of the edges, namely, turn a directed graph into a indirected graph.
- Givens Removal: remove all the nodes in Z and the associated edges.
- Connectedness: if you can find a path between x and y, for example, x — y, or x — u — y), then these two nodes are d-connected, otherwise, they are d-separated.
I highly recommend you walk through an example (shown below) as illustrated in the MIT 6.034 handout. Detailed step-by-step graphical explanation is provided in the handout. Here, I display the causal graph and some key takeaways from that example.
- Are A and B conditionally independent, given a set of variables Z? <=> P(A|B, Z) =? P(A|Z)
- Are A and B marginally independent? <=> P(A|B, {}) =? P(A|{}). (Here, the Z set is an empty set. So in the 4th step, we don’t need to remove any nodes.)
- P(D|CEG) =? P(D|C) <=> P(D|EC) =?P(D|C) && P(D|GC) = P(D|C), namely, “are D and E conditionally independent, given C? AND are D and G conditionally independent, given C?” (This one can be a bit tricky, but well worth taking some time to fully understand it.)
Check d-separation in R package {bnlearn}
In bnlearn, once you get the fitted Bayesian Network model, an object of class bn.fit, you can easily check d-separation with the function dsep()
. For example, if we would like to check whether or not A and B are d-separated given C, then we can run the following code, where bn is the fitted Bayesian Network model. The result will be TRUE or FALSE.
> dsep(bn, x = "A", y = "B", z = "C")
Hope this article gives you some idea about d-separation and make it less intimidating. Happy learning!
References
- The algorithm: d-separation: How to determine which variables are independent in a Bayes net
- Three Rules: d-SEPARATION WITHOUT TEARS
- Scutari, Marco, and Jean-Baptiste Denis. Bayesian networks: with examples in R. CRC press, 2014.