# Time, Clocks, and the Ordering of Events in a Distributed System

Let’s assume you have a distributed system consisting of `n`

nodes. These nodes have perfectly synchronized clocks. If node 1 sends a message to node 2 and attaches a timestamp to the message, node 2 can order this event with respect to local events. In other words, node 2 knows whether node 1 sent the message either **before** or **after** node 2 received, say, a message from node 3. Since these messages may be arbitrarily delayed, they may be received in a different order than they were sent. With synchronized timestamps, node 2 can resolve this.

In fact, with synchronized timestamps and a synchronous network (every message is delivered within a known, bounded amount of time and differences in processor speeds are known and bounded), one can achieve consensus.

Unfortunately, identical clocks will drift depending on things like, say, temperature. Protocols exist for the synchronization of clocks across networks, and they are reasonably accurate.

But as today’s paper says:

In a distributed system, it is sometimes impossible to say that one of two events occurred first. The relation “happened before” is therefore only a partial ordering in the events of a system.

If the perfect clocks mentioned at the start can be not-so-perfect, it becomes impossible for node 1 to determine which message was sent first in reality (often called wall-clock time, since clocks live on walls). So if we can’t achieve perfection, what *can* we achieve? Distributed systems papers often ask this question.

Leslie Lamport’s paper makes two key contributions, among others:

- It is possible to construct a
*unique*total ordering of a*subset*of all events in a distributed system’s execution. There are events for which we cannot compare (with respect to one event occurring before/after another). As such, we can obtain a*partial*ordering for all of the events in the distributed system, under the “happened before” relation. - We can construct a total ordering for the
*entire*set of all events in the execution. This is useful in that it may not reflect reality, but it can at least be agreed upon as an unambiguous ordering by all nodes.

# Setup

Imagine we have some set of processors `{i, j...}`

and each processor is performing “events”. An event is either some local operation or a message action (either sending or receiving a message). It may be helpful to think of a program’s execution as a line on which events are dots, and two processes’ events may be connected via a line to signify a message transmission.

# →

- If
*a*and*b*are events in the same process, and*a*comes before*b*, then*a → b*. - If
*a*is the sending of a message by one process and*b*is the receipt of this message, then*a → b*. - If
*a → b*and*b → c*, then*a → c*. - If
*a ↛ b*and*b ↛ a*, then*a*and*b*are said to be*concurrent*.

Note here that concurrent does not mean *they actually occurred at the same time*. It means *we can’t say anything definitive about the order in which these two events occurred, and as such we must say they are concurrent*.

# “Clocks”

Now imagine that each processor maintains a counter, starting at 0. Before an event (some instruction, sending a message, receiving a message), the processor increments the counter by 1, then assigns the counter value to the event.

So a given processors log may look something like:

```
[(receive: 1), (send: 2), (instruction: 3)...]
```

As per (1), we know that all of these instructions are ordered under the → relation. Let us say for notation’s sake that `C((receive: 1)) = 1`

.

Our clocks should be correct. But what is correct? Let us say:

Clock condition:For any eventsaandb, ifa → bthenC(a) < C(b).

It’s easy to see that within a single process, the clock condition holds, since we increment our counter by 1 before every event.

But what about between processors (2)? Imagine two processors:

```
1: [(instr: 1), (recv from p2: 2)]
2: [(instr: 1), (instr: 2), (instr: 3), (send to p1: 4)]
```

Each process increments its own counter, but we’ve now violated our clock condition. According to (2), processor 2’s send → processor 1’s receive (since it’s the same message), but `C(send) > C(receive)`

.

The fix is simple: processor 2 includes its counter value in the send. Processor 1 increments its counter value to 1 + the received counter value. The correct timelines look like:

```
1: [(instr: 1), (recv from p2: 5)]
2: [(instr: 1), (instr: 2), (instr: 3), (send to p1: 4)]
```

# Delays & Total Ordering

Note that in the last example, although instructions p2:2, and p2:3 have higher counter values than p1:1, it’s impossible to say which occurred first (in wall-clock time). Perhaps processor 2 is *much* slower.

But the second contribution I mentioned details that we don’t need a necessarily accurate ordering, we just need one we can agree on. So for the sake of agreement, we can say that p2:2 and p2:3 happened before p1:1 simply because it’s easy.

But how do we compare p1:1 and p2:1? Lamport’s solution is simple: just choose an arbitrary process pecking order. Just let the system agree that some processor’s events happened “before” another.

Note that only the partial ordering reflects reality.

Also note that Lamport does not discuss component failure in this particular paper.

He has presented a way of reasoning about the order of events (as best as you can) in a distributed system.