Backend Technical Interview Questions - Distributed Systems

Description: Questions and answers to ask in an interview on distributed systems
Authored: 2022-08-04;
categories : interviewing;
tags : distributed-systems;

Table of Contents


[1] Fallacies of distributed computing

From Fallacies_of_distributed_computing

  1. The network is reliable;
  2. Latency is zero;
  3. Bandwidth is infinite;
  4. The network is secure;
  5. Topology doesn't change;
  6. There is one administrator;
  7. Transport cost is zero;
  8. The network is homogeneous.

[1] What is the difference between Asynchronous and Parallel programming?


When you run something asynchronously it means that is non-blocking, meaning the program will continue with other tasks even if there is a delay in executing this particular piece. Whereas in Parallel programming you can run multiple things at once–in parallel!–which works well when they are broken down into independent pieces of work to complete them faster without having too many strands connected together which would slow us down really badly later on for no good reason because all these threads were doing was slowing each other’s progress! You should use async/callback functions whenever possible rather than blocking code like Event handlers etc. since event handling happens outside your application so anyway.

[1] How a distributed system is different from distributed computing?


Distributed System manages the distributed resources across networked computers, while Distributed Computing deals with writing software applications that can run in a distributed environment. The difference between distributed systems and cloud computing services are described above. It should be noted that there may be overlap between these two as distributed computing can provide distributed services to run on distributed systems.

What are the delivery guarantees with: best effort, at least once, at most once?


Following are some common “message delivery approaches” used in distributed systems:

At-Most-Once: With at-most-once message delivery, when sending a message from the sender to receiver, there’s no guarantee that they’ll receive it. Not all messages will be delivered and if you try to deal with this problem yourself by trying either at least once delivery or an alternative system like batching.

At-Least-Once: The at-least-once approach for sending messages means that either the sender or recipient of a message is required to actively participate and ensure every instance they send it, there’s no way of knowing if someone will receive it. To ensure that each message is delivered either the sender must detect the failure and resend it, or the receiver continuously requests messages which have not been received. The message receiver can be either a sender that pushes messages until they get a response or someone who just won’t give up and keeps pulling them in.

Exactly-Once: With the at-least-once messaging approach, we can only hope that our processes lead to the delivery of some messages more than once. Ideally, we want to get exactly-once delivery of messages. But sometimes life just isn’t fair and you can’t always get what your heart desires!

[1] What is Sharding?


Sharding is a process of splitting the large logical dataset into multiple databases. It also refers to horizontal partitioning of data as it will be stored on multiple machines. By doing so, a sharded database becomes capable of handling more requests than a single large machine. Consider an example - in the following image, assume that we have around 1TB of data present in the database, when we perform sharding, we divide the large 1TB data into smaller chunks of 256GB into partitions called shards

[2] How is sharding different from partitioning?


Database Sharding - Sharding is a technique for dividing a single dataset among many databases, allowing it to be stored across multiple workstations. Larger datasets can be divided into smaller parts and stored in numerous data nodes, boosting the system’s total storage capacity. A sharded database, similarly, can accommodate more requests than a single system by dividing the data over numerous machines. Sharding, also known as horizontal scaling or scale-out, is a type of scaling in which more nodes are added to distribute the load. Horizontal scaling provides near-limitless scalability for handling large amounts of data and high-volume tasks.

Database Partitioning - Partitioning is the process of separating stored database objects (tables, indexes, and views) into distinct portions. Large database items are partitioned to improve controllability, performance, and availability. Partitioning can enhance performance when accessing partitioned tables in specific instances. Partitioning can act as a leading column in indexes, reducing index size and increasing the likelihood of finding the most desired indexes in memory. When a large portion of one area is used in the resultset, scanning that region is much faster than accessing data scattered throughout the entire table by index. Adding and deleting sections allows for large-scale data uploading and deletion, which improves performance. Data that are rarely used can be uploaded to more affordable data storage devices.


[1] CAP theorem

From CAP_theorem

[2] PACELC theorem

From PACELC_theorem

It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).