System Design - Design Code Build and Deployment System

Functional requirements

User send source code and deployment request to server;
Server builds the source code to binary file;
Server deploys the binary file to other desitinations gloably.

Non-Functional requirements

Reliable and moderate availability

Number estimate

Each deployment should take upto 30 minutes. Each source code and binary could reach size of up to 1 GB.

Working process

Since building and deployment takes long time for each of the request, the request has to handled asychronously. Each of the request shouldn’t block other comming request. We could seperate building and deploying services, as well as request handling services. When request handler receives request, it put the request with its data into a queue. The builder polles request from the queue and building the code to binary files. After the builder completes, it also send the binary files to another queue. The deploy services poll the binary files and replicates it to other destination.

Given the large size of the binary files and source code, we need a server to save these files. We need two queues to save source code information and the binary file information.

Database design

Queue is a data structure in memeory, but we also need to persist the souce code and binary meta data, so that when system crash or reboot, we can restart our work.

Schema:

Build_table

source_id

status

location

project_title

commit_id

sha

timestamp

Deploy_table

binary_id

status

location

source_id

timestamp

When a new request is comming in, request handler sends the request to in-memory build queue and update Build_table. Once the builder completes building and generates binary file, it sends the deploy request to deploy queue, and updates both Build_table and Deploy_table as a transcation. When the deployer completes its task, it updates Deploy_table.

When the sytem reboot, both queues could be rebuilt through querying the two tables.

In-memory K/V store could be used to store the two queues.

SQL or NoSQL

Since the bottleneck is not on database access, and scalability is not that critical here, and join is require for database query, it is better to use SQL database.

High-level system design

Availability

Each serve could be replicated using master-slave structure. Read could goes to various slave computer, while write goes to master only. When read longURL from each of the slave computer, it should check expiration time of the URL, when it expires, return error code to user. Eventual consistency will achieved.

Partition/Sharding

Partition storage by range based. Could result in unbalanced server load.

Consistent Hash based partition: take hash of each longURL and distribute it to different partitions. Using consistent hash.

Load balance

Load balance could be placed between client to application server, application server to database.

Cache

Cache stores key/value pairs of longURL and shortURL. Application goes to cache for shortURL, if not there, application goes to database to get the shortURL, backend calculate the shortURL and update the database. Database then send back to client and also update the cache. LRU eviction rule should be used because the URL most likely used in a short period of time.

DB cleanup

When a URL get expired, it should be removed from the database. If actively searching for expired URL, it will put a lot of pressure to the database. We can leave the URL there until a client acquire that URL and then delete it from there. We can also sweep the DB during night time when the load is less, and remove any expired item.

Written on May 20, 2021