Algorithm & Data Structure - Shortest Path

Shortest Path is a problem that people encounter almost every day. The basic problem statement is: Give a set of vertices and there distance or any measure (a graph), find the shortest path from one source to a target. There might be some variants, for example, to find the minimum cost along a path, but the basic principle and generate approach is the same.

Read More

System Design - Map Reduce System

MapReduce is a programming model for processing big data sets by parallel, distributed algorithm on a cluster. There are a few key words in the above description:

  1. Big data sets: that means it is not possible processed by a single machine/host;
  2. Parallel/distributed: that means a cluster of machine/hosts would process a set of data concurrently;
  3. Programming model: this is a model for process different types of data.
Read More

Algorithm & Data Structure - Extreme Value Problem - Type II

This type of problems require to find the maximum/minimum pair from a list or array of values with some conditions. It certainly could be solved by brutal force, but that is not an optimized solution. Swipe from both end of the list or array could help to speed up the process. The basic idea is to reuse the previous information to obtain the current minimum/maximum results. Similar to dynamic programing approach.

  1. At current location \(i\), previous extreme value upto \(i-1\) is already known;
  2. By adding \(ith\) element, how to update the extreme value?
Read More

Algorithm & Data Structure - Extreme Value Problem - Type I

This type of problem is different from the simple extreme value problems in two folds:

  1. It need to be an extreme value;
  2. It need to to satisfy a few other conditions. Simply sorting the array could give you extreme value overall but without satisfying the second conditions. One approach is to sort the array by the second condition and then by the extreme value.
Read More

Algorithm & Data Structure - Sweep Line Algorithm

Sweep line algorithm could be used to solve a number of geometric and interval problem. It normally involves two dimensional sorting. The general idea is to sort one dimension first, and sweep through that dimension to put the other dimension to a binary tree. The basic algorithm is:

  1. Define Enterning and Leaving event respectively for each of the interval or geometry shape.
  2. Sort all event by one coordinate (x or y).
  3. Perform sweep, when an Entering event, put the shape (interval) to a binary tree to sort it.
  4. When a Leaving event, remove the shape from the binary tree.
Read More

Algorithm & Data Structure - Substring with Bit Mask

A number of substring problems could be solved by bit mask with conversion to similar as subarray sum problems. Basic subarray sum equals K could be soved by hashmap in O(n):

  1. Store number of prefix sum in hashmap.
  2. Get the number of current running sum - K.
Read More

Algorithm & Data Structure - Union Find

Union find is a data structure that is like the name defined. Every node in union Find has a root or parent node. It could store the root or father of a node in a array with two operations, union() and find().

  • Union: join two nodes together if they are connected;
  • Find: find the father or root of a node.
Read More

System Design - Message Queue

In a batch process world, e.g., MapReducer, the data is bounded (i.e., of a known and finite size), so the processor knows when it has finished reading its input, and generates output from there. In reality, a lot of cases, data is stream and we don’t know when the data finishes, e.g., user login or logout. If we want to process these data, we have to process them as a stream (unbounded and incremented over time). Message System is designed to process stream data. The basic model is similary to below picture.

Read More

System Design - Design TinyURL

Functional requirements

  1. Client send a long URL, return a short URL to client;
  2. Client send a short URL, return a long URL if it exist and redirect to the long URL, else return error code;
  3. Client send a long URL, and a customized short URL, return short URL to client;
  4. Client send a long URL, and set expiration time of the tiny URL;
Read More

System Design - SQL and NoSQL

SQL

Structured schema. Data is stored in rows and columns. Each record has fixed schema. Columns must designed before data entry. Alternating data structure involves modifying whole database. Uses SQL query. True ACID transaction. Vertical salable. Reasons for SQL: Ensure ACID compliance, such as Bank data; Data is structured and unchanging.

Read More

System Design - Sharding

A technique to break up a big database into many smaller parts. Horizontal scaling means adding more machines, which is cheaper and more feasible. Vertical scaling means improves servers.

Read More

System Design - Cache

Cache is short term memory which has limited space but is faster. It is based on the principle that recently requested data is likely to be requested again. Cache could be added in almost every layer, but is often found nearest to the front end, e.g., browsers.

Read More

System Design - Load Balance

Function:

1. Distribute load to difference server. Could be in L4 and L7 layer, TCP/IP, HTTP, etc.
2. Check which server is available and redirect works.  ### Algorithms:
1. Round Robin
2. Round Robin with weighted server
3. Least connections
4. Least Response Time
5. Source IP Hash
6. URL hash ### Benefit:
1. prevent request goes to unhealthy server
2. prevent overloading server
3. eliminate single point of failure
Read More

System Design - Replication

The goal of replication is to provide high available and reliable services by store the same copy of data in a few of servers. However, it is always a challenge to maintain consistency of the data.

Read More

Algorithm & Data Structure - Monotonic Stack

Monotonic stack is a stack that maintain the element in the stack with increasing or decreasing property, i.e., when poping element from the stack, the elements are in sorted order. This property itself seems useless, but it can be used to solve some of the problems in LeetCode.

Read More