System Design - Message Queue
In a batch process world, e.g., MapReducer, the data is bounded (i.e., of a known and finite size), so the processor knows when it has finished reading its input, and generates output from there. In reality, a lot of cases, data is stream and we don’t know when the data finishes, e.g., user login or logout. If we want to process these data, we have to process them as a stream (unbounded and incremented over time). Message System is designed to process stream data. The basic model is similary to below picture.
A message/data is produced by a producer, it is then send to broker. Message/data comsumer fetches the data/message from the broker and process it. The message is also called event.
Publisher
Message producer that produces the message that contains event or data and publishes it to the broker.
Subscrier
Message comsumer that processes the message.
Broker
A node that accepts message from publisher and stores/mananges the messages to be fetched by subscriber.
Topic
Related event/message could be grouped together as a topic. For example, all login event could be a topic.
In this publisher/subscriber model, there are a few potential problems that might arise:
What if procuders send message faster than the comsumers could process them?
Three options available:
- Broker could drop messages;
- Broker buffer messages in a queue;
- Broker blocks the producer from sending more message;
What happens if nodes crash or go offline?
If the system could afford lose message, e.g., sensor data, temporary node crash or message lose may not a big issue. If not, broker/message replication and persistent is an approach to mitigate this risk.
Messaging Pattern
Load Balancing
Each message is only delivered to one of the comsumers.
Fan out
Each message is delivered to all of the consumers.
What if consumer crashes.
Consumer could crash at any time. One case might be the message is delivered to consumer, but the consumer crashes before it process the message. If the broker deleted the message after it delivers to the consumer, then this message is never processed. In order to ensure the message is not lost, the broker need received acknoledgement from consumer before it delete the message. If the broker doesn’t receive acknowledgement, it could deliver the message to other consumers. This could cause interesting message process order if combined with loading balancing.