Design Documentation for Software Failure Tolerant Highly Available Distributed Share Market System

This document describes the comp6231 final project design of the DSMS with support for tolerating a single software failure non-malicious Byzantine or achieving high availability tolerance of a single crash failure use active replication. This provide an overview of the system architecture, the roles and interactions of each module, and an outline of what needs to be implemented or modified.

1. Overview

In this project, we extend the existing DSMS from assignment 3 to incorporate active replication. The design allows for:

Either software failure tolerance handling one non-malicious, potentially buggy replica that may produce incorrect results,or high availability that handling one replica crash failure,
Front End: Receives client requests, sends them to the sequencer, collects responses from replicas, and returns a single correct result to the client.
Sequencer: Receives requests from the FE, assigns unique sequence numbers, and reliably multicasts total order the requests to all server replicas.
Replica Managers: Each replica’s RM detects failures software or crash and replaces the failed replica with a new one.
Replicas: Three server replicas each with the DSMS logic.

2. System Architecture

Below is a diagram illustrating the system components and data flows:

flowchart LR A(Client) --> B(FE) B(FE) --> C(Sequencer) C(Sequencer) --> D[Replica A] C(Sequencer) --> E[Replica B] C(Sequencer) --> F[Replica C] D --> B E --> B F --> B subgraph ManagerGroup[Replica Managers] Dm[RM for Replica A] Em[RM for Replica B] Fm[RM for Replica C] end B -- suspect software bug or crash --> ManagerGroup

Front End (FE): For each incoming request:
- Sends request to the Sequencer.
- Waits for at least two matching results from the replicas.
- If a mismatch or a timeout occurs, the FE notifies the RMs.
- Returns the correct or majority-agreed result to the client.
Sequencer:
- Assigns a unique sequence number to each request.
- Reliably multicasts UDP or another protocol with reliability measures the request + sequence number to all replicas.
Replicas (A, B, C):
- Each runs the DSMS logic in total order of sequence numbers.
- Returns a response to the FE upon processing each request.
Replica Managers (RM):
- Each RM is associated with one replica.
- When the FE flags a problem either an incorrect result or no response, the RMs coordinate to confirm the failure.
- After confirmation, the RMs either restart or replace the failing/crashed replica with a fresh one.

3. Detailed Component Interaction

Below is a simplified sequence diagram showing how a typical request is processed, how failures are detected, and how RMs handle those failures.

sequenceDiagram autonumber participant C as Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C participant RM as Replica Managers C->>FE: Request FE->>S: Send Request S->>R1: (seqNo, request) S->>R2: (seqNo, request) S->>R3: (seqNo, request) R1-->>FE: Response1 R2-->>FE: Response2 R3-->>FE: Response3 FE->>FE: Compare responses alt All match or majority found FE->>C: Return correct result else Timeout or mismatch FE->>RM: Notify suspicion (Crash or incorrect) RM->>R1: Check (if R1 is suspect) RM->>R2: Check (if R2 is suspect) RM->>R3: Check (if R3 is suspect) alt Confirmed failure RM->>R1/R2/R3: Replace or restart failed replica end end

4. Detailed Implementation Plan

Below shows how I desgined the DSMS with Front End, Sequencer, Replicas, and Replica Managers using only three physical machines on a local network. Also demonstrate how to test crash failures.

A. System Overview

We need:

3 Replicas + a Replica Manager attached to each replica,
1 Front End process, and
1 Sequencer process.

Because I have only 3 physical machines, one of them must host the FE + Sequencer alongside one Replica and its RM. The other two machines each host a Replica and its RM. That gives us 3 replicas total and a single FE Sequencer instance. Below is how the Distribution works:

Machine 1:
- Front End
- Sequencer
- Replica A + RM(A)
Machine 2:
- Replica B + RM(B)
Machine 3:
- Replica C + RM(C)

Deployment Diagram

flowchart LR M1(Machine 1) --- M2(Machine 2) M1 --- M3(Machine 3) subgraph M1 [Machine 1] FE(Front End):::fe SQ(Sequencer):::seq RA(Replica A):::replica RMA(RM A):::rm end subgraph M2 [Machine 2] RB(Replica B):::replica RMB(RM B):::rm end subgraph M3 [Machine 3] RC(Replica C):::replica RMC(RM C):::rm end classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Each block Replica + RM can be run as separate processes or combined logic, but they should communicate on the same machine localhost ports.

Machine 1 hosts:

The Front End that clients contact.
The Sequencer that assigns sequence numbers and multicasts requests.
One instance each of NYK server, LON server, and TOK server, plus their respective replica managers RM_NYK, RM_LON, RM_TOK.

Machine 1 Deployment

flowchart TB subgraph Machine1 [Machine 1] direction TB FE(Front End):::fe SQ(Sequencer):::seq subgraph DSMS_Processes direction TB TOK1(TOK Instance - M1):::replica RMTOK1(RM TOK - M1):::rm LON1(LON Instance - M1):::replica RMLON1(RM LON - M1):::rm NYK1(NYK Instance - M1):::replica RMNYK1(RM NYK - M1):::rm end end %% Arrows for internal communication FE -- requests --> SQ SQ -- seq. msgs --> TOK1 SQ -- seq. msgs --> LON1 SQ -- seq. msgs --> NYK1 %% Replicas -> FE TOK1 -- results --> FE LON1 -- results --> FE NYK1 -- results --> FE %% RM internal signals FE -- crash/fault info --> RMTOK1 FE -- crash/fault info --> RMLON1 FE -- crash/fault info --> RMNYK1 %% Relationship between RM and its instance RMTOK1 -- manages --> TOK1 RMLON1 -- manages --> LON1 RMNYK1 -- manages --> NYK1 classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Key Points:

All three city server processes run locally on Machine 1, each with its own Replica Manager.
The Front End and Sequencer are also on Machine 1. Clients only connect to the FE, which delegates requests to the Sequencer for total ordering.

Machines 2 and 3 do not host the FE or Sequencer. They only run:

The TOK server instance + RM_TOK
The LON server instance + RM_LON
The NYK server instance + RM_NYK

That way, each city code has a replica on Machine 1, 2, and 3. The Sequencer on Machine 1 sends requests to all nine processes which 3 per machine.

Machine 2 or 3 Deployment

flowchart TB subgraph MachineX [Machine 2 or Machine 3] direction TB subgraph DSMS_Processes direction TB TOKx(TOK Instance - M2/M3):::replica RMTOKx(RM TOK - M2/M3):::rm LONx(LON Instance - M2/M3):::replica RMLONx(RM LON - M2/M3):::rm NYKx(NYK Instance - M2/M3):::replica RMNYKx(RM NYK - M2/M3):::rm end end %% Arrows from Sequencer (on Machine 1) SQ1(Sequencer - M1):::seq SQ1 -- seq. msgs --> TOKx SQ1 -- seq. msgs --> LONx SQ1 -- seq. msgs --> NYKx %% Replicas -> FE on M1 FE1(Front End - M1):::fe TOKx -- results --> FE1 LONx -- results --> FE1 NYKx -- results --> FE1 %% Crash/fault notifications FE1 -- crash/fault info --> RMTOKx FE1 -- crash/fault info --> RMLONx FE1 -- crash/fault info --> RMNYKx %% RM manages the local instance RMTOKx -- manages --> TOKx RMLONx -- manages --> LONx RMNYKx -- manages --> NYKx classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Key Points:

Each machine has three DSMS processes TOK, LON, NYK plus three RMs—one per process.
The Sequencer and Front End both reside on Machine 1, but they must send/receive data from all replicas on M1, M2, M3.

Summary

In this deployment:

We have 9 total server processes .3 for each city code: TOK1,2,3; LON1,2,3; NYK1,2,3, across 3 physical machines.
Each server has a local Replica Manager that monitors it.
Machine 1 additionally runs the Front End and Sequencer, the only singletons in this system.
Clients talk only to the FE. The FE delegates requests to the Sequencer, which ensures total ordering and multicasts to all 9 server processes.

B. Running the System

Once the system is up, point any Admin or Buyer Client to connect to the Front End on Machine 1. The FE will pass requests to the Sequencer, which multicasts them to all three Replicas in total order.

The same pseudo-code from the assignment 3. The only difference is how we assign host IP addresses and ports to each component. For example:

Machine1_IP = "192.168.0.10"
Machine2_IP = "192.168.0.11"
Machine3_IP = "192.168.0.12"

// On Machine 1
startReplica("A", "192.168.0.10", port=9000)
startRM("A", "192.168.0.10", port=9100)
startSequencer("192.168.0.10", port=7000)
startFrontEnd("192.168.0.10", port=7100, sequencerIP="192.168.0.10", seqPort=7000)

// On Machine 2
startReplica("B", "192.168.0.11", port=9000)
startRM("B", "192.168.0.11", port=9100)

// On Machine 3
startReplica("C", "192.168.0.12", port=9000)
startRM("C", "192.168.0.12", port=9100)

Each replica knows the IP address and ports of the other replicas and the Sequencer. The Front End is the only one the client ever contacts 192.168.0.10:7100 in this example.

Runtime Communication Flow

flowchart TB Client --> FE[Front End @ Machine 1] FE --> SQ[Sequencer @ Machine 1] SQ --> RA[Replica A @ Machine 1] SQ --> RB[Replica B @ Machine 2] SQ --> RC[Replica C @ Machine 3] RA --> FE RB --> FE RC --> FE FE -.->|Notify if suspect crash/fault| RM[RM A,B,C]

pseudo-code for how a replica handle ordered requests:

ReplicaServer {
  sortedRequests = new PriorityQueue(... compare by seqNo ...);
  currentSeqExpected = 1;
  onReceiveRequest(seqNo, clientRequest):
      insert (seqNo, clientRequest) into sortedRequests

      while sortedRequests.peek() has seqNo == currentSeqExpected:
          nextReq = sortedRequests.poll()
          result = DSMS_logic(nextReq.clientRequest)
          sendResponseToFE(nextReq.seqNo, result)

          currentSeqExpected++
}

// DSMS_logic(request):
//   parse operation (addShare, removeShare, ....)
//   run the existing DSMS server code
//   return a string result

pseudo-code for the FE logic:

FrontEnd {
  handleClientRequest(clientRequest):
      seqNum = sendToSequencer(clientRequest)

      responses = []
      startTime = now()

      while not enoughResponses(responses):
          if responseArrivesFromReplica(rID, result):
              responses.add( (rID, result) )
              if checkMajorityOrTwoMatches(responses):
                  finalRes = getMajorityOrMatch(responses)
                  return finalRes
          if now() - startTime > TIMEOUT:
              identifyWhichReplicaDidNotRespond(responses)
              notifyAllRMs( crashedReplicaID )

      finalRes = getMajorityOrMatch(responses)
      return finalRes
}

C: RM

Each Replica Manager is bound to exactly one replica. The RM’s duties:

Detect repeated software faults
Detect crash failures

RM Fault Recovery Diagram

flowchart LR FE(Front End) --> RM([Replica Managers]) RM --> R([Replica]) RM -. coordinate .-> RM2([Other RMs]) RM -. check status .-> R R --> RM RM --> Rnew[Launch Replacement Replica?]

pseudo-code for the RM:

ReplicaManager(replicaID) {
  consecutiveFaults = 0

  onFaultSuspected(faultType):
      if faultType == "IncorrectResult":
          consecutiveFaults++
          if consecutiveFaults >= 3:
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0
      else if faultType == "CrashSuspected":
          if confirmCrashWithPeers(replicaID):
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0

  stopReplica(replicaID):
      // forcibly kill the process or call a cleanup method

  startNewReplica(replicaID):
      // spawn a fresh DSMS replica process
      // rejoin the group for receiving requests
}

D. Sequencer

The Sequencer enforces total order on all client requests. It:

Maintains a global counter nextSeqNo.
On receiving a request from the FE, increments nextSeqNo and reliably multicasts (nextSeqNo, request) to all replicas.
Uses a simple reliability mechanism on UDP or a library for group communication to ensure no requests are lost.
In this project we must assume that Sequencer never fail.

Sequencer Flow Diagram

flowchart LR FE(Front End) --> SQ(Sequencer) SQ --> RP1[Replica 1] SQ --> RP2[Replica 2] SQ --> RP3[Replica 3]

pseudo-code :

Sequencer {
  nextSeqNo = 1

  onReceiveRequestFromFE(clientRequest):
      seqNo = nextSeqNo
      nextSeqNo++
      for each replica in replicaList:
          sendUDPWithAck(replica.address, (seqNo, clientRequest))

  sendUDPWithAck(address, message):
      do {
        sendUDP(address, message)
        wait for ack or timeout
      } while (no ack received && retryCount < MAX_RETRIES)
}

5. Test Crash Failures

Here is a list of comprehensive test cases to verify the Distributed Share Market System (DSMS) functionality, covering both normal operations and crash scenarios process crash. The DSMS includes:

Three replicated servers Replicas + Replica Managers
One Sequencer
One Front End (FE)

A. Normal Operation Test Cases

Sequence Diagram

sequenceDiagram participant C as Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C participant RM as RMs C->>FE: Client Request FE->>S: Send Request + get seqNo S->>R1: (seqNo, request) S->>R2: (seqNo, request) S->>R3: (seqNo, request) R1-->>FE: result1 R2-->>FE: result2 R3-->>FE: result3 FE->>FE: Compare results / check timeouts alt mismatch or no response FE->>RM: Replica i is suspect else majority or matching pair FE->>C: Send correct result end

Below is a listing of core admin and buyer operations, covering typical edge cases. These assume all replicas are running and no process crashes occur.

Test Case ID	Operation	Scenario	Expected Result
TA01	addShare	Admin adds a new share with valid inputs	Share is successfully created with correct capacity; success message returned.
TA02	addShare	Admin attempts to add an existing shareID	Operation fails with a message “Share already exists.” No new share is created.
TA03	removeShare	Admin removes an existing share	Share is removed from the server. Future list or purchase references should fail for that share.
TAd04	removeShare	Admin attempts to remove a non-existent shareID	Operation returns “Share not found” or similar message.
TA05	listShareAvailability	Admin requests availability for a valid shareType (EQUITY, BONUS, or DIVIDEND)	All known shares of that type are returned with capacities from each server city. No error.
TB01	purchaseShare	Buyer purchases a share with sufficient capacity available	Buyer's purchase is recorded, capacity is reduced accordingly. Successful response.
TB02	purchaseShare	Buyer tries to purchase more than capacity, or shareID doesn’t exist	Purchase fails or partial is allocated; an error or partial success message is returned.
TB03	getShares	Buyer requests their existing holdings	All owned shares and quantities from all servers are returned.
TB04	sellShare	Buyer tries to sell shares they own, less than or equal to the purchased quantity	Operation succeeds, capacity goes up on that share. Buyer’s holdings are decreased accordingly.
TB05	sellShare	Buyer tries to sell a share they do not own	Operation fails with error message ( “You do not own this share.”).
TB06	swapShares	Buyer attempts to swap oldShare for newShare with enough capacity in newShare	Old share is removed from the buyer's holdings, new share is allocated. Successful swap message returned.
TB07	swapShares	Buyer attempts to swap but newShare lacks capacity	Swap fails; old share is returned to the buyer’s holdings. Clear error message.

B. Crash Failure Test Cases

In the following scenarios, at least one replica is forcibly terminated during or right after a request is broadcast by the Sequencer. We assume DSMS is in crash-failure tolerance mode not the software-fault mode. The system should continue to operate with the remaining replicas.

Test Case ID	Crash Point	Operation	Expected System Behavior
C01	After Sequencer sends an “addShare” request to all replicas, Replica B is killed before responding	addShare	FE receives 2 responses Replica A and Replica C. FE times out waiting for B, suspects crash. FE returns success if A and C results match. RM(B) restarts or replaces B. System remains consistent with the new share added.
C02	During removeShare broadcast, Replica A’s process is terminated in mid-execution	removeShare	FE gets responses from B and C. A never responds => crash suspicion => RMs confirm. The share is removed on B and C. Replica A is later restarted or replaced. Once A is back, it can catch up on missed requests.
C03	Replica C is killed after sending an incorrect partial result for “purchaseShare”	purchaseShare	FE compares responses from A, B, C. If C is killed mid-flight, FE get no final ack from C. A and B match => FE proceeds with that result. FE flags C as crashed. RM(C) restarts C. The final system state is consistent per A and B.
C04	Replica B is killed before any response for “swapShares”	swapShares	FE collects responses from A and C. If they match, the swap is successful. B is restarted later by RM(B). The buyer’s holdings reflect the swapped share on A and C. B syncs after restart.
C05	Replica A is killed mid “sellShare” operation. FE times out on A’s response	sellShare	FE obtains matching results from B, C, concludes the sale. A is flagged for crash; RM(A) restarts it. Overall capacity is incremented as B and C performed the sale.

In each crash case, once the Replica Manager detects or is notified of the crash, it spins up a new instance or restarts so the system returns to having 3 replicas. The FE can respond to requests as soon as it has at least 2 matching responses from the healthy replicas.

C. Crash and Recovery

Below are two sample sequence diagrams illustrating crash scenarios and the subsequent recovery. The first focuses on a crash during an admin operation addShare, the second focuses on a buyer operation purchaseShare.

Diagram 1: Crash During addShare Operation

sequenceDiagram autonumber participant Admin as Admin Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B (Crashes) participant R3 as Replica C participant RM as Replica Managers Admin->>FE: addShare(shareID, shareType, capacity) FE->>S: "Request from Admin" S->>R1: (seqNo, addShare) S->>R2: (seqNo, addShare) S->>R3: (seqNo, addShare) R1-->>FE: "Success: added share" R3-->>FE: "Success: added share" note over R2: R2 Crashes before responding FE->>FE: Wait for R2 => times out FE->>RM: "Replica B suspected crashed" RM->>R2: Attempt ping => no response RM->>RM: coordinate "Yes, B is down." RM->>R2: "Restart or new instance" FE->>Admin: "Share added successfully"
(2 matching results from A & C) note over R2: Once restarted, R2 can sync state from others

Diagram 2: Crash During purchaseShare Operation

sequenceDiagram autonumber participant Buyer as Buyer Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C (Crashes) participant RM as Replica Managers Buyer->>FE: purchaseShare(shareID, 10) FE->>S: "Buyer request" S->>R1: (seqNo, purchaseShare) S->>R2: (seqNo, purchaseShare) S->>R3: (seqNo, purchaseShare) R1-->>FE: result "Purchased 10" R2-->>FE: result "Purchased 10" note over R3: R3 crashes during processing FE->>FE: sees 2 matching successful results FE->>Buyer: "Purchase success (10 shares)" FE->>RM: "Replica C is not responding" RM->>R3: ping => fails RM->>R3: restart or spawn new replica note over R3: after restart, sync w/others

Conclusion

The test cases above normal operations + crash scenarios will verify correctness, robustness, and recovery in the actively replicated DSMS:

Normal Tests: Check that basic admin and buyer operations behave correctly under no failures.
Crash Tests: Forcibly terminate one replica during or just after receiving a request. Confirm the FE obtains matching results from the remaining replicas, and the Replica Manager replaces or restarts the crashed replica.

Together, these tests ensure the DSMS achieves the high availability goal under a single crash failure.

6. Verifying All Functionalities

After coding all the above Functionalities, need to verify:

Admin functions: Log into the system with a valid Admin ID in the FE. Perform:
- addShare
- removeShare
- listShareAvailability
Buyer functions: Perform:
- purchaseShare
- getShares
- sellShare
- swapShares

Confirm each replica’s data capacities and buyer records remain consistent. If forcibly stop any single replica process on Machine 2 or Machine 3, the FE should continue returning correct results the other two healthy replicas.

7. Final Conclusion

This document outlines the architectural structure and key components that must be implemented or extended in the existing DSMS code to achieve software failure tolerance or high availability using active replication. These design ensure correctness, fault tolerance, and ease of debugging.