Design Documentation for Software Failure Tolerant Highly Available Distributed Share Market System

This document describes the comp6231 final project design of the DSMS with support for tolerating a single software failure non-malicious Byzantine or achieving high availability tolerance of a single crash failure use active replication. This provide an overview of the system architecture, the roles and interactions of each module, and an outline of what needs to be implemented or modified.


1. Overview

In this project, we extend the existing DSMS from assignment 3 to incorporate active replication. The design allows for:


2. System Architecture

Below is a diagram illustrating the system components and data flows:

flowchart LR A(Client) --> B(FE) B(FE) --> C(Sequencer) C(Sequencer) --> D[Replica A] C(Sequencer) --> E[Replica B] C(Sequencer) --> F[Replica C] D --> B E --> B F --> B subgraph ManagerGroup[Replica Managers] Dm[RM for Replica A] Em[RM for Replica B] Fm[RM for Replica C] end B -- suspect software bug or crash --> ManagerGroup

3. Detailed Component Interaction

Below is a simplified sequence diagram showing how a typical request is processed, how failures are detected, and how RMs handle those failures.

sequenceDiagram autonumber participant C as Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C participant RM as Replica Managers C->>FE: Request FE->>S: Send Request S->>R1: (seqNo, request) S->>R2: (seqNo, request) S->>R3: (seqNo, request) R1-->>FE: Response1 R2-->>FE: Response2 R3-->>FE: Response3 FE->>FE: Compare responses alt All match or majority found FE->>C: Return correct result else Timeout or mismatch FE->>RM: Notify suspicion (Crash or incorrect) RM->>R1: Check (if R1 is suspect) RM->>R2: Check (if R2 is suspect) RM->>R3: Check (if R3 is suspect) alt Confirmed failure RM->>R1/R2/R3: Replace or restart failed replica end end

4. Detailed Implementation Plan

Below shows how I desgined the DSMS with Front End, Sequencer, Replicas, and Replica Managers using only three physical machines on a local network. Also demonstrate how to test crash failures.


A. System Overview

We need:

Because I have only 3 physical machines, one of them must host the FE + Sequencer alongside one Replica and its RM. The other two machines each host a Replica and its RM. That gives us 3 replicas total and a single FE Sequencer instance. Below is how the Distribution works:

  1. Machine 1:
  2. Machine 2:
  3. Machine 3:
Deployment Diagram
flowchart LR M1(Machine 1) --- M2(Machine 2) M1 --- M3(Machine 3) subgraph M1 [Machine 1] FE(Front End):::fe SQ(Sequencer):::seq RA(Replica A):::replica RMA(RM A):::rm end subgraph M2 [Machine 2] RB(Replica B):::replica RMB(RM B):::rm end subgraph M3 [Machine 3] RC(Replica C):::replica RMC(RM C):::rm end classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Each block Replica + RM can be run as separate processes or combined logic, but they should communicate on the same machine localhost ports.

Machine 1 hosts:

Machine 1 Deployment
flowchart TB subgraph Machine1 [Machine 1] direction TB FE(Front End):::fe SQ(Sequencer):::seq subgraph DSMS_Processes direction TB TOK1(TOK Instance - M1):::replica RMTOK1(RM TOK - M1):::rm LON1(LON Instance - M1):::replica RMLON1(RM LON - M1):::rm NYK1(NYK Instance - M1):::replica RMNYK1(RM NYK - M1):::rm end end %% Arrows for internal communication FE -- requests --> SQ SQ -- seq. msgs --> TOK1 SQ -- seq. msgs --> LON1 SQ -- seq. msgs --> NYK1 %% Replicas -> FE TOK1 -- results --> FE LON1 -- results --> FE NYK1 -- results --> FE %% RM internal signals FE -- crash/fault info --> RMTOK1 FE -- crash/fault info --> RMLON1 FE -- crash/fault info --> RMNYK1 %% Relationship between RM and its instance RMTOK1 -- manages --> TOK1 RMLON1 -- manages --> LON1 RMNYK1 -- manages --> NYK1 classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Key Points:

Machines 2 and 3 do not host the FE or Sequencer. They only run:

That way, each city code has a replica on Machine 1, 2, and 3. The Sequencer on Machine 1 sends requests to all nine processes which 3 per machine.

Machine 2 or 3 Deployment
flowchart TB subgraph MachineX [Machine 2 or Machine 3] direction TB subgraph DSMS_Processes direction TB TOKx(TOK Instance - M2/M3):::replica RMTOKx(RM TOK - M2/M3):::rm LONx(LON Instance - M2/M3):::replica RMLONx(RM LON - M2/M3):::rm NYKx(NYK Instance - M2/M3):::replica RMNYKx(RM NYK - M2/M3):::rm end end %% Arrows from Sequencer (on Machine 1) SQ1(Sequencer - M1):::seq SQ1 -- seq. msgs --> TOKx SQ1 -- seq. msgs --> LONx SQ1 -- seq. msgs --> NYKx %% Replicas -> FE on M1 FE1(Front End - M1):::fe TOKx -- results --> FE1 LONx -- results --> FE1 NYKx -- results --> FE1 %% Crash/fault notifications FE1 -- crash/fault info --> RMTOKx FE1 -- crash/fault info --> RMLONx FE1 -- crash/fault info --> RMNYKx %% RM manages the local instance RMTOKx -- manages --> TOKx RMLONx -- manages --> LONx RMNYKx -- manages --> NYKx classDef fe fill:#ADE7FF,stroke:#333,stroke-width:1px classDef seq fill:#FFD585,stroke:#333,stroke-width:1px classDef replica fill:#C7FFC7,stroke:#333,stroke-width:1px classDef rm fill:#FFC7C7,stroke:#333,stroke-width:1px

Key Points:

Summary

In this deployment:


B. Running the System

Once the system is up, point any Admin or Buyer Client to connect to the Front End on Machine 1. The FE will pass requests to the Sequencer, which multicasts them to all three Replicas in total order.

The same pseudo-code from the assignment 3. The only difference is how we assign host IP addresses and ports to each component. For example:

Machine1_IP = "192.168.0.10"
Machine2_IP = "192.168.0.11"
Machine3_IP = "192.168.0.12"

// On Machine 1
startReplica("A", "192.168.0.10", port=9000)
startRM("A", "192.168.0.10", port=9100)
startSequencer("192.168.0.10", port=7000)
startFrontEnd("192.168.0.10", port=7100, sequencerIP="192.168.0.10", seqPort=7000)

// On Machine 2
startReplica("B", "192.168.0.11", port=9000)
startRM("B", "192.168.0.11", port=9100)

// On Machine 3
startReplica("C", "192.168.0.12", port=9000)
startRM("C", "192.168.0.12", port=9100)

Each replica knows the IP address and ports of the other replicas and the Sequencer. The Front End is the only one the client ever contacts 192.168.0.10:7100 in this example.


Runtime Communication Flow
flowchart TB Client --> FE[Front End @ Machine 1] FE --> SQ[Sequencer @ Machine 1] SQ --> RA[Replica A @ Machine 1] SQ --> RB[Replica B @ Machine 2] SQ --> RC[Replica C @ Machine 3] RA --> FE RB --> FE RC --> FE FE -.->|Notify if suspect crash/fault| RM[RM A,B,C]

pseudo-code for how a replica handle ordered requests:

ReplicaServer {
  sortedRequests = new PriorityQueue(... compare by seqNo ...);
  currentSeqExpected = 1;
  onReceiveRequest(seqNo, clientRequest):
      insert (seqNo, clientRequest) into sortedRequests

      while sortedRequests.peek() has seqNo == currentSeqExpected:
          nextReq = sortedRequests.poll()
          result = DSMS_logic(nextReq.clientRequest)
          sendResponseToFE(nextReq.seqNo, result)

          currentSeqExpected++
}

// DSMS_logic(request):
//   parse operation (addShare, removeShare, ....)
//   run the existing DSMS server code
//   return a string result

pseudo-code for the FE logic:

FrontEnd {
  handleClientRequest(clientRequest):
      seqNum = sendToSequencer(clientRequest)

      responses = []
      startTime = now()

      while not enoughResponses(responses):
          if responseArrivesFromReplica(rID, result):
              responses.add( (rID, result) )
              if checkMajorityOrTwoMatches(responses):
                  finalRes = getMajorityOrMatch(responses)
                  return finalRes
          if now() - startTime > TIMEOUT:
              identifyWhichReplicaDidNotRespond(responses)
              notifyAllRMs( crashedReplicaID )

      finalRes = getMajorityOrMatch(responses)
      return finalRes
}

C: RM

Each Replica Manager is bound to exactly one replica. The RM’s duties:

RM Fault Recovery Diagram
flowchart LR FE(Front End) --> RM([Replica Managers]) RM --> R([Replica]) RM -. coordinate .-> RM2([Other RMs]) RM -. check status .-> R R --> RM RM --> Rnew[Launch Replacement Replica?]

pseudo-code for the RM:

ReplicaManager(replicaID) {
  consecutiveFaults = 0

  onFaultSuspected(faultType):
      if faultType == "IncorrectResult":
          consecutiveFaults++
          if consecutiveFaults >= 3:
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0
      else if faultType == "CrashSuspected":
          if confirmCrashWithPeers(replicaID):
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0

  stopReplica(replicaID):
      // forcibly kill the process or call a cleanup method

  startNewReplica(replicaID):
      // spawn a fresh DSMS replica process
      // rejoin the group for receiving requests
}

D. Sequencer

The Sequencer enforces total order on all client requests. It:

Sequencer Flow Diagram
flowchart LR FE(Front End) --> SQ(Sequencer) SQ --> RP1[Replica 1] SQ --> RP2[Replica 2] SQ --> RP3[Replica 3]

pseudo-code :

Sequencer {
  nextSeqNo = 1

  onReceiveRequestFromFE(clientRequest):
      seqNo = nextSeqNo
      nextSeqNo++
      for each replica in replicaList:
          sendUDPWithAck(replica.address, (seqNo, clientRequest))

  sendUDPWithAck(address, message):
      do {
        sendUDP(address, message)
        wait for ack or timeout
      } while (no ack received && retryCount < MAX_RETRIES)
}

5. Test Crash Failures

Here is a list of comprehensive test cases to verify the Distributed Share Market System (DSMS) functionality, covering both normal operations and crash scenarios process crash. The DSMS includes:

A. Normal Operation Test Cases

Sequence Diagram
sequenceDiagram participant C as Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C participant RM as RMs C->>FE: Client Request FE->>S: Send Request + get seqNo S->>R1: (seqNo, request) S->>R2: (seqNo, request) S->>R3: (seqNo, request) R1-->>FE: result1 R2-->>FE: result2 R3-->>FE: result3 FE->>FE: Compare results / check timeouts alt mismatch or no response FE->>RM: Replica i is suspect else majority or matching pair FE->>C: Send correct result end

Below is a listing of core admin and buyer operations, covering typical edge cases. These assume all replicas are running and no process crashes occur.

Test Case ID Operation Scenario Expected Result
TA01 addShare Admin adds a new share with valid inputs Share is successfully created with correct capacity; success message returned.
TA02 addShare Admin attempts to add an existing shareID Operation fails with a message “Share already exists.” No new share is created.
TA03 removeShare Admin removes an existing share Share is removed from the server. Future list or purchase references should fail for that share.
TAd04 removeShare Admin attempts to remove a non-existent shareID Operation returns “Share not found” or similar message.
TA05 listShareAvailability Admin requests availability for a valid shareType (EQUITY, BONUS, or DIVIDEND) All known shares of that type are returned with capacities from each server city. No error.
TB01 purchaseShare Buyer purchases a share with sufficient capacity available Buyer's purchase is recorded, capacity is reduced accordingly. Successful response.
TB02 purchaseShare Buyer tries to purchase more than capacity, or shareID doesn’t exist Purchase fails or partial is allocated; an error or partial success message is returned.
TB03 getShares Buyer requests their existing holdings All owned shares and quantities from all servers are returned.
TB04 sellShare Buyer tries to sell shares they own, less than or equal to the purchased quantity Operation succeeds, capacity goes up on that share. Buyer’s holdings are decreased accordingly.
TB05 sellShare Buyer tries to sell a share they do not own Operation fails with error message ( “You do not own this share.”).
TB06 swapShares Buyer attempts to swap oldShare for newShare with enough capacity in newShare Old share is removed from the buyer's holdings, new share is allocated. Successful swap message returned.
TB07 swapShares Buyer attempts to swap but newShare lacks capacity Swap fails; old share is returned to the buyer’s holdings. Clear error message.

B. Crash Failure Test Cases

In the following scenarios, at least one replica is forcibly terminated during or right after a request is broadcast by the Sequencer. We assume DSMS is in crash-failure tolerance mode not the software-fault mode. The system should continue to operate with the remaining replicas.

Test Case ID Crash Point Operation Expected System Behavior
C01 After Sequencer sends an “addShare” request to all replicas,
Replica B is killed before responding
addShare FE receives 2 responses Replica A and Replica C.
FE times out waiting for B, suspects crash.
FE returns success if A and C results match.
RM(B) restarts or replaces B.
System remains consistent with the new share added.
C02 During removeShare broadcast, Replica A’s process is terminated in mid-execution removeShare FE gets responses from B and C.
A never responds => crash suspicion => RMs confirm.
The share is removed on B and C.
Replica A is later restarted or replaced.
Once A is back, it can catch up on missed requests.
C03 Replica C is killed after sending an incorrect partial result for “purchaseShare” purchaseShare FE compares responses from A, B, C. If C is killed mid-flight, FE get no final ack from C.
A and B match => FE proceeds with that result.
FE flags C as crashed. RM(C) restarts C.
The final system state is consistent per A and B.
C04 Replica B is killed before any response for “swapShares” swapShares FE collects responses from A and C. If they match, the swap is successful.
B is restarted later by RM(B).
The buyer’s holdings reflect the swapped share on A and C. B syncs after restart.
C05 Replica A is killed mid “sellShare” operation. FE times out on A’s response sellShare FE obtains matching results from B, C, concludes the sale.
A is flagged for crash; RM(A) restarts it.
Overall capacity is incremented as B and C performed the sale.

In each crash case, once the Replica Manager detects or is notified of the crash, it spins up a new instance or restarts so the system returns to having 3 replicas. The FE can respond to requests as soon as it has at least 2 matching responses from the healthy replicas.


C. Crash and Recovery

Below are two sample sequence diagrams illustrating crash scenarios and the subsequent recovery. The first focuses on a crash during an admin operation addShare, the second focuses on a buyer operation purchaseShare.

Diagram 1: Crash During addShare Operation
sequenceDiagram autonumber participant Admin as Admin Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B (Crashes) participant R3 as Replica C participant RM as Replica Managers Admin->>FE: addShare(shareID, shareType, capacity) FE->>S: "Request from Admin" S->>R1: (seqNo, addShare) S->>R2: (seqNo, addShare) S->>R3: (seqNo, addShare) R1-->>FE: "Success: added share" R3-->>FE: "Success: added share" note over R2: R2 Crashes before responding FE->>FE: Wait for R2 => times out FE->>RM: "Replica B suspected crashed" RM->>R2: Attempt ping => no response RM->>RM: coordinate "Yes, B is down." RM->>R2: "Restart or new instance" FE->>Admin: "Share added successfully"
(2 matching results from A & C) note over R2: Once restarted, R2 can sync state from others

Diagram 2: Crash During purchaseShare Operation
sequenceDiagram autonumber participant Buyer as Buyer Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C (Crashes) participant RM as Replica Managers Buyer->>FE: purchaseShare(shareID, 10) FE->>S: "Buyer request" S->>R1: (seqNo, purchaseShare) S->>R2: (seqNo, purchaseShare) S->>R3: (seqNo, purchaseShare) R1-->>FE: result "Purchased 10" R2-->>FE: result "Purchased 10" note over R3: R3 crashes during processing FE->>FE: sees 2 matching successful results FE->>Buyer: "Purchase success (10 shares)" FE->>RM: "Replica C is not responding" RM->>R3: ping => fails RM->>R3: restart or spawn new replica note over R3: after restart, sync w/others

Conclusion

The test cases above normal operations + crash scenarios will verify correctness, robustness, and recovery in the actively replicated DSMS:

  1. Normal Tests: Check that basic admin and buyer operations behave correctly under no failures.
  2. Crash Tests: Forcibly terminate one replica during or just after receiving a request. Confirm the FE obtains matching results from the remaining replicas, and the Replica Manager replaces or restarts the crashed replica.

Together, these tests ensure the DSMS achieves the high availability goal under a single crash failure.


6. Verifying All Functionalities

After coding all the above Functionalities, need to verify:

Confirm each replica’s data capacities and buyer records remain consistent. If forcibly stop any single replica process on Machine 2 or Machine 3, the FE should continue returning correct results the other two healthy replicas.



7. Final Conclusion

This document outlines the architectural structure and key components that must be implemented or extended in the existing DSMS code to achieve software failure tolerance or high availability using active replication. These design ensure correctness, fault tolerance, and ease of debugging.