DSMS Part 5 - Detailed Implementation Plan

In this section, we break down the required modifications to integrate software-fault tolerance or high availability into the Distributed Share Market System (DSMS). We provide recommended pseudo-code and diagrams for each subcomponent.

Team-Wide Changes

The team must decide at system initialization whether the DSMS is running in:

Software-fault tolerance mode (non-malicious Byzantine), or
Crash-failure tolerance mode.

All other modules (Front End, Sequencer, Replicas, Replica Managers) use the same DSMS code but will branch on this selected mode to handle:

Software failures: Possibly incorrect results from one replica.
Crash failures: No response from one replica within a timeout period.

Team-Wide Initialization Diagram

flowchart LR A(Load Configuration) --> B{Mode Selected?} B -->|Software-fault Tolerance| C(Set 'ByzantineMode'=true) B -->|Crash-failure Tolerance| D(Set 'CrashMode'=true) C --> E[Launch Sequencer/FE/Replicas/ RMs in Byzantine mode] D --> F[Launch Sequencer/FE/Replicas/ RMs in Crash mode]

Here is some pseudo-code that all processes can share to initialize the system’s mode:


// Pseudo-code for configuring "failure mode" at startup

// A global or shared configuration object
GlobalConfig config = new GlobalConfig();

// At some main method or config loader:
String mode = getLaunchParameter("failureMode"); // e.g., "byzantine" or "crash"
if (mode.equalsIgnoreCase("byzantine")) {
    config.failureMode = "BYZANTINE";
} else {
    config.failureMode = "CRASH";
}

// Then pass "config" to the front end, sequencer, RMs, and replicas.
// All components read config.failureMode to know how to behave.

Server Replica Implementation

Each student modifies one copy of the DSMSServer (or CORBA equivalent) to become a “replica.” The replica now:

Receives requests in the form (sequenceNumber, clientRequest) from the Sequencer rather than directly from clients.
Executes requests in total order (by ascending sequenceNumber).
Sends the result back to the Front End (FE).

The DSMS logic (addShare, removeShare, purchaseShare, etc.) remains mostly the same, but you must:

Buffer and sort incoming requests by their sequence number.
Process them strictly in ascending order.
Immediately return the result to the FE (or FE’s port) after processing each request.

Replica Workflow Diagram

flowchart TB SQ(Sequencer) --> RE(Replica) RE --> DB[DSMS Logic Data Structures] DB --> RE RE --> FE(Front End)

Here is pseudo-code for how a replica might handle ordered requests:

ReplicaServer {
  // Maintains a queue or map of sequenceNumber -> request
  sortedRequests = new PriorityQueue(... compare by seqNo ...);
  currentSeqExpected = 1;

  // On receiving a new request (seqNo, clientRequest)
  onReceiveRequest(seqNo, clientRequest):
      insert (seqNo, clientRequest) into sortedRequests

      // Try to process in order
      while sortedRequests.peek() has seqNo == currentSeqExpected:
          nextReq = sortedRequests.poll()
          // process DSMS operation
          result = DSMS_logic(nextReq.clientRequest)
          // send 'result' back to FE
          sendResponseToFE(nextReq.seqNo, result)

          currentSeqExpected++
}

// DSMS_logic(request):
//   parse operation (addShare, removeShare, etc.)
//   run the existing DSMS server code
//   return a string result

Student 1: Front End Implementation

The Front End (FE) is the sole entry point for all clients (admins or buyers). Its responsibilities:

Receive each client request over a known interface (could be a simple local stub or a web service endpoint).
Forward the request to the Sequencer.
Collect responses from all Replicas.
In software-fault tolerant mode, pick the majority or first two identical results to detect any incorrect (buggy) result.
In crash-failure mode, consider the first matching pair of results as correct, or wait for a timeout from one replica to suspect a crash.
Notify the Replica Managers (RMs) if:
- A mismatch arises (suspected software failure) or
- A replica times out (suspected crash).
Return the correct result to the client as soon as possible.

Front End Sequence Diagram

sequenceDiagram participant C as Client participant FE as Front End participant S as Sequencer participant R1 as Replica A participant R2 as Replica B participant R3 as Replica C participant RM as RMs C->>FE: Client Request FE->>S: Send Request + get seqNo S->>R1: (seqNo, request) S->>R2: (seqNo, request) S->>R3: (seqNo, request) R1-->>FE: result1 R2-->>FE: result2 R3-->>FE: result3 FE->>FE: Compare results / check timeouts alt mismatch or no response FE->>RM: Replica i is suspect else majority or matching pair FE->>C: Send correct result end

Here is pseudo-code for the FE logic:

FrontEnd {
  handleClientRequest(clientRequest):
      // 1) Send request to sequencer
      seqNum = sendToSequencer(clientRequest)

      // 2) Initialize timers/wait for 3 replica responses
      responses = []
      startTime = now()

      while not enoughResponses(responses):
          if responseArrivesFromReplica(rID, result):
              responses.add( (rID, result) )
              if checkMajorityOrTwoMatches(responses):
                  finalRes = getMajorityOrMatch(responses)
                  // send final result to client
                  return finalRes
          if now() - startTime > TIMEOUT:
              // suspect crash
              identifyWhichReplicaDidNotRespond(responses)
              notifyAllRMs( crashedReplicaID )
              // possibly keep waiting for the remaining 2 if we can still get a majority

      // fallback
      finalRes = getMajorityOrMatch(responses)
      return finalRes
}

Student 2: Replica Manager (RM) Implementation

Each Replica Manager is bound to exactly one replica. The RM’s duties:

Detect repeated software faults: If the FE signals that a replica returned an incorrect result, increment a counter for that replica. After 3 consecutive incorrect results, replace or restart the replica.
Detect crash failures: If the FE signals that a replica timed out, the RMs coordinate (e.g., ping the replica). If all RMs agree it has crashed, they restart or replace it.
Potentially keep a hot backup on a different host or start a new process when needed.

RM Fault Recovery Diagram

flowchart LR FE(Front End) --> RM([Replica Managers]) RM --> R([Replica]) RM -. coordinate .-> RM2([Other RMs]) RM -. check status .-> R R --> RM RM --> Rnew[Launch Replacement Replica?]

Below is pseudo-code for the RM:

ReplicaManager(replicaID) {
  consecutiveFaults = 0

  onFaultSuspected(faultType):
      if faultType == "IncorrectResult":
          consecutiveFaults++
          if consecutiveFaults >= 3:
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0
      else if faultType == "CrashSuspected":
          // coordinate with other RMs
          if confirmCrashWithPeers(replicaID):
              stopReplica(replicaID)
              startNewReplica(replicaID)
              consecutiveFaults = 0

  stopReplica(replicaID):
      // forcibly kill the process or call a cleanup method

  startNewReplica(replicaID):
      // spawn a fresh DSMS replica process
      // rejoin the group for receiving requests
}

Student 3: Sequencer Implementation

The Sequencer enforces total order on all client requests. It:

Maintains a global counter nextSeqNo.
On receiving a request from the FE, increments nextSeqNo and reliably multicasts (nextSeqNo, request) to all replicas.
Uses a simple reliability mechanism on UDP or a library for group communication to ensure no requests are lost.
Remains “failure-free” in this project (by assumption). In a real system, you could replicate the sequencer or have a crash fallback, but here we assume it does not fail.

Sequencer Flow Diagram

flowchart LR FE(Front End) --> SQ(Sequencer) SQ --> RP1[Replica 1] SQ --> RP2[Replica 2] SQ --> RP3[Replica 3]

Here is some pseudo-code for the sequencer:

Sequencer {
  nextSeqNo = 1

  onReceiveRequestFromFE(clientRequest):
      seqNo = nextSeqNo
      nextSeqNo++

      // reliably multicast (seqNo, clientRequest) to replicas
      for each replica in replicaList:
          sendUDPWithAck(replica.address, (seqNo, clientRequest))

  // sendUDPWithAck would be something like:
  sendUDPWithAck(address, message):
      do {
        sendUDP(address, message)
        wait for ack or timeout
      } while (no ack received && retryCount < MAX_RETRIES)
}

Conclusion

By following these implementation details for each subcomponent, you ensure that:

Your DSMS can operate under either software failure tolerance (Byzantine-like) or crash failure tolerance by toggling a global setting.
The Front End orchestrates client communication and majority checking.
The Sequencer guarantees total order delivery.
Each Replica processes ordered requests and returns results.
The Replica Managers detect and recover from faulty or crashed replicas, maintaining system integrity.

This completes the more detailed outline for Part 5 of the DSMS project, addressing each specific role and showing how pseudo-code and diagrams can guide your actual implementation.