1. TPMMS Steps (High-Level Flow)

flowchart TD A[Start: Main.main] --> B[Configure memory and output paths] B --> C[Create IOTracker] C --> D[Create TPMMS sorter] D --> E[Phase 1: createInitialRuns for T1] D --> F[Phase 1: createInitialRuns for T2] E --> G[T1 runs on disk] F --> H[T2 runs on disk] G --> I[Phase 2: multiPassMerge for T1] H --> J[Phase 2: multiPassMerge for T2] I --> K[Final sorted T1 file] J --> L[Final sorted T2 file] K --> M[BagUnionMerger.mergeAndWrite] L --> M M --> N[Bag union output file] N --> O[Print MergeMetrics and IOTracker stats]

2. Code Structure / Class Architecture

classDiagram class Main { +main(args) -clearOutputDir(dirPath) } class TPMMS { -BLOCK_TUPLES -maxRecordsInMem -K -ioTracker +TPMMS(memMB, ioTracker) +createInitialRuns(filePath, prefix) +multiPassMerge(initialRuns, relName) -mergeTwoRuns(leftFile, rightFile) -writeRun(buffer, runName) } class BagUnionMerger { +mergeAndWrite(sortedT1, sortedT2, ioTracker, writer) } class IOTracker { -readTuplesInCurrentBlock -writtenTuplesInCurrentBlock +totalBlocksRead +totalBlocksWritten +noteReadLine() +noteWriteLine() +flushPartialBlocks() } class MergeMetrics { +distinctTuples +outputBlocks +blocksForTuples(tuples) } class Record { +TOTAL_WIDTH +raw +Record(line) +compareTo(other) +toString() +equals(obj) +hashCode() } %% Relationships Main --> IOTracker : uses Main --> TPMMS : uses Main --> BagUnionMerger : uses Main --> MergeMetrics : reads TPMMS --> IOTracker : uses TPMMS --> Record : uses BagUnionMerger --> IOTracker : uses BagUnionMerger --> Record : uses BagUnionMerger --> MergeMetrics : returns