1. TPMMS Steps (High-Level Flow)
flowchart TD
A[Start: Main.main] --> B[Configure memory and output paths]
B --> C[Create IOTracker]
C --> D[Create TPMMS sorter]
D --> E[Phase 1: createInitialRuns for T1]
D --> F[Phase 1: createInitialRuns for T2]
E --> G[T1 runs on disk]
F --> H[T2 runs on disk]
G --> I[Phase 2: multiPassMerge for T1]
H --> J[Phase 2: multiPassMerge for T2]
I --> K[Final sorted T1 file]
J --> L[Final sorted T2 file]
K --> M[BagUnionMerger.mergeAndWrite]
L --> M
M --> N[Bag union output file]
N --> O[Print MergeMetrics and IOTracker stats]
2. Code Structure / Class Architecture
classDiagram
class Main {
+main(args)
-clearOutputDir(dirPath)
}
class TPMMS {
-BLOCK_TUPLES
-maxRecordsInMem
-K
-ioTracker
+TPMMS(memMB, ioTracker)
+createInitialRuns(filePath, prefix)
+multiPassMerge(initialRuns, relName)
-mergeTwoRuns(leftFile, rightFile)
-writeRun(buffer, runName)
}
class BagUnionMerger {
+mergeAndWrite(sortedT1, sortedT2, ioTracker, writer)
}
class IOTracker {
-readTuplesInCurrentBlock
-writtenTuplesInCurrentBlock
+totalBlocksRead
+totalBlocksWritten
+noteReadLine()
+noteWriteLine()
+flushPartialBlocks()
}
class MergeMetrics {
+distinctTuples
+outputBlocks
+blocksForTuples(tuples)
}
class Record {
+TOTAL_WIDTH
+raw
+Record(line)
+compareTo(other)
+toString()
+equals(obj)
+hashCode()
}
%% Relationships
Main --> IOTracker : uses
Main --> TPMMS : uses
Main --> BagUnionMerger : uses
Main --> MergeMetrics : reads
TPMMS --> IOTracker : uses
TPMMS --> Record : uses
BagUnionMerger --> IOTracker : uses
BagUnionMerger --> Record : uses
BagUnionMerger --> MergeMetrics : returns