Illustration of the regression representation. In the linear axis corresponding to the real genome, the position of each TGS long read is represented by a parameter βi to be estimated. Each overlap, marked by the yellow double-headed arrow, provides an observation on the difference between two reads’ positions. However, chimeric reads, as well as repeats from distant regions in either the same or reverse strain of the genome, will bring in false overlaps, as those marked by red crossings, where ① indicates a false overlap caused by a chimeric read, ② indicates one caused by a repeat in the same strain, and ③ indicates one caused by a repeat in the reverse strain. All the overlap observations are integrated into the linear regression model Y=Xβ+ε. Then the two-step robust regression procedure gives a globally optimal estimate of the read positions, which lead to a layout. Meanwhile, it detects the outliers, which correspond to the false overlaps.