The y axis is the Euclidean distance between the unit cell parameters and is described in Methods (equation ( 1 )). The horizontal axis is the index of the 3D ED datasets used for HCA. HCA showed five zeolite phases (highlighted by arrows), RTH , IWV , *CTH , *UOE and POS , by setting the Euclidean distance cut threshold at 2.0. The branches under each phase/cluster are of the same colour. As indicated by the number of branches (one branch represents one crystal) under each phase, RTH , IWV and *CTH are the major phases, and POS and *UOE are the minor phases. The unclassified datasets (marked by a purple star) could not be identified. These could be data from crystal agglomerates or of otherwise poor data quality, which both result in inaccurate unit cell parameters with large deviations (Supplementary Fig. 11 ). The unit cell parameters of all datasets for the HCA are given in Supplementary Table 1 .