We focus on detection and time-of-arrival (ToA) estimation of underwater acoustic signals of unknown structure. The common practice to use a detection threshold may fail when the assumed channel model is mismatched or when noise transients exist. We propose to detect and evaluate the ToA by labeling samples of observed data as 'signal' or 'noise'. Then, signal is detected when enough samples are labeled as 'signal', and ToA is estimated according to the position of the first 'signal'-related sample. We take a clustering approach thereby obviating the need for a detection threshold and training. Our method combines a constrained expectation-maximization (EM) with the Viterbi algorithm, and becomes handy when channel conditions are rough, noise statistics is hard to estimate, and signal-to-noise ratio is low. Numerical and experimental results show that, at the cost of some additional complexity, our proposed algorithm outperforms common benchmark methods in terms of detection and false alarm rates, and in terms of accuracy of ToA estimation.