We present a system capable of detecting and associating traffic participants in camera images with different points of view to a street scene. Our system is based on a multitask CNN architecture and detection as well as association are performed within the network. The association between different images is estimated without explicit knowledge of scene geometry and camera calibration information. One of the main applications of our system are currently test areas for autonomous vehicles. For this use case ground truth information for testing the environment perception as well as external environment information sent to autonomous vehicles via car-to-infrastructure is of great importance. This is particularly interesting in complex scenarios like big intersections. Our system shows promising results on the association task of big intersections taken from 8 different points of view.