To interact and collaborate with people in a natural way, robots must be able to recognize objects in their environments, accurately track the motion of humans, and estimate their goals and intentions. The last years have seen dramatic improvements in robotic capabilities to model, detect, and track non-rigid objects such as human bodies, hands, and their own manipulators. These recent developments can serve as the basis for providing robots with an unprecedented understanding of their environment and the people therein. I will use examples from our research on modeling, detecting, and tracking in 3D scenes to highlight some of these advances and discuss open problems that still need to be addressed. I will also use these examples to highlight the pros and cons of model-based approaches and deep learning techniques for solving perception problems in robotics.