Traditional approaches to simultaneous localization and mapping (SLAM) rely on low-level geometric features such as points, lines, and planes. They are unable to assign semantic labels to landmarks observed in the environment. Furthermore, loop closure recognition based on low-level features is often viewpoint-dependent and subject to failure in ambiguous or repetitive environments. On the other hand, object recognition methods can infer landmark classes and scales, resulting in a small set of easily recognizable landmarks, ideal for view-independent unambiguous loop closure. In a map with several objects of the same class, however, a crucial data association problem exists. While data association and recognition are discrete problems usually solved using discrete inference, classical SLAM is a continuous optimization over metric information. In this paper, we formulate an optimization problem over sensor states and semantic landmark positions that integrates metric information, semantic information, and data associations, and decompose it into two interconnected problems: an estimation of discrete data association and landmark class probabilities, and a continuous optimization over the metric states. The estimated landmark and robot poses affect the association and class distributions, which in turn affect the robot-landmark pose optimization. The performance of our algorithm is demonstrated on indoor and outdoor datasets.