WebRTC technologies are currently showing their potential for providing peer-to-peer real-time communications in a seamless and scalable way. However, most relevant use cases demanded by users require further features such as group communications, media recording and media interoperability. Providing them requires the presence of WebRTC media infrastructures that are sometimes complex to manage and to scale. In this talk, we present the experiences of the Kurento.org team creating auto-scalable WebRTC infrastructures in the large. Following results generated by the NUBOMEDIA and FIWARE research projects, we introduce stateless and stateful scalability models, which provide different scalability definitions and properties. Stateless models are suitable services requiring large number of WebRTC sessions with few participants each. Such models are commonly deployed today and they are compatible with current state-of-the-art on RTP topologies (e.g. following SFU or MCU architectures). On the other hand, stateful models are capable of scaling to very large sessions (with thousands or hundred of thousands of participants) but require new types of RTP topologies beyond plain SFU and MCU models. During the talk, we also show how to deploy such stateful and stateless infrastructures on top of IaaS clouds such as Amazon or OpenStack so that their scalability can be automatically managed. We also present the different KPIs that auto-scaling algorithms may use as well as our experiences on the accuracy and appropriateness of them. To conclude, we introduce some real-word problems on such deployments related to infrastructure monitoring and instrumentation, fault-tolerance and fault resilience mechanism and security issues.