The Cloud & AI Conference 2023, held at IIT Madras on November 17-18, focuses on the challenges and costs associated with running Large Language Models (LLMs) in production settings. Key discussion points include the importance of deterministic outputs, the intricacies of cost and latency management, and the trade-offs between prompt engineering and fine-tuning. The document outlines optimization strategies for inference costs and emphasizes the need for evolving design mindsets to handle the stochastic nature of LLMs.