The document summarizes the results of a benchmark comparison that tested several large language models across different skillsets and domains. It shows that GPT-4 performed the best overall based on metrics like logical robustness, correctness, efficiency, factuality, and common sense. Tables display the scores each model received for different skillsets and how they compare between open-sourced, proprietary, and oracle models. The source is listed as an unreviewed preprint paper and related GitHub page under a Creative Commons license.