The document discusses open source software tools for data scientists. It begins by defining a data scientist and explaining why open source software is useful. It then surveys and summarizes numerous popular open source tools for statistical analysis, data mining, machine learning, natural language processing, social network analysis, data visualization, and data fusion/analysis. The tools covered include R, Pandas, Impala, Mahout, Scikit-learn, Mallet, NLTK, Stanford CoreNLP, CLAVIN, NetworkX, Gephi, D3.js, and Lumify. It concludes by recommending that companies focus their budgets on people, resources, and proprietary software when necessary rather than on software licenses.