The Higgs Boson Machine Learning Challenge is, by far, one of the biggest big data competitions focusing on data analysis in the world. To be successful in such a competition, Cheng applied his knowledge in Computer Science, Mathematics, Statistics, and Physics, while his problem solving habit is developed during his training in Civil Engineering.
In this presentation, Cheng will use his experience in this competition to illustrate some important elements in big data analytics and why they are important. The content of the presentation covers different disciplines such as physics, statistics, and mathematics. But no background knowledge of these areas are required to understand the essence of the presentation.
In brief, the presentation covers the following content:
An effective framework for general data mining projects,
Introduction of the competition and its related physics background,
Various techniques in data exploring and some traps to avoid,
Various ways of feature enhancement,
Model building and selection, and
Optimization of model performance