Topic

Factor analysis: simplifying high dimensional data sets for visualization and machine learning

By: Mark Albert

Date: Oct. 8, 2015, 7 p.m.

For many machine learning problems, there are far more dimensions to our data than there need to be for efficient learning. Often a first step is dimensionality reduction to remove both redundancy and noise. In addition to more efficient automated learning, factor analysis allows us to visualize high dimensional data sets in our standard human-limited 2 or 3 dimensions. For demonstration, we will apply PCA on a set of questions asked of the audience to map everyone onto a 2D "personality" map - allowing us to visualize the underlying personality factors of those present. Beyond fun visualizations, these techniques are the basis of more efficient generalization in many machine learning problems.