Friday, March 21, 2025

I am @ New Delhi, India

How to dimensionally reduce data for the purposes of visualization

Jan 17, 2025

Dimensionality reduction is about taking high-dimensional data (like 1000 features) and mapping it into fewer dimensions (usually 2 or 3) so we can see patterns more easily. Here are some popular methods:

How does t-SNE work?

t-SNE takes each pair of points in the high-dimensional space and measures how similar they are. These similarities become probabilities. Then, it creates a similar set of probabilities in a lower-dimensional space and uses optimization (gradient descent) to minimize the difference between the two. This way, points that were close in high-dimensional space remain close, and points that were far remain far, preserving local structures and forming distinct clusters.

Here’s a good blog post about t-SNE.

t-SNE visualization of a small slice of human knowledge (from paperscape). Each circle is an arXiv paper, with size showing citation count. Source: Karpathy’s Blog.