The Zen of TDA¶
Warning
This is an extremely high level perspective on something that can be described very concretely. The author has found this layer abstraction very useful in remembering and relating the zoo of variations present in the literature.
In general, data science wants a strategy of the form:
\[\mathrm{Data} \overset{\mathrm{ML}} \longrightarrow \mathrm{Insight}\]
Traditional methods use probabilities:
\[\begin{split}\begin{align*} \mathrm{Data} &\overset{\mathrm{Stats}}\longrightarrow \mathrm{Probabilities} \overset{\mathrm{Prob}} \longrightarrow \mathrm{Insight} \\ \mathrm{sample} &\longmapsto \mathrm{(reg.)\ MLE} \longmapsto \mathrm{descriptive\ statistic} \end{align*}\end{split}\]
TDA uses spaces:
\[\mathrm{Data} \longrightarrow \mathrm{Spaces} \longrightarrow \mathrm{Insight}\]
It’s based on the simple
Hypothesis
Data has shape:
and shape has meaning:
\[\begin{align*} \mathrm{Spaces} \longrightarrow \mathrm{Insight} \end{align*}\]
In other words, TDA uses spaces to understand data.
Example
We can use sample data to build a graph:
\[\mathrm{sample} \longmapsto \mathrm{graph}\]
and draw that graph:
or say something about the graph:
Here is a loose
Definition
TDA is the application of constructions, deconstructions, and analyses of spaces to data science.
The theory behind spaces has a name.
Definition
Homotopy theory is the (de)construction and analysis of spaces
This gives a more concise definition of TDA:
Definition
TDA uses homotopy theory to understand data.
Warning
Unless you have a very strong (e.g. graduate level) background in mathematics, don’t try to learn this stuff by (only) reading a textbook.
Find someone to talk to about it who already knows about it. For example, go to a conference.
Find a project to use this stuff in.
Read a book.
Combine them.
To-Do
Add useful references