KIT - IES - Studentenarbeit

Maschinelles Lernen: Low-Dimensional Embeddings and Topology

Typ:	Masterarbeit
Betreuer:	Tim Zander
Status:	abgeschlossen

Low-dimensional embeddings, also called manifold learning, are a central dimension reduction technique in data analysis. It is typically used for visual data inspection as well as a preprocessing step in a data pipeline.

The source (Rieck 2017)[Ch. 7] evaluates a range of classical embedding algorithms with typical embedding quality measures. In this project, the recent UMAP (McInnes and Healy 2018) and other older algorithms, which our group is currently implementing, shall be added to the analysis. Distances between Persistence Diagrams (PD) from the original and the embedded data shall be investigated as a new quality measure for low-dimensional embeddings. PDs are able to capture topological information, like circles and spheres, in the learned manifold. Such a scale-free local-to-global structure puts PDs into an interesting position within the other quality measures.

Possible Python APIs to the algorithms (many are actually written in C++) can be found here:

scikit-learn for low dimensional embeddings
UMAP
Persistence Diagrams with Gudhi or Dionysos

Applicants should not be afraid of a mathematical approach to programming. The project is supervised by Tim Zander (KIT, tim.zander@kit.edu) with cooperation from Arkadi Schelling (Uni Bremen, schelling@uni-bremen.de).

McInnes, Leland, and John Healy. 2018. “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.” ArXiv E-Prints, February. http://arxiv.org/abs/1802.03426.

Rieck, Bastian Alexander. 2017. “Persistent Homology in Multivariate Data Visualization.” PhD thesis, University of Heidelberg.