Preprocessing module

Preprocessing module.

MARBLE.preprocessing.construct_dataset(anchor, vector, label=None, mask=None, graph_type='cknn', k=20, delta=1.0, frac_geodesic_nb=1.5, spacing=0.0, number_of_resamples=1, var_explained=0.9, local_gauges=False, seed=None, metric='euclidean', number_of_eigenvectors=None)[source]

Construct PyG dataset from node positions and features.

Parameters:
  • pos – matrix with position of points

  • features – matrix with feature values for each point

  • labels – any additional data labels used for plotting only

  • mask – boolean array, that will be forced to be close (default is None)

  • graph_type – type of nearest-neighbours graph: cknn (default), knn or radius

  • k – number of nearest-neighbours to construct the graph

  • delta – argument for cknn graph construction to decide the radius for each points.

  • frac_geodesic_nb – number of geodesic neighbours to fit the gauges to

  • k*frac_geodesic_nb (to map to tangent space)

  • stop_crit – stopping criterion for furthest point sampling

  • number_of_resamples – number of furthest point sampling runs to prevent bias (experimental)

  • var_explained – fraction of variance explained by the local gauges

  • local_gauges – is True, it will try to compute local gauges if it can (signal dim is > 2, embedding dimension is > 2 or dim embedding is not dim of manifold)

  • seed – Specify for reproducibility in the furthest point sampling. The default is None, which means a random starting vertex.

  • metric – metric used to fit proximity graph

  • number_of_eigenvectors – int number of eigenvectors to use. Default: None, meaning use all.