Preprocessing module¶
Preprocessing module.
- MARBLE.preprocessing.construct_dataset(anchor, vector, label=None, mask=None, graph_type='cknn', k=20, delta=1.0, frac_geodesic_nb=1.5, spacing=0.0, number_of_resamples=1, var_explained=0.9, local_gauges=False, seed=None, metric='euclidean', number_of_eigenvectors=None)[source]¶
Construct PyG dataset from node positions and features.
- Parameters:
pos – matrix with position of points
features – matrix with feature values for each point
labels – any additional data labels used for plotting only
mask – boolean array, that will be forced to be close (default is None)
graph_type – type of nearest-neighbours graph: cknn (default), knn or radius
k – number of nearest-neighbours to construct the graph
delta – argument for cknn graph construction to decide the radius for each points.
frac_geodesic_nb – number of geodesic neighbours to fit the gauges to
k*frac_geodesic_nb (to map to tangent space)
stop_crit – stopping criterion for furthest point sampling
number_of_resamples – number of furthest point sampling runs to prevent bias (experimental)
var_explained – fraction of variance explained by the local gauges
local_gauges – is True, it will try to compute local gauges if it can (signal dim is > 2, embedding dimension is > 2 or dim embedding is not dim of manifold)
seed – Specify for reproducibility in the furthest point sampling. The default is None, which means a random starting vertex.
metric – metric used to fit proximity graph
number_of_eigenvectors – int number of eigenvectors to use. Default: None, meaning use all.