DNetPRO

class DNetPRO.DNetPRO(estimator=GaussianNB(), cv=LeaveOneOut(), scoring=None, max_chunk=100, percentage=0.1, verbose=False, n_jobs=1)[source]

Bases: BaseEstimator, ClassifierMixin

DNetPRO feature selection algorithm

Parameters:
  • estimator (object) – A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.

  • cv (int, cross-validation generator or an iterable, optional) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 3-fold cross-validation,

    • integer, to specify the number of folds.

    • An object to be used as a cross-validation generator.

    • An iterable yielding train/test splits.

    For integer/None inputs, if y is binary or multiclass, sklearn.model_selection.StratifiedKFold is used. If the estimator is a classifier or if y is neither binary nor multiclass, sklearn.model_selection.KFold is used.

    Refer to scikit-learn cross_validation for the various cross-validation strategies that can be used here.

  • scoring (string, callable or None, optional, default: None) – A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

  • max_chunk (int, default=100) – Max number of features allowed in performances-chunk. If the size of chunk is greater than max_chunk and it is not the first one, features selection iteration is stopped.

  • percentage (float, default=0.1) – Percentage of couples to save after sorting

  • verbose (int, default=0) – Controls verbosity of couples evaluation.

  • n_jobs (int, default 1) – Number of cores to run in parallel while fitting across folds. Defaults to 1 core. If n_jobs=-1, then number of jobs is set to number of cores.

Example

>>> import numpy as np
>>> from DNetPRO import DNetPRO
>>> from sklearn.naive_bayes import GaussianNB
>>>
>>> Nprobe, Nsample = (5, 4)
>>>
>>> X = np.random.uniform(low=0., high=1., size=(Nsample, Nprobe))
>>> y = np.array(['A', 'A', 'B', 'B'])
>>>
>>> dnet = DNetPRO(estimator=GaussianNB(), max_chunk=4)
>>> dnet.fit(X, y)
>>>
>>> print(dnet.signatures)

Notes

Note

The full computation of couples is performed via C++ multithreading thus set an appropriated number of threads to speed up the execution.

References

  • Curti N., Giampieri E., Levi G., Castellani G., Remondini D.; DNetPRO: A network approach for low-dimensional signatures from high-throughput data; bioRxiv 773622; doi: https://doi.org/10.1101/773622

  • Mizzi C., Fabbri A., Rambaldi S. et al.; Unraveling pedestrian mobility on a road network using ICTs data during great tourist events. EPJ Data Sci. 7, 44 (2018); https://doi.org/10.1140/epjds/s13688-018-0168-2

  • Boccardi V., Paolacci L., Remondini D., Giampieri E., Poli G., Curti N., Cecchetti R., Villa A., Ruggiero C., Brancorsini S., Mecocci P.; Cognitive Decline and Alzheimer’s Disease in Old Age: A Sex-Specific Cytokinome Signature. J Alzheimers Dis. 2019;72(3):911-918. doi: 10.3233/JAD-190480. PMID: 31658056.

  • Malvisi M., Curti N., Remondini D., De Iorio MG., Palazzo F., Gandini G., Vitali S., Polli M., Williams JL., Minozzi G. Combinatorial Discriminant Analysis Applied to RNAseq Data Reveals a Set of 10 Transcripts as Signatures of Exposure of Cattle to Mycobacterium avium subsp. paratuberculosis. Animals (Basel). 2020 Feb 5;10(2):253. doi: 10.3390/ani10020253. PMID: 32033399; PMCID: PMC7070263.

    1. Biondi, G. Gravante, D. Remondini, S. Peluso, S. Cominetti, F. D’Amore, M. Bignami, A.D. Arosio, N. Curti; Towards Precision Medicine in Sinonasal Tumors: Low-Dimensional Radiomic Signature Extraction from MRI. Diagnostics (2025); doi: 10.3390/diagnostics15131675.

connected_component_subgraphs()[source]

Generator of connected components compatible with old and new networkx versions

Parameters:

G (graph) – A networkx graph

Returns:

subgraph – A subgraph (networkx like) of the input

Return type:

graph

decision_function(X)[source]
fit(X, y=None, **fit_params)[source]

Fit the DNetPRO model meta-transformer

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,)) – The target values (integers that correspond to classes in classification, real numbers in regression).

  • **fit_params (Other estimator specific parameters)

Returns:

self – Returns self.

Return type:

object

fit_transform(X, y)[source]

Fit the DNetPRO model meta-transformer and apply the data transformation, i.e feature selection

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,)) – The target values (integers that correspond to classes in classification, real numbers in regression).

  • **fit_params (Other estimator specific parameters)

Returns:

Xnew – The data filtered according to the best features found by the model

Return type:

array-like of shape (n_sample, n_signature_features)

Notes

Note

The signature is selected as the signature with highest score on training (X) data.

get_signature()[source]

Return the computed signature in ascending order (training score value)

static label2numbers(arr)[source]

Convert labels to numerical values

Parameters:

arr (array_like) – The array of labels

Returns:

numeric_labels – Array of numerical labels obtained by the LabelEncoder transform

Return type:

np.ndarray

Notes

The C++ function allows only numerical (integer) values as labels in input. For more general support refers to the C++ example.

Examples

>>> from DNetPRO import DNetPRO
>>> y = ('A', 'A', 'B', 'B')
>>> num_y = DNetPRO.label2numbers(y)
>>> print(num_y)
  [0, 0, 1, 1]
static pendrem(graph)[source]

Remove pendant node iterativelly

Parameters:

graph (graph) – A NetworkX graph

Returns:

pruned – The same graph without pendant nodes

Return type:

graph

Example

>>> import networkx as nx
>>> G = nx.star_graph(n=10)
>>> print(G.nodes)
  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> pruned = penderem(G)
>>> print(pruned.nodes)
  [0]
predict(X)[source]

Reduce X to the selected features and then predict using the underlying estimator.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.

Returns:

y – The predicted target values.

Return type:

array of shape [n_samples]

predict_log_proba(X)[source]
predict_proba(X)[source]
score(X, y)[source]

Reduce X to the selected features and then return the score of the underlying estimator.

Parameters:
  • X (array of shape [n_samples, n_features]) – The input samples.

  • y (array of shape [n_samples]) – The target values.

set_signature(index)[source]

Set the signature as selected features and fit the model

Parameters:

index (int) – Index of the signatures array

transform(X)[source]

Apply the data reduction according to the features in the best signature found.

Parameters:

X (array of shape [n_samples, n_features]) – The input samples.