DNetPRO
- class DNetPRO.DNetPRO(estimator=GaussianNB(), cv=LeaveOneOut(), scoring=None, max_chunk=100, percentage=0.1, verbose=False, n_jobs=1)[source]
Bases:
BaseEstimator,ClassifierMixinDNetPRO feature selection algorithm
- Parameters:
estimator (object) – A supervised learning estimator with a
fitmethod that provides information about feature importance either through acoef_attribute or through afeature_importances_attribute.cv (int, cross-validation generator or an iterable, optional) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 3-fold cross-validation,
integer, to specify the number of folds.
An object to be used as a cross-validation generator.
An iterable yielding train/test splits.
For integer/None inputs, if
yis binary or multiclass,sklearn.model_selection.StratifiedKFoldis used. If the estimator is a classifier or ifyis neither binary nor multiclass,sklearn.model_selection.KFoldis used.Refer to scikit-learn cross_validation for the various cross-validation strategies that can be used here.
scoring (string, callable or None, optional, default: None) – A string (see model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y).max_chunk (int, default=100) – Max number of features allowed in performances-chunk. If the size of chunk is greater than max_chunk and it is not the first one, features selection iteration is stopped.
percentage (float, default=0.1) – Percentage of couples to save after sorting
verbose (int, default=0) – Controls verbosity of couples evaluation.
n_jobs (int, default 1) – Number of cores to run in parallel while fitting across folds. Defaults to 1 core. If n_jobs=-1, then number of jobs is set to number of cores.
Example
>>> import numpy as np >>> from DNetPRO import DNetPRO >>> from sklearn.naive_bayes import GaussianNB >>> >>> Nprobe, Nsample = (5, 4) >>> >>> X = np.random.uniform(low=0., high=1., size=(Nsample, Nprobe)) >>> y = np.array(['A', 'A', 'B', 'B']) >>> >>> dnet = DNetPRO(estimator=GaussianNB(), max_chunk=4) >>> dnet.fit(X, y) >>> >>> print(dnet.signatures)
Notes
Note
The full computation of couples is performed via C++ multithreading thus set an appropriated number of threads to speed up the execution.
References
Curti N., Giampieri E., Levi G., Castellani G., Remondini D.; DNetPRO: A network approach for low-dimensional signatures from high-throughput data; bioRxiv 773622; doi: https://doi.org/10.1101/773622
Mizzi C., Fabbri A., Rambaldi S. et al.; Unraveling pedestrian mobility on a road network using ICTs data during great tourist events. EPJ Data Sci. 7, 44 (2018); https://doi.org/10.1140/epjds/s13688-018-0168-2
Boccardi V., Paolacci L., Remondini D., Giampieri E., Poli G., Curti N., Cecchetti R., Villa A., Ruggiero C., Brancorsini S., Mecocci P.; Cognitive Decline and Alzheimer’s Disease in Old Age: A Sex-Specific Cytokinome Signature. J Alzheimers Dis. 2019;72(3):911-918. doi: 10.3233/JAD-190480. PMID: 31658056.
Malvisi M., Curti N., Remondini D., De Iorio MG., Palazzo F., Gandini G., Vitali S., Polli M., Williams JL., Minozzi G. Combinatorial Discriminant Analysis Applied to RNAseq Data Reveals a Set of 10 Transcripts as Signatures of Exposure of Cattle to Mycobacterium avium subsp. paratuberculosis. Animals (Basel). 2020 Feb 5;10(2):253. doi: 10.3390/ani10020253. PMID: 32033399; PMCID: PMC7070263.
Biondi, G. Gravante, D. Remondini, S. Peluso, S. Cominetti, F. D’Amore, M. Bignami, A.D. Arosio, N. Curti; Towards Precision Medicine in Sinonasal Tumors: Low-Dimensional Radiomic Signature Extraction from MRI. Diagnostics (2025); doi: 10.3390/diagnostics15131675.
- connected_component_subgraphs()[source]
Generator of connected components compatible with old and new networkx versions
- Parameters:
G (graph) – A networkx graph
- Returns:
subgraph – A subgraph (networkx like) of the input
- Return type:
graph
- fit(X, y=None, **fit_params)[source]
Fit the DNetPRO model meta-transformer
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,)) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (Other estimator specific parameters)
- Returns:
self – Returns self.
- Return type:
object
- fit_transform(X, y)[source]
Fit the DNetPRO model meta-transformer and apply the data transformation, i.e feature selection
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,)) – The target values (integers that correspond to classes in classification, real numbers in regression).
**fit_params (Other estimator specific parameters)
- Returns:
Xnew – The data filtered according to the best features found by the model
- Return type:
array-like of shape (n_sample, n_signature_features)
Notes
Note
The signature is selected as the signature with highest score on training (X) data.
- static label2numbers(arr)[source]
Convert labels to numerical values
- Parameters:
arr (array_like) – The array of labels
- Returns:
numeric_labels – Array of numerical labels obtained by the LabelEncoder transform
- Return type:
np.ndarray
Notes
The C++ function allows only numerical (integer) values as labels in input. For more general support refers to the C++ example.
Examples
>>> from DNetPRO import DNetPRO >>> y = ('A', 'A', 'B', 'B') >>> num_y = DNetPRO.label2numbers(y) >>> print(num_y) [0, 0, 1, 1]
- static pendrem(graph)[source]
Remove pendant node iterativelly
- Parameters:
graph (graph) – A NetworkX graph
- Returns:
pruned – The same graph without pendant nodes
- Return type:
graph
Example
>>> import networkx as nx >>> G = nx.star_graph(n=10) >>> print(G.nodes) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> pruned = penderem(G) >>> print(pruned.nodes) [0]
- predict(X)[source]
Reduce X to the selected features and then predict using the underlying estimator.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
- Returns:
y – The predicted target values.
- Return type:
array of shape [n_samples]
- score(X, y)[source]
Reduce X to the selected features and then return the score of the underlying estimator.
- Parameters:
X (array of shape [n_samples, n_features]) – The input samples.
y (array of shape [n_samples]) – The target values.