Datasets

load_sample

bcselector.datasets.load_sample(as_frame=True)[source]

Load and return the sample artificial dataset.

Samples total

10000

Dimensionality

35

Target variables

1

Parameters

as_frame (bool, default=True) – If True, the data is a pandas DataFrame including columns with appropriate names. The target is a pandas DataFrame with multiple target variables.

Returns

  • data ({np.ndarray, pd.DataFrame} of shape (10000, 35)) – The data matrix. If as_frame=True, data will be a pd.DataFrame.

  • target ({np.ndarray, pd.Series} of shape (10000, 35)) – The binary classification target variable. If as_frame=True, target will be a pd.DataFrame.

  • costs ({dict, list)) – Cost of every feature in data. If as_frame=True, target will be a dict.

Examples

>>> from bcselector.dataset import load_sample
>>> data, target, costs = load_sample()

load_hepatitis

bcselector.datasets.load_hepatitis(as_frame=True, discretize_data=True, **kwargs)[source]

Load and return the hepatitis dataset provided. The mimic3 dataset is a small medical dataset with single target variable. Dataset is collected from UCI repository 3.

Samples total

155

Dimensionality

19

Target variables

1

Parameters
  • as_frame (bool, default=True) – If True, the data is a pandas DataFrame including columns with appropriate names. The target is a pandas DataFrame with multiple target variables.

  • discretize_data (bool, default=True) – If True, the returned data is discretized with sklearn.preprocessing.KBinsDiscretizer.

  • kwargs – Arguments passed to sklearn.preprocessing.KBinsDiscretizer constructor.

Returns

  • data ({np.ndarray, pd.DataFrame} of shape (6591, 306)) – The data matrix. If as_frame=True, data will be a pd.DataFrame.

  • target ({np.ndarray, pd.Series} of shape (6591, 10)) – The binary classification target variable. If as_frame=True, target will be a pd.DataFrame.

  • costs ({dict, list)) – Cost of every feature in data. If as_frame=True, target will be a dict.

References

3

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Examples

>>> from bcselector.dataset import load_hepatitis
>>> data, target, costs = load_hepatitis()