Package PyML :: Package datagen :: Module sample
[frames] | no frames]

Module sample

source code

Functions
 
shuffle(x)
shuffle a list
source code
 
sample(data, size, **args)
sample from a dataset without replacement
source code
 
splitDataset(data, fraction, **args)
split a dataset into two.
source code
 
bootstrap(data, **args)
return a bootstrap sample from a dataset
source code
Function Details

sample(data, size, **args)

source code 

sample from a dataset without replacement

:Parameters:

  • `data` - a dataset object
  • `size` - can be one of the following: An integer - in this case the given number of patterns are chosen. A list - size[i] specifies how many examples to sample from class i (data.labels.classLabels will tell how they are indexed). A dictionary whose keys are the class names e.g. {'+1': 100, '-1':100}. If an entry in the list or dictionary is 'all' then all members of the corresponding class are sampled.

:Keywords:

  • `stratified` - whether to perform stratified sampling [default: True]. This applies only when a global 'size' parameter is provided
  • `seed` - random number generator seed

splitDataset(data, fraction, **args)

source code 

split a dataset into two. randomly splits a dataset into two datasets whose sizes are determined by the 'fraction' parameter (the first dataset will contain that fraction of the examples).

for example: train, test = splitDataset(data, 0.7) will split the data -- 70% for training and 30% for test

:Parameters:

  • `data` - a dataset object
  • `fraction` - the fraction of the examples to put in the first split

:Keywords:

  • `stratified` - whether to perform stratified splitting, i.e. whether to keep the class ratio in the two datasets [default: True]
  • `seed` - random number generator seed
  • `indicesOnly` - if this flag is set, the indices of the two splits are returned instead of the datasets [default: False]

bootstrap(data, **args)

source code 

return a bootstrap sample from a dataset

:Parameters:

  • `data` - a dataset object

:Keywords:

  • `stratified` - whether to perform stratified bootstrapping, i.e. whether to keep the class ratio
  • `seed` - random number generator seed