org.apache.mahout.classifier.df.data
Class Data

java.lang.Object
  extended by org.apache.mahout.classifier.df.data.Data
All Implemented Interfaces:
Cloneable

public class Data
extends Object
implements Cloneable

Holds a list of vectors and their corresponding Dataset. contains various operations that deals with the vectors (subset, count,...)


Constructor Summary
Data(Dataset dataset)
           
Data(Dataset dataset, List<Instance> instances)
           
 
Method Summary
 Data bagging(Random rng)
          if data has N cases, sample N cases at random -but with replacement.
 Data bagging(Random rng, boolean[] sampled)
          if data has N cases, sample N cases at random -but with replacement.
 Data clone()
           
 boolean contains(Instance v)
           
 void countLabels(int[] counts)
          Counts the number of occurrences of each label value
This method can be used when the criterion variable is the categorical attribute.
 boolean equals(Object obj)
           
 double[] extractLabels()
          extract the labels of all instances
 Instance get(int index)
          Returns the element at the specified position
 Dataset getDataset()
           
 int hashCode()
           
 boolean identicalLabel()
          checks if all the vectors have identical label values
 boolean isEmpty()
           
 boolean isIdentical()
          checks if all the vectors have identical attribute values
 int majorityLabel(Random rng)
          finds the majority label, breaking ties randomly
This method can be used when the criterion variable is the categorical attribute.
 Data rsplit(Random rng, int subsize)
          Splits the data in two, returns one part, and this gets the rest of the data.
 int size()
           
 Data subset(Condition condition)
           
 double[] values(int attr)
          finds all distinct values of a given attribute
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Data

public Data(Dataset dataset)

Data

public Data(Dataset dataset,
            List<Instance> instances)
Method Detail

size

public int size()
Returns:
the number of elements

isEmpty

public boolean isEmpty()
Returns:
true if this data contains no element

contains

public boolean contains(Instance v)
Parameters:
v - element whose presence in this list if to be searched
Returns:
true is this data contains the specified element.

get

public Instance get(int index)
Returns the element at the specified position

Parameters:
index - index of element to return
Returns:
the element at the specified position
Throws:
IndexOutOfBoundsException - if the index is out of range

subset

public Data subset(Condition condition)
Returns:
the subset from this data that matches the given condition

bagging

public Data bagging(Random rng)
if data has N cases, sample N cases at random -but with replacement.


bagging

public Data bagging(Random rng,
                    boolean[] sampled)
if data has N cases, sample N cases at random -but with replacement.

Parameters:
sampled - indicating which instance has been sampled
Returns:
sampled data

rsplit

public Data rsplit(Random rng,
                   int subsize)
Splits the data in two, returns one part, and this gets the rest of the data. VERY SLOW!


isIdentical

public boolean isIdentical()
checks if all the vectors have identical attribute values

Returns:
true is all the vectors are identical or the data is empty
false otherwise

identicalLabel

public boolean identicalLabel()
checks if all the vectors have identical label values


values

public double[] values(int attr)
finds all distinct values of a given attribute


clone

public Data clone()
Overrides:
clone in class Object

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

extractLabels

public double[] extractLabels()
extract the labels of all instances


majorityLabel

public int majorityLabel(Random rng)
finds the majority label, breaking ties randomly
This method can be used when the criterion variable is the categorical attribute.

Returns:
the majority label value

countLabels

public void countLabels(int[] counts)
Counts the number of occurrences of each label value
This method can be used when the criterion variable is the categorical attribute.

Parameters:
counts - will contain the results, supposed to be initialized at 0

getDataset

public Dataset getDataset()


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.