org.apache.mahout.classifier.df.data
Class Dataset

java.lang.Object
  extended by org.apache.mahout.classifier.df.data.Dataset

public class Dataset
extends Object

Contains information about the attributes.


Nested Class Summary
static class Dataset.Attribute
          Attributes type
 
Constructor Summary
protected Dataset()
           
 
Method Summary
 boolean equals(Object obj)
           
static Dataset fromJSON(String json)
          De-serialize an instance from a string
 Dataset.Attribute getAttribute(int attr)
           
 int[] getIgnored()
           
 double getLabel(Instance instance)
           
 int getLabelId()
           
 String getLabelString(double code)
          Returns the label value in the data This method can be used when the criterion variable is the categorical attribute.
 int hashCode()
           
 boolean isNumerical(int attr)
          Is this a numerical attribute ?
 int labelCode(String label)
          Returns the code used to represent the label value in the data
 String[] labels()
           
static Dataset load(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path path)
          Loads the dataset from a file
 int nbAttributes()
           
 int nblabels()
           
 int nbValues(int attr)
           
 String toJSON()
          Serialize this instance to JSON
 String toString()
           
 int valueOf(int attr, String token)
          Converts a token to its corresponding integer code for a given attribute
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Dataset

protected Dataset()
Method Detail

nbValues

public int nbValues(int attr)

labels

public String[] labels()

nblabels

public int nblabels()

getLabelId

public int getLabelId()

getLabel

public double getLabel(Instance instance)

getAttribute

public Dataset.Attribute getAttribute(int attr)

labelCode

public int labelCode(String label)
Returns the code used to represent the label value in the data

Parameters:
label - label's value to code
Returns:
label's code

getLabelString

public String getLabelString(double code)
Returns the label value in the data This method can be used when the criterion variable is the categorical attribute.

Parameters:
code - label's code
Returns:
label's value

toString

public String toString()
Overrides:
toString in class Object

valueOf

public int valueOf(int attr,
                   String token)
Converts a token to its corresponding integer code for a given attribute

Parameters:
attr - attribute index

getIgnored

public int[] getIgnored()

nbAttributes

public int nbAttributes()
Returns:
number of attributes

isNumerical

public boolean isNumerical(int attr)
Is this a numerical attribute ?

Parameters:
attr - index of the attribute to check
Returns:
true if the attribute is numerical

equals

public boolean equals(Object obj)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

load

public static Dataset load(org.apache.hadoop.conf.Configuration conf,
                           org.apache.hadoop.fs.Path path)
                    throws IOException
Loads the dataset from a file

Throws:
IOException

toJSON

public String toJSON()
Serialize this instance to JSON

Returns:
some JSON

fromJSON

public static Dataset fromJSON(String json)
De-serialize an instance from a string

Parameters:
json - From which an instance is created
Returns:
A shiny new Dataset


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.