org.apache.mahout.cf.taste.impl.similarity.file
Class FileItemSimilarity

java.lang.Object
  extended by org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarity
All Implemented Interfaces:
Refreshable, ItemSimilarity

public class FileItemSimilarity
extends Object
implements ItemSimilarity

An ItemSimilarity backed by a comma-delimited file. This class typically expects a file where each line contains an item ID, followed by another item ID, followed by a similarity value, separated by commas. You may also use tabs.

The similarity value is assumed to be parseable as a double having a value between -1 and 1. The item IDs are parsed as longs. Similarities are symmetric so for a pair of items you do not have to include 2 lines in the file.

This class will reload data from the data file when refresh(Collection) is called, unless the file has been reloaded very recently already.

This class is not intended for use with very large amounts of data. For that, a JDBC-backed ItemSimilarity and a database are more appropriate.


Field Summary
static long DEFAULT_MIN_RELOAD_INTERVAL_MS
           
 
Constructor Summary
FileItemSimilarity(File dataFile)
           
FileItemSimilarity(File dataFile, long minReloadIntervalMS)
           
 
Method Summary
 long[] allSimilarItemIDs(long itemID)
           
 double[] itemSimilarities(long itemID1, long[] itemID2s)
          A bulk-get version of ItemSimilarity.itemSimilarity(long, long).
 double itemSimilarity(long itemID1, long itemID2)
           Returns the degree of similarity, of two items, based on the preferences that users have expressed for the items.
 void refresh(Collection<Refreshable> alreadyRefreshed)
           Triggers "refresh" -- whatever that means -- of the implementation.
protected  void reload()
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_MIN_RELOAD_INTERVAL_MS

public static final long DEFAULT_MIN_RELOAD_INTERVAL_MS
See Also:
Constant Field Values
Constructor Detail

FileItemSimilarity

public FileItemSimilarity(File dataFile)
Parameters:
dataFile - file containing the similarity data

FileItemSimilarity

public FileItemSimilarity(File dataFile,
                          long minReloadIntervalMS)
Parameters:
minReloadIntervalMS - the minimum interval in milliseconds after which a full reload of the original datafile is done when refresh() is called
See Also:
FileItemSimilarity(File)
Method Detail

itemSimilarities

public double[] itemSimilarities(long itemID1,
                                 long[] itemID2s)
                          throws TasteException
Description copied from interface: ItemSimilarity

A bulk-get version of ItemSimilarity.itemSimilarity(long, long).

Specified by:
itemSimilarities in interface ItemSimilarity
Parameters:
itemID1 - first item ID
itemID2s - second item IDs to compute similarity with
Returns:
similarity between itemID1 and other items
Throws:
NoSuchItemException - if any item is known to be non-existent in the data
TasteException - if an error occurs while accessing the data

allSimilarItemIDs

public long[] allSimilarItemIDs(long itemID)
                         throws TasteException
Specified by:
allSimilarItemIDs in interface ItemSimilarity
Returns:
all IDs of similar items, in no particular order
Throws:
TasteException

itemSimilarity

public double itemSimilarity(long itemID1,
                             long itemID2)
                      throws TasteException
Description copied from interface: ItemSimilarity

Returns the degree of similarity, of two items, based on the preferences that users have expressed for the items.

Specified by:
itemSimilarity in interface ItemSimilarity
Parameters:
itemID1 - first item ID
itemID2 - second item ID
Returns:
similarity between the items, in [-1,1] or Double.NaN similarity is unknown
Throws:
NoSuchItemException - if either item is known to be non-existent in the data
TasteException - if an error occurs while accessing the data

refresh

public void refresh(Collection<Refreshable> alreadyRefreshed)
Description copied from interface: Refreshable

Triggers "refresh" -- whatever that means -- of the implementation. The general contract is that any Refreshable should always leave itself in a consistent, operational state, and that the refresh atomically updates internal state from old to new.

Specified by:
refresh in interface Refreshable
Parameters:
alreadyRefreshed - Refreshables that are known to have already been refreshed as a result of an initial call to a Refreshable.refresh(Collection) method on some object. This ensure that objects in a refresh dependency graph aren't refreshed twice needlessly.

reload

protected void reload()

toString

public String toString()
Overrides:
toString in class Object


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.