org.apache.mahout.cf.taste.impl.similarity
Class EuclideanDistanceSimilarity

java.lang.Object
  extended by org.apache.mahout.cf.taste.impl.similarity.AbstractItemSimilarity
      extended by org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity
All Implemented Interfaces:
Refreshable, ItemSimilarity, UserSimilarity

public final class EuclideanDistanceSimilarity
extends AbstractItemSimilarity

An implementation of a "similarity" based on the Euclidean "distance" between two users X and Y. Thinking of items as dimensions and preferences as points along those dimensions, a distance is computed using all items (dimensions) where both users have expressed a preference for that item. This is simply the square root of the sum of the squares of differences in position (preference) along each dimension.

The similarity could be computed as 1 / (1 + distance), so the resulting values are in the range (0,1]. This would weight against pairs that overlap in more dimensions, which should indicate more similarity, since more dimensions offer more opportunities to be farther apart. Actually, it is computed as sqrt(n) / (1 + distance), where n is the number of dimensions, in order to help correct for this. sqrt(n) is chosen since randomly-chosen points have a distance that grows as sqrt(n).

Note that this could cause a similarity to exceed 1; such values are capped at 1.

Note that the distance isn't normalized in any way; it's not valid to compare similarities computed from different domains (different rating scales, for example). Within one domain, normalizing doesn't matter much as it doesn't change ordering.


Constructor Summary
EuclideanDistanceSimilarity(DataModel dataModel)
           
EuclideanDistanceSimilarity(DataModel dataModel, Weighting weighting)
           
 
Method Summary
 double[] itemSimilarities(long itemID1, long[] itemID2s)
          A bulk-get version of ItemSimilarity.itemSimilarity(long, long).
 double itemSimilarity(long itemID1, long itemID2)
           Returns the degree of similarity, of two items, based on the preferences that users have expressed for the items.
 void refresh(Collection<Refreshable> alreadyRefreshed)
           Triggers "refresh" -- whatever that means -- of the implementation.
 void setPreferenceInferrer(PreferenceInferrer inferrer)
           Attaches a PreferenceInferrer to the UserSimilarity implementation.
 String toString()
           
 double userSimilarity(long userID1, long userID2)
           Returns the degree of similarity, of two users, based on the their preferences.
 
Methods inherited from class org.apache.mahout.cf.taste.impl.similarity.AbstractItemSimilarity
allSimilarItemIDs, getDataModel
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

EuclideanDistanceSimilarity

public EuclideanDistanceSimilarity(DataModel dataModel)
                            throws TasteException
Throws:
IllegalArgumentException - if DataModel does not have preference values
TasteException

EuclideanDistanceSimilarity

public EuclideanDistanceSimilarity(DataModel dataModel,
                                   Weighting weighting)
                            throws TasteException
Throws:
IllegalArgumentException - if DataModel does not have preference values
TasteException
Method Detail

setPreferenceInferrer

public final void setPreferenceInferrer(PreferenceInferrer inferrer)
Description copied from interface: UserSimilarity

Attaches a PreferenceInferrer to the UserSimilarity implementation.

Specified by:
setPreferenceInferrer in interface UserSimilarity
Parameters:
inferrer - PreferenceInferrer

userSimilarity

public double userSimilarity(long userID1,
                             long userID2)
                      throws TasteException
Description copied from interface: UserSimilarity

Returns the degree of similarity, of two users, based on the their preferences.

Specified by:
userSimilarity in interface UserSimilarity
Parameters:
userID1 - first user ID
userID2 - second user ID
Returns:
similarity between the users, in [-1,1] or Double.NaN similarity is unknown
Throws:
NoSuchUserException - if either user is known to be non-existent in the data
TasteException - if an error occurs while accessing the data

itemSimilarity

public final double itemSimilarity(long itemID1,
                                   long itemID2)
                            throws TasteException
Description copied from interface: ItemSimilarity

Returns the degree of similarity, of two items, based on the preferences that users have expressed for the items.

Specified by:
itemSimilarity in interface ItemSimilarity
Parameters:
itemID1 - first item ID
itemID2 - second item ID
Returns:
similarity between the items, in [-1,1] or Double.NaN similarity is unknown
Throws:
NoSuchItemException - if either item is known to be non-existent in the data
TasteException - if an error occurs while accessing the data

itemSimilarities

public double[] itemSimilarities(long itemID1,
                                 long[] itemID2s)
                          throws TasteException
Description copied from interface: ItemSimilarity

A bulk-get version of ItemSimilarity.itemSimilarity(long, long).

Specified by:
itemSimilarities in interface ItemSimilarity
Parameters:
itemID1 - first item ID
itemID2s - second item IDs to compute similarity with
Returns:
similarity between itemID1 and other items
Throws:
NoSuchItemException - if any item is known to be non-existent in the data
TasteException - if an error occurs while accessing the data

refresh

public final void refresh(Collection<Refreshable> alreadyRefreshed)
Description copied from interface: Refreshable

Triggers "refresh" -- whatever that means -- of the implementation. The general contract is that any Refreshable should always leave itself in a consistent, operational state, and that the refresh atomically updates internal state from old to new.

Specified by:
refresh in interface Refreshable
Overrides:
refresh in class AbstractItemSimilarity
Parameters:
alreadyRefreshed - Refreshables that are known to have already been refreshed as a result of an initial call to a Refreshable.refresh(Collection) method on some object. This ensure that objects in a refresh dependency graph aren't refreshed twice needlessly.

toString

public final String toString()
Overrides:
toString in class Object


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.