Approximate Nearest Neighbours

This library supports using a couple of different approximate nearest neighbours libraries to speed up the recommend and similar_items methods of any Matrix Factorization model.

The potential speedup of using these methods can be quite significant, at the risk of potentially missing relevant results:

See this post comparing the different ANN libraries for more details.

NMSLibModel

class implicit.ann.nmslib.NMSLibModel(model, approximate_similar_items=True, approximate_recommend=True, method='hnsw', index_params=None, query_params=None, **kwargs)

Bases: implicit.recommender_base.RecommenderBase

Speeds up inference calls to MatrixFactorization models by using NMSLib to create approximate nearest neighbours indices of the latent factors.

Parameters

model (MatrixFactorizationBase) – A matrix factorization model to use for the factors
method (str, optional) – The NMSLib method to use
index_params (dict, optional) – Optional params to send to the createIndex call in NMSLib
query_params (dict, optional) – Optional query time params for the NMSLib ‘setQueryTimeParams’ call
approximate_similar_items (bool, optional) – whether or not to build an NMSLIB index for computing similar_items
approximate_recommend (bool, optional) – whether or not to build an NMSLIB index for the recommend call

similar_items_index

NMSLib index for looking up similar items in the cosine space formed by the latent item_factors

Type: nmslib.FloatIndex

recommend_index

NMSLib index for looking up similar items in the inner product space formed by the latent item_factors

Type: nmslib.FloatIndex

fit(Cui, show_progress=True, callback=None)

Trains the model on a sparse matrix of user/item/weight

Parameters

user_items (csr_matrix) – A sparse CSR matrix of shape (number_of_users, number_of_items). The nonzero entries in this matrix are the items that are liked by each user. The values are how confident you are that the item is liked by the user.
show_progress (bool, optional) – Whether to show a progress bar during fitting
callback (Callable, optional) – Callable function on each epoch with such arguments as epoch, elapsed time and progress

similar_items(itemid, N=10, recalculate_item=False, item_users=None, filter_items=None, items=None)

Calculates a list of similar items

Parameters

itemid (Union[int, array_like]) – The item id or an array of item ids to retrieve similar items for
N (int, optional) – The number of similar items to return
recalculate_item (bool, optional) – When true, don’t rely on stored item state and instead recalculate from the passed in item_users
item_users (csr_matrix, optional) – A sparse matrix of shape (itemid, number_users). This lets us look up the users for each item. This is only needs to be set when setting recalculate_item=True. This should have the same number of rows as the itemid parameter, with the first row of the sparse matrix corresponding to the first item in the itemid array.
filter_items (array_like, optional) – An array of item ids to filter out from the results being returned
items (array_like, optional) – An array of item ids to include in the output. If not set all items in the training set will be included. Cannot be used with the filter_items options

Returns

Tuple of (itemids, scores) arrays

Return type

tuple

recommend(userid, user_items, N=10, filter_already_liked_items=True, filter_items=None, recalculate_user=False, items=None)

Recommends items for users.

This method allows you to calculate the top N recommendations for a user or batch of users. Passing an array of userids instead of a single userid will tend to be more efficient, and allows multi-thread processing on the CPU.

This method has options for filtering out items from the results. You can both filter out items that have already been liked by the user with the filter_already_liked_items parameter, as well as pass in filter_items to filter out other items for all users in the batch. By default all items in the training dataset are scored, but by setting the ‘items’ parameter you can restrict down to a subset.

Example usage:

# calculate the top recommendations for a single user
ids, scores = model.recommend(0, user_items[0])

# calculate the top recommendations for a batch of users
userids = np.arange(10)
ids, scores = model.recommend(userids, user_items[userids])

Parameters

userid (Union[int, array_like]) – The userid or array of userids to calculate recommendations for
user_items (csr_matrix) – A sparse matrix of shape (users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially recalculate the user representation. Each row in this sparse matrix corresponds to a row in the userid parameter: that is the first row in this matrix contains the liked items for the first user in the userid array.
N (int, optional) – The number of results to return
filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specified user.
filter_items (array_like, optional) – List of extra item ids to filter out from the output
recalculate_user (bool, optional) – When true, don’t rely on stored user embeddings and instead recalculate from the passed in user_items. This option isn’t supported by all models.
items (array_like, optional) – Array of extra item ids. When set this will only rank the items in this array instead of ranking every item the model was fit for. This parameter cannot be used with filter_items

Returns

Tuple of (itemids, scores) arrays. When calculating for a single user these array will be 1-dimensional with N items. When passed an array of userids, these will be 2-dimensional arrays with a row for each user.

Return type

tuple

similar_users(userid, N=10, filter_users=None, users=None)

Calculates the most similar users for a userid or array of userids

Parameters

userid (Union[int, array_like]) – The userid or an array of userids to retrieve similar users for.
N (int, optional) – The number of similar users to return
filter_users (array_like, optional) – An array of user ids to filter out from the results being returned
users (array_like, optional) – An array of user ids to include in the output. If not set all users in the training set will be included. Cannot be used with the filter_users options

Returns

Tuple of (userids, scores) arrays

Return type

tuple

save(file)

Saves the model to a file, using the numpy .npz format

Parameters: file (str or io.IOBase) – Either the filename or an open file-like object to save the model to

See also

load, numpy.savez

classmethod load(file)

Loads the model from a file

Parameters: fileobj_or_path (str or io.IOBase) – Either the filename or an open file-like object to load the model from
Returns: The model loaded up from disk
Return type: RecommenderBase

See also

save, numpy.load

AnnoyModel

class implicit.ann.annoy.AnnoyModel(model, approximate_similar_items=True, approximate_recommend=True, n_trees=50, search_k=- 1)

Bases: implicit.recommender_base.RecommenderBase

Speeds up inference calls to MatrixFactorization models by using an Annoy index to calculate similar items and recommend items.

Parameters

model (MatrixFactorizationBase) – A matrix factorization model to use for the factors
n_trees (int, optional) – The number of trees to use when building the Annoy index. More trees gives higher precision when querying.
search_k (int, optional) – Provides a way to search more trees at runtime, giving the ability to have more accurate results at the cost of taking more time.
approximate_similar_items (bool, optional) – whether or not to build an Annoy index for computing similar_items
approximate_recommend (bool, optional) – whether or not to build an Annoy index for the recommend call

similar_items_index

Annoy index for looking up similar items in the cosine space formed by the latent item_factors

Type: annoy.AnnoyIndex

recommend_index

Annoy index for looking up similar items in the inner product space formed by the latent item_factors

Type: annoy.AnnoyIndex

fit(Cui, show_progress=True, callback=None)

Trains the model on a sparse matrix of user/item/weight

Parameters

user_items (csr_matrix) – A sparse CSR matrix of shape (number_of_users, number_of_items). The nonzero entries in this matrix are the items that are liked by each user. The values are how confident you are that the item is liked by the user.
show_progress (bool, optional) – Whether to show a progress bar during fitting
callback (Callable, optional) – Callable function on each epoch with such arguments as epoch, elapsed time and progress

similar_items(itemid, N=10, recalculate_item=False, item_users=None, filter_items=None, items=None)

Calculates a list of similar items

Parameters

itemid (Union[int, array_like]) – The item id or an array of item ids to retrieve similar items for
N (int, optional) – The number of similar items to return
recalculate_item (bool, optional) – When true, don’t rely on stored item state and instead recalculate from the passed in item_users
item_users (csr_matrix, optional) – A sparse matrix of shape (itemid, number_users). This lets us look up the users for each item. This is only needs to be set when setting recalculate_item=True. This should have the same number of rows as the itemid parameter, with the first row of the sparse matrix corresponding to the first item in the itemid array.
filter_items (array_like, optional) – An array of item ids to filter out from the results being returned
items (array_like, optional) – An array of item ids to include in the output. If not set all items in the training set will be included. Cannot be used with the filter_items options

Returns

Tuple of (itemids, scores) arrays

Return type

tuple

recommend(userid, user_items, N=10, filter_already_liked_items=True, filter_items=None, recalculate_user=False, items=None)

Recommends items for users.

This method allows you to calculate the top N recommendations for a user or batch of users. Passing an array of userids instead of a single userid will tend to be more efficient, and allows multi-thread processing on the CPU.

This method has options for filtering out items from the results. You can both filter out items that have already been liked by the user with the filter_already_liked_items parameter, as well as pass in filter_items to filter out other items for all users in the batch. By default all items in the training dataset are scored, but by setting the ‘items’ parameter you can restrict down to a subset.

Example usage:

# calculate the top recommendations for a single user
ids, scores = model.recommend(0, user_items[0])

# calculate the top recommendations for a batch of users
userids = np.arange(10)
ids, scores = model.recommend(userids, user_items[userids])

Parameters

userid (Union[int, array_like]) – The userid or array of userids to calculate recommendations for
user_items (csr_matrix) – A sparse matrix of shape (users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially recalculate the user representation. Each row in this sparse matrix corresponds to a row in the userid parameter: that is the first row in this matrix contains the liked items for the first user in the userid array.
N (int, optional) – The number of results to return
filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specified user.
filter_items (array_like, optional) – List of extra item ids to filter out from the output
recalculate_user (bool, optional) – When true, don’t rely on stored user embeddings and instead recalculate from the passed in user_items. This option isn’t supported by all models.
items (array_like, optional) – Array of extra item ids. When set this will only rank the items in this array instead of ranking every item the model was fit for. This parameter cannot be used with filter_items

Returns

Tuple of (itemids, scores) arrays. When calculating for a single user these array will be 1-dimensional with N items. When passed an array of userids, these will be 2-dimensional arrays with a row for each user.

Return type

tuple

similar_users(userid, N=10, filter_users=None, users=None)

Calculates the most similar users for a userid or array of userids

Parameters

userid (Union[int, array_like]) – The userid or an array of userids to retrieve similar users for.
N (int, optional) – The number of similar users to return
filter_users (array_like, optional) – An array of user ids to filter out from the results being returned
users (array_like, optional) – An array of user ids to include in the output. If not set all users in the training set will be included. Cannot be used with the filter_users options

Returns

Tuple of (userids, scores) arrays

Return type

tuple

save(file)

Saves the model to a file, using the numpy .npz format

Parameters: file (str or io.IOBase) – Either the filename or an open file-like object to save the model to

See also

load, numpy.savez

classmethod load(file)

Loads the model from a file

Parameters: fileobj_or_path (str or io.IOBase) – Either the filename or an open file-like object to load the model from
Returns: The model loaded up from disk
Return type: RecommenderBase

See also

save, numpy.load

FaissModel

class implicit.ann.faiss.FaissModel(model, approximate_similar_items=True, approximate_recommend=True, nlist=400, nprobe=20, use_gpu=True)

Bases: implicit.recommender_base.RecommenderBase

Speeds up inference calls to MatrixFactorization models by using Faiss to create approximate nearest neighbours indices of the latent factors.

Parameters

model (MatrixFactorizationBase) – A matrix factorization model to use for the factors
nlist (int, optional) – The number of cells to use when building the Faiss index.
nprobe (int, optional) – The number of cells to visit to perform a search.
use_gpu (bool, optional) – Whether or not to enable run Faiss on the GPU. Requires faiss to have been built with GPU support.
approximate_similar_items (bool, optional) – whether or not to build an Faiss index for computing similar_items
approximate_recommend (bool, optional) – whether or not to build an Faiss index for the recommend call

similar_items_index

Faiss index for looking up similar items in the cosine space formed by the latent item_factors

Type: faiss.IndexIVFFlat

recommend_index

Faiss index for looking up similar items in the inner product space formed by the latent item_factors

Type: faiss.IndexIVFFlat

fit(Cui, show_progress=True, callback=None)

Trains the model on a sparse matrix of user/item/weight

Parameters

user_items (csr_matrix) – A sparse CSR matrix of shape (number_of_users, number_of_items). The nonzero entries in this matrix are the items that are liked by each user. The values are how confident you are that the item is liked by the user.
show_progress (bool, optional) – Whether to show a progress bar during fitting
callback (Callable, optional) – Callable function on each epoch with such arguments as epoch, elapsed time and progress

similar_items(itemid, N=10, recalculate_item=False, item_users=None, filter_items=None, items=None)

Calculates a list of similar items

Parameters

itemid (Union[int, array_like]) – The item id or an array of item ids to retrieve similar items for
N (int, optional) – The number of similar items to return
recalculate_item (bool, optional) – When true, don’t rely on stored item state and instead recalculate from the passed in item_users
item_users (csr_matrix, optional) – A sparse matrix of shape (itemid, number_users). This lets us look up the users for each item. This is only needs to be set when setting recalculate_item=True. This should have the same number of rows as the itemid parameter, with the first row of the sparse matrix corresponding to the first item in the itemid array.
filter_items (array_like, optional) – An array of item ids to filter out from the results being returned
items (array_like, optional) – An array of item ids to include in the output. If not set all items in the training set will be included. Cannot be used with the filter_items options

Returns

Tuple of (itemids, scores) arrays

Return type

tuple

recommend(userid, user_items, N=10, filter_already_liked_items=True, filter_items=None, recalculate_user=False, items=None)

Recommends items for users.

This method allows you to calculate the top N recommendations for a user or batch of users. Passing an array of userids instead of a single userid will tend to be more efficient, and allows multi-thread processing on the CPU.

This method has options for filtering out items from the results. You can both filter out items that have already been liked by the user with the filter_already_liked_items parameter, as well as pass in filter_items to filter out other items for all users in the batch. By default all items in the training dataset are scored, but by setting the ‘items’ parameter you can restrict down to a subset.

Example usage:

# calculate the top recommendations for a single user
ids, scores = model.recommend(0, user_items[0])

# calculate the top recommendations for a batch of users
userids = np.arange(10)
ids, scores = model.recommend(userids, user_items[userids])

Parameters

userid (Union[int, array_like]) – The userid or array of userids to calculate recommendations for
user_items (csr_matrix) – A sparse matrix of shape (users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially recalculate the user representation. Each row in this sparse matrix corresponds to a row in the userid parameter: that is the first row in this matrix contains the liked items for the first user in the userid array.
N (int, optional) – The number of results to return
filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specified user.
filter_items (array_like, optional) – List of extra item ids to filter out from the output
recalculate_user (bool, optional) – When true, don’t rely on stored user embeddings and instead recalculate from the passed in user_items. This option isn’t supported by all models.
items (array_like, optional) – Array of extra item ids. When set this will only rank the items in this array instead of ranking every item the model was fit for. This parameter cannot be used with filter_items

Returns

Tuple of (itemids, scores) arrays. When calculating for a single user these array will be 1-dimensional with N items. When passed an array of userids, these will be 2-dimensional arrays with a row for each user.

Return type

tuple

similar_users(userid, N=10, filter_users=None, users=None)

Calculates the most similar users for a userid or array of userids

Parameters

userid (Union[int, array_like]) – The userid or an array of userids to retrieve similar users for.
N (int, optional) – The number of similar users to return
filter_users (array_like, optional) – An array of user ids to filter out from the results being returned
users (array_like, optional) – An array of user ids to include in the output. If not set all users in the training set will be included. Cannot be used with the filter_users options

Returns

Tuple of (userids, scores) arrays

Return type

tuple

save(file)

Saves the model to a file, using the numpy .npz format

Parameters: file (str or io.IOBase) – Either the filename or an open file-like object to save the model to

See also

load, numpy.savez

classmethod load(file)

Loads the model from a file

Parameters: fileobj_or_path (str or io.IOBase) – Either the filename or an open file-like object to load the model from
Returns: The model loaded up from disk
Return type: RecommenderBase

See also

save, numpy.load