AlternatingLeastSquares

class implicit.cpu.als.AlternatingLeastSquares(factors=100, regularization=0.01, alpha=1.0, dtype=<class 'numpy.float32'>, use_native=True, use_cg=True, iterations=15, calculate_training_loss=False, num_threads=0, random_state=None)

Bases: implicit.cpu.matrix_factorization_base.MatrixFactorizationBase

Alternating Least Squares

A Recommendation Model based off the algorithms described in the paper ‘Collaborative Filtering for Implicit Feedback Datasets’ with performance optimizations described in ‘Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.’

Parameters

factors (int, optional) – The number of latent factors to compute
regularization (float, optional) – The regularization factor to use
alpha (float, optional) – The weight to give to positive examples.
dtype (data-type, optional) – Specifies whether to generate 64 bit or 32 bit floating point factors
use_native (bool, optional) – Use native extensions to speed up model fitting
use_cg (bool, optional) – Use a faster Conjugate Gradient solver to calculate factors
iterations (int, optional) – The number of ALS iterations to use when fitting data
calculate_training_loss (bool, optional) – Whether to log out the training loss at each iteration
num_threads (int, optional) – The number of threads to use for fitting the model and batch recommend calls. Specifying 0 means to default to the number of cores on the machine.
random_state (int, numpy.random.RandomState or None, optional) – The random state for seeding the initial item and user factors. Default is None.

item_factors

Array of latent factors for each item in the training set

Type: ndarray

user_factors

Array of latent factors for each user in the training set

Type: ndarray

fit(user_items, show_progress=True, callback=None)

Factorizes the user_items matrix.

After calling this method, the members ‘user_factors’ and ‘item_factors’ will be initialized with a latent factor model of the input data.

The user_items matrix does double duty here. It defines which items are liked by which users (P_ui in the original paper), as well as how much confidence we have that the user liked the item (C_ui).

The negative items are implicitly defined: This code assumes that positive items in the user_items matrix means that the user liked the item. The negatives are left unset in this sparse matrix: the library will assume that means Piu = 0 and Ciu = 1 for all these items. Negative items can also be passed with a higher confidence value by passing a negative value, indicating that the user disliked the item.

Parameters

user_items (csr_matrix) – Matrix of confidences for the liked items. This matrix should be a csr_matrix where the rows of the matrix are the users, the columns are the items liked that user, and the value is the confidence that the user liked the item.
show_progress (bool, optional) – Whether to show a progress bar during fitting
callback (Callable, optional) – Callable function on each epoch with such arguments as epoch, elapsed time and progress

recalculate_user(userid, user_items)

Recalculates factors for a batch of users

This method recalculates factors for a batch of users and returns the factors without storing on the object. For updating the model using ‘partial_fit_users’

Parameters

userid (Union[array_like, int]) – The userid or array of userids to recalculate
user_items (csr_matrix) – Sparse matrix of (users, items) that contain the users that liked each item.

recalculate_item(itemid, item_users)

Recalculates factors for a batch of items

This recalculates factors for a batch of items, returns the newly calculated values without storing.

Parameters

itemid (Union[array_like, int]) – The itemid or array of itemids to recalculate
item_users (csr_matrix) – Sparse matrix of (items, users) that contain the users that liked each item

partial_fit_users(userids, user_items)

Incrementally updates user factors

This method updates factors for users specified by userids, given a sparse matrix of items that they have interacted with before. This allows you to retrain only parts of the model with new data, and avoid a full retraining when new users appear - or the liked items for an existing user change.

Parameters

userids (array_like) – An array of userids to calculate new factors for
user_items (csr_matrix) – Sparse matrix containing the liked items for each user. Each row in this matrix corresponds to a row in userids.

partial_fit_items(itemids, item_users)

Incrementally updates item factors

This method updates factors for items specified by itemids, given a sparse matrix of users that have interacted with them. This allows you to retrain only parts of the model with new data, and avoid a full retraining when new users appear - or the liked users for an existing item change.

Parameters

itemids (array_like) – An array of itemids to calculate new factors for
item_users (csr_matrix) – Sparse matrix containing the liked users for each item in itemids

explain(userid, user_items, itemid, user_weights=None, N=10)

Provides explanations for why the item is liked by the user.

Parameters

userid (int) – The userid to explain recommendations for
user_items (csr_matrix) – Sparse matrix containing the liked items for the user
itemid (int) – The itemid to explain recommendations for
user_weights (ndarray, optional) – Precomputed Cholesky decomposition of the weighted user liked items. Useful for speeding up repeated calls to this function, this value is returned
N (int, optional) – The number of liked items to show the contribution for

Returns

total_score (float) – The total predicted score for this user/item pair
top_contributions (list) – A list of the top N (itemid, score) contributions for this user/item pair
user_weights (ndarray) – A factorized representation of the user. Passing this in to future ‘explain’ calls will lead to noticeable speedups

to_gpu(): Converts this model to an equivalent version running on the gpu

save(fileobj_or_path)

Saves the model to a file, using the numpy .npz format

Parameters: file (str or io.IOBase) – Either the filename or an open file-like object to save the model to

See also

save, numpy.load

recommend(userid, user_items, N=10, filter_already_liked_items=True, filter_items=None, recalculate_user=False, items=None)

Recommends items for users.

This method allows you to calculate the top N recommendations for a user or batch of users. Passing an array of userids instead of a single userid will tend to be more efficient, and allows multi-thread processing on the CPU.

This method has options for filtering out items from the results. You can both filter out items that have already been liked by the user with the filter_already_liked_items parameter, as well as pass in filter_items to filter out other items for all users in the batch. By default all items in the training dataset are scored, but by setting the ‘items’ parameter you can restrict down to a subset.

Example usage:

# calculate the top recommendations for a single user
ids, scores = model.recommend(0, user_items[0])

# calculate the top recommendations for a batch of users
userids = np.arange(10)
ids, scores = model.recommend(userids, user_items[userids])

Parameters

userid (Union[int, array_like]) – The userid or array of userids to calculate recommendations for
user_items (csr_matrix) – A sparse matrix of shape (users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially recalculate the user representation. Each row in this sparse matrix corresponds to a row in the userid parameter: that is the first row in this matrix contains the liked items for the first user in the userid array.
N (int, optional) – The number of results to return
filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specified user.
filter_items (array_like, optional) – List of extra item ids to filter out from the output
recalculate_user (bool, optional) – When true, don’t rely on stored user embeddings and instead recalculate from the passed in user_items. This option isn’t supported by all models.
items (array_like, optional) – Array of extra item ids. When set this will only rank the items in this array instead of ranking every item the model was fit for. This parameter cannot be used with filter_items

Returns

Tuple of (itemids, scores) arrays. When calculating for a single user these array will be 1-dimensional with N items. When passed an array of userids, these will be 2-dimensional arrays with a row for each user.

Return type

tuple

similar_items(itemid, N=10, recalculate_item=False, item_users=None, filter_items=None, items=None)

Calculates a list of similar items

Parameters

itemid (Union[int, array_like]) – The item id or an array of item ids to retrieve similar items for
N (int, optional) – The number of similar items to return
recalculate_item (bool, optional) – When true, don’t rely on stored item state and instead recalculate from the passed in item_users
item_users (csr_matrix, optional) – A sparse matrix of shape (itemid, number_users). This lets us look up the users for each item. This is only needs to be set when setting recalculate_item=True. This should have the same number of rows as the itemid parameter, with the first row of the sparse matrix corresponding to the first item in the itemid array.
filter_items (array_like, optional) – An array of item ids to filter out from the results being returned
items (array_like, optional) – An array of item ids to include in the output. If not set all items in the training set will be included. Cannot be used with the filter_items options

Returns

Tuple of (itemids, scores) arrays

Return type

tuple

similar_users(userid, N=10, filter_users=None, users=None)

Calculates the most similar users for a userid or array of userids

Parameters

userid (Union[int, array_like]) – The userid or an array of userids to retrieve similar users for.
N (int, optional) – The number of similar users to return
filter_users (array_like, optional) – An array of user ids to filter out from the results being returned
users (array_like, optional) – An array of user ids to include in the output. If not set all users in the training set will be included. Cannot be used with the filter_users options

Returns

Tuple of (userids, scores) arrays

Return type

tuple