LogisticMatrixFactorization

class implicit.cpu.lmf.LogisticMatrixFactorization(factors=30, learning_rate=1.0, regularization=0.6, dtype=<class 'numpy.float32'>, iterations=30, neg_prop=30, num_threads=0, random_state=None)

Bases: implicit.cpu.matrix_factorization_base.MatrixFactorizationBase

Logistic Matrix Factorization

A collaborative filtering recommender model that learns probabilistic distribution whether user like it or not. Algorithm of the model is described in Logistic Matrix Factorization for Implicit Feedback Data <https://web.stanford.edu/~rezab/nips2014workshop/submits/logmat.pdf>

Parameters
  • factors (int, optional) – The number of latent factors to compute

  • learning_rate (float, optional) – The learning rate to apply for updates during training

  • regularization (float, optional) – The regularization factor to use

  • dtype (data-type, optional) – Specifies whether to generate 64 bit or 32 bit floating point factors

  • iterations (int, optional) – The number of training epochs to use when fitting the data

  • neg_prop (int, optional) – The proportion of negative samples. i.e.) “neg_prop = 30” means if user have seen 5 items, then 5 * 30 = 150 negative samples are used for training.

  • num_threads (int, optional) – The number of threads to use for fitting the model and batch recommendation calls. Specifying 0 means to default to the number of cores on the machine.

  • random_state (int, RandomState or None, optional) – The random state for seeding the initial item and user factors. Default is None.

item_factors

Array of latent factors for each item in the training set

Type

ndarray

user_factors

Array of latent factors for each user in the training set

Type

ndarray

fit(self, user_items, show_progress=True, callback=None)

Factorizes the user_items matrix

Parameters
  • user_items (csr_matrix) – Matrix of confidences for the liked items. This matrix should be a csr_matrix where the rows of the matrix are the user, and the columns are the items that are liked by the user. BPR ignores the weight value of the matrix right now - it treats non zero entries as a binary signal that the user liked the item.

  • show_progress (bool, optional) – Whether to show a progress bar

  • callback (Callable, optional) – Callable function on each epoch with such arguments as epoch, elapsed time and progress

classmethod load(fileobj_or_path)implicit.recommender_base.RecommenderBase

Loads the model from a file

Parameters

fileobj_or_path (str or io.IOBase) – Either the filename or an open file-like object to load the model from

Returns

The model loaded up from disk

Return type

RecommenderBase

See also

save, numpy.load

recommend(userid, user_items, N=10, filter_already_liked_items=True, filter_items=None, recalculate_user=False, items=None)

Recommends items for users.

This method allows you to calculate the top N recommendations for a user or batch of users. Passing an array of userids instead of a single userid will tend to be more efficient, and allows multi-thread processing on the CPU.

This method has options for filtering out items from the results. You can both filter out items that have already been liked by the user with the filter_already_liked_items parameter, as well as pass in filter_items to filter out other items for all users in the batch. By default all items in the training dataset are scored, but by setting the ‘items’ parameter you can restrict down to a subset.

Example usage:

# calculate the top recommendations for a single user
ids, scores = model.recommend(0, user_items[0])

# calculate the top recommendations for a batch of users
userids = np.arange(10)
ids, scores = model.recommend(userids, user_items[userids])
Parameters
  • userid (Union[int, array_like]) – The userid or array of userids to calculate recommendations for

  • user_items (csr_matrix) – A sparse matrix of shape (users, number_items). This lets us look up the liked items and their weights for the user. This is used to filter out items that have already been liked from the output, and to also potentially recalculate the user representation. Each row in this sparse matrix corresponds to a row in the userid parameter: that is the first row in this matrix contains the liked items for the first user in the userid array.

  • N (int, optional) – The number of results to return

  • filter_already_liked_items (bool, optional) – When true, don’t return items present in the training set that were rated by the specified user.

  • filter_items (array_like, optional) – List of extra item ids to filter out from the output

  • recalculate_user (bool, optional) – When true, don’t rely on stored user embeddings and instead recalculate from the passed in user_items. This option isn’t supported by all models.

  • items (array_like, optional) – Array of extra item ids. When set this will only rank the items in this array instead of ranking every item the model was fit for. This parameter cannot be used with filter_items

Returns

Tuple of (itemids, scores) arrays. When calculating for a single user these array will be 1-dimensional with N items. When passed an array of userids, these will be 2-dimensional arrays with a row for each user.

Return type

tuple

save(self, fileobj_or_path)
similar_items(itemid, N=10, recalculate_item=False, item_users=None, filter_items=None, items=None)

Calculates a list of similar items

Parameters
  • itemid (Union[int, array_like]) – The item id or an array of item ids to retrieve similar items for

  • N (int, optional) – The number of similar items to return

  • recalculate_item (bool, optional) – When true, don’t rely on stored item state and instead recalculate from the passed in item_users

  • item_users (csr_matrix, optional) – A sparse matrix of shape (itemid, number_users). This lets us look up the users for each item. This is only needs to be set when setting recalculate_item=True. This should have the same number of rows as the itemid parameter, with the first row of the sparse matrix corresponding to the first item in the itemid array.

  • filter_items (array_like, optional) – An array of item ids to filter out from the results being returned

  • items (array_like, optional) – An array of item ids to include in the output. If not set all items in the training set will be included. Cannot be used with the filter_items options

Returns

Tuple of (itemids, scores) arrays

Return type

tuple

similar_users(userid, N=10, filter_users=None, users=None)

Calculates the most similar users for a userid or array of userids

Parameters
  • userid (Union[int, array_like]) – The userid or an array of userids to retrieve similar users for.

  • N (int, optional) – The number of similar users to return

  • filter_users (array_like, optional) – An array of user ids to filter out from the results being returned

  • users (array_like, optional) – An array of user ids to include in the output. If not set all users in the training set will be included. Cannot be used with the filter_users options

Returns

Tuple of (userids, scores) arrays

Return type

tuple