Data model

pyrecsys data model includes: users, items, and its interaction.

Items

Create an item, with an Id, and some metadata:

from recsys.datamodel.item import Item

ITEMID = 'a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432'
item = Item(ITEMID)
# ...plus any other info you'd like to add
name = 'U2'
genres = ['rock', 'irish', '80s']
popularity = 8.89
item.add_data({'name': name, 'genres': genres, 'popularity': popularity})

Create a dict of items, reading from a file. This example actually reads the Movielens 1M Ratings Movies file (movies.dat):

# Read movie info
def read_items(filename):
    items = dict()
    for line in open(filename):
        #1::Toy Story (1995)::Animation|Children's|Comedy
        data =  line.strip('\r\n').split('::')
        item_id = int(data[0])
        item_name = data[1]
        genres = data[2].split('|')
        item = Item(item_id)
        item.add_data({'name': item_name, 'genres': genres})
        items[item_id] = item
    return items

# Call it!
filename = './data/movielens/movies.dat'
items = read_items(filename)

Users

Create a user, with a given Id:

from recsys.datamodel.user import User

USERID = 'ocelma'
user = User(USERID)

Link an item with a user, plus its interaction (rating, number of plays, views, etc.):

from recsys.datamodel.user import User
from recsys.datamodel.item import Item

ITEMID = 'a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432'
item = Item(ITEMID)
# ...plus any other info you'd like to add
name = 'U2'
item.add_data({'name': name})

USERID = 'ocelma'
PLAYS = 256
user = User(USERID)
user.add_item(ITEMID, PLAYS) #Instead of PLAYS, one can add the classical [1..5] stars (rating)

Data

Data class manages “users rate items” information.

Loading user data, and adding tuples to Data:

from recsys.datamodel.data import Data

data = Data()
for PLAYS, ITEMID in user.get_items():
    data.add_tuple((PLAYS, ITEMID, user.get_id())) # Tuple format is: <value, row, column>

Loading a train/test dataset from a file. This example actually reads the Movielens 1M Ratings Data Set (ratings.dat) file:

from recsys.datamodel.data import Data

filename = './data/movielens/ratings.dat'

data = Data()
format = {'col':0, 'row':1, 'value':2, 'ids': 'int'}
    # About format parameter:
    #   'row': 1 -> Rows in matrix come from column 1 in ratings.dat file
    #   'col': 0 -> Cols in matrix come from column 0 in ratings.dat file
    #   'value': 2 -> Values (Mij) in matrix come from column 2 in ratings.dat file
    #   'ids': int -> Ids (row and col ids) are integers (not strings)
data.load(filename, sep='::', format=format)
train, test = data.split_train_test(percent=80) # 80% train, 20% test

Getting data from the test dataset:

for rating, item_id, user_id in test:
    pass # Do something, like evaluating how well we can predict the ratings in this test dataset

Accessing the test dataset as if it were a list:

test[3]

The Data class can also store the information to disk:

data.save(FILENAME)

Or even load or save data using the pickle format:

data.load_pickle(FILENAME)
data.save_pickle(FILENAME)

Table Of Contents

Previous topic

Quick start

Next topic

Algorithms

This Page