Python-side Evaluation

librec-auto supports custom evaluation metrics implemented in python.

Required boilerplate

Argument parsing

Regardless of the type of metric you’re implementing, you will need some boilerplate code.

A read_args method to handle input to the custom metric.

def read_args():
    """
    Parse command line arguments.
    """
    parser = argparse.ArgumentParser(description='My custom metric')
    parser.add_argument('--test', help='Path to test.')
    parser.add_argument('--result', help='Path to results.')
    parser.add_argument('--output-file', help='The output pickle file.')

    # Custom params defined in the config go here
    parser.add_argument('--foo', help='The weight for re-ranking.')

    input_args = parser.parse_args()
    return vars(input_args)

Main function

You will also need to start the main function with the following lines. Params specified in the config.xml are passed to the custom metric files and are accessible via the args['param-name'] syntax.

if __name__ == '__main__':
    args = read_args()

    params = {'foo': args['foo']}

    test_data = ListBasedMetric.read_data_from_file(
        args['test']
    )
    result_data = ListBasedMetric.read_data_from_file(
        args['result'],
        delimiter=','
    )

Adding a row-based metric (i.e., RMSE)

For metrics that are based solely upon an item’s expected and actual results (and not the entire lists’ results), librec-auto provides the RowBasedMetric superclass.

1. Create the new class file

First, make a file in your study directory. Name the file something clear. Let’s assume a file named custom_rmse_metric.py

2. Override the RowBasedMetric methods

In this custom_rmse_metric.py file, we’ll want to copy the boilerplate from above and then create a subclass of RowBasedMetric, like this:

from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric

class CustomRmseMetric(RowBasedMetric):
    ...

We’ll also want to override the following methods.

__init__

Override __init__ and set self._name equal to the name of the metric. Do not forget to call super().__init__.

def __init__(self, params: dict, test_data: np.array,
             result_data: np.array, output_file) -> None:
    super().__init__(params, test_data, result_data, output_file)
    self._name = 'RMSE'

evaluate_row

This method performs the actual evaluation of the results. The first parameter contains the test (expected) values for a given item,user combination. The second parameter contains the actual values for that item,user combination. Both are numpy arrays, and need to be indexed to access the row values. This method should return the value of the metric for the given user,item combination.

Every time that evaluate_row is executed, the results are saved to a _scores list in the metric class, which we’ll access in post_row_processing.

evaluate_row for RMSE follows:

def evaluate_row(self, test: np.array, result: np.array):
    test_ranking = test[2]
    result_ranking = result[2]
    return (test_ranking - result_ranking)**2

pre_row_processing and post_row_processing

The pre_row_processing method allows for setting initial values or for other processing that should be performed before _any_ of the rows are processed. Think of this like setting up the metric.

The post_row_processing method should manipulate self._scores and return a single value that represents the final value of the metric.

post_row_processing for RMSE follows:

def post_row_processing(self):
    T = len(self._scores)
    return (sum(self._scores) / T)**0.5

Below is the complete file for an implementation of RMSE.

import argparse
import numpy as np

from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric


def read_args():
    """
    Parse command line arguments.
    """
    parser = argparse.ArgumentParser(description='My custom metric')
    parser.add_argument('--test', help='Path to test.')
    parser.add_argument('--result', help='Path to results.')
    parser.add_argument('--output-file', help='The output pickle file.')

    # Custom params defined in the config go here
    parser.add_argument('--foo', help='The weight for re-ranking.')

    input_args = parser.parse_args()
    return vars(input_args)


class CustomRmseMetric(RowBasedMetric):
    def __init__(self, params: dict, test_data: np.array,
                result_data: np.array, output_file) -> None:
        super().__init__(params, test_data, result_data, output_file)
        self._name = 'RMSE'

    def evaluate_row(self, test: np.array, result: np.array):
        test_ranking = test[2]
        result_ranking = result[2]
        return (test_ranking - result_ranking)**2

            def post_row_processing(self):
                    T = len(self._scores)
                    return (sum(self._scores) / T)**0.5


if __name__ == '__main__':
    args = read_args()

    params = {'foo': args['foo']}

    test_data = CustomRmseMetric.read_data_from_file(args['test'])

    result_data = CustomRmseMetric.read_data_from_file(args['result'],
                                                    delimiter=',')

    custom = CustomRmseMetric(params, test_data, result_data,
                            args['output_file'])

    custom.evaluate()

Adding a list-based metric (i.e., NDCG)

For metrics that require the entire result list for computation, librec-auto provides the ListBasedMetric superclass, which can be inherited by custom class metrics.

Required boilerplate

See above for the argument parsing and main function boilerplate. These are both required for both row- and list-based metrics, and are identical for either.

1. Create the new class file

Make a file in your study directory. Name is something clear. Let’s assume a file named custom_ndcg_metric.py.

2. Override the ListBasedMetric methods

In the custom_ndcg_metric.py file, we’ll want to copy the boilerplate from above and then then import and instantiate the ListBasedMetric superclass.

from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric

class CustomRmseMetric(ListBasedMetric):
    ...

__init__

Override __init__ and set self._name equal to the name of the metric. Do not forget to call super().__init__.

def __init__(self, params: dict, test_data: np.array,
             result_data: np.array, output_file) -> None:
    super().__init__(params, test_data, result_data, output_file)
    self._name = 'RMSE'

evaluate_user

This method produces a metric value for a given user, based on test and result arrays of user data. These arrays contain values for all rows where this user is the user.

evaluate_user for NDCG follows:

(Note the self._list_size is set in config.xml, in __init__, and in __main__.)

def evaluate_user(self, test_user_data: np.array,
                  result_user_data: np.array) -> float:
    rec_num = int(self._list_size)

    idealOrder = test_user_data
    idealDCG = 0.0

    for j in range(min(rec_num, len(idealOrder))):
        idealDCG += ((math.pow(2.0,
                               len(idealOrder) - j) - 1) /
                     math.log(2.0 + j))

    recDCG = 0.0
    test_user_items = list(test_user_data[:, 1])

    for j in range(rec_num):
        item = int(result_user_data[j][1])
        if item in test_user_items:
            rank = len(test_user_items) - test_user_items.index(
                item)  # why ground truth?
            recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1))
    return (recDCG / idealDCG)

preprocessing and postprocessing

preprocessing should be used to set up initial values for the metric that are not passed from config.xml.

Results from every execution of evaluate_user are saved to self._values, which should be accessed in postprocessing to produce a single final value.

postprocessing for NDCG follows:

def postprocessing(self):
    return np.average(self._values)

__main__

Use the main function to parse any file arguments to class parameters, to initialize the custom metric class, and to call .evaluate().

The main function for NDCG follows:

if __name__ == '__main__':
        args = read_args()

        params = {'list_size': args['list_size']}

        test_data = ListBasedMetric.read_data_from_file(
                args['test']
        )
        result_data = ListBasedMetric.read_data_from_file(
                args['result'],
                delimiter=','
        )

        custom = CustomNdcgMetric(params, test_data, result_data,
                                                        args['output_file'])

        custom.evaluate()

Below is the complete file for a custom implementation of NDCG.

import argparse
import numpy as np
import math

from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric

def read_args():
    """
    Parse command line arguments.
    """
    parser = argparse.ArgumentParser(description='My custom metric')
    parser.add_argument('--test', help='Path to test.')
    parser.add_argument('--result', help='Path to results.')
    parser.add_argument('--output-file', help='The output pickle file.')

    # Custom params defined in the config go here
    parser.add_argument('--list-size', help='Size of the list for NDCG.')

    input_args = parser.parse_args()
    return vars(input_args)

class CustomNdcgMetric(ListBasedMetric):
    def __init__(self, params: dict, test_data: np.array,
                result_data: np.array, output_file: str) -> None:
        super().__init__(params, test_data, result_data, output_file)
        self._name = 'NDCG'
        self._list_size = params['list_size']

    def evaluate_user(self, test_user_data: np.array,
                    result_user_data: np.array) -> float:
        rec_num = int(self._list_size)

        idealOrder = test_user_data
        idealDCG = 0.0

        for j in range(min(rec_num, len(idealOrder))):
            idealDCG += ((math.pow(2.0,
                                len(idealOrder) - j) - 1) /
                        math.log(2.0 + j))

        recDCG = 0.0
        test_user_items = list(test_user_data[:, 1])

        for j in range(rec_num):
            item = int(result_user_data[j][1])
            if item in test_user_items:
                rank = len(test_user_items) - test_user_items.index(
                    item)  # why ground truth?
                recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1))
        return (recDCG / idealDCG)

    def postprocessing(self):
        return np.average(self._values)


if __name__ == '__main__':
    args = read_args()


    params = {'list_size': args['list_size']}

    test_data = ListBasedMetric.read_data_from_file(
        args['test']
    )
    result_data = ListBasedMetric.read_data_from_file(
        args['result'],
        delimiter=','
    )

    custom = CustomNdcgMetric(params, test_data, result_data,
                            args['output_file'])

    custom.evaluate()