.. _python-side ====================== Python-side Evaluation ====================== ``librec-auto`` supports custom evaluation metrics implemented in python. Required boilerplate -------------------- Argument parsing ~~~~~~~~~~~~~~~~ Regardless of the type of metric you're implementing, you will need some boilerplate code. A ``read_args`` method to handle input to the custom metric. :: def read_args(): """ Parse command line arguments. """ parser = argparse.ArgumentParser(description='My custom metric') parser.add_argument('--test', help='Path to test.') parser.add_argument('--result', help='Path to results.') parser.add_argument('--output-file', help='The output pickle file.') # Custom params defined in the config go here parser.add_argument('--foo', help='The weight for re-ranking.') input_args = parser.parse_args() return vars(input_args) Main function ~~~~~~~~~~~~~ You will also need to start the main function with the following lines. Params specified in the ``config.xml`` are passed to the custom metric files and are accessible via the ``args['param-name']`` syntax. :: if __name__ == '__main__': args = read_args() params = {'foo': args['foo']} test_data = ListBasedMetric.read_data_from_file( args['test'] ) result_data = ListBasedMetric.read_data_from_file( args['result'], delimiter=',' ) Adding a row-based metric (i.e., RMSE) -------------------------------------- For metrics that are based solely upon an item's expected and actual results (and not the entire lists' results), ``librec-auto`` provides the ``RowBasedMetric`` superclass. 1. Create the new class file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, make a file in your study directory. Name the file something clear. Let's assume a file named ``custom_rmse_metric.py`` 2. Override the ``RowBasedMetric`` methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this ``custom_rmse_metric.py`` file, we'll want to copy the boilerplate from above and then create a subclass of ``RowBasedMetric``, like this: :: from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric class CustomRmseMetric(RowBasedMetric): ... We'll also want to override the following methods. ``__init__`` """""""""""" Override ``__init__`` and set ``self._name`` equal to the name of the metric. Do not forget to call ``super().__init__``. :: def __init__(self, params: dict, test_data: np.array, result_data: np.array, output_file) -> None: super().__init__(params, test_data, result_data, output_file) self._name = 'RMSE' ``evaluate_row`` """""""""""""""" This method performs the actual evaluation of the results. The first parameter contains the test (expected) values for a given ``item,user`` combination. The second parameter contains the actual values for that ``item,user`` combination. Both are numpy arrays, and need to be indexed to access the row values. This method should return the value of the metric for the given ``user,item`` combination. Every time that ``evaluate_row`` is executed, the results are saved to a ``_scores`` list in the metric class, which we'll access in ``post_row_processing``. ``evaluate_row`` for RMSE follows: :: def evaluate_row(self, test: np.array, result: np.array): test_ranking = test[2] result_ranking = result[2] return (test_ranking - result_ranking)**2 ``pre_row_processing`` and ``post_row_processing`` """""""""""""""""""""""""""""""""""""""""""""""""" The ``pre_row_processing`` method allows for setting initial values or for other processing that should be performed before _any_ of the rows are processed. Think of this like setting up the metric. The ``post_row_processing`` method should manipulate ``self._scores`` and return a single value that represents the final value of the metric. ``post_row_processing`` for RMSE follows: :: def post_row_processing(self): T = len(self._scores) return (sum(self._scores) / T)**0.5 Below is the complete file for an implementation of RMSE. :: import argparse import numpy as np from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric def read_args(): """ Parse command line arguments. """ parser = argparse.ArgumentParser(description='My custom metric') parser.add_argument('--test', help='Path to test.') parser.add_argument('--result', help='Path to results.') parser.add_argument('--output-file', help='The output pickle file.') # Custom params defined in the config go here parser.add_argument('--foo', help='The weight for re-ranking.') input_args = parser.parse_args() return vars(input_args) class CustomRmseMetric(RowBasedMetric): def __init__(self, params: dict, test_data: np.array, result_data: np.array, output_file) -> None: super().__init__(params, test_data, result_data, output_file) self._name = 'RMSE' def evaluate_row(self, test: np.array, result: np.array): test_ranking = test[2] result_ranking = result[2] return (test_ranking - result_ranking)**2 def post_row_processing(self): T = len(self._scores) return (sum(self._scores) / T)**0.5 if __name__ == '__main__': args = read_args() params = {'foo': args['foo']} test_data = CustomRmseMetric.read_data_from_file(args['test']) result_data = CustomRmseMetric.read_data_from_file(args['result'], delimiter=',') custom = CustomRmseMetric(params, test_data, result_data, args['output_file']) custom.evaluate() Adding a list-based metric (i.e., NDCG) --------------------------------------- For metrics that require the entire result list for computation, ``librec-auto`` provides the ``ListBasedMetric`` superclass, which can be inherited by custom class metrics. Required boilerplate ~~~~~~~~~~~~~~~~~~~~ See above for the argument parsing and main function boilerplate. These are both required for both row- and list-based metrics, and are identical for either. 1. Create the new class file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make a file in your study directory. Name is something clear. Let's assume a file named ``custom_ndcg_metric.py``. 2. Override the ``ListBasedMetric`` methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the ``custom_ndcg_metric.py`` file, we'll want to copy the boilerplate from above and then then import and instantiate the ``ListBasedMetric`` superclass. :: from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric class CustomRmseMetric(ListBasedMetric): ... ``__init__`` """""""""""" Override ``__init__`` and set ``self._name`` equal to the name of the metric. Do not forget to call ``super().__init__``. :: def __init__(self, params: dict, test_data: np.array, result_data: np.array, output_file) -> None: super().__init__(params, test_data, result_data, output_file) self._name = 'RMSE' ``evaluate_user`` """"""""""""""""" This method produces a metric value for a given user, based on test and result arrays of user data. These arrays contain values for all rows where this user is the user. ``evaluate_user`` for NDCG follows: (Note the ``self._list_size`` is set in ``config.xml``, in ``__init__``, and in ``__main__``.) :: def evaluate_user(self, test_user_data: np.array, result_user_data: np.array) -> float: rec_num = int(self._list_size) idealOrder = test_user_data idealDCG = 0.0 for j in range(min(rec_num, len(idealOrder))): idealDCG += ((math.pow(2.0, len(idealOrder) - j) - 1) / math.log(2.0 + j)) recDCG = 0.0 test_user_items = list(test_user_data[:, 1]) for j in range(rec_num): item = int(result_user_data[j][1]) if item in test_user_items: rank = len(test_user_items) - test_user_items.index( item) # why ground truth? recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1)) return (recDCG / idealDCG) ``preprocessing`` and ``postprocessing`` """""""""""""""""""""""""""""""""""""""" ``preprocessing`` should be used to set up initial values for the metric that are not passed from ``config.xml``. Results from every execution of ``evaluate_user`` are saved to ``self._values``, which should be accessed in ``postprocessing`` to produce a single final value. ``postprocessing`` for NDCG follows: :: def postprocessing(self): return np.average(self._values) ``__main__`` """""""""""" Use the main function to parse any file arguments to class parameters, to initialize the custom metric class, and to call ``.evaluate()``. The main function for NDCG follows: :: if __name__ == '__main__': args = read_args() params = {'list_size': args['list_size']} test_data = ListBasedMetric.read_data_from_file( args['test'] ) result_data = ListBasedMetric.read_data_from_file( args['result'], delimiter=',' ) custom = CustomNdcgMetric(params, test_data, result_data, args['output_file']) custom.evaluate() Below is the complete file for a custom implementation of NDCG. :: import argparse import numpy as np import math from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric def read_args(): """ Parse command line arguments. """ parser = argparse.ArgumentParser(description='My custom metric') parser.add_argument('--test', help='Path to test.') parser.add_argument('--result', help='Path to results.') parser.add_argument('--output-file', help='The output pickle file.') # Custom params defined in the config go here parser.add_argument('--list-size', help='Size of the list for NDCG.') input_args = parser.parse_args() return vars(input_args) class CustomNdcgMetric(ListBasedMetric): def __init__(self, params: dict, test_data: np.array, result_data: np.array, output_file: str) -> None: super().__init__(params, test_data, result_data, output_file) self._name = 'NDCG' self._list_size = params['list_size'] def evaluate_user(self, test_user_data: np.array, result_user_data: np.array) -> float: rec_num = int(self._list_size) idealOrder = test_user_data idealDCG = 0.0 for j in range(min(rec_num, len(idealOrder))): idealDCG += ((math.pow(2.0, len(idealOrder) - j) - 1) / math.log(2.0 + j)) recDCG = 0.0 test_user_items = list(test_user_data[:, 1]) for j in range(rec_num): item = int(result_user_data[j][1]) if item in test_user_items: rank = len(test_user_items) - test_user_items.index( item) # why ground truth? recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1)) return (recDCG / idealDCG) def postprocessing(self): return np.average(self._values) if __name__ == '__main__': args = read_args() params = {'list_size': args['list_size']} test_data = ListBasedMetric.read_data_from_file( args['test'] ) result_data = ListBasedMetric.read_data_from_file( args['result'], delimiter=',' ) custom = CustomNdcgMetric(params, test_data, result_data, args['output_file']) custom.evaluate()