Python-side Evaluation¶
librec-auto
supports custom evaluation metrics implemented in python.
Required boilerplate¶
Argument parsing¶
Regardless of the type of metric you’re implementing, you will need some boilerplate code.
A read_args
method to handle input to the custom metric.
def read_args():
"""
Parse command line arguments.
"""
parser = argparse.ArgumentParser(description='My custom metric')
parser.add_argument('--test', help='Path to test.')
parser.add_argument('--result', help='Path to results.')
parser.add_argument('--output-file', help='The output pickle file.')
# Custom params defined in the config go here
parser.add_argument('--foo', help='The weight for re-ranking.')
input_args = parser.parse_args()
return vars(input_args)
Main function¶
You will also need to start the main function with the following lines.
Params specified in the config.xml
are passed to the custom metric files
and are accessible via the args['param-name']
syntax.
if __name__ == '__main__':
args = read_args()
params = {'foo': args['foo']}
test_data = ListBasedMetric.read_data_from_file(
args['test']
)
result_data = ListBasedMetric.read_data_from_file(
args['result'],
delimiter=','
)
Adding a row-based metric (i.e., RMSE)¶
For metrics that are based solely upon an item’s expected and actual results
(and not the entire lists’ results), librec-auto
provides the RowBasedMetric
superclass.
1. Create the new class file¶
First, make a file in your study directory. Name the file something clear.
Let’s assume a file named custom_rmse_metric.py
2. Override the RowBasedMetric
methods¶
In this custom_rmse_metric.py
file, we’ll want to copy the boilerplate from
above and then create a subclass of RowBasedMetric
, like this:
from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric
class CustomRmseMetric(RowBasedMetric):
...
We’ll also want to override the following methods.
__init__
¶
Override __init__
and set self._name
equal to the name of the metric.
Do not forget to call super().__init__
.
def __init__(self, params: dict, test_data: np.array,
result_data: np.array, output_file) -> None:
super().__init__(params, test_data, result_data, output_file)
self._name = 'RMSE'
evaluate_row
¶
This method performs the actual evaluation of the results. The first parameter contains
the test (expected) values for a given item,user
combination. The second
parameter contains the actual values for that item,user
combination. Both are numpy
arrays, and need to be indexed to access the row values. This method should
return the value of the metric for the given user,item
combination.
Every time that evaluate_row
is executed, the results are saved to a _scores
list in the metric class, which we’ll access in post_row_processing
.
evaluate_row
for RMSE follows:
def evaluate_row(self, test: np.array, result: np.array):
test_ranking = test[2]
result_ranking = result[2]
return (test_ranking - result_ranking)**2
pre_row_processing
and post_row_processing
¶
The pre_row_processing
method allows for setting initial values or for other
processing that should be performed before _any_ of the rows are processed.
Think of this like setting up the metric.
The post_row_processing
method should manipulate self._scores
and return
a single value that represents the final value of the metric.
post_row_processing
for RMSE follows:
def post_row_processing(self):
T = len(self._scores)
return (sum(self._scores) / T)**0.5
Below is the complete file for an implementation of RMSE.
import argparse
import numpy as np
from librec_auto.core.eval.metrics.row_based_metric import RowBasedMetric
def read_args():
"""
Parse command line arguments.
"""
parser = argparse.ArgumentParser(description='My custom metric')
parser.add_argument('--test', help='Path to test.')
parser.add_argument('--result', help='Path to results.')
parser.add_argument('--output-file', help='The output pickle file.')
# Custom params defined in the config go here
parser.add_argument('--foo', help='The weight for re-ranking.')
input_args = parser.parse_args()
return vars(input_args)
class CustomRmseMetric(RowBasedMetric):
def __init__(self, params: dict, test_data: np.array,
result_data: np.array, output_file) -> None:
super().__init__(params, test_data, result_data, output_file)
self._name = 'RMSE'
def evaluate_row(self, test: np.array, result: np.array):
test_ranking = test[2]
result_ranking = result[2]
return (test_ranking - result_ranking)**2
def post_row_processing(self):
T = len(self._scores)
return (sum(self._scores) / T)**0.5
if __name__ == '__main__':
args = read_args()
params = {'foo': args['foo']}
test_data = CustomRmseMetric.read_data_from_file(args['test'])
result_data = CustomRmseMetric.read_data_from_file(args['result'],
delimiter=',')
custom = CustomRmseMetric(params, test_data, result_data,
args['output_file'])
custom.evaluate()
Adding a list-based metric (i.e., NDCG)¶
For metrics that require the entire result list for computation, librec-auto
provides the ListBasedMetric
superclass, which can be inherited by custom class
metrics.
Required boilerplate¶
See above for the argument parsing and main function boilerplate. These are both required for both row- and list-based metrics, and are identical for either.
1. Create the new class file¶
Make a file in your study directory. Name is something clear. Let’s assume a
file named custom_ndcg_metric.py
.
2. Override the ListBasedMetric
methods¶
In the custom_ndcg_metric.py
file, we’ll want to copy the boilerplate from
above and then then import and instantiate the ListBasedMetric
superclass.
from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric
class CustomRmseMetric(ListBasedMetric):
...
__init__
¶
Override __init__
and set self._name
equal to the name of the metric.
Do not forget to call super().__init__
.
def __init__(self, params: dict, test_data: np.array,
result_data: np.array, output_file) -> None:
super().__init__(params, test_data, result_data, output_file)
self._name = 'RMSE'
evaluate_user
¶
This method produces a metric value for a given user, based on test and result arrays of user data. These arrays contain values for all rows where this user is the user.
evaluate_user
for NDCG follows:
(Note the self._list_size
is set in config.xml
, in __init__
, and in
__main__
.)
def evaluate_user(self, test_user_data: np.array,
result_user_data: np.array) -> float:
rec_num = int(self._list_size)
idealOrder = test_user_data
idealDCG = 0.0
for j in range(min(rec_num, len(idealOrder))):
idealDCG += ((math.pow(2.0,
len(idealOrder) - j) - 1) /
math.log(2.0 + j))
recDCG = 0.0
test_user_items = list(test_user_data[:, 1])
for j in range(rec_num):
item = int(result_user_data[j][1])
if item in test_user_items:
rank = len(test_user_items) - test_user_items.index(
item) # why ground truth?
recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1))
return (recDCG / idealDCG)
preprocessing
and postprocessing
¶
preprocessing
should be used to set up initial values for the metric that
are not passed from config.xml
.
Results from every execution of evaluate_user
are saved to self._values
,
which should be accessed in postprocessing
to produce a single final value.
postprocessing
for NDCG follows:
def postprocessing(self):
return np.average(self._values)
__main__
¶
Use the main function to parse any file arguments to class parameters, to
initialize the custom metric class, and to call .evaluate()
.
The main function for NDCG follows:
if __name__ == '__main__':
args = read_args()
params = {'list_size': args['list_size']}
test_data = ListBasedMetric.read_data_from_file(
args['test']
)
result_data = ListBasedMetric.read_data_from_file(
args['result'],
delimiter=','
)
custom = CustomNdcgMetric(params, test_data, result_data,
args['output_file'])
custom.evaluate()
Below is the complete file for a custom implementation of NDCG.
import argparse
import numpy as np
import math
from librec_auto.core.eval.metrics.list_based_metric import ListBasedMetric
def read_args():
"""
Parse command line arguments.
"""
parser = argparse.ArgumentParser(description='My custom metric')
parser.add_argument('--test', help='Path to test.')
parser.add_argument('--result', help='Path to results.')
parser.add_argument('--output-file', help='The output pickle file.')
# Custom params defined in the config go here
parser.add_argument('--list-size', help='Size of the list for NDCG.')
input_args = parser.parse_args()
return vars(input_args)
class CustomNdcgMetric(ListBasedMetric):
def __init__(self, params: dict, test_data: np.array,
result_data: np.array, output_file: str) -> None:
super().__init__(params, test_data, result_data, output_file)
self._name = 'NDCG'
self._list_size = params['list_size']
def evaluate_user(self, test_user_data: np.array,
result_user_data: np.array) -> float:
rec_num = int(self._list_size)
idealOrder = test_user_data
idealDCG = 0.0
for j in range(min(rec_num, len(idealOrder))):
idealDCG += ((math.pow(2.0,
len(idealOrder) - j) - 1) /
math.log(2.0 + j))
recDCG = 0.0
test_user_items = list(test_user_data[:, 1])
for j in range(rec_num):
item = int(result_user_data[j][1])
if item in test_user_items:
rank = len(test_user_items) - test_user_items.index(
item) # why ground truth?
recDCG += ((math.pow(2.0, rank) - 1) / math.log(1.0 + j + 1))
return (recDCG / idealDCG)
def postprocessing(self):
return np.average(self._values)
if __name__ == '__main__':
args = read_args()
params = {'list_size': args['list_size']}
test_data = ListBasedMetric.read_data_from_file(
args['test']
)
result_data = ListBasedMetric.read_data_from_file(
args['result'],
delimiter=','
)
custom = CustomNdcgMetric(params, test_data, result_data,
args['output_file'])
custom.evaluate()