Run a Study¶
Imagine you want to run a study with librec_auto
.
First, you will need to set up a configuration file. See the Configuration File documentation for information on the elements of a configuration file. Once your configuration file is complete, you can run your study.
Definitions¶
What is an experiment?¶
An experiment is a single job from the librec
library.
This is what happens when you call the librec
jar from the command line.
What is a study?¶
A study is a collection of experiments using the same algorithm the same data set, and the same methodology. librec_auto
automates running
multiple experiments at once by varying algorithm hyperparameters, and the entity that encompasses a set of related
experiments is called a study. If you want to examine multiple algorithms or multiple data sets, you will need to define multiple studies.
Methodology¶
The methodology in a librec-auto
study includes information about how data is handled, what kinds of predictions are generated, and how they are evaluated. The methodology information is a bit distributed across the configuration file, and there are certain “gotchas” you should be aware of.
- As of LibRec 3.0, all cross-validation splits are drawn uniform-at-random from the ratings data set. The size of each split is N/k where N is the number of ratings and k is the number of splits. Other parameters (the style of split) that can be used in the single-split
ratio
model are ignored. - Ranking vs rating prediction methodology. If you are using a metric that looks at a ranked list of results (pretty much anything other than
rmse
,mae
and certain of the fairness metrics), you must specify the ranking element and the list-size element in the metric element in your configuration. See below. You will get very strange ranking results, if LibRec thinks you are using a rating prediction methodology. - Ranking vs rating prediction algorithms. Some algorithms are optimized for ranking loss and the results that they produce may not be interpretable as prediction ratings for a user/item pair. For example, BPR, RankALS, or PLSA. These algorithms should only be used with the ranking element set and an appropriate ranking-type metric.
<ranking>true</ranking>
<list-size>10</list-size>
Any list size can be used.
File Structure¶
Study Structure¶
librec_auto
has a specific project structure. If you want to run an study
named movies
, you will need to put your config.xml
file in a conf
directory inside a movies
directory, like this:
movies
└── conf
└── config.xml
You can then run your movies study with:
$ python -m librec_auto run -t movies
This will update the movies
directory to look like this:
movies
├── conf
│ └── config.xml
├── exp00000
│ ├── conf
│ │ ├── config.xml
│ │ └── librec.properties
│ ├── log
│ │ └── librec-<timestamp>.log
│ ├── original
│ └── result
│ ├── out-1.txt
│ └── ...
├── exp00001
│ ├── conf
│ │ ├── config.xml
│ │ └── librec.properties
│ ├── log
│ │ └── librec-<timestamp>.log
│ ├── original
│ └── result
│ ├── out-1.txt
│ └── ...
├── exp00002
│ └── ...
├── exp00003
│ └── ...
└── ...
output.xml
Each directory like exp00001
represents one of the experiments from your
movies study. The number of exp#####
directories is equal to the number of
permutations from the value
items in your study-wide configuration file. The
output.xml
contains a summary of results over all the experiments and also a log of any warnings
or errors encountered.
Experiment Structure¶
Let’s consider a single experiment directory:
exp00002
├── conf
│ ├── config.xml
│ └── librec.properties
├── log
│ └── librec-<timestamp>.log
├── original
└── result
├── out-1.txt
├── out-2.txt
├── out-3.txt
├── out-4.txt
└── out-5.txt
output.xml
conf
holds the auto-generated configuration file for this experiment (not for the study), as well as thelibrec.properties
equivalent of theconfig.xml
.- Don’t tamper with these files: to edit the experiment configurations, modify the study-wide
movies/conf/config.xml
file.
- Don’t tamper with these files: to edit the experiment configurations, modify the study-wide
log
holds the log output from running the experiment. Many LibRec algorithms output log information containing training phase information and this can be found here.result
holds the computed recommendation lists or predictions from thelibrec
experiment.original
is a directory used for experiments involving result re-ranking. The re-ranker will copy the original recommendation output from the algorithm to this directory. Re-ranked results are then place in theresult
directory so they can be located by subsequent processes. You can experiment with multiple hyperparameters for a re-ranking algorithm without recomputing the base recommendations. For example:- Re-rank the results with
python -m librec_auto rerank movies
- Re-rank the results with
output.xml
is a file that contains a summary of the experiment run. Metric results are stored here as well as any warnings or error messages encountered.