Run a Study

Imagine you want to run a study with librec_auto.

First, you will need to set up a configuration file. See the Configuration File documentation for information on the elements of a configuration file. Once your configuration file is complete, you can run your study.

Definitions

What is an experiment?

An experiment is a single job from the librec library. This is what happens when you call the librec jar from the command line.

What is a study?

A study is a collection of experiments using the same algorithm the same data set, and the same methodology. librec_auto automates running multiple experiments at once by varying algorithm hyperparameters, and the entity that encompasses a set of related experiments is called a study. If you want to examine multiple algorithms or multiple data sets, you will need to define multiple studies.

Methodology

The methodology in a librec-auto study includes information about how data is handled, what kinds of predictions are generated, and how they are evaluated. The methodology information is a bit distributed across the configuration file, and there are certain “gotchas” you should be aware of.

  • As of LibRec 3.0, all cross-validation splits are drawn uniform-at-random from the ratings data set. The size of each split is N/k where N is the number of ratings and k is the number of splits. Other parameters (the style of split) that can be used in the single-split ratio model are ignored.
  • Ranking vs rating prediction methodology. If you are using a metric that looks at a ranked list of results (pretty much anything other than rmse, mae and certain of the fairness metrics), you must specify the ranking element and the list-size element in the metric element in your configuration. See below. You will get very strange ranking results, if LibRec thinks you are using a rating prediction methodology.
  • Ranking vs rating prediction algorithms. Some algorithms are optimized for ranking loss and the results that they produce may not be interpretable as prediction ratings for a user/item pair. For example, BPR, RankALS, or PLSA. These algorithms should only be used with the ranking element set and an appropriate ranking-type metric.
<ranking>true</ranking>
<list-size>10</list-size>

Any list size can be used.

File Structure

Study Structure

librec_auto has a specific project structure. If you want to run an study named movies, you will need to put your config.xml file in a conf directory inside a movies directory, like this:

movies
└── conf
    └── config.xml

You can then run your movies study with:

$ python -m librec_auto run -t movies

This will update the movies directory to look like this:

movies
├── conf
│   └── config.xml
├── exp00000
│   ├── conf
│   │   ├── config.xml
│   │   └── librec.properties
│   ├── log
│   │   └── librec-<timestamp>.log
│   ├── original
│   └── result
│       ├── out-1.txt
│       └── ...
├── exp00001
│   ├── conf
│   │   ├── config.xml
│   │   └── librec.properties
│   ├── log
│   │   └── librec-<timestamp>.log
│   ├── original
│   └── result
│       ├── out-1.txt
│       └── ...
├── exp00002
│   └── ...
├── exp00003
│   └── ...
└── ...
    output.xml

Each directory like exp00001 represents one of the experiments from your movies study. The number of exp##### directories is equal to the number of permutations from the value items in your study-wide configuration file. The output.xml contains a summary of results over all the experiments and also a log of any warnings or errors encountered.

Experiment Structure

Let’s consider a single experiment directory:

exp00002
├── conf
│   ├── config.xml
│   └── librec.properties
├── log
│   └── librec-<timestamp>.log
├── original
└── result
    ├── out-1.txt
    ├── out-2.txt
    ├── out-3.txt
    ├── out-4.txt
    └── out-5.txt
    output.xml
  • conf holds the auto-generated configuration file for this experiment (not for the study), as well as the librec.properties equivalent of the config.xml.
    • Don’t tamper with these files: to edit the experiment configurations, modify the study-wide movies/conf/config.xml file.
  • log holds the log output from running the experiment. Many LibRec algorithms output log information containing training phase information and this can be found here.
  • result holds the computed recommendation lists or predictions from the librec experiment.
  • original is a directory used for experiments involving result re-ranking. The re-ranker will copy the original recommendation output from the algorithm to this directory. Re-ranked results are then place in the result directory so they can be located by subsequent processes. You can experiment with multiple hyperparameters for a re-ranking algorithm without recomputing the base recommendations. For example:
    • Re-rank the results with python -m librec_auto rerank movies
  • output.xml is a file that contains a summary of the experiment run. Metric results are stored here as well as any warnings or error messages encountered.