Installation and use of FFM/libffm on Windows/Linux

Yu-Chin Juan, the author of FFM, has open-sourced the C++ version of the code libffm on GitHub. Since the daily data processing is in a Python environment, expect to find a Python version of FFM. Related projects on Github There are many on Github, such as this one: A Python wrapper for LibFFM.

Installation of libffm in Windows+Anaconda environment

Installation of libffm-python package

The project is installed on Windows as follows.

Download the project locally and unzip it.
Install the mingw32 environment. conda install mingw32
Add mingw32 path to environment variable PATH: C:\RBuildTools\3.5\mingw_32\bin
Modify the compilation settings in Python. D:\ProgramData\Anaconda3\Lib\distutils\distutils.cfg If you don’t have this file then create it yourself, add the content as.

1
2

[build]
compiler=mingw32

Execute: python setup.py install in the project directory

However, when using it, the following error is reported.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-244abf364e9b> in <module>
----> 1 import ffm

D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\__init__.py in <module>
----> 1 from .ffm import FFMData, FFM, read_model

D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\ffm.py in <module>
     70 FFM_Problem_ptr = ctypes.POINTER(FFM_Problem)
     71 
---> 72 _lib = ctypes.cdll.LoadLibrary(get_lib_path())
     73 
     74 _lib.ffm_convert_data.restype = FFM_Problem

D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in LoadLibrary(self, name)
    424 
    425     def LoadLibrary(self, name):
--> 426         return self._dlltype(name)
    427 
    428 cdll = LibraryLoader(CDLL)

D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    346 
    347         if handle is None:
--> 348             self._handle = _dlopen(self._name, mode)
    349         else:
    350             self._handle = handle

OSError: [WinError 87] 参数错误。

The main reason is that the libffm.so file was not compiled and generated during the installation on Windows. The installation failed.

Compilation of Libffm on Windows

Since I had problems with the Python package, I thought I would compile it directly using the C++ version of the code. After reading the project description, only v1.21 of libffm supports Windows environment:

Building Windows Binaries
=========================

The Windows part is maintained by different maintainer, so it may not always support the latest version.

The latest version it supports is: v1.21

To build them via command-line tools of Visual C++, use the following steps:

1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
variables of VC++ have not been set, type

"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"

You may have to modify the above command according which version of VC++ or
where it is installed.

2. Type

nmake -f Makefile.win clean all

Follow the above procedure to install, the first error encountered: “nmake” cannot be found

nmake : 无法将“nmake”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正
确，然后再试一次。
所在位置 行:1 字符: 1
+ nmake -f Makefile.win clean all
+ ~~~~~
    + CategoryInfo          : ObjectNotFound: (nmake:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

The initial solution was to add the directory where “nmake” is located to the environment variable PATH. However, the error is still reported after execution, and this time the main error is that the referenced file cannot be loaded.

PS E:\Download\libffm-121> nmake -f Makefile.win clean all

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        erase /Q *.obj *.exe windows\.
        rd windows
        mkdir windows
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
ffm.cpp(22): fatal error C1034: algorithm: no include path set
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\cl.exe"' : return code '0x2'
Stop.

After searching online, I found that the water of setting environment variables in VC++ is still deep, you need to add PATH, LIB and INCLUDE. The main reason is that ucrt is added in VS2015, so it needs to introduce Windows 10 SDK, and uuid.lib has to be found in Windows 8.x SDK, so it is still quite troublesome to configure.

PATH C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin;C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE
LIB C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.10240.0\ucrt\x86;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.10240.0\ucrt\x86 Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x86
INCLUDE C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include;C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0\ ucrt

Adjust the specific path accordingly according to the location of your installation. After finishing, execute it again to compile successfully. As follows, only a few warning messages appear.

PS E:\Download\libffm-121> nmake -f Makefile.win clean all

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        erase /Q *.obj *.exe windows\.
        rd windows
        mkdir windows
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c timer.cpp
timer.cpp
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-train.cpp ffm.obj timer.obj -Fewindows\ffm-train.exe
ffm-train.cpp
ffm-train.cpp(1): warning C4068: unknown pragma
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-predict.cpp ffm.obj timer.obj -Fewindows\ffm-predict.exe
ffm-predict.cpp

After compilation, a new windows folder will be created under the source folder and 2 exe files will be generated.

ffm-predict.exe
ffm-train.exe

Use of ffm-train.exe and ffm-predict.exe

The simpler method is to call it directly from the command line, using the method described in the project documentation.

Command Line Usage
==================

-   `ffm-train'

    usage: ffm-train [options] training_set_file [model_file]

    options:
    -l <lambda>: set regularization parameter (default 0.00002)
    -k <factor>: set number of latent factors (default 4)
    -t <iteration>: set number of iterations (default 15)
    -r <eta>: set learning rate (default 0.2)
    -s <nr_threads>: set number of threads (default 1)
    -p <path>: set path to the validation set
    --quiet: quiet model (no output)
    --no-norm: disable instance-wise normalization
    --auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)

    By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
    `--no-norm' to disable this function.
    
    A binary file `training_set_file.bin' will be generated to store the data in binary format.

    Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
    the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
    you use this option.


-   `ffm-predict'

    usage: ffm-predict test_file model_file output_file

Alternatively it can be used by calling the command line via Python at

import os
import subprocess

os.getcwd()
os.chdir(r'E:\Download\libffm-121\windows')

os.getcwd()
os.system("start ffm-train.exe")
os.startfile("ffm-train.exe")
os.system("start ffm-predict.exe")
os.startfile("ffm-predict.exe")

#使用缺省参数训练模型
cmd = 'ffm-train bigdata.tr.txt model'
subprocess.call(cmd, shell=True)

#使用bigdata.te.txt作为validation数据
cmd = 'ffm-train -p bigdata.te.txt bigdata.tr.txt model'
subprocess.call(cmd, shell=True)

#使用5折交叉验证
cmd = 'ffm-train -v 5 bigdata.tr.txt'
subprocess.call(cmd, shell=True)

#用–quiet参数训练时不打印训练信息
cmd = 'ffm-train –quiet bigdata.tr.txt'
subprocess.call(cmd, shell=True)

#预测
cmd = 'ffm-predict bigdata.te.txt model output.txt'
subprocess.call(cmd, shell=True)

#基于磁盘的训练
cmd = 'ffm-train –no-rand –on-disk bigdata.tr.txt'
subprocess.call(cmd, shell=True)

#使用–auto-stop参数，当达到最优的validation损失时停止训练
cmd = 'ffm-train -p bigdata.te.txt -t 100 bigdata.tr.txt'
subprocess.call(cmd, shell=True)

The address of the training file used for the sample code is

https://github.com/keyunluo/python-ffm/tree/master/example/libffm-format

As the above call is very troublesome, I found a separate open source project to further encapsulate it, the encapsulated code is

from __future__ import print_function, absolute_import

import os, sys, subprocess, shlex, tempfile, time, sklearn.base, math
import numpy as np
import pandas as pd
from pandas_extensions import * 
from ExeEstimator import *

class LibFFMClassifier(ExeEstimator, sklearn.base.ClassifierMixin):
  '''
  options:
  -l <lambda>: set regularization parameter (default 0)
  -k <factor>: set number of latent factors (default 4)
  -t <iteration>: set number of iterations (default 15)
  -r <eta>: set learning rate (default 0.1)
  -s <nr_threads>: set number of threads (default 1)
  -p <path>: set path to the validation set
  --quiet: quiet model (no output)
  --norm: do instance-wise normalization
  --no-rand: disable random update
  `--norm' helps you to do instance-wise normalization. When it is enabled,
  you can simply assign `1' to `value' in the data.
  '''
  def __init__(self, columns, lambda_v=0, factor=4, iteration=15, eta=0.1, 
    nr_threads=1, quiet=False, normalize=None, no_rand=None):
    ExeEstimator.__init__(self)
    
    self.columns = columns.tolist() if hasattr(columns, 'tolist') else columns
    self.lambda_v = lambda_v
    self.factor = factor
    self.iteration = iteration
    self.eta = eta
    self.nr_threads = nr_threads
    self.quiet = quiet
    self.normalize = normalize
    self.no_rand = no_rand

  def fit(self, X, y=None):
    if type(X) is str: train_file = X
    else: 
      if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
      train_file = self.save_reusable('_libffm_train', 'to_libffm', X, y)
      
    # self._model_file = self.save_tmp_file(X, '_libffm_model', True)
    self._model_file = self.tmpfile('_libffm_model')

    command = 'utils/lib/ffm-train.exe' + ' -l ' + repr(v) + \
      ' -k ' + repr(r) + ' -t ' + repr(n) + ' -r ' + repr(a) + \
      ' -s ' + repr(s)
    if self.quiet: command += ' --quiet'
    if self.normalize: command += ' --norm'
    if self.no_rand: command += ' --no-rand'  
    command += ' ' + train_file
    command += ' ' + self._model_file
    running_process = self.make_subprocess(command)
    self.close_process(running_process)
    return self

  def predict(self, X):  
    if type(X) is str: test_file = X
    else: 
      if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
      test_file = self.save_reusable('_libffm_test', 'to_libffm', X)

    output_file = self.tmpfile('_libffm_predictions')

    command = 'utils/lib/ffm-predict.exe ' + test_file + ' ' + self._model_file + ' ' + output_file
    running_process = self.make_subprocess(command)
    self.close_process(running_process)
    preds = list(self.read_predictions(output_file))
    return preds

  def predict_proba(self, X):    
    predictions = np.asarray(map(lambda p: 1 / (1 + math.exp(-p)), self.predict(X)))
    return np.vstack([1 - predictions, predictions]).T

In summary, it is very difficult to use libffm in a Windows environment, either for compiling or calling, and it is recommended to use it in a Linux environment if the environment permits.

Installation of libffm in Linux+Anaconda environment

The installation of the libffm-python package in Anaconda on Linux also has problems. The specific error reported is as follows.

➜  libffm-python git:(master) python setup.py install
/home/qw/anaconda3/lib/python3.7/site-packages/setuptools/dist.py:481: UserWarning: The version specified ('7e8621d') is an invalid version, this may not work as expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440 for more details.
  "details." % self.metadata.version
running install
running bdist_egg
running egg_info
creating ffm.egg-info
writing ffm.egg-info/PKG-INFO
writing dependency_links to ffm.egg-info/dependency_links.txt
writing requirements to ffm.egg-info/requires.txt
writing top-level names to ffm.egg-info/top_level.txt
writing manifest file 'ffm.egg-info/SOURCES.txt'
reading manifest file 'ffm.egg-info/SOURCES.txt'
writing manifest file 'ffm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/ffm
copying ffm/__init__.py -> build/lib.linux-x86_64-3.7/ffm
copying ffm/ffm.py -> build/lib.linux-x86_64-3.7/ffm
running build_ext
building 'ffm.libffm' extension
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c ffm.cpp -o build/temp.linux-x86_64-3.7/ffm.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告：command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
ffm.cpp:578: 警告：忽略 #pragma omp parallel [-Wunknown-pragmas]
  578 | #pragma omp parallel for schedule(static) reduction(+: loss)
      | 
ffm.cpp:726: 警告：忽略 #pragma omp parallel [-Wunknown-pragmas]
  726 |     #pragma omp parallel for schedule(static) reduction(+: loss)
      | 
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c timer.cpp -o build/temp.linux-x86_64-3.7/timer.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告：command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -B /home/qw/anaconda3/compiler_compat -L/home/qw/anaconda3/lib -Wl,-rpath=/home/qw/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/ffm.o build/temp.linux-x86_64-3.7/timer.o -o build/lib.linux-x86_64-3.7/ffm/libffm.cpython-37m-x86_64-linux-gnu.so -fopenmp
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
build/temp.linux-x86_64-3.7/ffm.o: file not recognized: file format not recognized
collect2: 错误：ld 返回 1
error: command 'g++' failed with exit status 1

At first I thought there was a problem with the libffm code, so I replaced it with the latest version online and found that it still reported errors. So I checked the code again and found that the code was fine and could be compiled normally in a non-Anaconda environment. Anaconda comes with a connector ld which is stored in ~/anaconda3/compiler_compat directory, the solution is very simple, just change the name of the ld in ~/anaconda3/compiler_compat directory and install it again. The solution is very simple.

Table of Contents

Installation of libffm in Windows+Anaconda environment

Installation of libffm-python package

Compilation of Libffm on Windows

Use of ffm-train.exe and ffm-predict.exe

Installation of libffm in Linux+Anaconda environment