器→工具, 工具软件

FFM/libffm在Windows/Linux上的安装使用

钱魏Way · · 1,251 次浏览
!文章内容如有错误或排版问题,请提交反馈,非常感谢!

FFM的作者 Yu-Chin Juan 在 GitHub 上开源了 C++ 版本的代码libffm,由于日常的数据处理都是 Python 环境,所以期望能找到 Python 版本的 FFM。相关的项目 Github 上有很多,比如这个:A Python wrapper for LibFFM

Windows+Anaconda环境下libffm 的安装

libffm-python 包的安装

该项目在 Windows 的安装方式为:

  • 将项目下载到本地,并解压。
  • 安装 mingw32 环境。conda install mingw32
  • 在环境变量 PATH 中添加 mingw32 路径:C:\RBuildTools\3.5\mingw_32\bin
  • 修改 Python 中的编译设置,D:\ProgramData\Anaconda3\Lib\distutils\distutils.cfg 如果没有此文件则自己创建,添加内容为:
[build]
compiler=mingw32
  • 在项目目录中执行:python setup.py install

但在使用的时候,会报如下错误:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-244abf364e9b> in <module>
----> 1 import ffm

D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\__init__.py in <module>
----> 1 from .ffm import FFMData, FFM, read_model

D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\ffm.py in <module>
     70 FFM_Problem_ptr = ctypes.POINTER(FFM_Problem)
     71 
---> 72 _lib = ctypes.cdll.LoadLibrary(get_lib_path())
     73 
     74 _lib.ffm_convert_data.restype = FFM_Problem

D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in LoadLibrary(self, name)
    424 
    425     def LoadLibrary(self, name):
--> 426         return self._dlltype(name)
    427 
    428 cdll = LibraryLoader(CDLL)

D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    346 
    347         if handle is None:
--> 348             self._handle = _dlopen(self._name, mode)
    349         else:
    350             self._handle = handle

OSError: [WinError 87] 参数错误。

主要原因是在 Windows 上进行安装的时候并没有编译生成 libffm.so 文件。安装失败。

Libffm 在 Windows 上的编译

由于使用 Python 包时遇到问题,所以想着直接使用 C++ 版本的代码进行编译。看了下项目介绍,只有 v1.21 版本的 libffm 才支持 Windows 环境:

Building Windows Binaries
=========================

The Windows part is maintained by different maintainer, so it may not always support the latest version.

The latest version it supports is: v1.21

To build them via command-line tools of Visual C++, use the following steps:

1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
variables of VC++ have not been set, type

"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"

You may have to modify the above command according which version of VC++ or
where it is installed.

2. Type

nmake -f Makefile.win clean all

按照上面的流程进行安装,遇到的第一个报错:无法找到“nmake”

nmake: 无法将“nmake”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正
确,然后再试一次。
所在位置 行:1 字符: 1
+ nmake -f Makefile.win clean all
+ ~~~~~
    + CategoryInfo          : ObjectNotFound: (nmake:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

初步解决方案为将“nmake”所在目录添加到环境变量 PATH 中。然而,执行后还是会报错,这次报错的主要把内容是无法加载到引用的文件:

PS E:\Download\libffm-121> nmake -f Makefile.win clean all

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        erase /Q *.obj *.exe windows\.
        rd windows
        mkdir windows
        cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
ffm.cpp(22): fatal error C1034: algorithm: no include path set
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\cl.exe"': return code '0x2'
Stop.

网上搜索了下,发现VC++设置环境变量的水还是比较深的,需要添加PATH、LIB和INCLUDE这三个环境变量。主要的原因是VS2015里面加入了ucrt这个东西,所以需要额外引入Windows10的SDK,还有uuid.lib得在Windows8.x的SDK里找到,所以配置起来还是蛮麻烦的。

  • PATHC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin;C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE
  • LIBC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.10240.0\ucrt\x86;C:\Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x86
  • INCLUDE C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include;C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0\ucrt

具体路径按照自己安装的位置进行相应的调整。完成后再次执行即可成功编译。如下,只出现了一些警告信息:

PSE:\Download\libffm-121>nmake -f Makefile.win clean all

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.

erase /Q *.obj *.exe windows\.
rd windows
mkdir windows
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c timer.cpp
timer.cpp
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-train.cpp ffm.obj timer.obj /Fewindows\ffm-train.exe
ffm-train.cpp
ffm-train.cpp(1): warning C4068: unknown pragma
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-predict.cpp ffm.obj timer.obj /Fewindows\ffm-predict.exe
ffm-predict.cpp

编译完成后会在源文件文件夹下新建一个windows的文件夹,并生成2个exe文件:

  • ffm-predict.exe
  • ffm-train.exe

ffm-train.exe与ffm-predict.exe的使用

比较简单的方法时在命令行直接调用,使用方法如项目文档中所述:

Command Line Usage
==================

- `ffm-train'

usage: ffm-train [options] training_set_file [model_file]

options:
-l <lambda>: set regularization parameter (default 0.00002)
-k <factor>: set number of latent factors (default 4)
-t <iteration>: set number of iterations (default 15)
-r <eta>: set learning rate (default 0.2)
-s <nr_threads>: set number of threads (default 1)
-p <path>: set path to the validation set
--quiet: quiet model (no output)
--no-norm: disable instance-wise normalization
--auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)

By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
`--no-norm' to disable this function.

A binary file `training_set_file.bin' will be generated to store the data in binary format.

Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
you use this option.


- `ffm-predict'

usage: ffm-predict test_file model_file output_file

另外也可通过Python调用命令行的方式来使用:

import os
import subprocess

os.getcwd()
os.chdir(r'E:\Download\libffm-121\windows')

os.getcwd()
os.system("start ffm-train.exe")
os.startfile("ffm-train.exe")
os.system("start ffm-predict.exe")
os.startfile("ffm-predict.exe")

# 使用缺省参数训练模型
cmd = 'ffm-train bigdata.tr.txt model'
subprocess.call(cmd, shell=True)

# 使用bigdata.te.txt作为validation数据
cmd = 'ffm-train -p bigdata.te.txt bigdata.tr.txt model'
subprocess.call(cmd, shell=True)

# 使用5折交叉验证
cmd = 'ffm-train -v 5 bigdata.tr.txt'
subprocess.call(cmd, shell=True)

# 用–quiet参数训练时不打印训练信息
cmd = 'ffm-train –quiet bigdata.tr.txt'
subprocess.call(cmd, shell=True)

# 预测
cmd = 'ffm-predict bigdata.te.txt model output.txt'
subprocess.call(cmd, shell=True)

# 基于磁盘的训练
cmd = 'ffm-train –no-rand –on-disk bigdata.tr.txt'
subprocess.call(cmd, shell=True)

# 使用–auto-stop参数,当达到最优的validation损失时停止训练
cmd = 'ffm-train -p bigdata.te.txt -t 100 bigdata.tr.txt'
subprocess.call(cmd, shell=True)

示例代码所用到的训练文件地址为:https://github.com/keyunluo/python-ffm/tree/master/example/libffm-format

如上调用非常的麻烦,我另外找到了一个开源的项目对其进行了进一步封装:https://github.com/gatapia/py_ml_utils,封装的代码为:

from __future__ import print_function, absolute_import

import os, sys, subprocess, shlex, tempfile, time, sklearn.base, math
import numpy as np
import pandas as pd
from pandas_extensions import *
from ExeEstimator import *

class LibFFMClassifier(ExeEstimator, sklearn.base.ClassifierMixin):
    '''
    options:
    -l<lambda>: set regularization parameter (default 0)
    -k<factor>: set number of latent factors (default 4)
    -t<iteration>: set number of iterations (default 15)
    -r<eta>: set learning rate (default 0.1)
    -s<nr_threads>: set number of threads (default 1)
    -p<path>: set path to the validation set
    --quiet: quiet model (no output)
    --norm: do instance-wise normalization
    --no-rand: disable random update
    `--norm' helps you to do instance-wise normalization. When it is enabled,
    you can simply assign `1' to `value' in the data.
    '''
    def __init__(self, columns, lambda_v=0, factor=4, iteration=15, eta=0.1,
                 nr_threads=1, quiet=False, normalize=None, no_rand=None):
        ExeEstimator.__init__(self)

        self.columns = columns.tolist() if hasattr(columns, 'tolist') else columns
        self.lambda_v = lambda_v
        self.factor = factor
        self.iteration = iteration
        self.eta = eta
        self.nr_threads = nr_threads
        self.quiet = quiet
        self.normalize = normalize
        self.no_rand = no_rand

    def fit(self, X, y=None):
        if type(X) is str: train_file = X
        else:
            if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
            train_file = self.save_reusable('_libffm_train', 'to_libffm', X, y)

        #self._model_file = self.save_tmp_file(X, '_libffm_model', True)
        self._model_file = self.tmpfile('_libffm_model')

        command = 'utils/lib/ffm-train.exe ' + '-l ' + repr(v) + \
                 '-k ' + repr(r) + '-t ' + repr(n) + '-r ' + repr(a) + \
                 '-s ' + repr(s)
        if self.quiet: command += ' --quiet'
        if self.normalize: command += ' --norm'
        if self.no_rand: command += ' --no-rand'
        command += ' ' + train_file
        command += ' ' + self._model_file
        running_process = self.make_subprocess(command)
        self.close_process(running_process)
        return self

    def predict(self, X):
        if type(X) is str: test_file = X
        else:
            if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
            test_file = self.save_reusable('_libffm_test', 'to_libffm', X)

        output_file = self.tmpfile('_libffm_predictions')

        command = 'utils/lib/ffm-predict.exe ' + test_file + ' ' + self._model_file + ' ' + output_file
        running_process = self.make_subprocess(command)
        self.close_process(running_process)
        preds = list(self.read_predictions(output_file))
        return preds

    def predict_proba(self, X):
        predictions = np.asarray(map(lambda p: 1/(1+math.exp(-p)), self.predict(X)))
        return np.vstack([1-predictions, predictions]).T

总结,在Windows环境下使用libffm非常的困难,不管是编译还是调用,如果环境许可,建议还是在Linux环境下使用。

Linux+Anaconda环境下libffm的安装

Linux环境下的Anaconda中安装libffm-python包同样出现了问题。具体报错内容如下:

➜ libffm-python git:(master) python setup.py install
/home/qw/anaconda3/lib/python3.7/site-packages/setuptools/dist.py:481: UserWarning: The version specified ('7e8621d') is an invalid version, this may not work as expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440 for more details.
  "details." % self.metadata.version
running install
running bdist_egg
running egg_info
creating ffm.egg-info
writing ffm.egg-info/PKG-INFO
writing dependency_links to ffm.egg-info/dependency_links.txt
writing requirements to ffm.egg-info/requires.txt
writing top-level names to ffm.egg-info/top_level.txt
writing manifest file 'ffm.egg-info/SOURCES.txt'
reading manifest file 'ffm.egg-info/SOURCES.txt'
writing manifest file 'ffm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/ffm
copying ffm/__init__.py -> build/lib.linux-x86_64-3.7/ffm
copying ffm/ffm.py -> build/lib.linux-x86_64-3.7/ffm
running build_ext
building 'ffm.libffm' extension
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c ffm.cpp -o build/temp.linux-x86_64-3.7/ffm.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
ffm.cpp:578: 警告:忽略 #pragma omp parallel [-Wunknown-pragmas]
  578 | #pragma omp parallel for schedule(static) reduction(+: loss)
      |
ffm.cpp:726: 警告:忽略 #pragma omp parallel [-Wunknown-pragmas]
  726 | #pragma omp parallel for schedule(static) reduction(+: loss)
      |
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c timer.cpp -o build/temp.linux-x86_64-3.7/timer.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -B /home/qw/anaconda3/compiler_compat -L/home/qw/anaconda3/lib -Wl,-rpath=/home/qw/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/ffm.o build/temp.linux-x86_64-3.7/timer.o -o build/lib.linux-x86_64-3.7/ffm/libffm.cpython-37m-x86_64-linux-gnu.so -fopenmp
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
build/temp.linux-x86_64-3.7/ffm.o: file not recognized: file format not recognized
collect2: 错误:ld 返回 1
error: command 'g++' failed with exit status 1

刚开始以为 libffm 的代码存在了问题,先用线上最新版进行了替换,发现还是会报错。于是又检查了代码,发现代码并没有问题,并且可以在非 Anaconda 环境下正常编译。仔细检查了下,发现问题出在 Anaconda。Anaconda 自带了一个连接器 ld,位置存放在 ~/anaconda3/compiler_compat 目录下,解决方案非常简单,将 ~/anaconda3/compiler_compat 目录下的 ld 改个名字后再安装即可。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注