CentOS 7.2 安装 Portia 记录

3 min read

Portia 是 scrapyhub 开源的一款可视化爬虫规则编写工具。Portia 提供了可视化的 Web 页面,只需通过简单点击,标注页面上需提取的相应数据,无需任何编程知识即可完成爬取规则的开发。这些规则还可在 Scrapy 中使用,用于抓取页面。

官方说明文档推荐使用Vagrant安装,或者使用Docker,另外给出的方法只支持Debian系的系统(如Debian或Ubuntu)进行安装,并没有支持CentOS的方法。网上寻找到了一个Mac OS 上安装Portia的方法

You have to create, activate and navigate into the virtualenv before installing anything (including cloning portia from github). Here’s the whole thing working from start to finish:
1: cd to wherever you’d like to store your project… and Install virtualenv:
$ pip install virtualenv

2: Create the virtual environment. (I called mine “portia” but this can be anything.):
$ virtualenv portia

3: Activate the virtual environment you created (change the path to reflect the name you used here if not “portia”.):
$ source portia/bin/activate

At this point your terminal should have display the virtualenv name in parenthesis before the standard directory path prompt:
 (name-of-virtualenv) [your-machine]:[current-directory]: [user]$ …and if you list the files within your pwd you’ll see the name of you virtualenv there.

4: cd into your virtualenv (“portia” for me):
$ cd portia

5: Now you can clone portia from github into your virtualenv…
$ git clone https://github.com/scrapinghub/portia

6: cd into the cloned portia/slyd…
$ cd portia/slyd

7/8: pip install twisted and Scrapy…
$ pip install twisted
$ pip install Scrapy

You’re virtualenv should still be activated and you should still be in [virtualenv-name]/portia/slyd

9: Install the requirements.txt:
$ pip install -r requirements.txt

10: Run slyd:
$ twistd -n slyd
— No more scrapy error! —

正常情况下,执行完成上述命令后应该可以通过http://localhost:9001/static/main.html进行访问。事实上并没有如愿以偿。

问题1:portia_server不能正常启动

原因:相关的依赖并没有被安装。

方案:转到portia_server目录,执行 pip install -r requirements.txt

问题2:MySQL-python不能正确安装,没有找到mysql_config

报错内容:

方案:在Centos上安装MySQL(mariadb), 但是执行完以后并没有/usr/bin/mysql_config目录,需要再执行:yum -y install mariadb-devel方可。

问题3:No module named portia_server.settings

报错内容:

导致原因:

The value of DJANGO_SETTINGS_MODULE should be in Python path syntax, e.g. mysite.settings. Note that the settings module should be on the Python import search path.

The Import Search Path

Before you go any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just a list, and you can easily view it or modify it with standard list methods.

Example 2.4. Import Search Path

解决方案:

问题4:No module named PyQt5.QtCore

报错内容:

报错原因:缺少PyQt5,使用pip install PyQt5 进行安装,发现并不支持(其仅支持Python3.5以上,我目前使用的是Python2.7)

Wheels are the standard Python packaging format for pure Python or binary extension modules such as PyQt5. PyQt5 wheels are specific to a particular version of Python. Only Python v3.5 and later is supported. Wheels are provide for 32- and 64-bit Windows, 64-bit OS X and 64-bit Linux. These correspond with the platforms for which The Qt Company provide binary installers.

尝试方案1:网上找到一个第三方支持Python 2.7的包:https://github.com/pyqt/python-qt5,执行后报错如下: No files/directories in /tmp/pip-build-rzQAsM/python-qt5/pip-egg-info (from PKG-INFO)  报错原因:该包只支持Windows环境。

解决方案:尝试按照文档手动编译安装PyQT5

1、安装qt5

2、下载并安装SIP

SIP must be installed before building and using PyQt5. You can get the latest release of the SIP source code from https://www.riverbankcomputing.com/software/sip/download.

The SIP installation instructions can be found at https://pyqt.sourceforge.net/Docs/sip4/installation.html.

3、下载并安装PyQt5

You can get the latest release of the GPL version of the PyQt5 source code from https://www.riverbankcomputing.com/software/pyqt/download5.

If you are using the commercial version of PyQt5 then you should use the download instructions which were sent to you when you made your purchase. You must also download your pyqt-commercial.sip license file.

参考文档:http://www.jianshu.com/p/abf910e78771

安装完成后,再次执行,报如下错误:

可能原因:

在 PyQt 5.6(+) 版本中, Qt 移除了对 QtWebKitWidgets 模块的支持, 新增 QtWebEngineWidgets 作为代替, 以提供更好的和最新的 HTML, CSS 以及 JavaScript 功能支持(基于Chromium)。 官方文档说明:

旧的写法:

from PyQt5.QtWebKitWidgets import QWebPage, QWebView

写的写法:

经过测试,并非此原因,QtWebEngineWidgets根本没有被安装。原因是在安装检测环境时,没有安装没有找到相应的基础库,导致相应的组建没有安装。

解决方案:使用 yum list |grep qt5 查看了所有与Qt5相关的组件,执行: yum install qt5-qtwebkit qt5-qtwebkit-devel

重新回到 python configure.py --qmake=/usr/bin/qmake-qt5 ,执行结果如下:

问题5:No such file or directory: ‘/root/portia/portia/portiaui/dist’

可能原因:

The Ember app lives under portia/portiaui. bin/slyd expects the existence of /portiaui/dist which is the default build directory for the Ember app. Try running ember build within portia/portiaui before bin/slyd.

解决方案:

1、安装nodejs

依次执行上面的指令,其中make过程可能会久一点,指令执行完毕,Node也就安装好了,可以用node -v和npm -v来检查下。

2、安装ember

使用 npm install -g ember-cli 进行安装,进行ember build:

Could not start watchman 错误,解决方案

完成后又报如下错误:

可能原因:(出处

you can first enter the directory: “/portiaui/node_modules/ember-cli” and then run the command “npm install”. I think this will help you install broccli-source into the node_module and solve the problem

解决方案:

问题6:AttributeError: Site instance has no attribute ‘setServiceParent’

报错信息:

可能原因:

Hi there. Did you follow the install instructions here? http://portia.readthedocs.io/en/latest/installation.html

You are using twistd -n slyd to start portia. This is the old way, the new way to run it is to run bin/slyd

问题7:Can not find Xvf

报错内容:

采取方案: pip install xvfbwrapper 无效,导致原因:

The xvfbwrapper version 0.2.3 contained a bug that was causing the problem.

(Sorry, when I checked the versions on my systems earlier I must’ve checked the same computer from the terminal on accident… I do that sometimes…)

I contacted the author and he graciously fixed the bug and released version 0.2.4 to PyPi which should work as expected.

The version on my other system that works was xvfbwrapper v0.2.2

解决方案: pip install -v xvfbwrapper==0.2.7

完成后再次报错:

导致原因:电脑上缺少 python-xvfbwrapper 组件

解决方案: yum install python-xvfbwrapper

再次执行,报如下错误:

解决方案:

问题8:无法安装Python lupa模块

报错内容:

解决方案:

问题9:再次执行bin/slyd,然后打开过http://localhost:9001/,页面空白

导致原因,portia_server未启动,需要配置nginx和启动django方能运行。

目前状态未处理,未完待续。

打赏作者
微信支付标点符 wechat qrcode
支付宝标点符 alipay qrcode

C语言学习:size_t

在学习C语言的时候,遇到了一个新的数据类型size_t,截止目前也没有完全理清这个类似的具体场景及出现的原因。
44 sec read

C语言学习:main()函数的正确写法

C语言虽然是一门古老的语言,但是其标准一直在完善,所以很多以前支持的语法在到当前已经不能在使用了。 C语言的版
41 sec read

Scipy数学函数的Scala实现

最近在推进项目的时候,遇到需要将线下的Python代码转化成线上的集群代码,由于机器代码环境是Scala,所以
4 min read

7 Replies to “CentOS 7.2 安装 Portia 记录”

    1. 最近一段时间较忙,没有继续处理,主要问题原因是Django没有部署,安装 nginx然后配置启动不同站点应该就可以。

  1. 按照你的步骤后,我也碰到了问题9,页面时空白的

  2. 您好!请问您现在解决了么?我按照官方文档的方法运行Portia,可以创建项目和设计爬虫,但是不能运行爬虫

  3. portia “GET /api/projects HTTP/1.1” 404 153 这个问题你遇到过吗

  4. 您好,我在学习portia的时候遇到了问题,想请教您一下,当我在ember build 的时候我出现了下面的问题:
    ⠋ Building

    25604 ms: Mark-sweep 986.3 (1035.9) -> 986.3 (1035.9) MB, 1154.7 / 0.0 ms [allocation failure] [GC in old space requested].
    26835 ms: Mark-sweep 986.3 (1035.9) -> 986.2 (1018.9) MB, 1231.2 / 0.0 ms [last resort gc].
    28002 ms: Mark-sweep 986.2 (1018.9) -> 986.2 (1018.9) MB, 1165.9 / 0.0 ms [last resort gc].

    ==== JS stack trace =========================================

    Security context: 0x3fe1635cfb51
    2: keysForTree [/home/vagrant/portia-master/portiaui/node_modules/_broccoli-kitchen-sink-helpers@0.3.1@broccoli-kitchen-sink-helpers/index.js:37] [pc=0x2abbf3338697] (this=0x3fe1635e6111 ,fullPath=0x3dfeb2595f49 <String[742]: /home/vagrant/portia-master/portiaui/node_modules/_babel-core@6.25.0@babel-core/node_modules/babel-register/node_modules/babel-core/node_modules/babe…

    FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed – JavaScript heap out of memory
    1: node::Abort() [ember]
    2: 0x10a878c [ember]
    3: v8::Utils::ReportApiFailure(char const*, char const*) [ember]
    4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [ember]
    5: v8::internal::Factory::NewRawOneByteString(int, v8::internal::PretenureFlag) [ember]
    6: v8::internal::Factory::NewStringFromOneByte(v8::internal::Vector, v8::internal::PretenureFlag) [ember]
    7: v8::internal::Factory::NumberToString(v8::internal::Handle, bool) [ember]
    8: v8::internal::Runtime_NumberToStringSkipCache(int, v8::internal::Object**, v8::internal::Isolate*) [ember]
    9: 0x2abbf28079a7
    Aborted

    我谷歌百度了很多都不能解决,希望可以得到您的帮助!!!感激不尽!!!

发表评论

电子邮件地址不会被公开。 必填项已用*标注