Skip to content

Browser Emulation Using PhantomJs Cloud

Irfan Charania edited this page Dec 21, 2016 · 2 revisions

Browser Emulation Using PhantomJs Cloud

Problem: Certain sites have ugly source code and/or render the page using JavaScript, making it next to impossible to use the Website Agent. (Described in issue #888)

Solution: Use PhantomJs Cloud to emulate the browser and return a fully rendered DOM. This allows the Website Agent to then properly scrape dynamic content from JavaScript-heavy pages.

There are two ways to generate URLS for PhantomJs Cloud:

  1. PhantomJs Cloud Agent (simpler but limited)
  2. Manually


Before you begin, you will need to sign up for an account at Then you can copy your API key and add it in your Huginn credentials

    "id": 1,
    "user_id": 1,
    "credential_name": "phantomjs_cloud",
    "credential_value": "YOUR-KEY",
    "mode": "text"

Option 1: PhantomJs Cloud Agent

** This agent only provides a limited subset of the most commonly used options.

The workflow to fetch the page is as follows:

  1. RssAgent - provides example urls to fetch
  2. PhantomJsCloudAgent - to set up PhantomJs Cloud options
  3. WebsiteAgent - to fetch the page using PhantomJs Cloud
  4. DataOutputAgent - to output RSS

Full scenario can be found here

1. RssAgent

Name: PhantomJS Cloud - In - RSS

  "expected_update_period_in_days": "5",
  "clean": "true",
  "url": ""

2. PhantomJsCloudAgent

Name: PhantomJS Cloud - Process - Options
Event sources: PhantomJS Cloud - In - RSS
Propagate immediately: Yes

  "mode": "clean",
  "api_key": "{% credential phantomjs_cloud %}",
  "url": "{{url}}",
  "render_type": "html",
  "output_as_json_radio": "false",
  "output_as_json": "false",
  "ignore_images_radio": "false",
  "ignore_images": "false",
  "user_agent": "Mozilla/5.0 (BlackBerry; U; BlackBerry 9900; en) AppleWebKit/534.11+ (KHTML, like Gecko) Version/ Mobile Safari/534.11+",
  "wait_interval": "1000"

3. WebsiteAgent

Name: PhantomJS Cloud - Process - Fetch Page
Event sources: PhantomJS Cloud - Process - Options
Propagate immediately: Yes

  "expected_update_period_in_days": "2",
  "url_from_event": "{{url}}",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "title": {
      "css": "title",
      "value": "normalize-space(.)"
    "body": {
      "css": "body #comic",
      "value": "./node()"

4. DataOutputAgent

Name: PhantomJS Cloud - Out - RSS
Event sources: PhantomJS Cloud - Process - Fetch Page
Propagate immediately: Yes

  "secrets": [
  "expected_receive_period_in_days": 2,
  "template": {
    "title": "XKCD comics as a feed",
    "description": "This is a feed of recent XKCD comics, generated by Huginn",
    "item": {
      "title": "{{title}}",
      "description": "{{body}}"

Option 2: Manually

The workflow to fetch the page is as follows:

  1. RssAgent - provides example urls to fetch
  2. EventFormattingAgent - to set up PhantomJs Cloud options
  3. JavascriptAgent - to properly encode the [REQUEST-JSON] using encodeURIComponent()
  4. WebsiteAgent - to fetch the page using PhantomJs Cloud
  5. DataOutputAgent - to output RSS

Full scenario can be found here

1. RssAgent

Name: PhantomJS Cloud - In - RSS

  "expected_update_period_in_days": "5",
  "clean": "true",
  "url": ""

2. EventFormattingAgent

Name: PhantomJS Cloud - Process - Format
Event sources: PhantomJS Cloud - In - RSS
Propagate immediately: Yes

  "instructions": {
    "message": {
      "url": "{{url}}",
      "renderType": "html",
      "requestSettings": {
        "userAgent": "Mozilla/5.0 (BlackBerry; U; BlackBerry 9900; en) AppleWebKit/534.11+ (KHTML, like Gecko) Version/ Mobile Safari/534.11+"
  "mode": "clean"

For more options, refer to the Official API

3. JavascriptAgent

Name: PhantomJS Cloud - Process - JS Escape
Event sources: PhantomJS Cloud - Process - Format
Propagate immediately: Yes

  "language": "JavaScript",
  "code": "Agent.receive = function() {\r\n  var events = this.incomingEvents();\r\n  for(var i = 0; i < events.length; i++) {\r\n    var js = JSON.stringify(events[i].payload.message);\r\n    this.log('Message to escape: ' + js);\r\n    this.createEvent({ 'url': encodeURIComponent(js) });\r\n    var callCount = this.memory('callCount') || 0;\r\n    this.memory('callCount', callCount + 1);\r\n  }\r\n}",
  "expected_receive_period_in_days": "2",
  "expected_update_period_in_days": "2"

Note: Huginn's uri_escape doesn't escape same as Javascript encodeURIComponent

4. WebsiteAgent

Name: PhantomJS Cloud - Process - Fetch Page
Event sources: PhantomJS Cloud - Process - JS Escape
Propagate immediately: Yes

  "expected_update_period_in_days": "2",
  "url_from_event": "{%credential phantomjs_cloud%}/?request={{url}}",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "title": {
      "css": "title",
      "value": "normalize-space(.)"
    "body": {
      "css": "body #comic",
      "value": "./node()"

5. DataOutputAgent

Name: PhantomJS Cloud - Out - RSS
Event sources: PhantomJS Cloud - Process - Fetch Page
Propagate immediately: Yes

  "secrets": [
  "expected_receive_period_in_days": 2,
  "template": {
    "title": "XKCD comics as a feed",
    "description": "This is a feed of recent XKCD comics, generated by Huginn",
    "item": {
      "title": "{{title}}",
      "description": "{{body}}"
Clone this wiki locally