Nightmare JS is a great browser automation library, especially effective for scrapping websites. It has two faults though: memory leaks and zombie electron processes.
But with a duct-tape hacks we can easily overcome both. For memory leak, which is minor but builds up over time, daily restart of the node script or app is pretty effective and avoids solutions that are overkill.
The second issue is zombie electron processes. Nightmare js relies on electron js for most of the underlying browser work. Whenever nightmare js is running you’ll see multiple electron processes at work (4 of them in the screenshot of htop):
Usually they disappear after nightmare is done, but not always. When you include (require) nightmare once and spin up a new instance for, say, every API call like below, eventually one or two electron processes avoid being cleaned up, and become lurking zombies, and they add up pretty quickly to eventually choke the server.
const Nightmare = require('nightmare');
//creating an instance for each API call
router.post('/scrape/', function(req, res){
const nightmare = Nightmare({})
nightmare
.goto('https://example.com')
//...
})
When that happens there’s no way to get rid of the zombie electrons except for restarting the app that’s using nightmare. But restarting every few hours or minutes is not feasible, so we just move to the next effective hack, which is to run a node js zombieKiller script every few seconds.
We simply execute linux command killall --older-than 2m electron
. It means to find all “electron” processes older than 2 minutes (which is generous time in which nightmare would have long completed its work), and kill them. We run this script every 10 seconds.
And just for our reference, whenever zombies are killed we also check how many zombies are left with ps aux | grep electron | wc -l
.
// zombieKiller.js
const { exec } = require('child_process');
const killCommand = "killall --older-than 2m electron";
const countCommand = "ps aux | grep electron | wc -l"
const killZombies = function () {
exec(killCommand, (err, stdout, stderr) => {
if (!err && !stderr) {
console.log(new Date(), `Zombie electrons killed!!!`);
countElectronProcesses();
}
else if (err && !err.toString().includes("no process found")) {
console.log(new Date(), `Error in executing ${killCommand}`, err);
}
else if (stderr && !stderr.toString().includes("no process found")) {
console.log(new Date(), `stderr: ${stderr}`);
}
});
setTimeout(killZombies, 10000);
}
function countElectronProcesses() {
exec(countCommand, (err, stdout, stderr) => {
console.log(`${new Date()} No of Electron Processes: ${stdout}`)
});
}
console.log("Starting Zombie Killer");
killZombies();
(“no process found” appears in error. We need to avoid this useless info every 10 seconds, therefore the checks).
Over time, the logs of the script will look something like below:
2019-11-11T00:07:13.259Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:07:13 GMT+0000 (UTC) No of Electron Processes: 2
2019-11-11T00:13:33.373Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:13:33 GMT+0000 (UTC) No of Electron Processes: 11
2019-11-11T00:13:43.376Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:13:43 GMT+0000 (UTC) No of Electron Processes: 8
2019-11-11T00:14:23.410Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:14:23 GMT+0000 (UTC) No of Electron Processes: 2
2019-11-11T00:19:03.494Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:19:03 GMT+0000 (UTC) No of Electron Processes: 2
2019-11-11T00:23:43.588Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:23:43 GMT+0000 (UTC) No of Electron Processes: 8
2019-11-11T00:24:43.624Z 'Zombie electrons killed!!!'
Mon Nov 11 2019 00:24:43 GMT+0000 (UTC) No of Electron Processes: 2
See also
- Node JS Mongo Client for Atlas Data API
- SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided. Check your key and signing method.
- Exactly Same Query Behaving Differently in Mongo Client and Mongoose
- MongoDB Single Update Query to Change the Field Name in All Matching Documents of the Collection
- AWS Layer: Generate nodejs Zip Layer File Based on the Lambda's Dependencies
- In Node JS HTML to PDF conversion, Populate Images From URLs
- Convert HTML to PDF in Nodejs