Skip to content

DevLog 1-3

DevLog 1-3: Node.js & Web Scraping with Cron

Section titled “DevLog 1-3: Node.js & Web Scraping with Cron”

Built a Node.js application that uses Puppeteer to scrape GitHub repository data on a scheduled interval using cron jobs.

  • Installed and configured Node.js (v20.9.0).
  • Initialized an NPM project with ES module support ("type": "module").
  • Installed Nodemon globally for automatic script reloading during development.
  • cron for scheduling automated tasks.
  • puppeteer for headless browser automation.

Created index.js with the following functionality:

  • Launches a headless Chromium browser using Puppeteer.
  • Navigates to https://github.com/alpnix/Radical-Software-DevLogs.
  • Extracts structured data (title, repo name, description, star count, and first five files).
  • Logs timestamped results to the console.
  • Configured cron expression */10 * * * * (runs every 10 minutes).
  • Executes the scraping function on schedule.
  • Used modern ES6 imports.
  • Implemented async/await for async browser operations.
  • Used DOM query selectors with optional chaining for safe extraction.
  • Configured Puppeteer with networkidle2 for reliable page loading.