Scraping article content and metadata using CSV file import

In many areas, users often need many articles scraped on demand. Implementing CSV article import was a logical thing to do. This way, users are not limited by our API, nor do they need to code custom logic around it for queues, retries, etc. You can simply create a CSV file containing articles to scrape, […]

Ivan Radunovic
Scraping article content and metadata using CSV file import

In many areas, users often need many articles scraped on demand. Implementing CSV article import was a logical thing to do.

This way, users are not limited by our API, nor do they need to code custom logic around it for queues, retries, etc.

You can simply create a CSV file containing articles to scrape, and after they are processed, you will be able to download results.

Using Google Sheets compile a list of URLs to scrape

Using Google Sheets is one of the ways how you can create CSV file for import.

File should contain only one column without heading row and with one article link in each row.

Any non-valid URLs inside the file will be ignored.

Now it’s time to download this sheet as a CSV file, click File -> Download -> Comma Separated Values (.csv).

Importing CSV file into Niched AI

Inside an active Niched AI project you’ll find Imports link in the sidebar.

On Imports page there is only one file input, on selection of a file system will present approval modal.

Approval modal

System expects to find URLs in the first column.

If you’re seeing URLs click Import, otherwise modify your CSV file.

On import system will start processing links in the background, and you’ll see a status of Processing.

While imports are processing you can leave a page and do something else. Once it’s ready you’ll receive an email, or you can visit Imports page again and check for status.

When scrapping is completed you’ll be able to download results in the CSV format.

Properties of scraped articles

Final export will have only valid articles scraped. Any non-crawlable or non-article links won’t be scraped, or they’ll return some funny content.

Following data points are returned:

  • URL
  • Title
  • Image
  • Published date
  • Author
  • HTML content
  • Text content

Conclusion

Soon you’ll be able to browse scraped articles in the UI also.

Every free trial use gets 100 credits, that gives you 100 article scrapes.

Niched AI offers a 7-day free trial, with no credit card required.
Sign up here