Open Papers

Mar 8, 2026 · 2 min read

Overview

Open Papers is a Python toolkit for collecting research papers from major AI and machine learning venues. The goal is to build a high-quality dataset of papers with reliable metadata and full PDFs, enabling downstream research applications such as citation analysis, research trend analysis, and paper recommendation.

Motivation

The rapid growth of AI and machine learning research has led to thousands of new papers being published every year. While platforms such as Google Scholar or Semantic Scholar aggregate publication data, their coverage and metadata quality can sometimes be inconsistent.

Open Papers was developed to provide a cleaner and more structured dataset by directly scraping official conference and journal websites.

Features

  • Extracts paper metadata including title, authors, and abstract
  • Automatically downloads PDFs
  • Supports multiple major AI/ML conferences and journals
  • Resume capability for interrupted scraping
  • Robust error handling and rate limiting

Coverage

The scraper focuses on major AI and machine learning venues, particularly papers published in the deep learning era (roughly 2013 onward). Supported venues include conferences such as NeurIPS, ICML, ICLR, CVPR, ACL, and several others.

Applications

The collected dataset can support a variety of downstream tasks:

  • research trend and topic analysis
  • citation and reference recommendation
  • structured document parsing (e.g., with tools like GROBID)
  • intelligent paper discovery and reading recommendation

The dataset is continuously updated as new conference proceedings become available.