Basic required files

Packages have a specific structure required to be bundable, deployable, and usable. There are a set of minimum files required for both and packages but you will often see additional files in packages. Though how these files are actually named will differ between and , the organization follows a very similar pattern. The primary files and directories you will be concerned about include:

  • Package metadata: Every package requires important metadata about the package. This file will declare various information such as package name, description, dependencies, author(s), contact information and more. The metadata notebook will discuss the details of this file.
  • Source code directory: This is the directory that contains all the source code for end-user functionality. Here you will organize your classes, functions, etc. into one or more .R / .py scripts. The source code notebook will discuss requirements, approaches and best practices for organizing this directory.
  • Test directory: Every package should always contain a testing framework. These tests should include a minimum of unit tests but can also contain integration or environment specific tests. The testing notebook will discuss the most common test approaches for and .
  • Documentation directory: Although not necessary, good packages often will have detailed documentation hosted as a static website. These files are generally kept in a /docs folder. The package website notebook will discuss how to make beautiful websites for your package.
  • License file: Specifies who can and cannot use your package. This file is related to specifications made in the metadata file and will be discussed in the metadata notebook.
  • Changelog: A changelog is a log or record of all notable changes made to a project. We will discuss the changelog in the Changelog notebook.
  • README: The README is often one of the first files a user sees when going to the source code location (typically Github). A good README will provide useful instructions to get started using the package quickly and also where to go to find more detailed information. We will discuss what a good README looks like in the README notebook.
  • Other various config or requirement files: You will often see additional files in packages. These may be various configuration files or other package components. Many of these you will be introduced along the way through this book and others will be explicitly discussed in the other components notebook.

This general structure will look like the following and the sections that follow will illustrate and describe the exact structure setup for each language.

.
├── package metadata file
├── source code directory
│   ├── script 1
│   ├── script 2
│   └── script n
├── test directory
│   ├── script 1
│   ├── script 2
│   └── script n
├── documentation directory
│   └── various doc files
├── license file
├── changelog file
├── README
└── other various config files and components

For the sections that follow let’s assume we are creating a package called mypkg.

file structure

Within R, the basic structure of your package will look like the following. The unique items here are:

  • DESCRIPTION: This is the file that contains the primary package metadata.
  • R/: The R directory is where all the source code resides.
  • NEWS: It is common convention in many R packages to title the changelog as “NEWS”.
mypkg
├── DESCRIPTION
├── R/
├── tests/
├── docs/
├── LICENSE
├── NEWS
└── README

The rest of the files are consistent with the previous section. As you’ll see in the next chapter, additional files will be produced when we build the package but these are the primary directories and files to get going with an R package.

file structure

Within Python, the basic structure of your package will look like the following. The unique items here are:

  • setup.py: This is the file that contains the primary package metadata.
  • src/: The src directory is where all the source code resides.
mypkg
├── setup.py
├── src/
├── tests/
├── docs/
├── LICENSE
├── CHANGELOG
└── README

The rest of the files are consistent with the first section. As you’ll see in the next chapter, additional files are often included to help configure the developer environment for a package but these are the primary directories and files to create a baseline Python package.

Exercises

  1. Look at the package structure for the package tidyr
    • Do all the files exist that were discussed in this module?
    • Take a quick glance at each of the files discussed in this section.
    • What additional files are present?
  2. Look at the package structure for the package pytest
    • Do all the files exist that were discussed in this module?
    • Take a quick glance at each of the files discussed in this section.
    • What additional files are present?


🏠