Every package requires important metadata about the package. This file will declare various information such as package name, description, dependencies, author(s), contact information and more. This module covers the metadata information that is either required or should be included in both and packages along with the key files to supply this information.

First, we’ll talk about the metadata to include and then the last two sections will walk through the and files that hold this info.

Name and package description

Name, title and description information help describe what the package does. Both and have some general guidelines required for each of these fields such as the number of characters. However, the intent is often the same:

  1. Name is the name of the package,
  2. A shorter, one line description of the package provides,
  3. A longer, multi-line description of the package. This is the elevator pitch of the key things that your package does and how important it is to others.

Author and contact info

Packages require at least the primary author to be listed along with their email contact information. You can optionally provide additional authors and contributors along with their contact info.

Dependencies

Most packages have dependencies. Moreover, packages can have different types of dependencies:

  • Language version dependency: Sometimes our package relies on certain language features that are only available in certain versions. For example, if you use f-strings in a Python package than your package depends on using Python v3.6 or later.
  • Required package dependencies: Often our package uses other third party packages to simplify tasks. We may use dplyr or pandas to simplify data frame manipulations. We need to capture any required packages used by our source code to make sure anyone using our package also has these required packages installed on their operating system (OS).
  • Suggested package dependencies: Sometimes your package leverages third party packages but doesn’t strictly require them to provide its fundamental capabilities. This may include packages that provide example datasets, running tests, building documentation, etc. We can usually specify these packages in a way that doesn’t enforce installation onto a users OS but provides them the option to install them if necessary.
  • Developer dependencies: Developers often use certain packages for enabling syntax styling, pre-commit hooks and other developer environment requirements.

Versioning

There is a wide array of options when it comes to software versioning; however, we advise using a Semantic Versioning approach. Semantic versioning follows a <MAJOR>.<MINOR>.<PATCH> numbering approach (i.e. 0.9.1, 1.4.0). With this approach, you increment certain parts of a version number depending on the changes made:

  • <MAJOR> version when you make incompatible API changes. This signals that your updates will likely effect many users and will cause breaking changes to users’ prior code.
  • <MINOR> version when you add functionality in a backwards compatible manner. This often includes adding a new feature that does not effect the existing code base.
  • <PATCH> version when you make backwards compatible bug fixes.

You may also see in-development packages using a fourth component called the development version (i.e. 1.0.0.9000). Using this development number makes it easy to see if a package is released or in-development and the use of the fourth place means that you’re not limited to what the next version will be.

Unfortunately, determining the right version number is not always an exact science. For example, if you make an API-incompatible change to a rarely-used part of your code, it may not deserve a major number change. But if you fix a bug that many people depend on, it will feel like an API breaking change. Use your best judgement.

Packages often start with a version number 0.1.0 and slowly increment as they mature. A version of 1.0.0 typically indicates that your package is feature complete with a stable API.

License

The license field states who can use your package. The License field can be either a standard abbreviation for an open source license, like GPL-2 or BSD, or a pointer to a file containing more information, file LICENSE. The license is really only important if you’re planning on releasing your package or you may need a proprietary license for packages built within your organization.

There are several open source software licenses to choose from. A few of the more common ones include:

  • MIT: This is a simple and permissive license. It lets people use and freely distribute your code subject to only one restriction: the license must always be distributed with the code.
  • GPL-2 or GPL-3: These are “copy-left” licenses. This means that anyone who distributes your code in a bundle must license the whole bundle in a GPL-compatible way. Additionally, anyone who distributes modified versions of your code (derivative works) must also make the source code available. GPL-3 is a little stricter than GPL-2, closing some older loopholes.
  • CC0: It relinquishes all your rights on the code and data so that it can be freely used by anyone for any purpose. This is sometimes called putting it in the public domain, a term which is neither well-defined nor meaningful in all countries.

If you’d like to learn more about other common licenses, Github’s choosealicense.com is a good place to start. Another good resource is https://tldrlegal.com/, which explains the most important parts of each license.

Project source

It is common to include URLs to direct users to:

  • Where the source code resides (i.e. Github repo)
  • Where to post bugs, feature requests, etc. (i.e. Github issues)

We may also want to point people to a package documentation website, the changelog, or other URLs that host important package information.

Other

As you’ll see in the next couple sections, there are additional types of metadata that we can include such as deploying data or other files (i.e. LICENSE) with you package. However, the items listed in the previous section are the primary forms of metadata that we want to ensure we include.

package metadata example

The DESCRIPTION file is the main file used to capture metadata for your package. If you look at your package’s DESCRIPTION file you will see that several items are already filled out.

Package: myfirstpkg
Title: My First Package
Version: 0.1.0
Authors@R:
    person("Brad", "Boehmke", email = "bradleyboehmke@gmail.com", role = c("aut", "cre"))
Description: Provide a longer description of what the package does. This is the
    place to give your elevator pitch of how great this package is and how important
    it is to others.
URL: https://github.com/bradleyboehmke/myfirstpkg
BugReports: https://github.com/bradleyboehmke/myfirstpkg/issues
License: file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
    R (>= 2.10)
Suggests:
    testthat
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.0

A couple items to note:

  • Title is the one line description of the package you supplied when creating the package structure. It should be plain text, title case, and NOT end in a period. If submitting to CRAN, this will be truncated to 65 characters. So depending on your original input you may want to adjust the Title to ensure it meets these requirements.
  • Your Author information has been included but if you are working with multiple people you may want to include more authors which can look like this:

    Authors@R: c(
     person("John", "Doe", email = "john.doe@example.com", role = "cre"),
     person("Jane", "Doe", email = "jane.doe@examplecom", role = "aut"))

    Note that there are various roles authorship can have. The main ones include:

    • cre: the creator or maintainer, the person to contact if there are problems.
    • aut: authors, those who have made significant contributions to the package.
    • ctb: contributors, those who have made smaller contributions, like patches.
    • cph: copyright holder. This is used if the copyright is held by someone other than the author, typically a company (i.e. the author’s employer).
  • Description is more detailed than the title. You can use multiple sentences but you are limited to one paragraph. If your description spans multiple lines (and it should!), each line must be no more than 80 characters wide. Indent subsequent lines with 4 spaces.
  • License points to the LICENSE file automatically created based on the license you specified.
  • Encoding is if you use any non-ASCII characters in the DESCRIPTION file, you must also specify an encoding. There are three main encodings that work on all platforms: latin1, latin2 and UTF-8 (by far the most common).
  • LazyData makes it easier to access data in your package. Because it’s important, it’s included in the basic template even though you may not share data through your package.

Dependencies

Currently in our DESCRIPTION file we specify:

Depends:
    R (>= 2.10)
Suggests:
    testthat

Depends states that our package depends on the users version of R to be 2.10 or greater. Think very carefully before increasing to a more restrictive version of R. As you add more sophisticated functionality to your package you, as the developer, need to think about whether or not those functionalities will continue to work on all versions of R >= 2.10 or if you need to bump the version number to, say, 3.0.

Suggests states that our package leverages the testthat package but it is not required for basic functionality. In fact, testthat is only used to run the tests for our package so a regular user of our package does not need testthat to use my_mean().

Often, we will need to add new dependencies to our package. For example, say one of our functions uses the dplyr and ggplot2 packages and one our tests uses the purrr package. Then, our package’s functionality depends on dplyr and ggplot2 but purrr is not a hard requirement.

We would add the required package dependencies under Imports and the non-hard requirements under Suggests:

Generally you will only put the R version under Depends and keep all package requirements under Imports.

Depends:
    R (>= 2.10)
Imports:
    dplyr,
    ggplot2
Suggests:
    purrr,
    testthat

You can easily add new packages to the Imports and Suggests fields with:

  • usethis::use_package(“pkg_name”)
  • usethis::use_package(“pkg_name”, “Suggests”)

Also, you can require specific versions of a package by specifying the version in parentheses after the package name:

Imports:
    ggvis (>= 0.2),
    dplyr (>= 0.3.0.1)
Suggests:
    MASS (>= 7.3.0)

package metadata example

The setup.py file is the main file used to capture metadata for your package and configures your package for distribution. The setup() function provides the main functionality. If you look at your package’s setup.py file you will see that several items are already filled out.

#!/usr/bin/env python
"""Setup, configuration, and metadata file for the myfirstpypkg package."""
from setuptools import find_packages
from setuptools import setup


install_requires = []
doc_requires = ["sphinx", "sphinx_rtd_theme", "sphinxcontrib.napoleon"]
test_requires = ["pytest"]
dev_requires = ["flake8", "mypy"] + doc_requires + test_requires

setup(
    name="myfirstpypkg",
    version="0.1.0",
    license="GNU General Public License v3",
    description="My first package",
    url="https://github.com/bradleyboehmke/myfirstpypkg",
    author="Brad Boehmke",
    author_email="bradleyboehmke@gmail.com",
    package_dir={"": "src"},
    packages=find_packages(where='src'),
    python_requires='>=3.6',
    install_requires=install_requires,
    extras_require={"docs": doc_requires, "tests": test_requires, "dev": dev_requires},
    test_suite="tests",
    include_package_data=True,
    project_urls={
        'Source': "https://github.com/bradleyboehmke/myfirstpypkg",
        'Bug Reports': "https://github.com/bradleyboehmke/myfirstpypkg/issues",
    },
)

A couple items to note:

  • description is a one-line description or tagline of what your project does but does not require title-case or have a strict character length restriction. Note that you can also supply a longer, multi-line description with a long_description parameter.
  • package_dir and packages points to where the package is located. In our case we have our package in the src/ subdirectory so we need to specify that is where it is located.
  • test_suite simply tells pytest where to look to find and run our tests.
  • include_package_data is used when you have data you want to deploy with your package.

Dependencies

A common theme you will see is specifying all package requirements in a list and then supplying them to setup() parameters as we are doing in our current setup.py:

  • python_requires is where you need to specify the version of Python required for your package. When a user tries to install your package pip install will check that the user’s Python version meets this requirement and refuse to install the project if the version does not match.
  • install_requires is where any required package dependencies would be added. In this case, our initial package has no requirements but if we added functionality that requires numpy for example, then we would include numpy in the currently empty install_requires list.
  • extras_require is where any additional “supporting” packages would go. These are packages that are not required for normal use but commonly required to run tests, build supporting documentation, and execute other common developer activities. Users can install these “extras” by running pip install pkgname[dev].

    When you are developing you will often do the following the activate your virtual environment, install the package in an editable fashion, and ensure all developer required packages are installed in your environment: bash source venv/bin/activate pip install -e . ".[dev]"

Exercises

  • Check out the metadata for the dplyr package.
    • What is the current version number?
    • Who was the original creator?
    • What kind of license does it have?
    • What R version does it require?
    • What kind of dependencies are glue, rlang, DBI and lobstr?
  • Check out the metadata for the pre-commit package. Note that this package uses setup.cfg to hold its metadata.
    • What is the current version number?
    • Who is the author?
    • What kind of license does it have?
    • What Oython version does it require?
    • Name 4 package dependencies of pre-commit?
  • Building on to the initial R and Python packages you created in the portfolio building exercise:
    • Check out the LICENSE file in your package.
    • R package
      • Fill out the Description field for your package.
      • Add dplyr as a required package dependency.
      • Add purrr as a suggested package dependency.
      • Run devtools::check() to make sure the package builds successfully.
    • Python package
      • Create a long_description field for your package.
      • Add numpy as a required package dependency.
      • Add black as a developer suggested package dependency.
      • Activate your virtual environment, install your package in editable mode, ensure all developer dependencies are installed and rerun your test(s).


🏠