Every package requires important metadata about the package. This file will declare various information such as package name, description, dependencies, author(s), contact information and more. This module covers the metadata information that is either required or should be included in both and packages along with the key files to supply this information.
First, we’ll talk about the metadata to include and then the last two sections will walk through the and files that hold this info.
Name, title and description information help describe what the package does. Both and have some general guidelines required for each of these fields such as the number of characters. However, the intent is often the same:
Most packages have dependencies. Moreover, packages can have different types of dependencies:
There is a wide array of options when it comes to software versioning; however, we advise using a Semantic Versioning approach. Semantic versioning follows a <MAJOR>.<MINOR>.<PATCH>
numbering approach (i.e. 0.9.1, 1.4.0). With this approach, you increment certain parts of a version number depending on the changes made:
<MAJOR>
version when you make incompatible API changes. This signals that your updates will likely effect many users and will cause breaking changes to users’ prior code.<MINOR>
version when you add functionality in a backwards compatible manner. This often includes adding a new feature that does not effect the existing code base.<PATCH>
version when you make backwards compatible bug fixes.You may also see in-development packages using a fourth component called the development version (i.e. 1.0.0.9000). Using this development number makes it easy to see if a package is released or in-development and the use of the fourth place means that you’re not limited to what the next version will be.
Unfortunately, determining the right version number is not always an exact science. For example, if you make an API-incompatible change to a rarely-used part of your code, it may not deserve a major number change. But if you fix a bug that many people depend on, it will feel like an API breaking change. Use your best judgement.
Packages often start with a version number 0.1.0
and slowly increment as they mature. A version of 1.0.0
typically indicates that your package is feature complete with a stable API.
The license field states who can use your package. The License field can be either a standard abbreviation for an open source license, like GPL-2 or BSD, or a pointer to a file containing more information, file LICENSE
. The license is really only important if you’re planning on releasing your package or you may need a proprietary license for packages built within your organization.
There are several open source software licenses to choose from. A few of the more common ones include:
If you’d like to learn more about other common licenses, Github’s choosealicense.com is a good place to start. Another good resource is https://tldrlegal.com/, which explains the most important parts of each license.
It is common to include URLs to direct users to:
We may also want to point people to a package documentation website, the changelog, or other URLs that host important package information.
As you’ll see in the next couple sections, there are additional types of metadata that we can include such as deploying data or other files (i.e. LICENSE) with you package. However, the items listed in the previous section are the primary forms of metadata that we want to ensure we include.
The DESCRIPTION
file is the main file used to capture metadata for your package. If you look at your package’s DESCRIPTION
file you will see that several items are already filled out.
Package: myfirstpkg
Title: My First Package
Version: 0.1.0
Authors@R:
person("Brad", "Boehmke", email = "bradleyboehmke@gmail.com", role = c("aut", "cre"))
Description: Provide a longer description of what the package does. This is the
place to give your elevator pitch of how great this package is and how important
it is to others.
URL: https://github.com/bradleyboehmke/myfirstpkg
BugReports: https://github.com/bradleyboehmke/myfirstpkg/issues
License: file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
R (>= 2.10)
Suggests:
testthat
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.0
A couple items to note:
Title
is the one line description of the package you supplied when creating the package structure. It should be plain text, title case, and NOT end in a period. If submitting to CRAN, this will be truncated to 65 characters. So depending on your original input you may want to adjust the Title
to ensure it meets these requirements.Your Author
information has been included but if you are working with multiple people you may want to include more authors which can look like this:
Authors@R: c(
person("John", "Doe", email = "john.doe@example.com", role = "cre"),
person("Jane", "Doe", email = "jane.doe@examplecom", role = "aut"))
Note that there are various roles authorship can have. The main ones include:
cre
: the creator or maintainer, the person to contact if there are problems.aut
: authors, those who have made significant contributions to the package.ctb
: contributors, those who have made smaller contributions, like patches.cph
: copyright holder. This is used if the copyright is held by someone other than the author, typically a company (i.e. the author’s employer).Description
is more detailed than the title. You can use multiple sentences but you are limited to one paragraph. If your description spans multiple lines (and it should!), each line must be no more than 80 characters wide. Indent subsequent lines with 4 spaces.License
points to the LICENSE file automatically created based on the license you specified.Encoding
is if you use any non-ASCII characters in the DESCRIPTION
file, you must also specify an encoding. There are three main encodings that work on all platforms: latin1
, latin2
and UTF-8
(by far the most common).LazyData
makes it easier to access data in your package. Because it’s important, it’s included in the basic template even though you may not share data through your package.
Currently in our DESCRIPTION
file we specify:
Depends:
R (>= 2.10)
Suggests:
testthat
Depends
states that our package depends on the users version of R to be 2.10 or greater. Think very carefully before increasing to a more restrictive version of R. As you add more sophisticated functionality to your package you, as the developer, need to think about whether or not those functionalities will continue to work on all versions of R >= 2.10 or if you need to bump the version number to, say, 3.0.
Suggests
states that our package leverages the testthat package but it is not required for basic functionality. In fact, testthat is only used to run the tests for our package so a regular user of our package does not need testthat to use my_mean()
.
Often, we will need to add new dependencies to our package. For example, say one of our functions uses the dplyr and ggplot2 packages and one our tests uses the purrr package. Then, our package’s functionality depends on dplyr and ggplot2 but purrr is not a hard requirement.
We would add the required package dependencies under Imports
and the non-hard requirements under Suggests
:
Generally you will only put the R version under Depends
and keep all package requirements under Imports
.
Depends:
R (>= 2.10)
Imports:
dplyr,
ggplot2
Suggests:
purrr,
testthat
You can easily add new packages to the Imports
and Suggests
fields with:
usethis::use_package(“pkg_name”)
usethis::use_package(“pkg_name”, “Suggests”)
Also, you can require specific versions of a package by specifying the version in parentheses after the package name:
Imports:
ggvis (>= 0.2),
dplyr (>= 0.3.0.1)
Suggests:
MASS (>= 7.3.0)
The setup.py
file is the main file used to capture metadata for your package and configures your package for distribution. The setup()
function provides the main functionality. If you look at your package’s setup.py
file you will see that several items are already filled out.
#!/usr/bin/env python
"""Setup, configuration, and metadata file for the myfirstpypkg package."""
from setuptools import find_packages
from setuptools import setup
install_requires = []
doc_requires = ["sphinx", "sphinx_rtd_theme", "sphinxcontrib.napoleon"]
test_requires = ["pytest"]
dev_requires = ["flake8", "mypy"] + doc_requires + test_requires
setup(
name="myfirstpypkg",
version="0.1.0",
license="GNU General Public License v3",
description="My first package",
url="https://github.com/bradleyboehmke/myfirstpypkg",
author="Brad Boehmke",
author_email="bradleyboehmke@gmail.com",
package_dir={"": "src"},
packages=find_packages(where='src'),
python_requires='>=3.6',
install_requires=install_requires,
extras_require={"docs": doc_requires, "tests": test_requires, "dev": dev_requires},
test_suite="tests",
include_package_data=True,
project_urls={
'Source': "https://github.com/bradleyboehmke/myfirstpypkg",
'Bug Reports': "https://github.com/bradleyboehmke/myfirstpypkg/issues",
},
)
A couple items to note:
description
is a one-line description or tagline of what your project does but does not require title-case or have a strict character length restriction. Note that you can also supply a longer, multi-line description with a long_description
parameter.package_dir
and packages
points to where the package is located. In our case we have our package in the src/
subdirectory so we need to specify that is where it is located.test_suite
simply tells pytest where to look to find and run our tests.include_package_data
is used when you have data you want to deploy with your package.A common theme you will see is specifying all package requirements in a list and then supplying them to setup()
parameters as we are doing in our current setup.py
:
python_requires
is where you need to specify the version of Python required for your package. When a user tries to install your package pip install
will check that the user’s Python version meets this requirement and refuse to install the project if the version does not match.install_requires
is where any required package dependencies would be added. In this case, our initial package has no requirements but if we added functionality that requires numpy for example, then we would include numpy in the currently empty install_requires
list.extras_require
is where any additional “supporting” packages would go. These are packages that are not required for normal use but commonly required to run tests, build supporting documentation, and execute other common developer activities. Users can install these “extras” by running pip install pkgname[dev]
.
When you are developing you will often do the following the activate your virtual environment, install the package in an editable fashion, and ensure all developer required packages are installed in your environment: bash source venv/bin/activate pip install -e . ".[dev]"
Description
field for your package.devtools::check()
to make sure the package builds successfully.long_description
field for your package.