Packages are the fundamental units of reproducible R and Python code. They include reusable components, the documentation that describes how to use them, requirements to ensure the user can apply them and tests to ensure consistent and reliable functionality. In this course you’ll learn how to turn your code into packages so that you, your teammates, and others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.
Often, small snippets of code grow in usefulness and importance, which creates a need to share and distribute their contents. and libraries require packaging, otherwise distributing code becomes problematic and brittle. A package bundles together code, data, documentation, and tests, and is easy to share with others.
People often use the terms “package” and “library” synonymously. Although there are some semantical differences between R and Python, here is how you can think of the two terms:
package: generally refers to source code that is bundled up in a way that a package manager can host. PyPI and CRAN are the two primary public package managers for Python and R. When you pip install pkg
(Python) or install.packages(“pkg”)
(R) you are installing the pkg package from a package manager onto a computer.
library: generally refers to a centralized location on an operating system where installed package source code resides and can be imported into a current session (i.e. /usr/lib/R/library). When you use pip list
(Python) or installed.packages()
(R) you will see a list of installed packages, which we refer to as libraries. When you run sys.path
(Python) or .libPaths()
(R) you will get the path to the library where your installed packages are stored.
Why write a package?
To be clear, it is not necessary to package your code to address all the bullets in the previous section; but it will definitely make them easier to achieve. However, there are times when writing a package is not necessary. For example:
Although the above type of work may not justify a package, it often still justifies good software engineering practices such as modularity and unit tests!
It is important to understand that there are several approaches and strategies to writing packages. Even within a single language there are different ways to structure a package. Consequently, understand that this course demonstrates the best practices we have found for developing, maintaining, and distributing packages. Our goal is to provide you with a short runway to writing packages as quickly as possible while following commonly used best practices.
This is a lot to learn, but don’t feel overwhelmed. Start with a minimal subset of useful features and build up over time.
As you write more packages you will begin to learn the lower-level details and alternative options. Do not shy away from these details as we highly recommend that you learn them. In fact, below are some additional resources that will take you to the official R and Python packaging documentation. Realize that the details in these documents can make them challenging to read. That is ok and to be expected. Our suggestion is to get the basics down, which we cover in this course, and slowly expand your knowledge base by writing more packages.
We will also refer to both R and Python throughout. When discussing one or the other we will typically label with the appropriate icons:
In addition to the general text used throughout, you will notice the following code chunks:
Signifies a tip or suggestion
Signifies a general note
Signifies a warning or caution