Packages are the fundamental units of reproducible R and Python code. They include reusable components, the documentation that describes how to use them, requirements to ensure the user can apply them and tests to ensure consistent and reliable functionality. In this course you’ll learn how to turn your code into packages so that you, your teammates, and others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.

Often, small snippets of code grow in usefulness and importance, which creates a need to share and distribute their contents. and libraries require packaging, otherwise distributing code becomes problematic and brittle. A package bundles together code, data, documentation, and tests, and is easy to share with others.

People often use the terms “package” and “library” synonymously. Although there are some semantical differences between R and Python, here is how you can think of the two terms:

  • package: generally refers to source code that is bundled up in a way that a package manager can host. PyPI and CRAN are the two primary public package managers for Python and R. When you pip install pkg (Python) or install.packages(“pkg”) (R) you are installing the pkg package from a package manager onto a computer.

  • library: generally refers to a centralized location on an operating system where installed package source code resides and can be imported into a current session (i.e. /usr/lib/R/library). When you use pip list (Python) or installed.packages() (R) you will see a list of installed packages, which we refer to as libraries. When you run sys.path (Python) or .libPaths() (R) you will get the path to the library where your installed packages are stored.

Why is packaging important

Why write a package?

  • To share with others: Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any or user can easily download it, install it and learn how to use it.
  • To make processes reproducible: When certain code processes are repeated, rather than copy-n-pasting (error prone!) packaging helps to ensure the same procedures are executed in the same way. This leads to standardized tools for standardized conventions and can save you and others time.
  • To track changes: Packaging helps to formulize version changes of a project. Versioning allows developers a way to communicate the type of changes made to a project (i.e. bug fixes versus feature enhancement). It also allows users to “pin” a specific version that works for their project. For example, although a current package may have a version 3.2.1, your project depends on specific functionality provided in version 2.9.7. Pinning allows you to request and use that specific version.
  • To communicate with users: Beyond versioning, packaging helps with communication between developers and end-users. Detailed documentation helps the end-users implement the code, a change log can communcate the progression of a package over its versions, users can communicate to the developers regarding bugs they’ve found or suggested feature enhancements.
  • To provide reliability: Packaging helps to modularize code in a way that allows for, and simplifies, testing procedures. This means as your source code grows, or as you get feedback from end-users, so should your test suite grow to help improve reliability of the code base. This results in code that is reliable and trustworthy.
  • To deploy production work: You may have code that is never reused by others; however, the code is put into production to formalize the processesing and output of the code. Packaging your project is a great way to organize, maintain, and distribute your work into production.

When packaging might not be needed

To be clear, it is not necessary to package your code to address all the bullets in the previous section; but it will definitely make them easier to achieve. However, there are times when writing a package is not necessary. For example:

  • Exploratory analysis: The exploratory phase of a project is very interactive and contains many unknowns. You may find yourself repeating some code, which may be a good time to write and reuse small functions. However, this is not the phase to start writing a package. It’s best to wait and see what results from your exploratory analysis. If there are steps in the process (i.e. data prep, model execution) that you later determine are common steps that will often be repeated in the future then a package may result from post-exploration project review.
  • One-off projects: Many times we perform one-off ad-hoc projects. Sometimes we may be able to abstract some steps in these projects that may be repeated but often these projects contain very unique steps that are not often repeated. You most likely will use packages in your analysis but writing a package based on a one-off project is rarely a good use of time.
  • Small scripts: Sometimes a simple , , or Bash script can go a long way and serve your primary purpose. Small individual scripts can be easy to build, execute and orchestrate; however, if you notice this small script starting to grow in size and use then it’s time to start thinking of converting it to a package.

Although the above type of work may not justify a package, it often still justifies good software engineering practices such as modularity and unit tests!

An opinionated approach

It is important to understand that there are several approaches and strategies to writing packages. Even within a single language there are different ways to structure a package. Consequently, understand that this course demonstrates the best practices we have found for developing, maintaining, and distributing packages. Our goal is to provide you with a short runway to writing packages as quickly as possible while following commonly used best practices.

This is a lot to learn, but don’t feel overwhelmed. Start with a minimal subset of useful features and build up over time.

As you write more packages you will begin to learn the lower-level details and alternative options. Do not shy away from these details as we highly recommend that you learn them. In fact, below are some additional resources that will take you to the official R and Python packaging documentation. Realize that the details in these documents can make them challenging to read. That is ok and to be expected. Our suggestion is to get the basics down, which we cover in this course, and slowly expand your knowledge base by writing more packages.

Conventions used throughout

We will also refer to both R and Python throughout. When discussing one or the other we will typically label with the appropriate icons:

  • Python:
  • R:

In addition to the general text used throughout, you will notice the following code chunks:

Signifies a tip or suggestion

Signifies a general note

Signifies a warning or caution

Exercises

  1. Skim the table of contents for the following “official guides” to and packaging. Do not worry about reading the details but realize that these are good sources to start looking at when you have questions about certain aspects of packaging.
  2. Identify two popular packages and explain why packaging these capabilities is beneficial.
  3. Identify two popular packages and explain why packaging these capabilities is beneficial.


🏠