Announcing the OpenDP Library 0.9

Photo of people socializing at a reception at the Harvard Science and Engineering Complex

The OpenDP team is excited to bring you our latest release, OpenDP Library 0.9!

The OpenDP Library is a modular collection of algorithms for building privacy-preserving applications, with an extensible approach to tracking privacy, and a vetted implementation. It is available as binaries for Python on PyPI, for Rust on crates.io, or in source form on GitHub.

This release features new measurements and transformations, expanded functionality of user-defined primitives, proofs, and R-language bindings. We are continually working on improving the usability and expanding audiences that would benefit from the library. This is a major step in that direction with additional features coming this next quarter.  Read more about them below!

R Language Bindings

R Language bindings for the OpenDP Library are now released on R-Universe. Most of the core OpenDP library framework is available, with exceptions including DP-PCA and defining your own library plugins in R code. API Documentation for the R bindings has also been published. For an upcoming 2024 release (second quarter), we are working on translating our Python examples to R. If you would like to get started now, please use the R API Documentation and don’t hesitate to contact us.

New Transformations and Measurements

Several new measurements have been added, including:

Differentially Private PCA (Python-only)

Differentially private PCA in OpenDP provides an API similar to Scikit-Learn. For example, this code snippet shows how to define a DP PCA model. Please see the full example in our docs. It includes an example of the fitted model being introspected, similar to Scikit-Learn’s non-private PCA.

Differentially Private Quantiles

The DP quantile mechanism privately selects a candidate most similar to a given alpha-quantile, as shown in this example. Stay tuned for a simpler API to release DP quantiles in the next release, through the upcoming API for Polars.

Approximate Laplace Projection (ALP) Mechanism

You can use the Approximate Laplace Projection (ALP) mechanism in the OpenDP Library to release a DP sparse histogram. The release is in the form of a queryable that holds a compressed representation of the counts of all possible keys. Please see this example as a reference.

The mechanism was previously only available in Rust, but is now available in Python and R. The implementation was contributed by Christian Lebeda and is based on the paper by Martin Aumüller, Christian Lebeda, and Rasmus Pagh. The ALP mechanism “is shown to simultaneously have information-theoretically optimal space (up to constant factors), fast access to vector entries, and error of the same magnitude as the Laplace-mechanism applied to dense vectors.”

Concurrent Composition of Interactive Measurements

Compositors spawned by “make_sequential_composition” and “make_basic_composition” now allow concurrent composition. Queries may be interleaved between any interactive mechanisms spawned by the compositor queryable.

Functionality for DP Experts

Improvements to Library Plugins

In addition to being able to define your own transformations and measurements from Python, you can now also define your own domains, metrics, measures and queryables from Python. Custom domains can also carry domain descriptors represented with Python data types. You can mix-and-match plugins with other library functionality implemented in Rust.

The following section contains an example that constructs a transformation with a custom domain.

Exponential Mechanism on a Finite Support

A variation of the exponential mechanism (Report Noisy Max Gumbel) is now available in the OpenDP Library for private selection from a set of candidates. This mechanism selects the candidate, from a predetermined set of candidates, that has approximately the highest utility.

The RNM Gumbel mechanism is immune from floating-point vulnerabilities, and can be used as a building block for other differentially private mechanisms.

Example: Privately Selecting Grouping Columns

Consider you have a tabular dataset in Pandas with four columns: “date”, “merchant_postal_code”, “merch_category” and “transaction_type”. The data is sparse– not all combinations of these categories are present in the data. To create a differentially private release with statistics grouped by these columns, you can only release statistics for combinations of these attributes that many people contribute to.

If you group by too many columns, then the number of individuals contributing to each combination of attributes will be small, resulting in most combinations being censored in the final release. On the other hand, you want granular statistics, so more grouping columns is appealing. 

This example demonstrates how to construct your own mechanism that chooses a set of grouping columns for you. It also makes use of library plugins (via a user-defined transformation and domain) and the Report Noisy Max Gumbel mechanism.

Additional Proofs

Continuing our efforts to expand the mathematical verification of OpenDP, we’ve added proofs for the following:

Underlying Code Improvements

We have been steadily improving the CI process and codebase. The CI now uses Mypy type-checking, verifies links within docs, has improved code coverage, and enforces Rust formatting. In addition the OpenDP Rust crate is now thread-safe and the Python package supports PEP 561 type information. We’ve also removed the dependency on C’s GMP MPFR library in favor of the lighter dashu library in Rust. This allows the Rust code to be more easily built on Windows or for WebAssembly.

Getting the OpenDP Library

Further details can be found in the repository CHANGELOG. We’re excited to have you try the OpenDP Library! You can find it on PyPIcrates.io, or GitHub.

We welcome your feedback and participation in the OpenDP Project. To learn more, please visit the OpenDP website or join our Slack workspace.