Roadmap

Photo of people socializing at a reception at the Harvard Science and Engineering Complex

OpenDP’s core initiatives are detailed in the Roadmap. These initiatives are not comprehensive: there are lines of work that don’t fit clearly into an initiative and may not be represented here. We’ve broken the initiatives down into three categories: In-Progress, Planning/Design and Future. Project Milestones on GitHub give a more granular look at what the OpenDP Project Team and Community are working on this quarter.

In Progress

What the development team and community are working on right now. 

Algorithmic Primitives

As needed to support use-case partners and to improve the usability and utility of the OpenDP Library, we are broadening the suite of supported mechanisms and combinators. Some of the near-term primitives include:

  • report-noisy-max with exponential noise
  • tulap noise perturbation
  • variations of above threshold and sparse vector
  • variations of thresholded noise and truncated noise mechanisms
  • variations of private selection from private candidates

More Dataframe Functionality

The OpenDP Library integrates with the Polars dataframe library. OpenDP 0.11 Polars supports counts, sums, means, medians and quantiles, as well as grouping with protected group keys. The library operates under either add/remove or change-one neighboring relations, and uses Laplace, Gaussian and variations of the Exponential mechanism as appropriate. We plan to extend this functionality:

  • Support more statistics (like means and variances under bounded-DP) and dataset preprocessing transformations. 
  • Support user identifiers in datasets with unbounded contributions
  • Automatic rewriting of queries to satisfy differential privacy. This would improve library usability, allow us to privatize standard SQL queries, and ultimately allow a tighter integration with SmartNoise SQL.
  • Add support for joins.

Privacy Odometers

Leveraging the work we have done on interactive measurements, we are finalizing work on the implementation of Privacy Odometers and Privacy Filters. Odometers are similar to interactive measurements, in that they allow a sequence of queries to be made interactively. However, odometers provide additional flexibility in that they don’t require the privacy loss to be stated up front. They instead accumulate the loss on the fly. Privacy odometers will be a general building block for other mechanisms in the OpenDP Library.

More R Package Functionality

While the low-level Framework API is supported for Python and R, higher level interfaces which improve usability (Context API, Polars and Plugins) are not yet supported in R.

DP Creator as a Simple Python Application

Based on user feedback, we are planning to create a simplified version of DPCreator (likely based on Python Shiny). This would make DPCreator easily pip-installable for local use, instead of relying on Kubernetes.

Epsilon Registry – Initial Survey Tool

  • Create a survey tool to collect data on real-world DP use cases, including information on data domain specific problems and legal requirements, selection of privacy-loss parameters, and use of differentially private releases.
  • Based on the work of Cynthia Dwork, Nitin Kohli, and Deirdre Mulligan’s publication Differential Privacy in Practice: Expose your Epsilons!

Planning / Design

Strategic items that we’ve prioritized and are designing and testing. 

Alternate Dataset Types

Most of the algorithms in the OpenDP library operate on row-oriented multisets (i.e., tabular data). However, the library was designed to accommodate any type of dataset. We plan to extend the library to support other types of datasets, like graphs and streams.

External Compute

OpenDP functions are currently limited to running on a single CPU. In order to support larger datasets and more computationally intensive operations, we need to extend the library to support multiple machines and external compute resources. The plan is to provide combinators that execute Polars logical plans representing private computations on other compute backends. This approach allows OpenDP to reuse the privacy calculus implemented over the Polars Logical Plan DSL.

Future

What we’d like to work on but haven’t yet prioritized.

  • Federated Machine Learning
  • Synthetic Data Generation
  • Uncertainty Estimates and Utility Framework
  • Additional Models of Privacy