The OpenDP team is excited to bring you our latest for the OpenDP Library and DP Wizard!
The OpenDP Library is a modular collection of algorithms for building privacy-preserving applications. DP Wizard is a web application which guides users in the creation of differentially private statistics.
You’ll find new features in the 0.13 release and an update for DP Wizard in the details below. The OpenDP library and DP Wizard are improved with your feedback and contributions, so please check out these tools and let us know what you think via our slack channel or emailing us directly at info@opendp.org!
OpenDP 0.13
OpenDP plugins can now be registered in the Context API, and plugins can read descriptors from domains (Atom, Vector, Option, LazyFrame, Series, Extrinsic). We’ve also added support for serialization and deserialization of transformations, measurements and Context API queries.
Thank you to Damien Aymon for contributing a combinator that converts measurements with an (ε, δ) guarantee into a δ(ε) guarantee!
Polars
This release expands supported Polars expressions to include replace, replace_strict, drop_nan, drop_null, expression filtering, and all binary operators. Margin keys and group_by keys can now also be stable expressions. We’ve also added support for enum data types.
Thank you to Daniel Simmons-Marengo for disclosing a bug in explicit key release that causes shared randomness among imputed records.
Migration
- We now default to the assumption that NaN values exist in float data in all settings. You’ll notice that you now need to specify nan=Falsenan=False in atom domains, even when building the Laplace or Gaussian measurements.
- The recommended format for margins in the Context API is now a list of margins containing group by keys, instead of a dictionary of group by keys and margins. An example of the new syntax can be found here. This allows group by keys to be arbitrary expressions.
DP Wizard 0.3
We haven’t posted an update here about DP Wizard since the initial release, and since then a lot of functionality has been added, while still keeping the interface simple.
One comment we received after the first release was that it would be nice to be able to supply a public CSV, in addition to the private CSV. The public CSV might be synthetic, or it could be old data which is no longer regarded as private. In any case, when you provide a public CSV, it is used in preview visualizations, so you can have more confidence in your choice of bounds.
We’ve added DP mean and median to go along with the DP histogram, and we also let the user specify grouping columns. After the analysis is configured, there are more download options, including un-executed notebooks and HTML and PDF exports. We’ve also rearranged the download options for clarity.