Industry Panelist Q&A

In advance of the 2023 OpenDP Community Meeting and the Industry Panel: DP in Motion on Friday, September 29th, here’s a sneak peek on our esteemed panelists, their background, and what they are currently working on!

Tammy Greasby

VP, Engineering & Research at Anonym

Please introduce yourself and share a little about your background in the privacy space.

I am on the technology team at Anonym, where we are building systems that allow businesses to share data to create business value, while ensuring those systems are secure & user data stays private. When we talk about privacy we usually refer to three different dimensions – the data security, user anonymity, and corporate confidentiality. In our world, a private system needs to solve for all three. 

I’ve been focused on privacy related problems for the past year and a half or so at Anonym. Prior to this role I held a variety of roles within the adtech ecosystem, often dabbling in privacy but never focused on it. 

What are you currently working on at Anonym and where can the OpenDP Community learn more?

Anonym is currently building differentially private versions of common ads algorithms such as last touch attribution (counts & sums), lift (confidence intervals & p-values), and conversion prediction (machine learning). In addition to these problems we are also starting to explore empirical measures of privacy to help the advertising community build intuition around DP. In that same vein, we are playing with different visualization tools to demonstrate it’s possible to have privacy and still create business value. There is a lot of fear in ad tech around losing data access. We want to be able to show the industry that done intelligently, privacy is net good for everyone

It’s still early days for our company still. If you are interested in learning more, you can use the contact form on our website to shoot us a note. Let us know you want to hear more about white papers when they are ready and I will add you to an email list.  

Peter Kairouz

Staff Research Scientist at Google

Please introduce yourself and share a little about your background in the differential privacy space.

I am a researcher at Google working on advancing the state-of-the-art in privacy-preserving technologies, including differential privacy, federated learning and analytics, cryptography, and secure (trusted) environments. My experience with differential privacy goes back to my early PhD days when I embarked on a journey to understand the fundamental tradeoffs between DP and accuracy for a variety of canonical problems in statistics and learning. I have since made several important contributions to the field, co-developed novel algorithms that are used by many today, and deployed multiple DP technologies along with my colleagues at Google. 

What is a DP project you’ve built that you are excited about and where can the OpenDP Community learn more?

Three things: 

  1. I am excited about training large/foundation ML models (e.g. LLMs) with DP. 
  2. I am also interested in distributed and verifiable/attestable DP. 
  3. I am also excited about auditing DP implementations and estimating privacy leakage in settings where it is hard to tightly bound the DP loss.

 Here are some of the recent blog posts that I was involved in:

Andrew Knox

Product Manager at Decentriq

Please introduce yourself and share a little about your background in the privacy space.

I was exposed to differential privacy working as a Research Scientist at Meta (then Facebook) in 2018.  I was researching multiparty computation and related cryptographic topics to protect individual privacy when measuring the effectiveness of online advertising. When you are doing that kind of research, you quickly realize that having a secure environment to do calculations only gets you halfway there – there’s a huge amount of information that can be inferred from aggregated or informally anonymized data. I worked on several projects there to incorporate differential privacy with other security paradigms, especially for advertising measurement and targeting.

What are you currently working on at Decentriq and where can the OpenDP Community learn more?

At Decentriq we focus on data collaboration – when more than one organization needs to combine data, and they can’t just hand the data to each other or a trusted third party. Our platform is built on top of secure hardware, and those trust guarantees are deeply integrated to provide robust cryptographic evidence that every data permission is enforced at the hardware level. There are two places differential privacy often enters the conversation in that paradigm.. 

The first is perhaps surprising: because the hardware is protecting the data so strictly, you can’t peek at the data to check what went wrong when there is an error – the system was intentionally designed to make it impossible to see anything that wasn’t declared at the outset. This means you need some special tooling to aid in troubleshooting, and one tool we use for this purpose is synthetic data. The OpenDP SmartNoise Synthesizers project was the first model we used to allow users to create synthetic copies of data that is mid-processing (under certain conditions and restrictions) to debug certain types of issues that are hard to troubleshoot when you can’t check the partially processed data.

The second is a more common story: just like any other release of data, there’s a risk of inferring personal information, even if everything in the release is aggregated. This is greatly complicated in our case by the fact that the release is almost always the product of more than one organization’s data. We do incorporate differential privacy into our products and make differential privacy libraries available for custom scripts, but the most challenging part is always negotiating the details of how budgets are set. It can be difficult to establish who sets the budgets, the scopes over which they are valid, and what business goals they are fulfilling. This is an area of differential privacy I am most interested in – how to give practical guidance and relaxations or alternate definitions that make it easier to follow in collaborative settings.

Sara Krehbiel

Associate Professor, Mathematics and Computer Science at Santa Clara University

Please introduce yourself and share a little about your background in the differential privacy space.

I’m a professor at Santa Clara University. I’ve done research in differential privacy, including formally analyzing privacy of algorithms in non-traditional settings such as attaching monetary value to privacy and analyzing data before it is all collected. I’ve also spent time as a privacy research collaborator at Meta.

What types of contributions did you make to Meta’s DP projects?

Much of the challenge of large-scale deployment of DP technologies involves educating engineers on what DP is and how to implement algorithms that achieve it. The output perturbation paradigm is so general-purpose that it often becomes accidentally identified with DP, so there is lots of room for DP researchers to identify when a better privacy/accuracy tradeoff is possible via more nuanced analysis of the sensitivity of a batch of tasks. Other important areas of contribution involve clarifying privacy semantics when combining DP with other privacy-oriented protocols, such as limited data retention.

Gerome Miklau

Founder and CEO at Tumult Labs; Professor, Computer Science at UMass Amherst

Please introduce yourself and share a little about your background in the differential privacy space.

I’m one of the founders of Tumult Labs, a startup focused on making sensitive data useful in new ways via differential privacy (DP). I’m on leave from my position at UMass Amherst, where I’m a professor. I’ve done research in differential privacy including systems for DP query answering, DP synthetic data, and DP algorithms for analyzing relational data, graphs, and streaming data. I currently support OpenDP as co-chair of the advisory board.

What are you currently working on at Tumult and where can the OpenDP Community learn more?

With the Tumult team, I spend a lot of time developing ways for non-experts to more easily achieve the benefits of differential privacy and to adapt DP to new application domains. This includes making our platform, Tumult Analytics, flexible and easy to use, as well as designing products that support the full lifecycle of DP data releases. Lately we’ve been articulating the need for DP in data cleanrooms and working on integrating DP effectively into cleanroom workflows.

One of the most exciting outcomes of our work is the release of new and interesting datasets that would not have been available to the public without strong privacy protections. For example, we recently helped the WikiMedia Foundation release detailed statistics about Wikipedia web page views.  Using Tumult Analytics, they are able to release 250 million statistics each day, which is 40 times more than they were able to previously. 

Lipika Ramaswamy (Gretel AI)

Senior Applied Scientist, Gretel AI

Please introduce yourself and share a little about your background in the privacy space.

I am an applied scientist at Gretel working on generative models for tabular data. Part of my time is dedicated to incorporating privacy methods into our models. I was first introduced to differential privacy during graduate school at Harvard, where I took a class on applied DP for data science. Since then, I have been working at startups on software incorporating DP.

What are you currently working on at Gretel AI and where can the OpenDP Community learn more?

Early this year, I built a DP synthetic data generating model for tabular data, which we call TabularDP at Gretel. I drew on research from the NIST synthetic data challenge and open source implementations to build this DP marginals based model. A few exciting things — it’s incredibly fast compared to deep learning models, it works well without much tuning for primarily categorical tabular data, and it provides data of reasonably good quality for reasonably low epsilon values.