Machine Learning and Artificial Intelligence Course Tackles Big Tech Privacy Concerns
- As smart devices — including refrigerators, TVs, wearables and more — become increasingly used by the public, issues of data privacy are reaching an inflection point.
- MAT 280: “Fairness, Privacy and Trustworthiness in Machine Learning" aims to elevate tenets of social responsibility when it comes to data privacy.
- The goal of the course is to give the technologists of tomorrow the mathematical tools to design fair machine learning and artificial intelligence-based systems.
For tech companies, human beings are data goldmines. Sure, the Internet makes it easy to find information, but when you search a search engine, more often than not, that search engine also searches you, collecting not just your queries but a slew of other information too. The rise of smart devices has only hastened this intrusion on privacy.
“If you have an Android phone, you can’t turn off the location tracking,” said Thomas Strohmer, a professor of mathematics at UC Davis. “Even if you take out the SIM card, it still tracks you. And with Facebook, almost every app on your phone will download your data and send it to Facebook, even if you don’t have a Facebook account.”
Technological infringements on our privacy seem to be part and parcel of the modern age, but Strohmer’s new class at UC Davis aims to change that by training the next generation of technologists.
MAT 280: “Fairness, Privacy and Trustworthiness in Machine Learning” aims to elevate tenets of social responsibility when it comes to developing machine learning and artificial intelligence-based systems. Geared toward mathematics, statistics and computer science graduate students, the special topics class focuses on the mathematical concepts underlying machine learning and how these concepts can be used for the better.
“To influence this for the positive, it’s really important to educate our students early on, in particular those students who will work for tech companies,” Strohmer said.
The rampant nature of data sharing
According to Strohmer, it’s not just big tech companies monitoring users. Rental car companies, like Hertz, have outfitted some cars in their fleets with cameras and microphones. On the health care side, a ProPublica report from 2018 revealed how breathing machines used by sleep apnea patients send user data to health insurers, a practice that can influence insurance payments. Even gamified experiences like Pokemon Go are used by businesses to direct players toward commercial opportunities based on location data. The examples go on and on.
As smart devices — including refrigerators, TVs, wearables and more — become increasingly used by the public, issues of data privacy are reaching an inflection point. Of dire concern is the effect these practices have on marginalized and disadvantaged populations, who already face many systemic challenges.
“As machine learning models/tools enter our life more and more, us engineers and researchers have a moral obligation to understand how these models fail in societal terms,” said Claudio Spiess, a student in the Graduate Group in Computer Science who enrolled in Strohmer’s winter class. “How are they biased? As most models are trained on large datasets that can reveal private information, how does privacy factor in? And most importantly, what can be done about it?”
“Gathering and sharing personal data can be highly profitable,” added Noah Perry, a graduate student in the Department of Statistics also enrolled in the class. “The business models of many large tech companies (Google, Apple, Amazon, etc.) revolve around collecting data about us through our smartphones and online activity and monetizing it through advertising.”
Harnessing data for social good
Despite privacy concerns, data-sharing holds enormous promise for solving world problems. High-quality, large-scale datasets can help address issues concerning food security, climate change and the spread of pandemics, among other applications.
“Data recorded about patients in a hospital is very sensitive and legally protected, but it would be beneficial to society if researchers studying cancer treatments could gain access to it for their work,” Perry said. “Also, the U.S. Census Bureau is required to release census data in a privacy-preserving way, and this data is used for important funding and policy decisions.”
According to Strohmer, it’s not about eliminating data sharing, it’s about adjusting regulations so the practice doesn’t favor companies over individual rights and so that algorithms built using large-scale datasets don’t favor or negatively target certain groups.
“Many decisions now are done by automated systems and the question is, how fair is this?” Strohmer said. “We have a certain responsibility when we make decisions to justify it. What about these machines? It goes from simple things like recommending movies to you to tough decisions like a drone dropping a bomb somewhere.”
Perry, whose master’s degree research assesses statistical disclosure control (SDC) methods to protect individual privacy, said Strohmer’s class successfully balanced rigorous math with intuition and real-world applications. The SDC method Perry is evaluating in his research obscures datasets in a way that preserves the integrity of the information, like the fundamental characteristics and relationships in the data, while making it difficult to learn anything about specific individuals.
“Dr. Strohmer’s MAT 280 class was a great experience for me because his class synthesized a lot of the topics I had learned bits and pieces about, filled in some gaps in my understanding, and broadened my exposure to this research area,” he said.
And that was Strohmer’s goal in launching the course: giving the technologists of tomorrow the mathematical tools to design fair machine learning and artificial intelligence-based systems.
“This will be a more and more important skill in the future for machine learning practitioners like myself,” Spiess said. “I think companies will benefit from employees who understand the fundamentals of data privacy when working with sensitive data and producing machine learning models based on sensitive data or for sensitive tasks such as healthcare.”
Based on the positive feedback he's received from students, Strohmer plans to offer the class again. In future iterations, he hopes to open enrollment to upper-level undergraduate students.
“I want to offer it again two years from now,” he said. “We have a new data science major, which we started this year, and at that time, two years from now, those students will be in their third year, which would be a perfect time to be exposed to this.”