Privacy-Preserving Machine Learning by J. Morris Chang, Di Zhuang, and G. Dumindu Samaraweera.
Morris holds a BSEE from Tatung University, Taiwan and MS and PhD in computer engineering from North Carolina State University. He teaches at the University of South Florida. Di holds PhD in Computer Engineering from Iowa State University and the University of South Florida. He is a Security / Privacy Engineer at Snap Inc. Dumindu holds a MSc in Enterprise Application Development from Sheffield Hallam University and PhD in Electrical Engineering and Philosophy from University of South Florida. Today he is Assistant Professor of Data Science at Embry-Riddle Aeronautical University.
This was a book that places into perspective the need for ensuring privacy in our fast paced AI marketplace. The authors express the need not only to understand privacy within Machine Learning systems, but understanding methodologies to preserve user’s private data while maintaining performance on LLMs.
They address how personal data well embedded across various sectors increases the risks of data breaches. Just realize how your smartphone is tracked by marketing companies. In fact, they review the Facebook-Cambridge Analytica scandal and call for robust privacy measures in data-driven applications.
Can we have a privacy first AI ecosystem?
However, anonymized data may remove or modify personally identifiable information (PII) including names, addresses, phone numbers, and social security numbers. Yet many companies can actually re-identify users by combining multiple public availability data including breached data. These techniques are able to reverse the data anonymization process and then compromising user’s privacy. This obviously remains a very significant concern. They address both Secure Multi-party Computation (SMPC) and homomorphic encryption which can compute encrypted data without revealing the user’s data.
This topic is also address in the following books: Grokking Machine Learning, Not with a Bug, But with a Sticker, and AI and Machine Learning for On-Device Development:
In fact, privacy-preserving synthetic data generation is also emerging as a solution to keep user data private. In the third and final section of their book they reveal data management operations and techniques which can deploy brainy assured machine learning applications. The book does stress the importance of aligning machine learning practices with key privacy regulations including HIPAA and Europe’s GDPR.
In conclusion, Privacy-Preserving Machine Learning is certainly a comprehensive and valuable guide for Machine Learning engineers and developers. Their goals is to empower solutions that ensure user privacy while not impacting performance or leading to data hallucinations. A very deep and well addressed book at a time when these techniques should be mandatory.