AI has created a lot of opportunity to improve people’s lives, but also raised questions about what is the best way for the AI systems to serve people.

 

  • Fairness
    • Fair model output to diverse of users and use cases
    • Solution is available to all users
    • Suggest multiple answers to the same question and measure the model the top-k answers
    • Use multiple metrics rather than a single one
  • Explainability
    • Understand how and why the AI system makes a decision
    • Help with fairness
    • Google Explainability White Paper
  • Privacy
    • Model trained with sensitive information or PII (Personally identifiable information) should safeguard this informatio
    • Aggregate and anonymize the sensitive data
    • Design the features with disclosure built-in
    • Legal compliance
      • GDPR (General Data Protection Regulation)
        • EU
        • Give individual control over their data
        • Companies should protect the data of employees and customers
        • The data subject has the right to revoke their consent at any time
      • CCPA (California Consumer Privacy Act)
        • Similar to GDPR
        • Consumers have the right to know what personal data is collected about them and whether the data is sold or disclosed to whom
        • Users can access the data that the company has for them, and block the sale of their data and request a business to delete their data
    • Anonymization – remove PII
      • Irreversible
      • Impossible to identify the person
      • Impossible to derive insight or discrete information even by the party responsible for the anonymization
    • Pseudonymization
      • Reversible
      • Possible to identify the person if the right information is included
      • Data masking, encryption, tokenization
  •  Security
    • Identify threads to the AI system from malicious intent
    • Harms: informational and behavioral
    • Defenses
      • Cryptography
        • SMPC – Secured Multi-Party Computation
          • Allow multiple systems collaborate to train/serve a model
          • Keep the data secured with shared secrets
        • FHE – Full Homomorphic Encryption
          • Train on encrypted data without decrypting it first
          • Send encrypted request and receive encrypted result
          • Very computationally expensive currently
      • Differential Privacy
        • Provide provable guarantee of privacy
        • Methods
          • DP-SGD (Differentially Private Stochastic Gradient Descent)
            • Eliminate the possibility of extract private information from the weights of the model
            • Apply noise to mini-batch
          • PATE (Private Aggregation of Teacher Ensemble)
            • PATE
            • Divide sensitive data into K partitions without overlap
            • Train K models as teacher models
            • Aggregate K models into one teacher model and add noise to the result
            • Create a student model by training on the teacher prediction
            • Only student model is accessible by users (including attackers)
          • CaPC (Confidential and Private Collaborative learning)
            • Uses multiple cryptographic building blocks for multiple parties to training/serve together without directly sharing raw data
            • CaPC
            • Uses HE (Homomorphic Encryption)
            • Uses PATE
            • Ex: hospitals want to collaborate to improve the model prediction without sharing sensitive data

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *