Responsible AI

AI has created a lot of opportunity to improve people’s lives, but also raised questions about what is the best way for the AI systems to serve people.

Fairness
- Fair model output to diverse of users and use cases
- Solution is available to all users
- Suggest multiple answers to the same question and measure the model the top-k answers
- Use multiple metrics rather than a single one
Explainability
- Understand how and why the AI system makes a decision
- Help with fairness
- Google Explainability White Paper
Privacy
- Model trained with sensitive information or PII (Personally identifiable information) should safeguard this informatio
- Aggregate and anonymize the sensitive data
- Design the features with disclosure built-in
- Legal compliance
  - GDPR (General Data Protection Regulation)
    - EU
    - Give individual control over their data
    - Companies should protect the data of employees and customers
    - The data subject has the right to revoke their consent at any time
  - CCPA (California Consumer Privacy Act)
    - Similar to GDPR
    - Consumers have the right to know what personal data is collected about them and whether the data is sold or disclosed to whom
    - Users can access the data that the company has for them, and block the sale of their data and request a business to delete their data
- Anonymization – remove PII
  - Irreversible
  - Impossible to identify the person
  - Impossible to derive insight or discrete information even by the party responsible for the anonymization
- Pseudonymization
  - Reversible
  - Possible to identify the person if the right information is included
  - Data masking, encryption, tokenization
Security
- Identify threads to the AI system from malicious intent
- Harms: informational and behavioral
- Defenses
  - Cryptography
    - SMPC – Secured Multi-Party Computation
      - Allow multiple systems collaborate to train/serve a model
      - Keep the data secured with shared secrets
    - FHE – Full Homomorphic Encryption
      - Train on encrypted data without decrypting it first
      - Send encrypted request and receive encrypted result
      - Very computationally expensive currently
  - Differential Privacy
    - Provide provable guarantee of privacy
    - Methods
      - DP-SGD (Differentially Private Stochastic Gradient Descent)
        
        Eliminate the possibility of extract private information from the weights of the model
        
        Apply noise to mini-batch
      - PATE (Private Aggregation of Teacher Ensemble)
        
        Divide sensitive data into K partitions without overlap
        
        Train K models as teacher models
        
        Aggregate K models into one teacher model and add noise to the result
        
        Create a student model by training on the teacher prediction
        
        Only student model is accessible by users (including attackers)
      - CaPC (Confidential and Private Collaborative learning)
        
        Uses multiple cryptographic building blocks for multiple parties to training/serve together without directly sharing raw data
        
        Uses HE (Homomorphic Encryption)
        
        Uses PATE
        
        Ex: hospitals want to collaborate to improve the model prediction without sharing sensitive data

Related Posts

7 Game-Changing Strategies for Using Cold Emails in Your Data Science Job Search

Probability Recursion Question for DS/ML Interviews (Step-by-Step Simple Solution)

Leave a Reply Cancel reply