Background
In response to the imperative need for safeguarding state-owned information assets, protecting privacy, and ensuring the well-being of California’s populace, the Statewide Information Management Manual (SIMM) 5305-F, Generative Artificial Intelligence (GenAI) Risk Assessment has been formulated.
This manual presents a structured risk assessment methodology tailored to assist state entities in comprehensively evaluating the potential risks associated with the deployment and utilization of GenAI systems.
Recognizing the paramount importance of deploying GenAI systems to enhance and optimize existing workflows rather than substituting or compromising public services, the manual underscores a commitment to maintaining public service quality. Referencing insights from the State of California Report: Benefits and Risks of GenAI, the manual outlines the diverse array of potential applications facilitated by GenAI technology, delineating high-level categories and offering illustrative public sector use cases to exemplify its functional versatility.
Risk Assessment Framework
In the provided manual, a comprehensive risk assessment framework is outlined, empowering state entities to address Quality, Safety, and Security Controls pertaining to GenAI systems.
While the manual offers guidelines, it remains the responsibility of each entity to ascertain their risk tolerance and implement corresponding risk mitigation strategies in line with their organizational standards for acceptable risk.
The potential risks associated with GenAI vary depending on the specific instance and use case. This GenAI risk assessment hinges on two primary factors:
- the type of data involved and
- the intended use of that data.
A thorough examination of these factors informs the determination of GenAI risk levels, necessitates the implementation of appropriate security controls, and aids in categorizing the system as:
- Low, Moderate, or High risk.
- Information Type: This factor assesses the risk of a data breach resulting from compromised or unauthorized access to information, contingent upon its classification. GenAI data encompasses inputs utilized for training or fine-tuning large language models (LLMs), as well as prompts submitted to LLMs via application interfaces.
- Information Expected Use: The risk arises when leveraging GenAI outputs for decision-making, tasks, or service provision. Depending on its application, such usage can introduce biases, misinformation, and inaccuracies, thereby potentially impacting decisions related to diversity, equity, inclusion, and accessibility (DEIA) across various departments. Factors like race, age, gender identity, or disability may unfairly influence decisions, placing individuals at a disadvantage.
State entities are mandated to assign a risk level to each GenAI use case or system. This assigned risk level serves as a vital tool for understanding the permissibility and associated risks inherent in the GenAI application. The risk levels are categorized as High (Red), Moderate (Yellow), and Low (Green), with the manual providing an illustrative example tabulation for assigning these levels.
Risk Assessment PART 1
If the risk level of the GenAI system is deemed Moderate or High, only then should SIMM 5305-F, Part 2 be completed.
Toolkit Overview
- Project Use Case and Problem Description: Detail the current process, impact of the desired outcome, and how the GenAI system will be integrated.
- Exploration of Alternatives: Were alternative approaches considered to address the use case or problem? Describe the decision-making process that led to the selection of GenAI.
- Data Sharing Agreement: For shared systems, confirm the existence of a data sharing agreement.
- Privacy Assessments: Verify if a Privacy Threshold Assessment (PTA) and Privacy Impact Assessments (PIA) (SIMM 5310 – C) have been conducted for the GenAI system. Provide reasons if these assessments have not been completed.
- Financial Considerations: Assess if funds have been appropriately allocated for procurement, development, integration, operation, maintenance, and potential scaling of the GenAI system. Additionally, ascertain if ongoing costs are contingent on future Budget Change Proposals (BCP) or reallocation of resources.
Risk Assessment PART 2
Completion of SIMM 5305F, Part 2 is required if the risk level of the GenAI system is assessed as moderate or high.
Toolkit Overview
- Human Verification: The GenAI system will incorporate human verification to ensure the accuracy and reliability of its output.
- Impact on Public Safety: The system will not affect physical equipment posing risks to public health and safety.
- Resource Availability: It will not adversely affect the availability of resources and services provided by the State of California.
- Reporting of Risks: Any risk to California’s security, national economic security, or public health and safety will be reported to the federal government during model training.
- User Account Segregation: State-owned user accounts will be employed to ensure segregation between public and personal records for future audit purposes
- Business Continuity: Business services will not be dependent on the system; in case of failure or inaccurate results, the state can maintain service levels without disruption.
- Data Loss Prevention: A data loss prevention system will be implemented, analyzing input and training data for the GenAI system.
- Information Security Standards: The state entity will adhere to Federal Information Processing Standards (FIPS) and NIST Special Publication (SP) 800-53 for information security programs.
- Compliance with Security Parameters: Security controls will comply with State-defined Security Parameters for NIST SP 800-53, SIMM 5300-A, and SAM Section 5300.5.
- Cloud Computing Policy Compliance: Cloud-based GenAI systems must comply with the Cloud Computing Policy SAM 4893.1, ensuring all data remains within the United States with no remote access allowed outside the country.
- Multi-Factor Authentication: All remote access will employ Multi-Factor Authentication (MFA) in compliance with the Telework and Remote Access Security Standard (SIMM 5360-A).
- Encryption Standards: Confidential, sensitive, or personal information will be encrypted according to SAM 5350.1 and SIMM 5305-A, based on data classification.
- Zero Trust Architecture: All data and systems, including third-party software, will comply with a zero-trust architecture model.
- Data Privacy: Data will be subject to Civil Code 1798.99.80 – 1798.99.89 and will not be sold or advertised to data brokers.
- Data Handling: Input and prompt data will not be stored by the vendor for future prompt engineering.
- Ownership of Output: All generated output will be owned by the State of California.
- Opt-Out of Data Collection: The GenAI system will opt-out of any data collection and model training features for commercial instances.
- Compliance with Copyright Laws: GenAI output will not infringe on copyright or intellectual property laws and will be compliant with open-source licenses as applicable.
- Transparency in Output: Generated output will be cited from credible sources, and any GenAI used in creating images or videos will be acknowledged, even after substantial editing.
- Prevention of Fraudulent Activities: The GenAI system will refrain from engaging in acts of fraud, including deepfake creation, impersonation, phishing, social engineering, or manipulation of other GenAI systems.
- Content Guidelines: The system is designed to avoid generating illicit content that may be controversial or not widely accepted by the public.
- Privacy Protection: The system will not systematically, indiscriminately, or on a large scale monitor, surveil, or track individuals.
- Stakeholder Training: State entities will provide adequate training to stakeholders and customers for the proper use of the GenAI system.
Vendor Details Assessment
- Vendor Involvement: Will the GenAI system be designed, developed, deployed, or maintained by a vendor or third party?
- Testing Procedures: How will the GenAI solution, hosted on state infrastructure, be tested, including all systems interacting with AI?
- Access Provision: What level of access will the vendor provide to the system owner, if any?
- Model and Network Specifications: What type of model(s) and/or network(s) will be utilized in the GenAI system? Provide details on their applicational use and purpose.
Transparency Details
- User Notification: What mechanism will the GenAI system utilize to notify a user that they are interacting with an AI system rather than a human?
- Audit Mechanisms: How can the system and its data be audited?
- Disclosure of Data Source: How will the system disclose to the customer that the data generated is from GenAI?
- Output Delivery and Correction: How will customers receive output, and what mechanisms are in place for error correction or appeals?
Data Output Standard (Level of Autonomy)
- Human Review: Data output from GenAI systems are analyzed and fact-checked by a human reviewer before usage.
- Ownership: The State of California will retain all rights and intellectual property of data output, and consultants must relinquish ownership of generated data.
- Autonomy Levels: Specify the level of autonomy of the system: fully automatic, automatic with occasional human reviews, or providing recommendations without autonomous action.
Human Oversight and Monitoring
- Error Mitigation: How will system owners identify and mitigate inaccuracies in data outputs?
- Public Accessibility: Will the system be publicly accessible or limited to a state-managed environment? Specify the intended audience and potential impacts.
- Risk Assessment Updates: How will system owners ensure the GenAI system’s original designated risk level remains unchanged over time?
- Log Accessibility: Will logs be available in a format compatible with Security Information and Event Management (SIEM) tools?
Ensuring Equity
- Decision Impact: Does the system output decisions affecting housing, education, employment, credit, healthcare, or criminal justice? If so, elaborate.
- Diversity Consideration: Will the system consider Diversity, Equity, Inclusion, and Accessibility in its decisions?
- Minor and Environmental Impact: Will the system impact minors under 18 or the environment, such as water pollution metrics?
FIPS 199 Categorization Level
References
Author:
Kosha Doshi, Final Year Student at Symbiosis Law School Pune and Legal Intern Data Privacy and Digital Law at Eu Digital Partners
Kosha is also a co-author of “Facial Recognition at CrossRoads: Policy Perspectives on Disruption and Innovation At the Closing the Gap 2023 | Emerging and Disruptive Technologies: Regional Perspectives Conference in the Hague, Netherlands.