ERC-7517 - Content Consent for AI/ML Data Mining

Created 2023-09-12
Status Draft
Category ERC
Type Standards Track
Authors
Requires

Abstract

This EIP proposes a standardized approach to declaring mining preferences for digital media content on the EVM-compatible blockchains. This extends digital media metadata standards like ERC-7053 and NFT metadata standards like ERC-721 and ERC-1155, allowing asset creators to specify how their assets are used in data mining, AI training, and machine learning workflows.

Motivation

As digital assets become increasingly utilized in AI and machine learning workflows, it is critical that the rights and preferences of asset creators and license owners are respected, and the AI/ML creators can check and collect data easily and safely. Similar to robot.txt to websites, content owners and creators are looking for more direct control over how their creativities are used.

This proposal standardizes a method of declaring these preferences. Adding dataMiningPreference in the content metadata allows creators to include the information about whether the asset may be used as part of a data mining or AI/ML training workflow. This ensures the original intent of the content is maintained.

For AI-focused applications, this information serves as a guideline, facilitating the ethical and efficient use of content while respecting the creator's rights and building a sustainable data mining and AI/ML environment.

The introduction of the dataMiningPreference property in digital asset metadata covers the considerations including:

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.

This EIP introduces a new property, dataMiningPreference, to the metadata standards which signify the choices made by the asset creators or license owners regarding the suitability of their asset for inclusion in data mining or AI/ML training workflows. dataMiningPreference is an object that can include one or more specific conditions.

Each category is defined by a set of permissions that can take on one of three values: allowed, notAllowed, and constraint.

For instance, the aiInference property indicates whether the asset can be used as input for an AI/ML model to derive results. If set to allowed, the asset can be utilized without restrictions. If notAllowed, the asset is prohibited from AI inference.

If marked as constrained, certain conditions, detailed in the license document, must be met. When constraint is selected, parties intending to use the media files should respect the rules specified in the license. To avoid discrepancies with the content license, the specifics of these constraints are not detailed within the schema, but the license reference should be included in the content metadata.

Schema

The JSON schema of dataMiningPreference is defined as follows:

{
  "type": "object",
  "properties": {
    "dataMining": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    },
    "aiInference": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    },
    "aiTraining": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    },
    "aiGenerativeTraining": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    },
    "aiTrainingWithAuthorship": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    },
    "aiGenerativeTrainingWithAuthorship": {
      "type": "string",
      "enum": ["allowed", "notAllowed", "constrained"]
    }
  },
  "additionalProperties": true
}

Examples

The mining preference example for not allowing generative AI training:

{
  "dataMiningPreference": {
    "dataMining": "allowed",
    "aiInference": "allowed",
    "aiTrainingWithAuthorship": "allowed",
    "aiGenerativeTraining": "notAllowed"
  }
}

The mining preference example for only allowing for AI inference:

{
  "dataMiningPreference": {
    "aiInference": "allowed",
    "aiTraining": "notAllowed",
    "aiGenerativeTraining": "notAllowed"
  }
}

The mining preference example for allowing generative AI training if mentioning authorship and follow license:

{
  "dataMiningPreference": {
    "dataMining": "allowed",
    "aiInference": "allowed",
    "aiTrainingWithAuthorship": "allowed",
    "aiGenerativeTrainingWithAuthorship": "constrained"
  }
}

Example Usage with ERC-721

The following is an example of using the dataMiningPreference property in ERC-721 NFTs.

We can put the dataMiningPreference field in the NFT metadata below. The license field is only an example for specifying how to use a constrained condition, and is not defined in this proposal. A NFT has its way to describe its license.

{
  "name": "The Starry Night, revision",
  "description": "Recreation of the oil-on-canvas painting by the Dutch Post-Impressionist painter Vincent van Gogh.",
  "image": "ipfs://bafyaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "dataMiningPreference": {
    "dataMining": "allowed",
    "aiInference": "allowed",
    "aiTrainingWithAuthorship": "allowed",
    "aiGenerativeTrainingWithAuthorship": "constrained"
  },
  "license": {
    "name": "CC-BY-4.0",
    "document": "https://creativecommons.org/licenses/by/4.0/legalcode"
  }
}

Example Usage with ERC-7053

The example using the dataMiningPreference property in onchain media provenance registration defined in ERC-7053.

Assuming the Decentralized Content Identifier (CID) is bafyaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. We can put the dataMiningPreference field in the Commit data directly. After following up the CID, got the Commit data:

{
  "dataMiningPreference": {
    "dataMining": "allowed",
    "aiInference": "allowed",
    "aiTrainingWithAuthorship": "allowed",
    "aiGenerativeTrainingWithAuthorship": "constrained"
  },
  "license": {
    "name": "CC-BY-4.0",
    "document": "https://creativecommons.org/licenses/by/4.0/legalcode"
  }
}

We can also put the dataMiningPreference field in any custom metadata whose CID is linked in the Commit data. The assetTreeCid field is an example for specifying how to link a custom metadata. After following up the CID, got the Commit data:

{
  /* custom metadata CID */
  "assetTreeCid": "bafybbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
}

Following up the assetTreeCid which describes the custom properties of the registered asset:

{
  "dataMiningPreference": {
    "dataMining": "allowed",
    "aiInference": "allowed",
    "aiTrainingWithAuthorship": "allowed",
    "aiGenerativeTrainingWithAuthorship": "constrained"
  },
  "license": {
    "name": "CC-BY-4.0",
    "document": "https://creativecommons.org/licenses/by/4.0/legalcode"
  }
}

Rationale

The technical decisions behind this EIP have been carefully considered to address specific challenges and requirements in the digital asset landscape. Here are the clarifications for the rationale behind:

  1. Adoption of JSON schema: The use of JSON facilitates ease of integration and interaction, both manually and programmatically, with the metadata.
  2. Detailed control with training types: The different categories like aiGenerativeTraining, aiTraining, and aiInference let creators control in detail, considering both ethics and computer resource needs.
  3. Authorship options included: Options like aiGenerativeTrainingWithAuthorship and aiTrainingWithAuthorship make sure creators get credit, addressing ethical and legal issues.
  4. Introduction of constrained category: The introduction of constrained category serves as an intermediary between allowed and notAllowed. It signals that additional permissions or clarifications may be required, defaulting to notAllowed in the absence of such information.
  5. C2PA alignment for interoperability: The standard aligns with C2PA guidelines, ensuring seamless mapping between onchain metadata and existing offchain standards.

Security Considerations

When adopting this EIP, it’s essential to address several security aspects to ensure the safety and integrity of adoption:

Copyright

Copyright and related rights waived via CC0.