# Data Provenance and Verification in Streams

When consuming data from any source, especially in a decentralized environment, the most critical question is: **"Can I trust this data?"**

This question is not just about the data's content, but its *origin*. How do you know that data claiming to be from a trusted oracle, a specific device, or another user *actually* came from them and not from an imposter?

This is the challenge of **Data Provenance**.

In Somnia Data Streams, provenance is not an optional feature or a "best practice". It is a fundamental, cryptographic guarantee built into the core smart contract. This article explains how Streams ensures authenticity via publisher signatures and how you can verify data origin.

## The Cryptographic Guarantee: `msg.sender` as Provenance

The trust layer of Somnia Streams is elegantly simple. It does not rely on complex off-chain signature checking or data fields like `senderName`. Instead, it leverages the most basic and secure primitive of the EVM: `msg.sender`.

All data published to Streams is stored in the core `Streams` smart contract. The data storage mapping has a specific structure:

#### **Conceptual Contract Storage**

```solidity
// mapping: schemaId => publisherAddress => dataId => data
mapping(bytes32 => mapping(address => mapping(bytes32 => bytes))) public dsstore;
```

When a publisher calls `sdk.streams.set(...)` or `sdk.streams.setAndEmitEvents(...)`, their wallet signs a transaction. The `Streams` smart contract receives this transaction and identifies the signer's address via the `msg.sender` variable.

The contract then stores the data *at the `msg.sender`'s address* within the schema's mapping.

**This is the cryptographic guarantee.**

It is **impossible** for `0xPublisher_A` to send a transaction that writes data into the slot for `0xPublisher_B`. They cannot fake their `msg.sender`. The data is automatically and immutably tied to the address of the account that paid the gas to publish it.

* An attacker **cannot** write data as if it came from a trusted oracle.
* A user **cannot** send a chat message pretending to be another user.
* Data integrity is linked directly to wallet security.

## Verification Is Implicit in the Read Operation

Because the `publisher` address is a fundamental key in the storage mapping, you don't need to perform complex "verification" steps. **Verification is implicit in the read operation.**

When you use the SDK to read data, you must specify which publisher you are interested in:

* `sdk.streams.getByKey(schemaId, publisher, key)`
* `sdk.streams.getAllPublisherDataForSchema(schemaId, publisher)`

When you call `getAllPublisherDataForSchema(schemaId, '0xTRUSTED_ORACLE_ADDRESS')`, you are not *filtering* data. You are asking the smart contract to retrieve data from the specific storage slot that *only* `0xTRUSTED_ORACLE_ADDRESS` could have written to.

If an imposter (`0xIMPOSTER_ADDRESS`) publishes data using the same `schemaId`, their data is stored in a completely different location (`dsstore[schemaId]['0xIMPOSTER_ADDRESS']`). It will never be returned when you query for the trusted address.

## Deliverable: Building a Verification Script

Let's build a utility to prove this concept.

**Scenario:** We have a shared `oraclePrice` schema. Two different, trusted oracles (`0xOracle_A` and `0xOracle_B`) publish prices to it. We will build a script that verifies the origin of data and proves that an `imposter` cannot pollute their feeds.

### **Project Setup**

We will use the same project setup as the "[Multi-Publisher Aggregator](https://emre-gitbook.gitbook.io/emre-gitbook-docs/data-streams/working-with-multiple-publishers-in-a-shared-stream#tutorial-building-a-multi-publisher-aggregator-app)" tutorial. You will need a `.env` file with at least one private key to act as a publisher, and we will simulate the other addresses.

[**`src/lib/clients.ts`**](https://emre-gitbook.gitbook.io/emre-gitbook-docs/data-streams/working-with-multiple-publishers-in-a-shared-stream#chain-and-client-configuration) (No changes needed from the previous tutorial. We just need `publicClient`.)

**`src/lib/schema.ts`**

```typescript
export const oraclePriceSchema = 'uint256 price, uint64 timestamp'
```

### **The Verification Script**

This script will not publish data. We will assume our two trusted oracles (`PUBLISHER_1_PK` and `PUBLISHER_2_PK` from the previous tutorial) have already published data using the `oraclePriceSchema`.

Our script will:

1. Define a list of `TRUSTED_ORACLES`.
2. Define an `IMPOSTER_ORACLE` (a random address that has *not* published).
3. Create a `verifyPublisher` function that fetches data *only* for a specific publisher address.
4. Run verification for all addresses and show that data is only returned for the correct publishers.

**`src/scripts/verifyOrigin.ts`**

```typescript
import 'dotenv/config'
import { SDK, SchemaDecodedItem } from '@somnia-chain/streams'
import { publicClient } from '../lib/clients' // Assuming you have clients.ts from previous tutorial
import { oraclePriceSchema } from '../lib/schema'
import { Address, createWalletClient, http } from 'viem'
import { privateKeyToAccount } from 'viem/accounts'

// --- Setup: Define our trusted and untrusted addresses ---

function getEnv(key: string): string {
  const value = process.env[key]
  if (!value) throw new Error(`Missing environment variable: ${key}`)
  return value
}

// These are the addresses we trust for this schema.
// We get them from our .env file for this example.
const TRUSTED_ORACLES: Address[] = [
  privateKeyToAccount(getEnv('PUBLISHER_1_PK') as `0x${string}`).address,
  privateKeyToAccount(getEnv('PUBLISHER_2_PK') as `0x${string}`).address,
]

// This is a random, untrusted address.
const IMPOSTER_ORACLE: Address = '0x1234567890123456789012345678901234567890'

// --- Helper Functions ---

// Helper to decode the oracle data
function decodePriceRecord(row: SchemaDecodedItem[]): { price: bigint, timestamp: number } {
  const val = (field: any) => field?.value?.value ?? field?.value ?? ''
  return {
    price: BigInt(val(row[0])),
    timestamp: Number(val(r[1])),
  }
}

/**
 * Verification Utility
 * Fetches data for a *single* publisher to verify its origin.
 */
async function verifyPublisher(sdk: SDK, schemaId: `0x${string}`, publisherAddress: Address) {
  console.log(`\n--- Verifying Publisher: ${publisherAddress} ---`)
  
  try {
    const data = await sdk.streams.getAllPublisherDataForSchema(schemaId, publisherAddress)
    
    if (!data || data.length === 0) {
      console.log('[VERIFIED] No data found for this publisher.')
      return
    }

    const records = (data as SchemaDecodedItem[][]).map(decodePriceRecord)
    console.log(`[VERIFIED] Found ${records.length} record(s) cryptographically signed by this publisher:`)
    
    records.forEach(record => {
      console.log(`  - Price: ${record.price}, Time: ${new Date(record.timestamp).toISOString()}`)
    })

  } catch (error: any) {
    console.error(`Error during verification: ${error.message}`)
  }
}

// --- Main Execution ---

async function main() {
  const sdk = new SDK({ public: publicClient })
  
  const schemaId = await sdk.streams.computeSchemaId(oraclePriceSchema)
  if (!schemaId) throw new Error('Could not compute schemaId')

  console.log('Starting Data Provenance Verification...')
  console.log(`Schema: oraclePriceSchema (${schemaId})`)

  // 1. Verify our trusted oracles
  for (const oracleAddress of TRUSTED_ORACLES) {
    await verifyPublisher(sdk, schemaId, oracleAddress)
  }

  // 2. Verify the imposter
  // This will securely return NO data, even if the imposter
  // published data to the same schemaId under their *own* address.
  await verifyPublisher(sdk, schemaId, IMPOSTER_ORACLE)
}

main().catch((e) => {
  console.error(e)
  process.exit(1)
})
```

### **Expected Output**

To run this, first publish some data (using the script from the previous tutorial, but adapted for `oraclePriceSchema`) from both `PUBLISHER_1_PK` and `PUBLISHER_2_PK`. Then, run the verification script.

```bash
# Add to package.json
"verify": "ts-node src/scripts/verifyOrigin.ts"

# Run it
npm run verify
```

You will see an output similar to this:

```bash
Starting Data Provenance Verification...
Schema: oraclePriceSchema (0x...)

--- Verifying Publisher: 0xPublisher1Address... ---
[VERIFIED] Found 2 record(s) cryptographically signed by this publisher:
  - Price: 3200, Time: 2025-10-31T12:30:00.000Z
  - Price: 3201, Time: 2025-10-31T12:31:00.000Z

--- Verifying Publisher: 0xPublisher2Address... ---
[VERIFIED] Found 1 record(s) cryptographically signed by this publisher:
  - Price: 3199, Time: 2025-10-31T12:30:30.000Z

--- Verifying Publisher: 0x1234567890123456789012345678901234567890 ---
[VERIFIED] No data found for this publisher.
```

## Conclusion: Key Takeaways

* **Provenance is Built-In:** Data provenance in Somnia Streams is not an optional feature; it is a core cryptographic guarantee of the `Streams` smart contract, enforced by `msg.sender`.
* **Verification is Implicit:** You verify data origin every time you perform a read operation with `getAllPublisherDataForSchema` or `getByKey`. The `publisher` address acts as the ultimate verification key.
* **Trust Layer:** This architecture creates a robust trust layer. Your application logic can be certain that any data returned for a specific publisher was, without question, signed and submitted by that publisher's wallet.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.somnia.network/developer/data-streams/concepts/data-provenance-and-verification-in-streams.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
