Sarcastic Fringehead: Upload Path Pt2 (GSI-1960) by TheByronHimes · Pull Request #159 · ghga-de/epic-docs

TheByronHimes · 2025-11-19T16:15:29Z

This epic details the backend work for the remaining portion of the file upload path.

The FileUploadReport has been augmented from the original concept. Instead of being generated by the DHFS and maintained by the FIS, it is generated with partial data by the FIS and the remaining data is supplied by DHFS after interrogation.

84-sarcastic-fringehead/technical_specification.md

mephenor · 2025-11-20T13:10:04Z

84-sarcastic-fringehead/technical_specification.md

+- GET /uploads/{storage_alias}/inbox
+  - Returns a 200 status code and list of `FileUploads`
+- GET /uploads/{storage_alias}/archived
+  - Returns a 200 status code and a list of file_ids for deletion


Confused about the deletion part here.
From the route I would expect this to provide a list of files that have been successfully moved to permanent storage.

Bad wording on my part. That's exactly what's in the list. The DHFS deletes those files from its interrogation bucket.

Still find the phrasing a bit weird, event though I get that that's the primary purpose of what's done with the information. On the risk of sounding redundant, this could be the list of successfully processed file_ids or something in that vein to be explicit about which file IDs we are talking about.

84-sarcastic-fringehead/technical_specification.md

mephenor · 2025-11-20T13:16:58Z

84-sarcastic-fringehead/technical_specification.md

+#### FileUploadReport
+```python
+file_id: UUID4      # Unique identifier for the file upload
+secret_id: str | None    # The Vault ID of the file secret used for re-encryption


This will only be populated after passed_inspection is set to True?

The new secret key would be deposited before re-encryption starts, so from a purely technical standpoint the secret_id could be included for both successful and unsuccessful interrogations. The FileUploadReport is only submitted once a conclusion is reached either way.

Thanks, I just wondered when exactly that would be populated.
We don't need to store a key for an unsuccessful interrogation attempt, that would result in more than one wasted HTTP call if thinking about the vault interaction resulting from that.

On the other hand, there's a potential problem if we store the secret after re-encryption. If the file re-encryption is successful but there's a network hiccup or some other error that prevents the secret from being stored, then that re-encrypted data is bricked and re-encryption has to be repeated.

As I step through this, I don't think I accounted for error recovery if something prevents the DHFS from being able to send the interrogation results to the FIS. Maybe it can write something to disk before it crashes/moves on. I'd like to keep it lightweight and not use a database

84-sarcastic-fringehead/technical_specification.md

mephenor

Much easier to read and well thought out version of the spec.
There's one minor and one major issue remaining:

Box ID is mentioned in the models, but not where/when it's populated. Should be added for consistency with the other information
The whole point where the actual data finally resides needs some more current input from the other reviewers. I think I'm operating on outdated information here, but in my mental model the data does not have to leave the data hub and only the secrets are stored at central in every case.

84-sarcastic-fringehead/technical_specification.md

mephenor · 2025-11-25T14:51:24Z

84-sarcastic-fringehead/technical_specification.md

+The `dcsPersistedEvents` collection needs the following changes to the `payload` field:
+- Where `type_` == `drs_object_served`:
+  - Replace the value for `file_id` with the value from `target_object_id`
+  - **QUESTION**: *Should we update the actual schema so `s3_endpoint_alias` becomes `storage_alias`?*


While we're migrating other fields, let's make this consistent.

That was my feeling too

Per offline discussion, let's hold off on that change until we get some clarity about the final move.

Cito

Thanks. Made some comments below, but essentially looks good to me.

AC007 should be adapted afterwards to reflect the current state.

84-sarcastic-fringehead/technical_specification.md

mephenor · 2025-12-15T12:03:43Z

84-sarcastic-fringehead/technical_specification.md

+- **One alternative** is to manually exfiltrate the data to UCS.
+- **A second alternative** is to simply drop the data since the files have already been archived and all relevant information is actually stored by IFRS.
+
+There is one **problem**: IFRS has different object IDs than FIS. If we keep the event data which FIS currently has, we will need to update the information so that the file IDs in FIS are set to the file IDs (object IDs) known to IFRS, using the accession to match. That action would not be reversible, of course.


The failure mode here would be IFRS loosing information in its internal DB about files that are already archived.

Theoretically we should preserve the information to keep guarantees about what we are able to restore by replaying events, even though personally I'd prefer a clean cutoff as there's a separation between current and new FIS which, in my mind, make them two distinct services and the new one shouldn't be burdened by one off logic permanently in it's code.
Would it make sense to possibly keep this legacy migration separate as a one off script?

On another note, why does this have to be reversible?

On another note, why does this have to be reversible?

It does not have to be reversible - I wrote that because when I initially looked at the data it seemed that the potential migration would be reversible in nature. And if it's possible to write a reverse migration without unreasonable effort, it's good practice to do so.

Would it make sense to possibly keep this legacy migration separate as a one off script?

Sure, that is of course a possible avenue.

which, in my mind, make them two distinct services

I had the same thought a few times. "File Ingest Service" doesn't really apply. It's more of an "Assistant to the DHFS". Just a DH-Central liaison or outsourced persistence mechanism. We also have the power to delete the FIS entirely and write a new service with a different identity but the same purpose described in the spec.

Cito

Made some suggestions, but I have no major objections.

84-sarcastic-fringehead/technical_specification.md

Cito · 2025-12-16T13:28:46Z

84-sarcastic-fringehead/technical_specification.md

+size: int
+storage_alias: str
+```
+


Should have a validator that checks/sets locked = True if archived = True.

84-sarcastic-fringehead/technical_specification.md

Cito

See comment regarding passing file IDs as query parameters.

84-sarcastic-fringehead/technical_specification.md

Co-authored-by: Christoph Zwerschke <c.zwerschke@dkfz-heidelberg.de>

TheByronHimes requested review from Cito, lkuchenb and mephenor November 19, 2025 16:26

mephenor requested changes Nov 20, 2025

View reviewed changes

TheByronHimes marked this pull request as draft November 21, 2025 11:45

TheByronHimes marked this pull request as ready for review November 24, 2025 22:00

TheByronHimes requested a review from mephenor November 24, 2025 22:00

mephenor requested changes Nov 25, 2025

View reviewed changes

Cito previously approved these changes Nov 25, 2025

View reviewed changes

TheByronHimes dismissed Cito’s stale review via 0e34975 November 25, 2025 16:43

TheByronHimes requested review from Cito and mephenor November 26, 2025 13:23

mephenor reviewed Dec 8, 2025

View reviewed changes

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

TheByronHimes marked this pull request as draft December 12, 2025 08:45

TheByronHimes marked this pull request as ready for review December 15, 2025 10:17

mephenor reviewed Dec 15, 2025

View reviewed changes

TheByronHimes requested a review from mephenor December 15, 2025 13:16

Cito approved these changes Dec 16, 2025

View reviewed changes

Cito previously approved these changes Dec 16, 2025

View reviewed changes

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

TheByronHimes dismissed Cito’s stale review via 718ed5c December 19, 2025 08:39

TheByronHimes requested a review from Cito December 24, 2025 15:20

Cito requested changes Jan 5, 2026

View reviewed changes

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

TheByronHimes requested a review from Cito January 7, 2026 12:51

Cito previously approved these changes Jan 7, 2026

View reviewed changes

mephenor previously approved these changes Jan 9, 2026

View reviewed changes

TheByronHimes dismissed stale reviews from mephenor and Cito via 424ee8c January 21, 2026 17:00

Cito reviewed Jan 23, 2026

View reviewed changes

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

Cito reviewed Jan 23, 2026

View reviewed changes

84-sarcastic-fringehead/technical_specification.md Outdated Show resolved Hide resolved

Rough draft

7e35704

TheByronHimes and others added 24 commits January 23, 2026 12:39

Small diagram addition and update

7544e1b

Update Cleanup Job Diagram and Service Map

c7dbddb

Update big sequence diagram

7958cbe

Clarify FileInternallyRegistered confusion

78a771d

Clarify checksums in re-encryption

f0387da

Fix copy paste errors

60b1c02

Add wording to clear up FIS token decryption

f49db05

Fix spots where CLOSED box state still referenced

828a78a

Address the timestamp problem

e74dfb5

Include EKSS changes

a778c0d

Include file secret in InterrogationReport

c6988a7

Simplify DHFS-to-FIS authentication

8c8d5e9

Clarify indexes and some other little stuff

6fe4512

Adapt DHFS cleanup to perform bulk request to FIS

81fd0df

Incorporate suggested wording in UCS event consumer section

0944faa

Fix accession treatment

6aee0a9

Update 84-sarcastic-fringehead/technical_specification.md

5468348

Co-authored-by: Christoph Zwerschke <c.zwerschke@dkfz-heidelberg.de>

Use POST for /can_remove request from DHFS to FIS

c30f334

Change how accessions are handled and add new FileUpload state

2525fc0

Add box state to FileUploadBox and clean up stuff for FileAccessionMap

306736f

Add descriptions to events

c13c64b

Update 84-sarcastic-fringehead/technical_specification.md

61471ea

Co-authored-by: Christoph Zwerschke <c.zwerschke@dkfz-heidelberg.de>

Change description for a field

5b1a78e

Add data_hub

c2f5ade

TheByronHimes force-pushed the feature/upload_path_pt2_GSI-1960 branch from fd6f1e8 to c2f5ade Compare January 23, 2026 12:39

TheByronHimes added 5 commits January 28, 2026 15:05

Revert to storage_alias only, but clarify definition

b4b81e6

Add bucket ID to file upload and interrogation report

db9913c

Update a variety of items

6ca5f3f

Update for monorepo changes

c012d5a

Update for DHFS changes

dc7db37

Conversation

TheByronHimes commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mephenor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheByronHimes commented Nov 19, 2025 •

edited

Loading