This would enable various uses of data:
- Data-Set extract that meet X criteria, with the resulting data-set being de-identified to a specified risk level, preserving specified data characteristics.
- Data-Summary report that is the result of R analysis. were the summary report is automatically assessed for data leakage.
- many other use-cases in-between
The solution is otherwise simply a data-lake, as a method of storing -raw- data within a system that facilitates many uses of the data. I want to use the concept in a broad way, I don't know if some useses of the term are narrow. Most use of data-lake are less sensitive, so they focus on the raw data storage and raw data access methods. This is very good foundation, I don't want to re-invent this. However the system does need to be re-designed in a Privacy-By-Design way.
I do want to restrict the Data-Lake sightly, Because my data-lake is full of PII, I want to impose that it is structured data, but not necessarily structured database. I am fine with the data-lake vs database or data mart. Meaning that the data is understood, such as using FHIR resource model; but that the data might also be made up of DICOM resource model, or CDA structure, or OpenEHR, or HL7 v2 messages, or other... The important part is that it is structured within any object well enough to know what the data are. Meaning it is not just a set of free-text (such is the internet, that google indexes).
I want all data to be clearly understood as having specific Provenance and specific Policy attached. I am not going to design here how this is attached, it might be like FHIR Provenance and FHIR resource meta tags. This layer of Provenance and Policy is essential, and potentially not considered part of the accessible data-lake functionality. That is that this metadata might be maintained somewhere else, somewhere close, or within the data. After all some problems do need to be 'engineered'.
The most important part of this Privacy By Design is that all accesses are mediated not just by a Security infrastructure, that is responsible for the security of the data; but also by a Privacy infrastructure that is responsible for the Privacy Principles of the data and the subject that the data are about.