Living in the information age, companies of any industry try to leverage more and more on their digital assets in order to gain and sustain a competitive edge. Technologies to enable these analyses have been around for some time, and their increasing degree of employment in the life science industry is a fact. So-called Data Lakes are built up in the background of a company´s (data and) IT landscape that somewhere resides between application layer and mere infrastructure to function as a turntable for data connections. Some of that data processed is subject to dedicated governance from the perspective of GCP, GMP, and GLP. Therefore, computer systems validation principles must be applied in order to ensure the validated state of the Data Lake itself and its delivering and receiving applications.
What sounds reasonable and intuitive, is not always easy to accomplish. What are the implications of a validated Data Lake operation and what obstacles must be addressed in order to maintain that state, especially when new connections are constantly needed or new applications or use cases shall be incorporated? Does change management and documentation burden paralyze meaningful use cases of the Data Lake?
This poster presentation will introduce basic Data Lake concepts and discuss what approaches might or might not be in favor of running such a platform under validated conditions – taking a pragmatic view and acknowledging need, speed and flexibility of change often needed.