Hi data-team...
The Data-GOA team is currently working on Data Lake in Qubole and we have scheduled everything to run in a separate Qubole account: tvlk-data-qubole-etl. No other users have visibility to this Qubole account except for the Data-GOA team. If you're wondering about:
You could query the log from the following tables.
These tables is accessible for public users to answer timeliness and completeness questions relating to our Data Lake.
Please refer to this dashboard, courtesy of Steph’s team (Flight and Accommodation), for an example of how to provide information on the timeliness and completeness of datasets from our Data Lake.
Detailed documentation on how the scheduler is logged can be found in this document.
Currently, we have many data source options for report and/or dashboard creation (e.g. Data Lake, Data Warehouse, NRT, etc.). This fragmentation has led to some inconsistencies in the reports, depending on which data source(s) were used.
To surface these inconsistencies, the Data-GOA have built a sample data consistency dashboard to compare our Data Lake and the existing Data Warehouse. This dashboard shows the consistency rate based on query in report/dashboard per metrics, per dimension key, and per dimension value.
Detailed documentation on how the consistency checking is done can be found in this document.
Our team welcomes further discussions should you have a similar requirement to monitor your report consistency (comparing one data source to another).
Hope it will help!
Draft DQA Main Doc (Gartner + DAMA)
A framework for DQ issue actions is in progress.