Demystify Databricks

After drowning myself in the world of databricks in these few days, I've decided to create some docs and utils to make databricks programming bearable (esp for beginners) bear:bear:. I want to share these findings with you guys in case you think they are useful too grin:grin:.

datA source explorations

dbfs_utils*

Eg: you want to find the event directory of final hourly and daily flight booking and flight search data in avro format. You can use:
search_tvlk_datalake_prod_directory(versions="final", time_granularity=["hour", "day"], file_formats="avro", keywords=["flight_booking", "flight_search"] )

Eg: you know the event directory but you aren't sure abt the first and the latest available files under that directory. You can use:
get_first_full_partitioned_path('/mnt/datalake-prod/traveloka/data/v1/final/avro/day_1/edw.fact_flight_booking/') or get_latest_full_partitioned_path(/mnt/datalake-prod/traveloka/data/v1/final/avro/day_1/edw.fact_flight_booking/)

only applicable to files mounted under /mnt/datalake-prod or /mnt/S3_*

secret_utils & DBFS_UTILS_ADMIN (admin only)*:

df_utils & TIME_UTILITIES:

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/327213

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/316425

plot_utils*

currently supports time series plots only, will add more if there are requests.

TL; DR

I created some notebooks that contain documentations and utilities to make programming in databricks more efficient.

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/315759

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/326780
Command: %run /Users/deka.akbar@traveloka.com/utils/dbfs_utils

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/333186
Command: %run /Users/deka.akbar@traveloka.com/utils/secret_utils

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/332078
Command: %run /Users/deka.akbar@traveloka.com/utils/dbfs_utils_admin

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/327213
Command: %run /Users/deka.akbar@traveloka.com/utils/df_utils

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/316425
Command: %run /Users/deka.akbar@traveloka.com/utils/time_utils

https://dbc-e60bee69-a52a.cloud.databricks.com/#notebook/327470
Command: %run /Users/deka.akbar@traveloka.com/utils/plot_utils