Advanced Big Data Analytics Platform For Digital Car Wash

gradient blue top playlist music 2022 youtube thumbnail (4)

Project Detail

We worked for an automated car wash company which had their data in two Rest APIs (Quickbooks and SuperOperator). I wrote a python code which would run on data bricks daily and would extract 25+ tables daily. I designed the delta lake house architecture in databricks.Our data lake consists of 3 layers (raw, curated and semantic). After the file is ingested in raw Azure ADLS, we create a delta table on this every file that is in the raw folder. This is an external table in Azure data bricks so that we can transform this data into the format we need inside data bricks environment. All of this code was written in Pyspark and Spark SQL and we automated the whole ETL using ADF.Transformed data is dumped into the curated and the aggregations are done and we finally dump facts and dimension data into the semantic layer. This layer is built on dedicated SQL pool in Synapse which acts the serving layer for us. All of these data bricks notebooks are scheduled using Azure data factory which is taking the parameters from the metadata in the look up file, which helps us with source and destination of each of the file.Synapse is then connected to Power BI for data visualization. I did a really good job with the dashboards as they only wanted to see some tables but I went ahead and created these beautiful visuals for the company.Also handled Power BI administration in which I was the owner of all the workspaces. I would take care of the scheduled refresh, implementing gateways and even enable row level security whenever we would create a report on finance data.

ClientSuper Operator

ScopeData Engineering, DevOps, Data Analytics