top of page


Have a look at my projects below to get an idea of what I’ve worked on, how I implement my skills, and my approach from concept to execution. Reach out if you’d like to learn more about a specific project.


Spark is a processing engine; it doesn’t have its own storage or metadata store. Instead, it uses AWS S3 for its storage. Also, while creating the table and views, it uses Hive metastore. And that’s the major reason Spark doesn’t provide reliable data processing systems such as atomicity, consistency, Isolation and durability (ACID) transactions. This document describes a transactional solution to Spark’s overwrite behavior and its implementation


To describe and illustrates a strategy required for the small file problem identification and mitigation in Apache Spark. This document describes when you need small file compaction and workaround to generate large data files dynamically on the Apache Spark framework.


Orchestration of partition synchronization using Apache Airflow

To describe and illustrate the process involved in orchestration of Delta table partitions sync from S3 to redshift spectrum via DynamoDB. This document is a guide on how to get started using airflow operator and scheduling of dag to validate end to end workflow. This document also validates result by querying in AWS redshift spectrum.


Real-time data streaming using Kafka cluster and data transformation using Apache Flink

The project aims to revolutionize the e-commerce industry by using cutting-edge technology to analyze customer behavior in real-time. By leveraging Kafka cluster and Apache Flink, it will be able to collect and transform massive amounts of data into actionable insights. The end result will be a powerful clickstream processing pipeline that enables the retailer to optimize their website, increase sales, and retain customers


Orchestrate a pipeline using AWS Step Function to perform Consumer Engagement Funnel Analysis & User Acquisition Optimization

The project's goal is to help businesses optimize their customer acquisition and retention strategies by analyzing their engagement funnel. By leveraging AWS Step Functions to orchestrate a pipeline for data ingestion, processing, transformation, and storage, the project was able to deliver actionable insights that led to increased customer base and revenue. It highlights the importance of customer engagement funnel analysis and how it can help businesses make data-driven decisions to improve their customer acquisition and retention.


Orchestrating a Data Engineering workflow to improve Lifetime Value(LTV) efficiency

Maximizing Revenue with Customer Acquisition Cost(CAC) Analysis


Revolutionizing Headcount Reporting: One-Stop Solution for Metrics Management

This project revolutionizes headcount reporting for businesses with a one-stop solution for metrics management, powered by AWS. The serverless data pipeline, scalable data warehousing solution, and Tableau dashboard provide accurate, real-time insights into headcount data from multiple sources, enabling seamless integration and data governance across different lines of business. By streamlining the process, businesses can save time and resources while making data-driven decisions for resource optimization and performance improvement.

bottom of page