Metica
Real-time ROAS (Return on Ad Spend) engine processing terabyte-scale user acquisition data. Built on AWS with Apache Spark, handling GCS-to-Iceberg ETL pipelines.
The Challenge
Metica's user acquisition data pipeline was processing terabytes of gaming data with significant performance degradation and unexpected cloud egress costs. The root cause was Spark lazy evaluation behaviour and improper DataFrame caching, which was not surfaced by existing monitoring.
Our Approach
Bayseian audited the full Spark pipeline architecture on AWS, identified the lazy evaluation and caching issues, and rebuilt the GCS-to-Iceberg ETL with correct execution plans, partition strategies, and caching policies.
Outcome
Pipeline performance recovered significantly with reduced cloud egress costs. Real-time ROAS attribution now processes at the required throughput for production user acquisition operations.
Highlights
- TB+ data processing pipeline
- Real-time attribution analytics
- Critical Spark performance optimisation delivered
Let's talk about your project
No pitch. A practical conversation about where AI fits in your operation.