← All projects
PrivateData Engineering

Automated Data Analytics Platform

Automated data analytics, end to end: it ingests a company's files, APIs, and systems, builds the data warehouse, and delivers governed reports on schedule — to inboxes, Power BI, Excel, and PDF.

PythonPostgreSQLPower BIREST APIsExcelAutomation

My role: Designed, built, and operate it end-to-end as Data Analytics Manager at an FMCG distributor.

Executive summary

This is the engine behind a company's automated data analytics. It replaces the manual reporting cycle — downloading exports, reconciling spreadsheets, refreshing dashboards, and emailing the same reports to the same people — with a single system that runs the whole loop on its own.

Every day it pulls data from across the business, consolidates it into a governed data warehouse, and turns it into decision-ready reports that land in the right place at the right time: an inbox, a Power BI dataset, an Excel workbook, or a PDF. Routine reporting that used to take a team's mornings now runs itself — and flags the moment something looks wrong.

How it works

The platform is built as an orchestrated pipeline. Each stage is scheduled, retried on failure, and observable from a central console — the same model production data tools like Apache Airflow use to run workflows reliably.

Ingest

  • Excel & CSV files
  • REST & web APIs
  • Line-of-business systems
  • Operational databases

Consolidate

  • Validation & cleaning rules
  • Warehouse staging & load
  • Modeled fact / dimension tables

Deliver

  • Scheduled email reports
  • Power BI datasets
  • Excel workbooks
  • PDF documents

Ingestion & the warehouse

Source data arrives in every shape a real business produces — manual Excel uploads, vendor CSV drops, REST APIs, and direct database reads. The platform normalizes all of it before anything touches the warehouse.

  • Heterogeneous connectorsOne ingestion layer for files, APIs, and databases, so new sources plug in without bespoke scripts.
  • Validation & data qualityType, range, and completeness checks reject or quarantine bad rows before they corrupt downstream reports.
  • Governed warehouseCleaned data lands in versioned warehouse tables — a single, auditable source of truth for every report.
  • Modeled datasetsFact and dimension tables are shaped once and reused across reports, email, and Power BI.

Reporting & delivery

From one set of governed datasets the platform generates and ships reports through whatever channel each audience prefers — no analyst assembling packs by hand. A single run can email a summary, attach an Excel and a PDF, and refresh a Power BI dataset.

Data Platformto Sales Leadership06:00
Daily Sales Report — Mon 16 Jun

Good morning — your daily sales report is ready.

Revenue is up 4.2% versus the same day last week, with the East region driving most of the lift. Full breakdown in the attached workbook, or open the live dashboard in Power BI.

Generated automatically at 06:00.

Sales_Daily.xlsx Sales_Daily.pdf
Data Platformto Data Team05:34
Pipeline alert — Inventory feed delayed

The Inventory API did not respond during the 05:30 run.

Today's inventory dataset is being served from the last successful load (yesterday, 05:30), so reports still deliver. The run has been retried automatically and is being monitored.

No action needed unless the next retry also fails.

Illustrative mock-ups of the automated report and alert emails the platform sends.

Orchestration & monitoring

Everything runs on a schedule and reports its own health. A built-in web console shows every pipeline, when it last ran, how long it took, and whether it succeeded — so the whole operation is observable at a glance, the way an orchestrator's run grid surfaces failures the moment they happen.

PipelineScheduleLast runDurationStatus
Sales warehouse loadDaily · 05:0005:004m 12ssuccess
Daily sales reportDaily · 06:0006:0038ssuccess
Power BI refreshDaily · 06:3006:301m 02srunning
Inventory syncEvery 30 min05:30retrying
Finance month-endMonthly · 1stqueued

Representative view of the monitoring console — pipeline health at a glance.

Tech stack

PythonPostgreSQLSQLitePower BIPower AutomateREST APIspandasExcelPDF generationScheduling

Outcome

The platform turns a recurring, manual reporting workload into a hands-off service. Around 100 data sources are ingested and roughly 500 pipeline runs are executed every day, feeding tens of reports across email, Power BI, Excel, and PDF.

Because every run is validated, scheduled, and monitored, reporting is both faster and more trustworthy — analysts move from assembling reports to acting on them.