Fahad SiddiquiLahore, Pakistan

I build AI systems & data infrastructure that ship.

Python/Go engineer with 11+ years across AI agents, LLM integrations, RAG systems, distributed data pipelines, and cloud infrastructure. Founder & CEO of Datum Brain — delivering production systems for enterprise clients in fintech, health intelligence, ad tech, and quantitative finance.

11+
years engineering
76+
projects delivered
$1.8M+
revenue generated
40+
production systems

Selected Work

Systems running in production, not demos.

01

Centrum AI

AI Agents

Enterprise AI agent platform where agents write their own Go code, execute in sandboxed Docker containers, and reach 70+ tools — with multi-LLM orchestration across OpenAI, Claude, and Vertex AI, plus behavioral anomaly detection in PyTorch.

200K+ LOC · 100+ concurrent agents · full audit trails
GoPostgreSQLDockerReactPyTorchClaude APIGCP
02

Polymer

Document Intelligence

Distributed document intelligence platform turning 100,000+ unstructured documents into queryable Neo4j knowledge graphs with Spark-parallelized NLP entity extraction — compressing 3-month legal discovery into a single day.

99.7% review-time reduction · 95% NER precision
Apache SparkNeo4jScalaPlayReactStanford NLP
03

TrueAudience

Ad Fraud Detection

Real-time ad fraud scoring engine fusing behavioral analysis, device fingerprinting, and 5+ IP intelligence sources — processing 10M–100M events a day behind a sub-100ms API on sharded MySQL and Redis.

100K+ events/min · 90%+ detection accuracy · 99.9% uptime
GoMySQLRedisKubernetesMaxMindScamalytics
04

Zolvat EMI

Fintech / Payments

Full electronic-money-institution stack: SEPA Instant Credit Transfers with ISO 20022 message generation, KYC with face liveness detection, and sanction screening across OFAC, EU, UK, and Interpol lists — sub-100ms screening API backed by a Luigi ETL of 13+ international watchlists.

ISO 20022 type-safe codegen · 13+ sanction lists unified
GoPythonLuigiPostgreSQLRedisVue 3AWS
05

DameTech

Energy / Automation

Autonomous Bitcoin mining controller that recomputes break-even electricity thresholds every 5 minutes from AEMO spot prices, hash price, and power contracts — pausing and resuming 3.2MW of miners over gRPC in under 10 seconds.

$252/hour cost avoidance · 24/7 autonomous multi-site ops
GoPythonAWS LambdaKubernetesgRPCInfluxDB
06

Examity VRS

Video Infrastructure

Enterprise exam-proctoring video service orchestrating thousands of concurrent Twilio/LiveKit sessions, with WASM body-segmentation screen blur at 30+ fps, NATS event streaming, and automated multi-track video composition.

1 proctor → 20–30 concurrent exams · 30+ fps WASM blur
GoTypeScriptTwilioLiveKitWebAssemblyNATSDynamoDB
07

Superlayer

Sales Intelligence

Sales intelligence platform auto-recording meetings via Recall.ai, transcribing with Deepgram speaker identification, summarizing with OpenAI, and enforcing CRM data quality through a JSON AST rule engine synced bidirectionally with HubSpot.

Forecast accuracy 67% → 84% · 6 hrs/week saved per rep
GoPostgreSQLRabbitMQOpenAIDeepgramHubSpot
08

Keyword Grouper

SEO Automation

SEO automation engine clustering 10,000+ keywords by search intent through SERP URL-overlap analysis with 100-thread concurrent fetching, ML-based clustering, and Snowflake analytics integration.

40 hours of research → 2 hours · 850 intent groups from 5K keywords
PythonDjangoCeleryRabbitMQSnowflakescikit-learn
09

Zoomprop

Real Estate Data

Real estate intelligence platform aggregating property data, public records, and 12 national location sensors into investment analytics — Dagster-orchestrated pipelines with AI-driven web scraping and automated data quality checks behind a FastAPI layer.

12 national location sensors · AI-driven scraping at scale
PythonFastAPIDagsterAWSPostgreSQL
10

ESG LDA

ESG Compliance

Enterprise ESG reporting platform for multi-site data collection, approval workflows, and analytics across GRI, CSRD, and SASB frameworks — 12 microservices behind a KrakenD gateway with real-time dashboards and multi-format report generation (PDF, DOCX, Excel).

80% reduction in reporting time — months to weeks
PythonFlaskAngularMongoDBRedisKrakenDDocker
11

Spongeling

AI Language Learning

AI-powered Spanish learning platform pairing FreeLing NLP precision with GPT-4 contextual intelligence — grammatical pattern recognition from real-world text, personalized examples matched to the learner, delivered through a Flutter app with a Vue 3 educator portal.

1.45s end-to-end analysis · 73% month-1 retention
GoFlutterVue 3FreeLing NLPGPT-4PostgreSQL
12

Mecku

Visual ETL

Visual data pipeline platform democratizing ETL through a drag-and-drop Rete.js editor — sources from S3, MongoDB, MySQL, Snowflake, and Redshift flowing through Apache Spark distributed execution orchestrated by Scala/Play and Akka actors.

Pipeline development from weeks to hours · 10+ data sources
ReactRete.jsGoScalaApache SparkAkka
13

PSI RSaaS

Scheduling Infrastructure

Cloud-native exam scheduling platform for professional certification testing — a unified Go API with Redis-backed hold-and-confirm workflows preventing double-booking, geographic test center ranking, and serverless Lambda deployment across four environments.

15K req/hour at peak · 99.9% uptime · 78% cache hit rate
GoAWS LambdaRedisAPI GatewayS3

Also shipped

  • Prepaire ShieldMap — global disease-outbreak ETL across 5 sources, sub-5-minute detection lag
  • Collab — serverless PII detection & redaction, $0.0002/document on Lambda + Comprehend
  • USO API — multilingual serverless options-flow intelligence in 5 languages
  • Civil API — campaign finance intelligence over FEC/IRS/state data, sub-100ms cached
  • CyborgDiva — serverless AI avatar pipeline: GPT + Stable Diffusion + YOLOv8 pose
  • CDMon — Go microservices domain-pricing engine, 5K req/min at sub-100ms
  • NeoPrintr — constraint-based puzzle generation, 15×15 grids solved in <100ms

Experience

A decade of leading and building. Remotely.

2020 — Present

Datum Brain Founder & CEO

Global · Remote
  • Software consultancy delivering AI systems, data engineering, and cloud infrastructure for enterprise clients worldwide.
  • 76+ projects across fintech, health tech, ad tech, and SaaS — 5-star satisfaction on Upwork, $1.8M+ revenue.
  • Recognised in clutch.co's top 10 big data service providers in Pakistan.
2022 — 2023

NMS360 Lead Engineer

United States · Remote
  • Built TrueAudience — a real-time ad fraud detection platform processing 10M–100M events/day with sub-100ms P99 latency, fusing behavioral analysis, device fingerprinting, and 5+ IP intelligence sources across sharded MySQL and Redis.
  • Saved clients $50K–$500K annually with 90%+ fraud detection accuracy and under 2% false positive rate.
2025 — 2026

Zoomprop Senior Data Engineer

United States · Remote
  • Scalable data pipelines and APIs for real estate discovery — AWS, Dagster orchestration, AI-driven scraping over public records and property data.
2021 — 2024

Ardent Growth Head of Engineering

United States · Remote
  • Automated SEO research platform processing 10,000+ keywords per job — 90% reduction in manual analysis with ML clustering surfacing $28.5K/month in content opportunities per client.
2020 — 2024

PSI Services LLC Lead Software Development Engineer

United States · Remote
  • Led end-to-end build of a multi-tenant Resource Scheduling as a Service platform in Go, PostgreSQL, and AWS SQS — MVP in 1 month, scaled to thousands of weekly users.
2022

Examity Lead Engineer

United States · Remote
  • Redesigned core remote-proctoring infrastructure; built real-time monitoring APIs in Go and Python with AI-assisted proctoring.
2018 — 2020

Polymer DLP Head of Engineering

New York · Remote
  • Architected a 20+ repository microservice platform; took Polymer to Acceleprise (selected from 200 applicants); partnered with the official Neo4j team on client POCs.

Earlier: Parcelist (project lead), Arbisoft, Strategic Systems International (ImpactUs Marketplace, later featured in Forbes), Platalytics (big data SaaS, Spark job server, IoT SDKs) · Visiting Lecturer at University of the Punjab — data structures & algorithms.

Open Source

Tools I maintain and projects I contribute to.

Stack

What I reach for.

Languages
Python · Go · TypeScript · Scala
AI / ML
LangChain · LlamaIndex · OpenAI API · Claude API · RAG · Vector DBs
Data Engineering
Dagster · Apache Spark · Kafka · Luigi · ETL/ELT · Snowflake
Backend
FastAPI · Django · Flask · go-chi · gRPC · GraphQL
Databases
PostgreSQL · MongoDB · Redis · Neo4j · DynamoDB · Pinecone
Cloud & DevOps
AWS · GCP · Kubernetes · Docker · Terraform · GitHub Actions

Have a hard systems problem?

I take on AI platforms, data infrastructure, and backend systems — directly or with the Datum Brain team. Usually responding within a day.

fahad@datumbrain.com