Dan Austin

Dan Austin

Data Scientist London

Data Scientist extremely passionate about building safe & composable agentic AI using large scale RL.

Current Role

Coming soon

Recent LLM Projects

A selection of AI/ML projects I've been working on

01
  • Trained Qwen3-14B orchestrator model using RL to better coordinate explorer & coder subagents, achieving 167% relative improvement (7% → 18.25%) on Stanford's TerminalBench—within striking distance of Qwen3-Coder-480B (19.7%).
  • Scaled RL training to 32x H100s across 4 bare-metal nodes with 256 Docker environments rolling out simultaneously, achieving stable training with smooth entropy decrease.
  • Discovered that simple reward design using just unit tests outperformed all crafted reward signals, which consistently led to policy collapse during training.
02

Orchestrator: Multi-agent coding system

  • Achieved #12 ranking on Stanford's TerminalBench leaderboard with my novel multi-agent architecture, outperforming Claude Code using Sonnet-4.
  • Implemented orchestrator-explorer-coder architecture where strategic delegation and explicit context artifacts enable complex multi-step problem solving.
  • Open-sourced the complete system with evaluation scripts, enabling researchers to build on multi-agent coordination patterns.
03

Terminal Agent RL

  • Implemented GRPO-based RL training infrastructure for training long-horizon terminal-based coding agents, scaling to 32x H100s across 4 nodes with Docker isolation per rollout.
  • Achieved highest scoring Qwen3 agent on Stanford's terminal-bench leaderboard using an untrained 32B model through prompt engineering and tool design, outperforming Terminus-Qwen3-235B.
  • Benchmarked multiple LLMs as behavioural judges, finding Claude-4-Sonnet most accurate for evaluating desired agent behaviour and tool usage patterns.
04

T-Bench Agentic Data Pipeline

  • Orchestrated 20+ parallel Claude Codes to generate hundreds of validated RL training tasks & environments for Terminal Agent RL project with distributed task management.
  • Designed 3-stage multi-agent pipeline with specialised agent roles, combining AI generation abilities with programmatic validation to ensure high quality training data at scale.
05

Calculator Agent RL

  • Trained Qwen2.5 models (0.5B and 3B) via multi-turn GRPO to use a recursive calculator tool, achieving 62% absolute accuracy increase (27% → 89%) on custom evaluation suite.
  • Designed hybrid reward system combining Claude-3.5-Haiku judge for tool-use quality with programmatic answer verification, enabling stable multi-turn RL training.

Previous Projects

Client work and consultancy projects (2022-2024)

Llama-8B NVDRS

Government of Alaska

  • Fine-tuned Llama-3.1-8B using SFT with LoRA adapter for government-specific use case, developing synthetic data pipeline and custom evaluation dataset
  • Deployed model on local infrastructure (2x4090s) ensuring data privacy and compliance requirements

Flow Legal AI Platform

Flow Legal

  • Built AI agent system for SME legal support, handling multi-step reasoning and custom document generation
  • Led technical architecture and development, resulting in 12-month advisory extension

Meta Grant Architecture

Clean Cooking Alliance

  • Designed technical architecture and system requirements for Meta Llama Grant application for global health charity
  • Developed AI strategy to leverage Llama models for improving clean cooking adoption in developing countries

Lead Generation Agent

AiTuning

  • Built AI system discovering 9,600 high-quality B2B leads/day, automating process from 8 leads/hour manual work
  • Fine-tuned Llama 3.1 8B using SFT with custom evaluation suite and continuous learning from user feedback

Todo List Fine-Tuned LLM

AiTuning

  • Fine-tuned LLMs using SFT to convert speech into structured todo lists with 96% accuracy, 2x better than GPT-4 at 20x lower cost
  • Developed "data engine" approach using iterative fine-tuning and synthetic data generation to achieve 10x performance improvement

AI Contract Reviewer

Techtracts

  • Developed GPT-4 based system to analyze 10-25 page legal contracts, identifying high-risk clauses with plain language explanations
  • Enabled entry into AI Forge tech accelerator by democratizing expert-level contract analysis for non-legal users

Testimonials

What clients say about working with me

"Dan is an exceptionally talented AI engineer who delivered high-quality work with efficiency and expertise. He's a creative problem solver with the skills and clear communication required to make collaboration seamless. Dan works to understand your problems, and if you get the chance to work with him, take it!"

Tyler NVDRS Project

"Dan was an amazing find for me! He is smart, talented, very professional, and very timely. His work was top notch in every way!! I hope to work with him on many future projects for our company. If you need a talented AI developer that actually understands how to bring an idea to life, I suggest hiring Dan!!"

Pete AI Social Media Agent

"We're so happy with Dan's work that we've asked him to stay on as an advisor for the next 12 months because we find his input to be critical to our ongoing success."

Guy Flow Legal

"We have been impressed by his professionalism and warmth and would recommend him whole-heartedly."

Adam Sphinx AI

"Dan understand's the importance of meeting clients' requirements and delivers excellent services. I would highly recommend."

Meera Techtracts

Most Recent Employer

Data Scientist | Fabric (Aviva Insurance)

Dec 2024 - Oct 2025

  • Independently designed and built Aviva's first customer-facing AI agent from conception to production deployment, achieving 4.5/5 average customer satisfaction scores.
  • Built comprehensive evaluation framework combining LLM judges with software verification, adopted internally for agent quality assurance.
  • Championed development of "AI engine" feedback system, continuously expanding evaluation suites from production edge cases to enable systematic model improvements and directly inform agent capability roadmap.