Dan Austin

Current Role

Coming soon

Recent LLM Projects

A selection of AI/ML projects I've been working on

Orca-Agent-RL

27 View on GitHub

Trained Qwen3-14B orchestrator model using RL to better coordinate explorer & coder subagents, achieving 167% relative improvement (7% → 18.25%) on Stanford's TerminalBench—within striking distance of Qwen3-Coder-480B (19.7%).
Scaled RL training to 32x H100s across 4 bare-metal nodes with 256 Docker environments rolling out simultaneously, achieving stable training with smooth entropy decrease.
Discovered that simple reward design using just unit tests outperformed all crafted reward signals, which consistently led to policy collapse during training.

Orchestrator: Multi-agent coding system

1,258 View on GitHub

Achieved #12 ranking on Stanford's TerminalBench leaderboard with my novel multi-agent architecture, outperforming Claude Code using Sonnet-4.
Implemented orchestrator-explorer-coder architecture where strategic delegation and explicit context artifacts enable complex multi-step problem solving.
Open-sourced the complete system with evaluation scripts, enabling researchers to build on multi-agent coordination patterns.

Terminal Agent RL

289 View on GitHub

Implemented GRPO-based RL training infrastructure for training long-horizon terminal-based coding agents, scaling to 32x H100s across 4 nodes with Docker isolation per rollout.
Achieved highest scoring Qwen3 agent on Stanford's terminal-bench leaderboard using an untrained 32B model through prompt engineering and tool design, outperforming Terminus-Qwen3-235B.
Benchmarked multiple LLMs as behavioural judges, finding Claude-4-Sonnet most accurate for evaluating desired agent behaviour and tool usage patterns.

T-Bench Agentic Data Pipeline

34 View on GitHub

Orchestrated 20+ parallel Claude Codes to generate hundreds of validated RL training tasks & environments for Terminal Agent RL project with distributed task management.
Designed 3-stage multi-agent pipeline with specialised agent roles, combining AI generation abilities with programmatic validation to ensure high quality training data at scale.

Calculator Agent RL

57 View on GitHub

Trained Qwen2.5 models (0.5B and 3B) via multi-turn GRPO to use a recursive calculator tool, achieving 62% absolute accuracy increase (27% → 89%) on custom evaluation suite.
Designed hybrid reward system combining Claude-3.5-Haiku judge for tool-use quality with programmatic answer verification, enabling stable multi-turn RL training.

Previous Projects

Client work and consultancy projects (2022-2024)

Llama-8B NVDRS

Government of Alaska

Fine-tuned Llama-3.1-8B using SFT with LoRA adapter for government-specific use case, developing synthetic data pipeline and custom evaluation dataset
Deployed model on local infrastructure (2x4090s) ensuring data privacy and compliance requirements

Flow Legal AI Platform

Flow Legal

Built AI agent system for SME legal support, handling multi-step reasoning and custom document generation
Led technical architecture and development, resulting in 12-month advisory extension

Meta Grant Architecture

Clean Cooking Alliance

Designed technical architecture and system requirements for Meta Llama Grant application for global health charity
Developed AI strategy to leverage Llama models for improving clean cooking adoption in developing countries

Lead Generation Agent

AiTuning

Built AI system discovering 9,600 high-quality B2B leads/day, automating process from 8 leads/hour manual work
Fine-tuned Llama 3.1 8B using SFT with custom evaluation suite and continuous learning from user feedback

Todo List Fine-Tuned LLM

AiTuning

Fine-tuned LLMs using SFT to convert speech into structured todo lists with 96% accuracy, 2x better than GPT-4 at 20x lower cost
Developed "data engine" approach using iterative fine-tuning and synthetic data generation to achieve 10x performance improvement

AI Contract Reviewer

Techtracts

Developed GPT-4 based system to analyze 10-25 page legal contracts, identifying high-risk clauses with plain language explanations
Enabled entry into AI Forge tech accelerator by democratizing expert-level contract analysis for non-legal users

Testimonials

What clients say about working with me

"Dan is an exceptionally talented AI engineer who delivered high-quality work with efficiency and expertise. He's a creative problem solver with the skills and clear communication required to make collaboration seamless. Dan works to understand your problems, and if you get the chance to work with him, take it!"

Tyler NVDRS Project

"Dan was an amazing find for me! He is smart, talented, very professional, and very timely. His work was top notch in every way!! I hope to work with him on many future projects for our company. If you need a talented AI developer that actually understands how to bring an idea to life, I suggest hiring Dan!!"

Pete AI Social Media Agent

"We're so happy with Dan's work that we've asked him to stay on as an advisor for the next 12 months because we find his input to be critical to our ongoing success."

Guy Flow Legal

"We have been impressed by his professionalism and warmth and would recommend him whole-heartedly."

Adam Sphinx AI

"Dan understand's the importance of meeting clients' requirements and delivers excellent services. I would highly recommend."

Meera Techtracts

Most Recent Employer

Data Scientist | Fabric (Aviva Insurance)

Dec 2024 - Oct 2025

Independently designed and built Aviva's first customer-facing AI agent from conception to production deployment, achieving 4.5/5 average customer satisfaction scores.
Built comprehensive evaluation framework combining LLM judges with software verification, adopted internally for agent quality assurance.
Championed development of "AI engine" feedback system, continuously expanding evaluation suites from production edge cases to enable systematic model improvements and directly inform agent capability roadmap.

Previous Employers

AiTuning Contentsquare Publicis Sapient