Question 1

Do we need a large dataset to get started?

Accepted Answer

Not always. The minimum depends heavily on the task. For text classification into well-defined categories with clear examples, we can achieve strong results from 200 to 500 labelled examples using few-shot fine-tuning or low-rank adaptation (LoRA). For complex NER or information extraction tasks, 1,000 to 5,000 annotated examples is a more realistic floor. For regression or forecasting on tabular data, it depends on the number of features and the signal-to-noise ratio. We always tell you honestly during the data audit if we think the dataset is too small, and what it would take to get to a viable size.

Question 2

Can you improve an existing model that isn't performing well enough?

Accepted Answer

Yes — this is a common engagement. We start by auditing the failure modes: are the errors clustered in specific data subsets? Is the training data noisy or mislabelled? Is the evaluation metric misaligned with the business objective? Often, the issue is not the model architecture but the data quality or the loss function. We fix the root cause rather than just retraining on the same problematic data.

Question 3

Where will the model run — on our servers or a cloud API?

Accepted Answer

Either. We evaluate the options: cloud-hosted APIs (OpenAI, Anthropic, Together AI) are zero-infrastructure and fast to integrate but carry per-token costs and data-leaving-your-perimeter implications. Self-hosted on AWS/GCP gives you full control, predictable costs at scale, and data sovereignty. Edge deployment (on-device or on-prem) eliminates latency and cloud dependency entirely. We recommend the right architecture for your volume, latency requirements, budget, and compliance constraints.

Question 4

How do you handle data privacy when training on sensitive information?

Accepted Answer

We operate under a strict data handling protocol. Training data is processed only in agreed environments, never stored beyond the project term, and never used to improve any other model. For regulated industries, we can work entirely within your private cloud under a BAA or equivalent agreement. We can also train on anonymised or differentially private data subsets where the underlying records must stay confidential.

Question 5

What is the difference between fine-tuning and RAG? Which do I need?

Accepted Answer

Fine-tuning adjusts the model's weights on your data, making it better at a specific task or domain by 'baking in' knowledge. RAG retrieves relevant documents at inference time and feeds them to the model as context, keeping the model weights unchanged. Fine-tuning is better for style, format, tone, and tasks that require consistent structured output. RAG is better for factual question-answering over a knowledge base that changes frequently — you update the index, not the model. Most production systems benefit from both: a fine-tuned model that is fluent in your domain, combined with RAG for up-to-date factual grounding.

Question 6

How long does a machine learning project typically take?

Accepted Answer

A focused classification or extraction model — well-scoped task, clean labelled data, clear evaluation criteria — typically takes 3 to 6 weeks from data audit to a deployed, monitored endpoint. A full forecasting or recommendation system with feature engineering, pipeline infrastructure, and A/B testing typically takes 8 to 16 weeks. MLOps infrastructure buildouts (if you have existing models that need production-grade deployment) typically take 4 to 8 weeks separately.

Machine Learning & Model Fine-Tuning

Built for teams that need results, not experiments.

From first call to production in clear steps.

The details that separate good from great.

When fine-tuning is worth it — and when it isn't

MLOps: the gap between a trained model and a useful product

Inference cost and latency optimisation

Questions we get asked before every project.

Let's build something that actually works.