November 19, 2025
Intro to Models: General versus Purpose-Built
Learn to decode what's best for your practice and your clients
Clinician Summary
Model types:
General purpose models (ChatGPT, Claude, Gemini): broad ability, not purpose-built or certified for clinical use.
Domain-adapted or purpose-built models: general purpose base + fine-tuning or retrieval on mental-health data; should be clinician-informed and evaluated for bias, privacy, safety, and clinical quality.
AI scribes today: Usually wrappers around general purpose/foundational models. More on assessment of AI scribes in a future post.
What you need to know for your practice:
For non-clinical tasks that do not involve PHI, start experimenting (if you are reading this, you likely already have) with general purpose tools like ChatGPT or Gemini. For clinical tasks or when recommending AI-powered tools to clients, start with due diligence of reading company policies, documenting your assessment, and contacting the company when needed.
Seek to address the question: How rigorously has the company adapted their model for the realities of my practice or the needs of my clients?
Purpose-built is not a protected term and does not guarantee safety, but communicates useful principles to look out for.
Questions you can ask today to evaluate AI tools:
For whom was this tool designed? In the case of several populations, for which population is this tool best adapted for according to your assessments?
Consult the American Psychological Association's AI Tool Guide for a non-exhaustive list of questions for the vendors.
As clinicians, we are all at this point flooded with AI solutions. Where to start, and which tools are worth recommending? For me (Kate), it helped to zoom out first and get a feel for what these systems actually are - and what they are not. In this piece, we will keep it simple: a quick tour of AI models made for everyone, and the ones that are (at least in theory) built with clinicians in mind.
How to think about models for mental health care
General purpose AI
General purpose models1 (GPMs) include systems like ChatGPT, Claude, and Gemini, which are currently the most visible, but there are other GPMs that receive far less attention2. These models have been pre-trained on internet data and are further trained for broad conversational ability. GPMs are often multimodal (text, image, audio, video) and can do many different tasks with prompting or light tuning. You can communicate with the model by typing a question, showing the model a document or screenshot, speaking to it, or even by turning your camera on.
You can think of them as generalists. General purpose means they can handle many different tasks - writing emails, answering questions, summarizing text - without being specialized in any one domain.
Why do AI labs build such large general purpose models instead of many specialized ones? They create the biggest reward (market): one system can serve many sectors at once. A single broad model also acts as a platform that others can build on top and fine-tune, and from a research perspective it is more efficient to scale the training of one such model on massive internet data than to build separate models for every niche from scratch.
None of these models were built for clinical work. (But we know you might ask, What if I remove all identifying client information before using the models? It is a great question. We count on answering it in our future work.)
As tools like ChatGPT became widely adopted, many clinicians like me naturally tried them out. We were impressed by the non-clinical support - emails, summaries, workshop ideas - but if you used it for clinical work, you might have noticed that it seemed accurate, but in reality these models have limits and important pitfalls. That is because large language models are trained to produce the most likely continuation of text, not the most accurate or safest answer. And since they have seen almost no real clinical data (case conceptualizations, notes, or structured clinical reasoning) and are instead trained on the internet, their output for clinical tasks can look polished yet be clinically shaky. To read more on this, see our article on the risks of using LLMs (in French) for the magazine of the Order of Psychologists of Quebec.
These limitations do not mean you should not use general-purpose models; they explain why they cannot simply be dropped into clinical work and trusted. They also help explain the next development: a push to build models that are adapted to therapy, not just to text.
The success of systems like ChatGPT sharply accelerated the ongoing work of number of (often smaller or research-focused) teams aiming to build models specifically for mental health.
Purpose Built AI
The idea behind these models is simple: create systems that understand therapy, not just text. These models are called purpose-built, domain-specific, or specialized models3.
But building such models is not straightforward. Unlike the broad internet data used to train general purpose models, clinical data is scarce, private, legally protected, and ethically sensitive.
There is also a talent gap: developing such models requires collaboration between AI engineers, clinical experts, ethicists, and safety teams - not just technical skill.
Clinicians must be meaningfully involved in development, not just consulted at the end.
Models must be evaluated against bias, privacy, safety, and clinical-quality criteria before deployment - not merely tested for usability.
So if we take a step back, and given all the challenges, why build purpose-built models at all? Because clinical work requires understanding context, nuance, and risk that general models are not trained to handle. A useful analogy comes from another safety-critical field: aviation speech recognition.
Pilots use highly structured language: Descending through flight level three-five-zero, Runway two-seven, Squawk seven-five-zero-zero. A generic speech to text model often mishears these phrases because it has never really learned aviation language, making the general purpose voice model unsuitable for the task. Aviation needs speech models trained on pilot communication, not just general English.
AI scribes for therapy face a similar problem. If a voice model does not understand clinical language, it can mis-transcribe risk terms, diagnoses, or shorthand. On top of that, transcription is only the first layer. Text models that go on to summarize session transcripts, assist in case conceptualizations, or draft notes also need training on clinical reasoning itself, not just on generic internet text.
As a clinician, I let go of two assumptions while co-writing this article:
We need a therapy-specific model built from the ground up.
In practice, most purpose-built models do not start from zero. The approach depends on the type of model being built. We will use examples from two different companies to illustrate this.
For text-based models (Large Language Models; LLMs), training exclusively on therapy data is neither feasible nor useful; therapy transcripts and notes are limited, highly protected, and not broad enough to teach a model basic language competence. Instead, the process often begins with a general purpose foundation model trained on large internet-scale text, then adapt or fine-tune it using mental-health-specific data. The foundation supplies linguistic structure and reasoning; the additional training supplies domain knowledge. Tenor, an AI-powered clinical assistant, appears to have adopted this setup:
Tenor adds psychology-specific intelligence on top of leading AI models, making its guidance highly relevant and practical.
A more ambitious approach is to train a new model from the ground up using a different data mix: still a large amount of general text, but with a much higher proportion of therapy-related material than you would see in a standard general purpose model. We see this claim (always confirm claims with your own due diligence) for the development of the chatbot Ash by Slingshot AI. According to the company founder, Ash is:
trained on its own large language model that includes hundreds of thousands of hours of clinical conversations and a vast array of therapeutic approaches.
For speech or transcription models - the kind needed for AI scribes - the situation is slightly different. If a company has enough ethically sourced, anonymized clinical audio, it is possible to train, or heavily adapt, a speech model specifically for therapy. Speech recognition relies more on acoustic and vocabulary patterns than broad knowledge, so domain-specific training can meaningfully improve transcription accuracy.
We do not need a model from scratch. We need a model adapted for our clinical realities and for the task at hand, compliant with existing laws and regulations.
Purpose-built models = safer models.
A model trained and evaluated on clinical data and checked against practice guidance can be safer in practice: fewer transcription errors on drug names and therapy techniques, tighter adherence to protocols for suspected harm to self or others. But safety is not a label; it is the product of task, workflow, and oversight.
The only way to know is evidence from clinical trials: which setup reduces errors and supports better care? Do gains come from training on clinical data, from stronger guardrails, from therapist-in-the-loop review, or from some other combination?
Until we have more research, purpose-built is a promising approach, useful when paired with good workflow design (e.g., the therapist checks every note for accuracy), not a safety guarantee.
Putting This into Practice for Your Practice
Go back to basic principles. Read the privacy policy and check your local regulations. Check if there are any certified tools available for clinical use in your region. Even if certified, certified may not equal compliant, and you still need to check appropriateness for your practice or for your clients4.
Seek to answer the question below:
How rigorously has the company adapted their model for the realities of my practice or the needs of my clients?
Adaptation is especially key the higher-risk the activity. You arguably do not need an adapted tool to write an email, but diagnostic tasks call for adapted, purpose-built tools. Purpose-built tools are, at the minimum, built for the single purpose of improving mental health; this is reassuring, at least, that it is the company's priority.
See the American Psychological Association's AI Tool Guide to help you evaluate AI tools. We can think of a few to add ourselves, including: For whom was this tool designed? (Ideally for therapists, not a whole bucket of health professionals). In the case of several user populations: For which population is this tool best adapted for according to your assessments?
A well adapted tool helps ensure safe, effective and accurate support, and alignment with regulatory guidelines and essential values such as privacy, transparency, inclusion, public engagement, and expert involvement.
This standard of care helps safeguard your clients and proactively sets up your practice for the future of mental health care.
Footnotes
- An AI model is a trained computer program or algorithm that learns patterns from the data, and is able to make predictions on new, unseen data often mimicking remotely human-like intelligence. ↩
- Microsoft markets Copilot (based on ChatGPT) across its products. If you use Word, check your settings - you might already be running it. ↩
- No established definition exists to our knowledge, we draw on various sources: Montreal Declaration for Responsible AI, World Health Organization, American Psychological Association, College of Alberta Psychologists. ↩
- For example, upon testing a scribe, its output looked to me like a medical note, not a psychotherapy note. Make sure it's adapted for therapists, no matter what the marketing says. ↩
Thanks for reading! If you learned something useful from this post, help us grow by sharing it with a like-minded colleague.