AI transcription software is changing the ways we do qualitative data analysis, and we’re not paying attention.
That’s the title of the Ignite presentation I delivered at #aes25cbr the 2025 Australian Evaluation Society International Conference in Canberra.
I thought I might try transforming that presentation into a blog post, too.
Ignite is an interesting presentation format – 5 minutes, 20 slides that need to be set to auto-advance every 15 seconds. It forces you to get really precise about what you want to say, and how you pace your talk, because there’s not much room to move! Blog posts are more flexible, but I’ll mostly keep this to what I said during the Ignite session.

That’s the one-sentence version. Here’s the 5 minute version to add more detail, but I have a 20 minute version too haha.

How we analyse data affects findings.
I feel like we don’t talk about this enough when we talk about qualitative analysis. We tend to just say “oh we analysed it, and this is what we found”, but the method matters.
And when the method includes AI tools, that matters too.

Tape, transcribe, code, interpret.
This is a pretty standard way to explain the process for qualitative data analysis (though we’d usually say “record” rather than “tape” these days). There’s loads more than can be unpacked in each of these steps, but the one I’m focused on here is the “Transcribe” step.
Not many evaluation sources give a lot of time to the “Transcribe” step.
But we tend to forget that what we’re doing in that step is data transformation.

Transcription is data transformation.
We’re turning an audio or video representation of our data into a text representation of our data. Any time you transform data, you run the risk of changing the meaning in your data.
And there are limits to how much meaning text can actually capture from spoken communication.

Spoken communication is complex.
Spoken and written language are different from each other. We use different words to convey meanings when we’re speaking and when we’re writing. And when we’re speaking, we have more ways of communicating than just word choice.
Just writing down the words a person spoke doesn’t convey all the meaning they were communicating.
Tone, emphasis, body language, silences, volume, laughter – these are all ways that we add or subvert meaning when we’re speaking.
So it’s worth thinking about how we capture that meaning if we’re transforming spoken data into text.

This is the bit where the blog post format will struggle to communicate what I said during my presentation. Ironic, right? Go with me on this…
Say you ask me about an experience I had with something.
I say “It was just…” then I pause, then I sigh, and the tone in my voice becomes kind of defeated. I add, “It was just one of those things,” and I look down and away.
Here’s how MS Teams captured that in a transcript:

You ask me about a different experience.
On this occasion, I smile. I shake my head, remembering, and say “It was jst-” I’m speaking quickly and I stumble over my words in my enthusiasm. “-it was just one of those things!” and I’m laughing and grinning by the time I’ve finished the sentence.
MS Teams captured that in a transcript like this:

I think we can see some shortcomings here.

Transcripts are a pretty blunt instrument.
A picture is worth a thousand words, after all. What about a song?
But transforming our data into text is genuinely really useful.
The coding, interpretation, synthesis and reporting steps are all almost always going to be done using text too. So having our data in that same format absolutely makes life easier.

If you dive into the body of literature on transcription (and yes, obviously there is a body of literature on transcription), this inability of text to fully capture the meanings in spoken language is something that has been explored for a while.
There’s lots of different advice about different ways to capture and describe non-word information alongside the words in your transcript. Some people use a systematic code of italics, bolding and indenting to show things like volume changes or emphasis.
Simply adding descriptions of key things can help. Punctuation can add meaning, too.

This might be a better way to represent the opinions I gave before. These are still just representations, but they do hold more information than the words I used alone.
How you do your transcript depends on what matters in your data.
And that depends on what your evaluation questions are. Or your research questions. And it depends on who you’re asking.
You can’t capture everything that communicates meaning from spoken data in text. But you can add some information that makes your transformed data richer.
For example, if one of the things you’re looking to understand is the emotional response that people had when something happened, you’ll need to make sure your transcript captures emotions in some way.
If the people you’re speaking with come from a culture where there’s a lot of meaning conveyed through silences, then your transcript needs to capture those silences.

Making those choices on purpose matters. Transcription choices need to be active, deliberate and considered. You want the way you do your transcript to match what you need from your data, so that you stand the best chance of being able to find what matters.

AI transcription takes those choices away from us.
Rather than making deliberate, considered choices about how to represent our data, we’re just going with the default option, without thinking about it.
And it’s so easy, and we’re all so time-poor, and the tool is just built into Teams or Zoom, and so most of us aren’t realising that the AI is stripping our data of some of its meaning.

This is all we need to do to get a transcript.
“Record and transcribe”. It used to be such a laborious and active process, and now it’s so simple, and passive.
But there’s no choice involved.
The software doesn’t have mechanisms to add body language. It struggles with laughter. It can’t handle tone. It doesn’t note pauses.
It just does words. Only words.
And it doesn’t always get those words right.

These shortcomings are important to think about.
All of us who have played with this type of software have had a laugh at the sometimes-hilarious ways it gets the words wrong. (My favourite was when the AI switched the name of the government agency Resilience NSW to Brazilians NSW.) Often we can figure out what it’s meant to be. But not always.
There are lots of things that make AI transcription worse. Jargon or new words, slang, and people with similar voices confuse the software, and that can make the transcript unintelligible.
Did you know that there can be censors built in, too? I tested it on Microsoft Teams – when I swore during a recording, my transcript just showed “******* ****”. That’s more polite, sure, but those words are usually chosen to express a particular meaning, and censoring them makes that meaning harder to interpret.
We also have to think about accents. Say we have 10 vox-pop interview transcripts, and one of the interviewees had an accent that the AI didn’t understand, so the transcript is nonsense. Do we just disregard that data? What implications does that have for bias and equity in our analysis?

To be clear – I am not advocating for a return to hand transcription.
It’s awful, and I’ll be happy never to do it again. It takes SUCH a long time! It’s tedious, and it really makes you notice how many times people say “like, um” when they’re speaking.
Clicking “download transcript” is a million times better.
(Although, after I gave this talk, an experienced colleague told me that for them, hand transcription takes the same amount of time as fixing an AI transcript – so it’s definitely something you can get faster at! They recommended a foot pedal, which I’ve also used for hand transcription in the past, and it absolutely helps. Anyway…)

But it comes back to active choices.
When you’re transcribing by hand, you’re actively thinking about your data, and making transcription choices that are aligned to your evaluation questions.
When you’re transcribing with AI, you’ve given those decisions to a software developer who probably doesn’t know much about qualitative data analysis.

We need to augment AI transcripts so that they include the data we need.
That data depends on our evaluation questions.
We need to consider whether an AI transcript will capture all the data needed to answer our questions. And sometimes it will! Sometimes the words will capture all the data we need.
But what about when they don’t?

There’s a few ways to do this.
We can be more active in choosing which software to use. Rather than defaulting to Teams (which we’re all using anyway, and is deployed in most workplaces, which really helps), we can choose a more purpose-built software. I like Otter because it lets you click on a section of text and hear the matching audio, and add descriptions at that point in the audio. But I’m sure there are others. The idea is to use software that makes it easier for you to add more meaning from your recording to your data.
(Another colleague told me after my talk that there is AI transcription software out there that CAN handle things like tone, laughter, pauses, even body language. My point remains – choose better software, not just the easiest and most obvious option. Make a deliberate choice!)
Another option is to code from the recording. Listen to or watch the recording while you’re coding, and use the transcript to make your notes – code meaning based on what you hear and see, not only on what’s written in the transcript. That lets you better understand the meaning that the person wanted to convey, because you’re not ignoring the tone and gestures.
You also don’t have to use a transcript. Controversial statement haha – I know some people who would have a problem with that. But I think that for some evaluations, just using note-taking to capture the meaning from an interview is ok. Notes are, again, a representation of what was said, but can sometimes capture more of the meaning. There are scenarios where it’s not appropriate or possible to record an interview, so you have to use notes anyway. And analysing notes is faster than analysing transcripts, so for lower-risk lower-resource evaluations, it can be a more proportionate approach.
I’m sure there’s other options, too.

AI transcription is really helpful.
And I’m not saying don’t use it.
For many evaluations, the AI transcript is perfectly serviceable, and will capture the data you need.
What I’m saying is to think more actively about what decisions the AI is making for you, and whether those decisions serve your evaluation purpose.
Active choices based in our evaluation purpose and evaluation questions get us better quality data, better quality analysis, and better quality findings.
So, let’s think it through, and make the right choices for our projects.
