Natural language processing hits and misses: Learnings from our alpha launch

In February and March, we trialed an alpha version of an AI-enabled coaching system with twenty users. Our system provided the ability to define custom worksheets supplemented with natural language processing capabilities to help people make sense of their responses to worksheet items. We experienced common challenges that you might face when you seek to get real value fast out of natural language processing models.

As a startup of two people, we don’t have time or resources to spend weeks and months exploring our datasets and trying out different algorithmic approaches, as a data science team at a bigger company could. Furthermore, we didn’t even start with any data! Were we crazy to think we could quickly get some quick value out of NLP? No, I don’t think we were. But we learned through our alpha that there’s a need for better infrastructure and tools for companies that want to quickly build and release NLP-powered features into their applications.

How AI can help coaches and counselors

Current evidence-based counseling techniques, and many derivative life coaching techniques, use structured interactions that can be implemented as worksheets. For example, cognitive behavior therapy often involves use of a thought record that helps the client identify and reframe negative or dysfunctional thoughts. Motivational interviewing, a popular technique often used in substance use disorders, often starts with values exploration, so that the client can determine what is most important to his or her before deciding how to address their presenting issue.

In contrast to digital mental health approaches that use chatbots to reproduce a therapeutic conversation with a human, we centered our interactions on worksheets, with conversation around the respondent’s answers to worksheet items. We hypothesized that our AI could be most helpful by summarizing, distilling, and evaluating the worksheet answers, by keeping the object of the conversation separate from the conversation itself.

Text classification few-shot learning

For example, we wanted our AI to determine what high-level personal value a respondent might be reflecting when they talked about what they did and didn’t want in their life. This could be specifically mentioned in the text, as below:

It is important to me to have financial security.

Or, the value could be latent, not specifically mentioned:

I need my job to provide me with enough money to support my children without any worry.

We thought we could build a quick text classification model to do this, with only a few examples. This is sometimes known as “few-shot learning.”

Using Amazon Comprehend with a few examples that we wrote ourselves, we were able to come up with a model, but it didn’t make good suggestions. We also trained our own multiclass-classification model using naive Bayes. It didn’t work much worse than the Comprehend model, and it was free to run, once we successfully got it deployed to AWS.

I could imagine a better way to do it, a way that bootstraps from zero examples, just starting with a lexicon:

  • Start out with a lexicon of values, e.g. Autonomy, Wealth, Learning, etc
  • If a value appears in an item response, tag the response with it
  • Once you have enough data, train a text classification model with the tagged data, dropping out the actual word from the text input

I’m not sure how well that would work, but I’d like to try it on the data we now have from our alpha. This seems like something that could work for many NLP cases where you want to apply some specific text label to text but aren’t starting with a lot of labeled data. In the counseling and coaching space, for example, you might want to apply Feeling/Emotion labels and Action labels in addition to Values.

So our attempt to do Value Suggestions was largely a fail. But what wasn’t a fail was…

Sentiment analysis for cognitive reframing

The creator of Cognitive Behavior Therapy, Aaron T Beck, realized that when people take overly negative views of their situations, their emotional life suffers. The idea of tracking and disputing automatic negative thoughts is a key foundation of CBT.

Turns out computers are pretty good at identifying when a statement is negative, positive, or neutral. Existing sentiment analysis routines aren’t perfect, but they are good enough to provide some feedback to a person on whether they’re letting negativity drive their response to a particular situation.

Using an out-of-the-box sentiment analysis capability from the Python nltk library, we implemented a feature that rated the positivity or negativity of a response to a worksheet question. We created a worksheet where you could describe a situation, then reframe your description to make it more positive, more negative, or neutral.

To make this work, the hardest part about it was actually deploying nltk to AWS in order to call the sentiment classification routine on incoming responses. This felt to me like where ML and NLP should be going: use off-the-shelf components to build useful guidance, without having to do any model training or evaluation yourself.

We got good feedback from our alpha participants about this feature. I found it useful myself, to explore the different ways I could think about a particular situation. To me, it seems healthiest to frame a situation as neutrally as possible. This technique is used by life coach Brooke Castillo to good effect in her coaching model. No need to play Candide, and paint things are rosier than they are! If you can neutralize your understanding of a situation, then you are in a good position to be able to flexibly think about it.

What’s next

Like most startups, we took on too much. Trying to build out an AI-enabled coaching/counseling system that involved (1) back-end REST services (2) a React front-end and (3) useful AI was too much. Frankly, the AI suffered because we were so busy building the front-end React app and back-end Flask services. To narrow our scope, one option would be to focus on the coaching and counseling system first, building out worksheets without AI-enablement. Alternatively, we could delve further into ML/NLP/AI infrastructure and possibilities we found lacking.

The Venn diagram overlap of my interests, cofounder John’s interests, and what the world needs appears to be in the ML/NLP/AI infrastructure and MLops space, so that’s where we are going to aim now.

We’re planning a focus on MLops and easy AI/ML/NLP, with the intention of building out the infrastructure and tools we didn’t find when we wanted to get immediate use out of NLP without having to start with a massive dataset, without having to do a lot of exploratory analytics, and without having to train a bunch of models ourselves. We also discovered that no standard way existed for creating a feedback loop from our application back into the dataset preparation and model training tasks. Not sure there can be a standard way, or what tools might support this, but we are going to experiment and find out!

(See my post on an idealized machine learning development lifecycle for the importance of such feedback loops).

Is this unreasonable to think that putting effective ML capabilities into use could be as easy as building a REST back-end service with a React front-end, and deploying it in the cloud? I don’t think so. The components exist already. The world of natural language processing is especially ripe for this kind of thing, given the existence of pretrained language models like GPT-3 and Amazon Comprehend. There’s no reason for businesses to have to find or generate massive fully labeled datasets then await lengthy data science research projects to complete as prerequisites for getting value from NLP.

We will likely continue to build our AI for mental health that can learn about Feelings, Thoughts, Actions, Motivations, and other concepts important in the coaching and counseling space, but more as an interesting domain use case rather than as something we can productize and market. We’ll likely be investigating other domains too. Our mission is, for the moment, to enable fast AI/ML/NLP success, using off-the-shelf components in the cloud. We hope to make it possible for smart software and data engineers to implement NLP-based capabilities, without getting a data scientist involved. Of course, data scientists will still be needed for the fanciest domain-specific capabilities. But we think there’s lots of low-hanging fruit to be picked by engineering teams that don’t include data scientists.

Sign up for our newsletter to stay up to date with our progress:

If you want to learn about how you can work with us as a partner or client, get in touch!