A product management process for data science

You can’t use standard product management workflows for AI/ML development. That’s because in advance of looking at the data and testing out some modeling approaches, you don’t know if you can actually build the thing you’d like to build.

There is a common dysfunction that happens with data science projects: starting with a likely dataset rather than a business problem to solve. A data science team, or sometimes a lone data scientist, will be pointed at a production database or data warehouse and let loose to create something of value. The data scientist will find some labeled data, build a supervised machine learning model, and share how great it is. “I’ve got 85% accuracy!” Or better… “I’ve got an F1 score of .75!”

High precision and recall or not, this is no way to ensure you make business impact with your data science efforts. Instead, you need to start by finding the overlap between high value business problems to solve and what AI/ML approaches can do for you. This is akin to when a person is figuring out their career. You need to find the overlap between market need (a business problem that if solved provides a lot of value) and your skillset (in this case the hammer and nails of AI/ML models). Don’t just “do what you love” because the money may not follow. With data science, don’t just model the nearest labeled dataset, because you might not have anything useful to do with the resultant output.

Here’s the process I typically use for data science product management:

Start with a business problem to solve that is likely to be improved upon by optimizing or automating the decisions involved. Data science models typically work best for optimizing repeated decisions that happen very regularly not for giant strategic business decisions. Think about where humans are making snap judgements in a couple seconds or less. This is where artificial intelligence (=AI/ML or data science) works best.
Write a one-to-two page pitch that describes the problem to solve and how an ML model that provides some sort of predictions can solve it. (I’m simplifying here, because there are many clever ways to assist humans or automate formerly human-driven tasks with ML that aren’t simply using predictions or classifications).
Crucal step: Vetting. Before going any further, have a data scientist take the pitch, gather some data, build a preliminary model, and see if this project is remotely feasible.
Now you’re ready for treating this like a standard software development process. Write a requirements document, have a machine learning engineer with the data scientist develop the technical design, then have the team implement it.
Now your other teams can incorporate it into your systems as features to improve upon the business processes.
Release it!
You’re not done. Now you need to monitor it in production, watch for drift, plan for and implement automated retraining, etc.

There’s a lot more to say about selecting the business problem to address and clever ways of reformulating such problems in a way that AI/ML systems can be used to solve them. Ensuring you orient yourself around a business problem not a labeled dataset is a critical starting point though.

Discover more from incantata.ai