You know, I write a lot about techniques and algorithms, and give you code for how to do things. I want to take a step back to talk about strategic thinking. This is one of the most difficult and tricky subjects to teach about data science. It is often neglected in the data scientist’s training. Without it, you are doomed to stay in the world of tactical daily grind of data science work.
Today, I want to talk about strategy, and strategic thinking around data science. It is kind of a weird topic, it is going to feel squishy. It is not squishy, but VITAL. A data science position is a strategic position within an organization. Don’t forget that. Here are 3 questions to focus on to help hone your strategic thinking muscles.
1. Is this Algorithm Supportable?
I think that this is a really interesting question that has to be asked any time that you are doing a data science project in an actual business. There are technologies out there. There are more technologies out there for doing data science than any one company can support. Take some time to think about the algorithm as living within your organization’s current ecosystem. Does xgboost make sense? Can the organization support that kind of complexity? Are there regulations that make black box models less attractive?
I’ve learned this the hard way. Some organizations are not mature enough to do that. Maybe a simple linear model is what your business needs. It is better than nothing, and it is often the difference between success and failure of a project is just the complexity of the model that you are using. A good engineer can replicate a linear model in a production environment in an afternoon, but may need a week to a month to implement a neural network. Think about it.
Also, why would you ever want to code in R? While perhaps that is the technology that your company was built on. Doing something in python may not make sense in this world. Also, as your data science program matures, you need to think about standardizing your technology stack. This will make your business run smoothly. You will be able to deliver cleanly, and you will be able to make things work. Debugging will be a smaller and smaller slice of your business.
In short, think about the poor guy who’s job it is to maintain what you are building.
2. Why do this with an algorithm?
A person who has a brain, that is trained to do a task is a neural net. By definition that is what a brain is. So we know that because a human can learn to do something, there must exist an algorithm that can do it too. Algorithms are cheap, and scalable. Humans are expensive, and don’t scale well. Therefore, we should replace all the humans, right?
Wrong.
Humans, are wonderful creatures. One of the benefits that you get from a human is plasticity. They are far more adaptive than an algorithm. An algorithm is also myopic. It can only see what you tell it is relevant. It can not contextualize. For example, an algorithm that looks at historical data will never be able to tell you that something doesn’t pass the smell test for the current environment, if it doesn’t have data about the environment. Think lending decisions. Someone that looks great on paper, that in good times you wouldn’t hesitate lending to, might be a terrible risk in a bad environment. But without environmental data, an algorithm will fail.
Always use a human when you need a decision maker to contextualize data. An algorithm will fall apart under changing conditions, but a human will be able to adjust their mental model by seeing the fear in another human’s eyes at the grocery store. A model can’t do that, it can’t see data that you don’t explicitly feed it.
Think about what you are doing.
3. How does this move the needle?
What are the company’s objectives? Where is the company going? Does my data science project get us closer or further from those objectives?
I’m talking about building on major strategic objectives. This is the essence of strategic thinking. We are trying to get plant a flag. I like to think about google in this case. In a world where they dominated the search space, an engineer created gmail. It lined up with the strategic initiatives of the company. It created touch points and loads of free form text to mine. It also created a new avenue to monetize.
Think about your data science projects in that respect. Does this create an avenue for monetization. Does it align with what the company is trying to do? Are we getting closer to the objective or is this a detour?
If the objective is to improve the company’s return on assets, ask yourself, does my data science project increase revenue, decrease cost, or improve our asset base? It could be that your project creates digital assets that can be monetized. Those digital assets could also be such that they can be reported on a balance sheet. It is hard to say, but think of the company’s main goal for the year. Build projects that move that needle.