How will AI change the world of work?

1 Feb

TLDR: This post looks at AI and how it will impact roles and processes within organisations in the near future.

3d render of robots - generated by DALL-E-2

I have a small confession to make. The image you can see at the top of this post, and all the images elsewhere on this substack have been generated using DALL-E 2, a generative image model from Open AI. The way it works is you offer it a prompt for an image and it makes its best guess at what that image might look like. In this case I wrote “3d render of robots having a discussion” and this was one of 4 near-instantly generated responses.

This is probably not the first time you’ve heard of Open AI. The company is more famous for its AI chatbot feature, Chat GPT, which I spent many hours playing around with in December. The human-like chat feature is driven by a large language (LLM) model which responds to text prompts in remarkable, seemingly intelligent and often unexpected ways. It can write poetry, recipes, meeting agendas and functioning code.

While this may seem like magic, it isn’t. These models are simply using your prompt to predict the most likely response that it thinks you expect, based on its training data. It doesn’t actually understand the prompt or the response it’s providing you. This means it has a tendency to make up things, and write them in a way that sounds plausible - a problem we’ll discuss more later on. However, despite the lack of “magic”, this technology is game-changing and the degree to which lay people can access this technology is even more game-changing.

OpenAI’s decision to make their models open to the public is only the start. Google is set to release a suite of consumer-facing deep learning tools later this year and the broader AI hypecycle is well underway. However, this particular kind of technology is expensive. Microsoft has been pouring money into OpenAI to expand their partnership in a (somewhat strangely structured) deal that values the start-up at around $29bn. These deep learning models take a lot of time to train and are expensive to run. So we’ll likely only see a few big players in the space currently occupied by OpenAI (Microsoft) and DeepMind (Google).

However, these enormous multi-purpose models are only one side of the AI revolution. More esoteric, boring, single-task-oriented AI models are likely to be the source of greater disruption: Categorising objects and images, making predictions and decisions about a specific outcome, optimizing routes or supply chains - these are use cases where the output is less “fuzzy” and models are typically selecting one option from a well-defined, constrained set of options. These task oriented models are easier to implement in large companies, require much fewer resources to train and can produce actionable outputs.

But, despite my strong interest in this topic I don’t write about technology trends nor am I an expert in machine learning. Instead, I want to use this post to think through some of the organisational implications of widespread AI adoption. While it might not be clear exactly how companies will adopt these tools or the pace of adoption, even after an hour of playing with chat GPT it becomes clear that this technology has the potential to disrupt a wide array of jobs.

So we’re going to look at a 2 key questions that organisations will have to face when adapting to this new AI-enabled world:

What will the roles of the future look like?
What will be the changes in how we work?

What will the roles of the future look like?

Many worry that their knowledge sector jobs will be replaced by models such as GPT3.5 (the LLM that underpins Chat GPT), but in the past technological innovation has typically not replaced roles - instead low value work is automated or simplified and our effort shifts to the work that humans are better suited at. Being more productive normally doesn’t lead to fewer jobs, it almost always leads to more. Similarly, it seems likely that these LLMs will be good at “augmenting” humans rather than replacing them. As mentioned earlier, they have a tendency to offer plausible but false outputs. This can be extremely damaging in some settings - imagine press releases with false information or shipping crucial code that is riddled with bugs. With LLMs you still need an expert to check the code that these models produce or to edit the content that they generate. But the productivity of these people will be much higher than it is today. Github’s AI “copilot” feature describes this augmentation well: copilot allows software engineers to spend less time on the boring and repetitive parts of their job so that they can “focus on solving bigger problems”.

One outcome of augmentation is the number of people you need doing the same role is likely to be smaller for the same amount of output. Or put another way, companies will be able to produce more features for the same number of engineers. The last decade has seen the bargaining power of software engineers rise rapidly as every company raced to digitize and the war for talent become more heated. But we may see a reversal of that trend. OpenAI has recently hired a massive cohort of contractor developers whose sole job will be to help train GPT4 (the next generation of GPT model) to write fully functioning code in response to simple tickets. Most large software development teams run on a system of tickets, acceptance criteria and overarching sprint goals. For basic tasks, if we can systematise how we write requirements and tickets, it may be possible to totally replace the need for an engineer to write the original code and simply rely on an engineer to review the code prior to pushing to testing / production. More complex problems will still require a human to solve, but features like co-pilot may still speed up aspects of that process. The well defined nature of some engineering problems suggests they may benefit more from augmentation than designers or product managers who have to manage more complex, messy “human” problems (e.g., strategic work, stakeholder management or user interviews). If engineers are easier to augment than designers or product managers, technology companies may start to shift their workforce mix to meet this new reality - fewer, more experienced engineers and more product managers and designers. Microsoft’s Satya Nadella has pointed to features like Github’s copilot as the next industrial revolution, and like the last industrial revolution the long term consequence is likely more jobs and more value creation. But problems around employee mix may be a short term challenge for many firms to overcome.

Some jobs, however, are at more risk of wholesale replacement. We mentioned categorisation, prediction and optimisation models earlier - it’s these use cases where the reliability of AI output is likely to be a lot higher. We’ve already seen studies where ML models outperform radiologists in identifying lung cancer, and AI models are carrying out high frequency stock trades by reading real-time news feeds. These examples are typically closed-loop learning environments. Each individual task gets immediate feedback which can then be fed back into the model to reinforce learning. The person flagged for cancer risk does a more detailed test like a biopsy and the model can be told if they flagged them correctly or not. The stock index goes up or down as the news gets digested by the market as a whole which can feed back into the model. In these environments where feedback is fast, well-defined and attributable to specific model outputs, ML models are very difficult to beat. However, not all categorisation situations provide closed-loop feedback. In some instances (open-loop situations) that feedback is noisy and inconsistent - e.g., predicting how much someone will comply with a diet plan or categorising loans into different risk tranches. These are environments where the time to train models is higher and there is a higher cost to retrain. Closed-loop learning environments will be the main area where jobs could be replaced, whilst open-loop environments will be a mix of augmentation and task-specific replacement.

As mentioned, augmentation and replacement both imply that the mix of jobs will shift over the next few years. Companies may need to think very carefully about their strategic workforce planning to understand where they expect fewer roles in the long term and where they expect more roles. One additional consideration is if there are totally new roles that companies will need that don’t exist at all today. Two immediate examples come to mind:

Prompt Engineers
AI use case designers

Prompt engineering is the art / science (delete as appropriate) of writing prompts for models to get you the output which most effectively meets your needs. It’s particularly relevant for LLMs like chat GPT, because inputs that we think are similar can produce outputs which look very different. There’s a good summary of some of the key principles of prompt engineering in this link, but a key part of the role involves an understanding of how these models are trained. Prompt engineers will be able to leverage that understanding to exert more control over the outputs. I won’t spend too long outlining the intricacies of how to craft prompts. Suffice to say, writing prompts is a skill. It takes technical understanding and it takes hands-on practice with these models. Being really good at writing prompts can massively level up how effectively you can use these LLMs. I imagine that “pairing” a technical expert with a prompt engineer could massively accelerate that technical expert’s learning curve when it comes to these models. After say 3 months of side-by-side work, that expert will be much better at using AI and the prompt engineer can move to the next expert. This hands-on experiential learning is a much more effective way of training your staff than a big (and likely expensive) classroom learning program. Prompt engineers won’t just be a vehicle for training - they will also be “swiss army knives” of written output. Flexible resources that can be deployed to boost productivity in a given area. This new role is already emerging - Anthropic AI is planning on paying someone over $250k per year to be a prompt engineer and they aren’t the only ones.

The second role that I see being increasingly important is that of a use case designer. There are two ways of trying to find a good use case for AI in a large company: top-down or bottom-up. Top-down logic involves someone working out which processes are most valuable to “digitise” and then telling a team to work out how to automate as much of that process as possible. This often involves people (sometimes expensive consultants) identifying which elements of the process are highly manual and seeing which steps can be removed or replaced. In 2023, many companies have already identified the “low hanging fruit” of these processes and they are already digitised. And in most cases I’ve worked on, the solution involved little to no machine learning. Instead someone builds software that connects two unwieldy systems together. Some of these top-down use cases are going to be more suited to AI, but they tend to be narrow task-focused examples. In general, the top-down approach means you build the model after you’ve found the use case. Top down use cases are great for narrow ML problems such as the categorisation models we discussed earlier.

On the other hand when it comes to LLMs such as GPT3.5, these models are really expensive to train. Thankfully, companies like OpenAI or Deepmind have done the work for you. As a result, you have to find the use cases after the model has been built. This lends itself to bottom-up use case identification. The model’s already there, now you need to go to your teams and find out how they can use it. Going forward, more and more of the value unlocked from AI will come from these bottom-up use cases. Some companies such as Meta have been trying to do this bottom-up process for a while (this article is an excellent deep-dive into how Facebook set up its AI team and unlocked value). But there are two peculiarities of this Facebook example: first, the data scientists involved in the use case discovery have to keep explaining how ML works to everyone they meet; and second, the people who were building and training the models were also the people who were doing the discovery work. The first of these problems (understanding and awareness about AI) is one that may naturally fade away over time as people get more experience and formal training in the area. The second phenomenon, data scientists doing the discovery work, feels like a short-term, in-between solution to the problem of finding good bottom-up sourced use cases for AI. ML engineers are smart people, but their skillset is typically in building models, cleaning data, training those models and deploying them. It seems odd that we are typically asking these people to also do discovery work. In most other domains of software, we have designers or user researchers doing the discovery work. They have experience in journey mapping, ethnographic research and effective interviewing. It feels like it won’t be long before we start seeing similar people taking roles with titles like an “AI use case designer”. However in the short term it might not feel obvious for companies to create roles like this. First, you need people with the right background who also have enough AI understanding to spot smart use cases where others cannot. Second, you need companies willing enough to spend enough money on design expertise which they know will be deployed internally rather than on their customers. Forward thinking companies can use roles like these to increase adoption of AI and reduce time-to-value, but dedicated roles require scale so you’ll probably only see these roles in larger companies (the very same companies who are currently going through rounds of layoffs).

How will we change how we work?

We’ve had a look at which roles will be augmented, which will be replaced and which will be created but there will also be changes in how people do these roles. New processes will be required and the importance we attribute to different skills will change. We’ve looked at some of the features of AI, but three are particularly important to how we work:

The cost to try out new ideas becomes much lower
The output from AI will look “plausible” regardless of quality
The process an AI model uses is often opaque (especially true of deep learning models)

The first of these features - the ability to deliver a “good enough” product very very quickly - means companies and teams within companies can accelerate early testing of ideas. Imagine we need to write up documentation to describe a process for new people joining your team. This can be a long, painful process of boring administrative write up work. The output will be a document that was difficult to write and even more difficult to read and absorb. In most cases, teams don’t even do onboarding or handover documentation. Instead, they use the time to be productive and just hope for the best when it comes to new team members. In a new AI enabled world that process instead could look like: first, you write the key bullet points about your process; second, you ask an LLM like Chat GPT to convert these bullets into 3 options for a 15 minute video script designed to be easily understood; third, you use AI-enabled software like synthesia.io to make a realistic narrated video; fourth, you bring all three versions to different new people and you ask them to rank which one is easiest to understand. All in all, you can probably plan, create, edit, test and refine this whole process in less than half a day. If you’re good at writing prompts, it’ll take even less time. Right now, if they do it at all then this process might take someone in your onboarding team at least two or three days (especially if you want a video at the end of it). Importantly, the marginal time to create three options for this video is near zero once you’ve already decided to make one version. The output is probably not perfect, but it’s almost certainly good enough. AI in this case acts as an “accelerator” and as a “multiplier”. You can deliver more versions of an output more quickly. This will be true of many other tasks, allowing people to pilot ideas much earlier and testing multiple versions of that idea. This “launch fast” mentality will also require you to “fail fast” when ideas aren’t successful. You can read my previous post to learn more about these quitting decisions. But this launch fast, fail fast world may have a similar impact on non-technical output as agile approaches originally had in the world of software development. One reason I used this particular example was to highlight the power of stringing a few of these tools together. We are likely to see new workflows emerging where “working smart” with the tools will lead to non-linear increases in how much output you can generate. Managers and companies will need to think hard about how they incentivise staff to strike the right balance between efficiency and quality, when the opportunity for faster work is so large.

The second feature of AI worth thinking about is the “human-like” quality of the output. This is great in terms of usability and opens up a huge possibility of new use cases. However, just because the output sounds like a human doesn’t mean the process used to generate it is similar to one we would use. As mentioned earlier, the plausibility of the output might obfuscate its quality. We will need to get much better at spotting inaccuracies and editing than we are today. We have a wide array of biases which hinder our fact-checking ability and the more “human-like” the output sounds the more likely we are to miss inaccuracies. As a result, we will need to develop new processes to review and test AI-generated output (ideally in an automated way). One solution that schools are considering is using AI detection software to spot which parts of text may be generated by Chat GPT. A second consequence of the “human-like” quality of AI output, is that we may not know if we are receiving information from a human or not. This can create trust issues both within companies and between companies and their customers. Humans hate complying with algorithmic processes (research on algorithm aversion) and interactions with chatbots are less popular than live chatting with a human or picking up the phone. Both of these preferences mean companies may have an incentive to be opaque around whether you are talking to a bot or a human. This can create trust issues, which companies need to be sensitive of when choosing which processes to hand off from people.

Trust issues compound in situations where ML models are providing predictions or categorisation decisions in sensitive settings. AI prediction processes are often black boxes - where each decision may be the result of thousands of factors and true explainability may be hard to achieve. For example, if a bank makes credit decisions using an AI risk assessment model, it still has to provide an explanation to a credit rating agency when it rejects a loan application. If the model it used is opaque, it will be difficult to provide a true explanation of how it came to that decision. One option is to use a post-hoc explanation classifier to “infer” a reason for the rejection based on counterfactual examples (i.e., answering questions like “what income would you have needed for the model to accept the loan application?”) . But this form of explanation is narrow and doesn’t really answer the underlying questions about the process used to make the prediction. In any event, the true explanation (if we could ever peek into the black box) would be messy, and organisations have to make decisions on how much to simplify without removing the value of explainability. In highly regulated environments this risk can be managed, but what happens when these predictions are made internally. What if we are predicting whether to put someone through to an interview based on their CV? What are the incentives here to provide an accurate and fair explanation? Without a clear regulatory process, explanations will become more and more simplified and at some point these explanations may even be “just for show”.

These explanations will matter most if things go wrong, and there will definitely be moments where things will go wrong. There have already been countless examples of AI models exhibiting bias in recruitment processes (example 1, example 2, example 3) and examples of other systematic biases will emerge as more use cases develop. While we can try to do everything within our power to avoid these mistakes, bias in training data and bias in how we interact with models are difficult to mitigate. The problems are even bigger for some of the LLMs which are ingesting huge amounts of biased written text as part of their training data. This opens up the key question: what should we do when something like this goes wrong? The importance of early reporting and post-mortem learning will increase. Models being black boxes mean that the signals for when things go wrong might be noisier and more difficult to spot. This could mean we need a lower threshold for reporting issues or more clear gudiance around explanability and legal liability. Teams will need high amounts of psychological safety to be willing to raise red flags before bigger systemic problems emerge. Companies using machine learning will need ethics and responsibility teams, similar to the team that was shut down by Facebook. Open AI and DeepMind are open about some of these ethical challenges, but each individual company will have to grapple with their own issues. We don’t yet understand what effective governance in this AI world might look like. We might see companies reporting near-misses in AI decisions (similar to oil and gas companies) or individual team members making anonymous reports to an ombudsperson in their organization. New processes will need to emerge to preempt failures and respond once they do occur. These changes in company processes can either be proactive - as companies look ahead to the risks and opportunities of AI - or reactive - in the aftermath of a mistake. The latter approach comes with significant reputation risk for leaders, whilst the former requires leaders who are thoughtful and willing to learn.

Final thoughts

The widespread accessibility of LLMs has sparked many to speculate about how AI will impact our lives and in particular our jobs. While these tools are more sophisticated than many we’re used to using, they are ultimately still just tools. We have to be deliberate about how they are used and what the consequences are of using them. Organisations will respond by changing job mix and introducing new processes, but we should be conscious that (at least the current generation of tools) are still fairly naive. They require hand-holding and someone needs to check their homework. Like with all technological changes, the most valuable skill will be our ability to learn quickly and adapt. AI literacy will be increasingly important if you want to unlock value but management approach will also be key. Being flexible and open to allowing your employees to experiment will lead to faster adoption and creating the right environment where people can flag ethical concerns should be a priority (before something goes wrong). Regardless of which trends dominate in the next few years, we will see this technology impact our jobs in a significant way.

Thanks for reading - and for reference none of this article was written by an AI

Isar Bhattacharjee

How will AI change the world of work?

Talking 'bout my generation

When should we commit, grit or quit?

Uncover Consulting