It seems you can’t escape having a conversation today without the mention of generative AI. But how accurate is the information coming out of generative AI algorithms?
While the origins of generative AI date back to the 1960’s, the interest has exploded in the past two years. Generative AI is built on large language models (LLMs) and companies are taking different approaches as to how they leverage data, analyze, and present it.
That’s where many of the similarities end. Interestingly, the results can vary depending on what you ask and how you ask it. Even asking the same question can yield varied results. The point is that there is variability even beyond delving into prompt engineering.
Taking generative AI for a spin and seeing the varied results
I asked both OpenAI’s ChatGPT and Google Bard the same question: “Who are the top cloud providers ranked?” You would think that this is a fairly pedestrian question. Yet the results had variance in them.
OpenAI’s ChatGPT response started with a caveat that it does not have access to realtime data or access to the latest rankings. However, here is ranked list as of September 2021.
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud Platform (GCP)
- IBM Cloud
- Alibaba Cloud
- Oracle Cloud
When I asked Google’s Bard the same question, it stated that the results were from 2023 and referred to results based on data from Gartner.
- Amazon Web Services (AWS) 34%
- Microsoft Azure 22%
- Google Cloud Platform (GCP) 9.5%
- Alibaba Cloud 7%
- Oracle Cloud Infrastructure (OCI) 4%
- IBM Cloud 3.5%
- Tencent Cloud 3%
- Salesforce 2.5%
- Digital Ocean 2%
- VMware Cloud on AWS 1.5%
What’s interesting is that using the same question, Bard provided more context, details and a source for the data. This is important as trust in data is more critical than ever. Also note the shift in ranking for IBM Cloud compared with Alibaba Cloud and Oracle Cloud. As a reminder, the same exact question was asked of both services.
Even within a single provider, the results can vary. I asked very similar questions to Google’s new Search with AI and here are the results.
I then waited 5-10 minutes and asked a similar question and got yet a slightly different result….and format.
Note that within a single provider, the rankings are similar, but the percentages are slightly different. One could argue that 1% or 0.5% isn’t much to argue about and ‘close enough’. However, when you are talking enterprise data and certain industries, 1% can make a big difference.
What can we learn from these examples?
Does this mean that generative AI is not ready for production yet? No. It can be and should be evaluated. At the same time, understand a few things before leveraging the results.
First, know your data. Understanding the source is key to building trust. Even if a link to the source data is not provided, mentioning the source(s) provides context and confidence.
Second, understand that variability exists. As these models evolve, so will the results that come from them. You may have to ask the question multiple times to correlate a more complete answer.
Lastly, and probably the most important point is that we are still in the experimental phase. While the potential opportunities from generative AI are massive, we still need to spend more time understanding how best to leverage these new tools.
Does this mean we should not use or trust these tools? No, we should use them. They should be exercised and leveraged. But as with most technology, know where best to leverage it. Getting guidance, creating boiler plate content sound like safe options. Using output to make critical business decisions or to provide granular details is probably less of a good idea for now unless you can correlate the data and insights.