A survey from the Motley Fool revealed some surprising—and, frankly, hard to believe—statistics about Americans’ use of the generative AI tool ChatGPT for financial advice. The study found that:
According to the study, the most important factors determining consumers’ use of ChatGPT to find financial products are: 1) the performance and accuracy of the recommendations; 2) the ability to understand the logic behind the recommendations; and 3) the ability to verify information the recommendation is based on.
However, the conclusions from a new Apple study might make consumers rethink using ChatGPT—and other generative AI tools—to get financial advice. And they should temper the plans of bank and credit union executives in using artificial intelligence to offer financial advice and guidance to consumers.
Generative AI (genAI) tools can do lots of amazing things, but, as a new report from researchers at Apple demonstrates, large language models (LLMs) have some troubling limitations with “mathematical reasoning.” The Apple researchers concluded:
“Current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data. When we add a single clause that appears relevant to the question, we observe significant performance drops (up to 65%) across all state-of-the-art models. Importantly, we demonstrate that LLMs struggle even when provided with multiple examples of the same question or examples containing similar irrelevant information. This suggests deeper issues in their reasoning processes that cannot be easily mitigated through few-shot learning or fine-tuning.”
A recent TechCrunch article documented some seemingly simple mathematical calculations that LLMs get wrong. The article states, “Claude can’t solve basic word problems, Gemini fails to understand quadratic equations, and Llama struggles with straightforward addition.”
Why can’t LLMs do basic math? The problem, according to TechCrunch, is tokenization:
“The process of dividing data up into chunks (e.g., breaking the word ‘fantastic’ into the syllables ‘fan,’ ‘tas,’ and ‘tic’), tokenization helps AI densely encode information. But because tokenizers—the AI models that do the tokenizing—don’t really know what numbers are, they frequently end up destroying the relationships between digits. For example, a tokenizer might treat the number ‘380’ as one token but represent ‘381’ as a pair of digits (‘38’ and ‘1’).”
Annoyingly, many people use the term “machine learning” when referring to regression analysis or some other form of statistical analysis. According to the University of California at Berkeley, machine learning has three components:
Regression analysis and most other forms of statistical analyses lack a model optimization process.
Here’s the real-world problem: While “investment” results are generally trackable, “spending” results are not. For the vast majority of people, however, how they spend is a bigger determinant of their financial performance than investing is.
The other challenge is that we don’t only spend to optimize our financial performance. We spend to optimize our emotional performance. How is a machine learning model going to track that?
The instructions needed to provide financial advice and guidance involve many “clauses.” In other words, the goals and objectives for establishing financial advice and guidance are not simple and straightforward—and it’s these complex questions and instructions that genAI tools are not good at (according to Apple).
Bottom line: Banks and credit unions shouldn’t rely on AI to provide financial advice and guidance—right now. Maybe someday, but not now, and not for another five, maybe 10, years. If vendors claim they’re using machine learning, ask them about their model optimization process. If they claim to have a large language model, ask them how it overcomes math computation limitations.