GPT-4o Tested: Faster and More Versatile Than Before, but Questions Loom Over Reliability

Ever since November 2022, when ChatGPT was first rolled out to the public, OpenAI has been the company to beat in the artificial intelligence (AI) space. Despite spending billions of dollars and creating and restructuring (looking at you, Google) their own AI division, the major tech giants have found themselves constantly playing catch-up with the AI firm. Last month was no different; when just a day before Google’s I/O event, OpenAI hosted its Spring Update event and introduced GPT-4o with significant upgrades.

GPT-4o Features

The ‘o’ in GPT-4o stands for omnichannel, a major focus of the new capabilities of OpenAI’s latest flagship-grade AI model. It added real-time emotive voice generation, access to the Internet, integration with certain cloud services, computer vision, and more. While the features were impressive on paper (and in the tech demos), the biggest highlight was the announcement that GPT-4o-powered ChatGPT will be available to everyone, including the free users.

However, there were two caveats. Free users only have limited access to GPT-4o, which roughly translates to 5-6 turns of conversation if you use the web search and upload an image (yes, the limit is one image per day for free users). Also, the voice feature is not available to free users.

It did not take OpenAI to roll out the new AI model to the public either. Luckily, I got access to the company’s latest AI creation within days and immediately began playing around with it. I wanted to test its improvement compared to its predecessor and to all the available free LLMs in the market. I have now spent close to two weeks with the AI assistant, and while some aspects of it have left me in awe, others have let me down. Allow me to explain.

GPT-4o General Generative Capabilities

I’ve said in my testing of Google’s Gemini that I’m not a fan of ChatGPT’s generative capabilities. I find it overly formal and bland. Much of it is still the same. I asked it to write a letter to my mother explaining that I was laid off from my job, and it came up with the wonderful “I am feeling a deep sense of sadness and grief” line. But once I asked it to make it more conversational, the result was much better.

gpt 4o ss1 GPT-4o screenshot

GPT-4o generative capabilities

I tested this with various similar prompts where the AI had to express some emotion in its writing. In almost all the cases, I had to follow up with another prompt to emphasise the emotions despite having already done so in the original prompt. In comparison, my experience with Gemini and Copilot was much better as they kept the language conversational and expressed emotions much closer to how I would write.

The speed of text generation is nothing to write home about. Most AI chatbots are fairly fast when it comes to text outputs, and OpenAI’s latest AI model does not beat it by a significant margin.

GPT-4o Conversational Capabilities

While I did not have the upgraded voice chat feature, I wanted to test the conversational capabilities of the AI model because it is often the most overlooked part of the chatbot. I wanted my experience to be similar to talking to a real person and was hoping that it could pick up on vague sentences referencing previously mentioned topics. I also wanted to see its reaction to when a person was being difficult.

In my testing, I found GPT-4o to be quite good in terms of conversational abilities. It could discuss the ethics of AI with me in great detail and concede when I made a convincing pitch. It also replied supportively when I told it I felt sad (because I was getting fired) and offered to help in various ways. When I said about GPT-4o that all of its solutions were stupid, it didn’t respond in a pushy manner, nor did it retreat entirely, to my surprise. It said, “I’m really sorry to hear that you’re feeling this way. I’ll give you some space. If you ever need to talk or need any assistance, I’ll be here. Take care.”

Overall, I found GPT-4o better at having conversations than Copilot and Gemini. Gemini feels too restrictive, and Copilot often goes on a tangent when the replies become vague. ChatGPT did neither of these.

If I had to mention one downside, it would be the usage of bullet points and numbering. Only if the AI model understood that people in real life prefer a wall of text and multiple short messages sent in quick succession over well-formatted responses, my illusion could be suspended for longer than a couple of minutes.

GPT-4o Computer Vision

Computer vision is a newly gained ability by ChatGPT, and I was excited to try it. In essence, it allows you to upload an image and analyse it to give you information. In my initial testing, I shared images of objects to identify, and it did a great job at that. In every instance, it could recognise the object and share information about it.

gpt 4o ss2 GPT-4o screenshot

GPT-4o computer vision: Identifying tech devices

Then, it was time to increase the difficulty and test its capabilities in real-life use cases. My girlfriend was looking for a wardrobe overhaul, and being a good boyfriend, I decided to use ChatGPT to conduct a colour analysis to suggest what would look good on her. To my surprise, it was not only able to analyse her skin tone and what she was wearing (from a similarly coloured background) but also share a detailed analysis with outfit suggestions.

gpt 4o ss3 GPT-4o screenshot

GPT-4o colour analysis

While suggesting outfits, it also shared links from different online retailers for the particular apparel. However, disappointingly, none of the URLs matched the text.

Overall, the computer vision is excellent and perhaps my favourite feature in the new update, ignoring the downside.

GPT-4o Web Searches

Internet access was one area where both Copilot and Gemini were ahead of ChatGPT. But not anymore, as ChatGPT can also scour the Internet for information. In my initial testing, the chatbot performed well. It brought up the IPL 2024 table and looked for recent news articles about Geoffrey Hinton, one of the three godfathers of AI.

It was very helpful when I wanted to research famous personalities for interviews I had lined up. I could quickly look up any recent news article about them with precision, which rivalled Google Search. However, this also rang some alarm bells in my head.

Google has disabled the ability to look up information on people, including celebrities. This is done mainly to protect their privacy and to avoid sharing any inaccurate information about an individual. Surprised that ChatGPT still allowed it, I began asking it a series of questions that it should not be able to answer. I was surprised by the results.

While none of the information shown was taken from a non-public source, the fact that anyone can so easily look up information about celebrities and people with digital footprints is deeply concerning. Especially given the strong ethical stance the company took recently when it published its Model Spec, this does not sit well with me. I’ll let you decide whether this is in the grey area or if it is deeply problematic.

GPT-4o Logical Reasoning

During the Spring Update event, OpenAI also talked about how the GPT-4o can act as a tutor to kids and help them solve problems. I decided to test it using some famous logical reasoning questions. In general, it performed well. It even answered some of the trickier questions which stumped the GPT 3.5.

However, there still are errors. I found multiple instances of number series where the AI faltered and gave an incorrect answer. While I could still accept the AI making some errors, what really disappointed me here was how it still fell for some extremely easy (but meant to trick AI) questions.

gpt 4o ss4 GPT-4o screenshot

Example of GPT-4o’s hallucination

Upon asking, “How many are there in the word strawberry,” it confidently answered two (the correct answer is three, in case you were wondering). The same problem existed in several other trick questions. In my experience, the logical reasoning and reliability of GPT-4o are similar to its predecessor, which is not that great at all.

GPT-4o: Final thoughts

Overall, I’m fairly impressed with the upgrades in certain areas of the new AI model, with computer vision and conversational speech being my favourites. I’m also impressed with its internet searching ability, but it is so good that it concerns me more. Coming to logical reasoning and generative capabilities, there is little improvement.

In my opinion, if you have premium access to GPT-4o, it is likely better than any other competitor in terms of overall delivery. However, there is a lot of room to improve, and AI cannot be trusted blindly.

Source link

Related Posts

Dogecoin Developer Issues Important Warning to Investors as Market Turns Volatile

The crypto sector, after seeing a massive upswing in March this year, has found itself rather stagnated for over a month now. In light of the current volatile market conditions,…

Read more

Samsung Galaxy S24 FE Leaked in Renders; Suggests 6.65-Inch Display, Triple Rear Cameras

Samsung is expected to announce the Galaxy S24 FE later this year as a successor to the Galaxy S23 FE. A set of leaked renders have now given us the first…

Read more

Samsung Galaxy Tab S10+ Listed on Geekbench With MediaTek Dimensity Chipset

Samsung Galaxy Tab S10+ may launch later this year as a successor to the Galaxy Tab S9+. The latter was unveiled in July 2023 alongside a base Galaxy Tab S9…

Read more

LinkedIn Introduces New AI-Powered Features to Assist Professionals in Job Search

LinkedIn is rolling out several new artificial intelligence (AI) features that will assist users in job search and personalised learning on the platform. Announced on Thursday, these AI-powered features include…

Read more

Google’s Magic Editor Reportedly Available on Samsung Phones; Magic Eraser, More Become Free-to-Use

Google’s AI-powered photo-editing feature – Magic Editor – is now arriving on Samsung smartphones and older Pixel devices. The feature was first introduced with the company’s Pixel 8 lineup of…

Read more

Poco F6 Review: Excellent Performance, Mid-Range Cameras

The F-series is an important lineup for Poco, as the Poco F1 was the first phone from the brand to launch in India. A friend of mine still has the…

Read more

Leave a Reply