AI Guide by Zaiq

What is true about AI

AI in 2026: what is actually true, for South African business

There is a lot of noise about AI, and most of it is either a sales pitch or a panic. We are two third-year Electrical and Information Engineering students at Wits who went all-in on using the sharpest AI on earth to solve business problems, so we read the benchmarks for a living. This page is the sober version: what is genuinely true in 2026, what each headline number means, and what a South African business owner should do about it. Every claim has its source next to it.

The short version

The models are far more capable than the average business is using them. AI now resolves over 70% of verified real-world software bugs (SWE-bench Verified), up from about a third in mid-2024. On GPQA Diamond, a set of graduate-level science questions, the top models score over 85% where domain-expert PhDs score about 65%. In 2025 Google DeepMind’s system was officially graded at gold-medal standard at the International Mathematical Olympiad. These are not press releases. They are measured results.

And yet MIT found in 2025 that 95% of corporate generative-AI pilots showed no measurable return. Both things are true. The capability is real and most rollouts are poor, because a genuinely powerful tool pointed at a vague problem with no owner produces a slide deck, not a result.

The benchmarks that matter, in plain language

A benchmark is just a standardised exam for AI models. Most of the scary or exciting headlines trace back to one of these. Here is what each actually tests and what it tells you.

The AI benchmarks that matter in 2026, and what each one means
BenchmarkWhat it testsThe 2026 numberWhat it means for you
SWE-bench VerifiedFixing real software bugs from GitHubOver 70% solved (about a third in mid-2024)AI can do real engineering work, not just toy demos
GPQA DiamondGraduate-level science questionsTop models 85%-plus vs about 65% for expert PhDsOn hard reasoning, the ceiling is now above human experts
IMO (Maths Olympiad)Elite competition mathematicsGold-medal standard (DeepMind, 2025)Frontier reasoning is real; it is not just autocomplete
MIT pilot studyDid corporate AI projects pay off?95% showed no measurable return (2025)The tool is not the problem; aim and ownership are
BCG growth studyDo AI leaders outperform?About 1.7x faster revenue growth (2025)Using AI well is now a measurable competitive edge

Sources cited in-text. Benchmarks are evidence the ceiling moved, not a promise about your specific task. Always test on your own work.

We break each of these down properly, including where they quietly mislead, in the AI benchmarks explained for a normal person.

What this means for a South African business

The numbers above are global. The local picture sharpens them. McKinsey (2025) estimated generative AI could add 61 to 103 billion US dollars a year across Africa. At the same time, 84% of large South African corporates say they cannot find the critical skills they need (Xpatweb, 2025). That is the whole opportunity in two lines: the value is enormous and the people who can capture it are scarce.

For a business owner, three things follow from the evidence:

  • The capability is real, so stop waiting for proof. SWE-bench and GPQA settle the “can it actually do useful work” question. It can.
  • The average rollout fails, so aim narrowly. The 95% that got nothing (MIT, 2025) almost always tried “AI for the business” instead of “AI for this one expensive, repetitive task”.
  • The edge is competitive, not optional. A 1.7x growth gap (BCG, 2025) compounds. The cost of sitting it out is not zero; it is your rivals pulling ahead.

What AI is genuinely good at, and what it is not

AI is strong on work that is repetitive, language-heavy, and checkable: drafting, coding, summarising long documents, classifying, extracting data, answering the same customer questions, and searching at a speed no human matches. It is weak where judgement, accountability, and high cost-of-error live. A model graded at gold-medal standard in maths (DeepMind, 2025) still cannot be trusted to run your payroll unsupervised, because a benchmark rewards being right on a clean task and your business punishes being confidently wrong on a messy one.

The practical test is simple. If a task is repetitive, mostly language or data, and a human can quickly check the output, AI is probably ready for it today. If the task needs accountable judgement or a wrong answer is expensive and hard to catch, keep a human in the loop. We walk through where the old way of working has genuinely been overtaken, and where it has not, in what AI just made obsolete.

Why “two engineers and AI” now beats a big team

The most underrated 2026 fact is an economic one. When the tooling does the heavy lifting, output stops scaling with headcount. WhatsApp was sold for 19 billion US dollars in 2014 with 55 employees; the leverage that made that possible is now in the tools themselves. A small, technical team pointing frontier AI at a problem can out-ship a large, layered one, because most of a big team’s cost is coordination and markup, not the work. We put real numbers on that in two engineers and AI versus an agency.

That is the bet we made. Two engineers who actually ship, using the best AI on earth on a fixed price: that is Zaiq.

Where to go next

In this guide

Questions people ask

What can AI actually do well in 2026?

Narrow, verifiable, language-heavy work. AI now resolves over 70% of real software bugs (SWE-bench Verified) and beats expert PhDs on graduate science questions (GPQA Diamond, 85%-plus versus about 65%). It drafts, codes, summarises, classifies, and searches at superhuman speed. It is weakest at judgement, accountability, and anything where being confidently wrong is expensive.

Is AI overhyped in 2026?

The capability is not; the corporate results often are. MIT found 95% of company AI pilots in 2025 showed no measurable return. The gap is not the model, it is pointing a genuinely capable tool at a vague problem with no owner. The benchmarks are real and the average rollout is poor at the same time.

Will AI replace jobs in South Africa?

It replaces tasks faster than whole jobs. The sharper risk for an SA business is competitive: BCG (2025) found AI leaders grow revenue about 1.7x faster than laggards. AI does not replace your business. A competitor who uses it well can take your customers while you debate whether it is hype.

What AI benchmark should a business owner actually care about?

None of them directly, and all of them as a signal. SWE-bench Verified (real bug fixes) and GPQA Diamond (expert-level reasoning) tell you the ceiling is high. What matters for you is whether a specific task in your business is the kind AI is now good at: repetitive, language-heavy, and checkable. If it is, the benchmarks say it is worth doing.

How big is the AI opportunity for African business?

McKinsey (2025) estimated generative AI could add 61 to 103 billion US dollars a year across Africa. At the same time, 84% of large South African corporates say they cannot find the skills they need (Xpatweb, 2025). The opportunity is large and the talent to capture it is scarce, which is the whole story.

Where do AI benchmarks mislead people?

They measure a clean, scored task, not your messy one. A model that fixes 70% of curated bugs will not fix 70% of yours unmonitored, and a gold medal at the Maths Olympiad (DeepMind, 2025) does not mean it can run your accounts. Treat a benchmark as proof the ceiling moved, then test on your own work.