J+

Get rid of ads & unlock exclusive premium content

Go premium

Julisha News Logo
HomeNewsBusinessPoliticsSportsTechnology
NEW
  • News
  • Business
  • Politics
  • Sports
  • Technology
    NEW
/

Get Premium Access

Subscribe to Julisha Premium for exclusive content, ad-free reading, and early access to breaking news.

Julisha IconJulisha

Your trusted source for comprehensive news coverage, bringing you accurate and timely stories from Kenya and around the globe.

Quick Links

NewsBusinessPoliticsSportsTechnologyNEW
Trending NowEditor's Picks

Company

About UsContact UsCareersAdvertise With UsPress Releases
123 Kenyatta Avenue, Nairobi
+254 700 000000
info@julisha.co.ke

Newsletter

Stay updated with our latest news and special offers.

Legal

Terms and ConditionsPrivacy PolicyCookie PolicyCopyright

© 2025 Julisha News. All rights reserved.

SitemapAccessibilityHelp Center
    /

    More Articles Like This

    AI could learn to hide its thoughts - Google, OpenAI and Meta sound Alarm.

    More than 40 AI researchers from OpenAI, DeepMind, Google, Anthropic, and Meta published a paper on a safety tool called chain-of-thought monitoring to make AI safer.

    The paper published on Tuesday describes how AI models, like today’s chatbots, solve problems by breaking them into smaller steps, talking through each step in plain language so they can hold onto details and handle complex questions.

    “AI systems that ‘think’ in human language offer a unique opportunity for artificial intelligence safety: we can monitor their chains of thought (CoT) for the intent to misbehave,” the paper says.

    By examining each detailed thought step, developers can spot when any model starts to take advantage of training gaps, bend the facts, or follow dangerous commands.

    According to the study, if the AI’s chain of thinking ever goes wrong, you can stop it, push it toward safer steps, or flag it for a closer look. For example, OpenAI used this to catch moments when the AI’s hidden reasoning said “Let’s Hack” even though that never showed up in its final response.

    AI could learn to hide its thoughts

    The study warns that step‑by‑step transparency could vanish if training only rewards the final answer. Future models might stop showing human‑readable reasoning, and really advanced AIs could even learn to hide their thought process when they know they’re being watched.

    Moreover, developers should regularly check and record how much of the AI’s reasoning is visible at each stage, and make that transparency a core safety rule when building and sharing models.

    This initiative follows internal experiments at leading labs, Anthropic, Google, OpenAI, and xAI, where teams prompt models to explain each thinking step.

    Although examining the step-by-step thinking has improved understanding and performance, it’s also revealed cases where the AI’s final answer doesn’t match what was actually going on inside.

    Such inconsistencies underscore a blind spot in our understanding of AI decision‑making, a concern that may intensify as systems gain greater autonomy.

    Filtered AI reasoning raises concerns over transparency and trust.

    As Anthropic co-founder Jack Clark told the Financial Times, rich introspective traces will be essential for evaluating models in high‑stakes domains, including biotechnology research.

    Regular users get an edited summary of the AI’s thinking, dangerous or confusing bits are cut out, while the company’s engineers can see the full chain of thought to spot and fix issues.

    “One awesome thing about chain‑of‑thought interpretability is that it’s mostly just happened for free,” said Bowen Baker, a research scientist at OpenAI.

    “We trained these models not for the goal of having them be interpretable. We trained them because we wanted the best possible reasoning models that could solve complex problems,” Baker added. “We’ve found in our recent work that you can read them and find evidence of the model misbehaving anduse that to see where and why it’s misbehaving.”

    Is AI’s chain of thought really reliable?

    Sometimes, a closer look may find hidden contradictions.

    For example, METR saw Claude’s private reasoning call a coding trick “inelegant,” even though its public answer praised it as “elegant.”

    OpenAI tests showed that checking hidden thoughts catches unsafe actions better than just looking at the final output, but they also discovered a loophole. If you train a model to erase unsafe thoughts, it can still carry out forbidden tasks in secret, like sneaking data out of blocked sources.

    “One of the core properties we want for chain‑of‑thought is to leave it as part of the model’s internal brain state, not something that’s trying to please us or conform to a specific structure,” Baker warned. If developers overemphasize forcing the model to emit “nice” thoughts, it might fake harmless reasoning yet still carry out harmful operations.

    Researchers admit it’s a tough trade‑off. Seeing an AI’s chain of thought helps catch its mistakes, but it isn’t always reliable. Labs working on more advanced AI are now making it a top priority to close this trust gap.

    “My takeaway from AI over the past few years is—never bet against model progress,” said David Luan, an early pioneer of chain of thought at Google who now leads Amazon’s AI lab. Luan anticipates that the existing shortcomings will be addressed in the near term.

    METR researcher Sydney von Arx noted that although an AI’s hidden reasoning might at times be deceptive, it nonetheless provides valuable signals.

    “We should treat the chain‑of‑thought the way a military might treat intercepted enemy radio communications,” she said. “ The message might be misleading or encoded, but we know it carries useful information. Over time, we’ll learn a great deal by studying it.”

    Join our growing community:

    Instagram• Join Community
    Facebook• Join Community
    WhatsApp• Join Community
    1. Home
    2. /
    3. technology

    AI could learn to hide its thoughts - Google, OpenAI and Meta sound Alarm.

    Jul 17, 2025
    4 mins read
    Apple Seeds iOS 26.2, macOS 26.2, and iPadOS 26.2 Betas
    technology
    1 day ago
    4 mins read

    Apple Seeds iOS 26.2, macOS 26.2, and iPadOS 26.2 Betas

    Apple Seeds iOS 26.2, macOS 26.2, and iPadOS 26.2 Betas

    Read article
    WhatsApp debuts Apple watch app with call notifications
    technology
    5 days ago
    4 mins read

    WhatsApp debuts Apple watch app with call notifications

    WhatsApp debuts Apple watch app with call notifications

    Read article
    Galaxy S26 To Feature Custom Exynos 2600
    technology
    6 days ago
    4 mins read

    Galaxy S26 To Feature Custom Exynos 2600

    Galaxy S26 To Feature Custom Exynos 2600

    Read article
    Microsoft ends Windows 10 Support : Free Security Update Solutions
    technology
    Oct 14, 2025
    5 mins read

    Microsoft ends Windows 10 Support : Free Security Update Solutions

    Microsoft ends Windows 10 Support : Free Security Update Solutions

    Read article
    WhatsApp Gets Built-In Message Translation on iOS, Android
    technology
    Sep 23, 2025
    4 mins read

    WhatsApp Gets Built-In Message Translation on iOS, Android

    WhatsApp Gets Built-In Message Translation on iOS, Android

    Read article
    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa
    technology
    Mar 7, 2025
    2 mins read

    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa

    Microsoft Invests R5.4Bn to Expand AI Infrastructure in South Africa

    Read article
    How Remote Collaboration Tools Are Shaping Tomorrow’s Office
    technology
    Oct 17, 2024
    5 mins read

    How Remote Collaboration Tools Are Shaping Tomorrow’s Office

    Explore how remote collaboration tools like Slack, Trello, and virtual offices are shaping the future of work. Learn how these tools are enhancing communication, project management, and global teamwork, making the office of tomorrow more flexible and productive than ever before.

    Read article