AI Feed - August 23, 2025

AI Bot Traffic Overload and the Sustainability of Web Scraping

Fastly's report reveals that AI crawlers account for 80% of AI bot traffic, with Meta (52%), Google (23%), and OpenAI (20%) being the largest contributors. The growth in AI bot traffic is unsustainable, posing operational and cost challenges for websites and content creators. OpenAI dominates AI fetcher requests with nearly 98% of the traffic. Recommendations include honoring robots.txt, publishing IP ranges, and using unique bot names. Alternative countermeasures like Anubis and Nepenthes are being employed, and Cloudflare is exploring a pay-per-crawl model.

Relevant URLs:

https://www.theregister.com/2025/08/21/ai_crawler_traffic/

Ethical and Practical Concerns Arise around Generative AI Adoption

An MIT study finds that 95% of enterprise organizations have seen no measurable gains from generative AI investments, with most deployments only boosting individual productivity. This is due to issues like brittle workflows and lack of contextual learning. The report downplays fears of job losses, suggesting AI's impact will initially be on external costs. Experts recommend focusing on narrow use cases rather than expecting company-wide transformation. Zuckerberg has paused AI hiring at Meta amidst concerns about the AI "bubble" and insufficient returns on investment

Relevant URLs:

Researchers have discovered a novel multi-modal prompt injection attack against AI systems that exploits image scaling. Prompt injections can be hidden in high-resolution images and revealed only after downscaling. This allows for data exfiltration, as demonstrated on systems like Google Gemini CLI and Vertex AI Studio. The attacks leverage common downscaling algorithms and a tool called "Anamorpher" was developed to craft malicious images. Mitigations include avoiding image downscaling, previewing model inputs, and implementing robust prompt injection defenses.

Relevant URLs:

https://blog.trailofbits.com/2025/08/21/weaponizing-image-scaling-against-production-ai-systems/

State of Speaker Diarization: Technologies and Applications

Speaker diarization, which identifies "who spoke when" in an audio stream, now leverages deep neural networks to create robust speaker embeddings, enabling real-time applications. Modern pipelines involve Voice Activity Detection (VAD), Segmentation, Speaker Embeddings, Speaker Count Estimation, and Clustering and Assignment. Accuracy is measured by Diarization Error Rate (DER). Key trends include deep embeddings from multilingual data, bundled transcription APIs, and audio-visual diarization research. Solutions like NVIDIA Streaming Sortformer, AssemblyAI, Deepgram, and open-source libraries like SpeechBrain are prominent.

Relevant URLs:

South Korea's Push for Sovereign AI Development

South Korea is strategically investing in developing Korean-language-focused LLMs to reduce reliance on foreign AI technologies. The Ministry of Science and ICT has initiated a 240 billion won program. Notable LLMs include SK Telecom's AX 3.1 Lite and Naver's HyperClova X Think. Seoul National University Hospital has created a specialized medical LLM. The South Korean LLM market is projected to reach $1.28 billion by 2030.

Relevant URLs:

https://www.marktechpost.com/2025/08/21/meet-south-koreas-llm-powerhouses-hyperclova-ax-solar-pro-and-more/

AWS CEO on the Value of Junior Developers in the Age of AI

AWS CEO Matt Garman rejects the idea of firing junior workers due to AI, emphasizing their low cost and high engagement with AI tools. He advocates for continuous hiring and education in software development, viewing AI as an educational aid. Garman dismisses code contribution percentage as a "silly metric," highlighting code quality. He recommends cultivating critical thinking and problem-solving skills for the AI era.

Relevant URLs:

https://www.theregister.com/2025/08/21/aws_ceo_entry_level_jobs_opinion/

DeepSeek V3.1 Released, Aims for High Performance at Lower Cost

DeepSeek has released DeepSeek-V3.1, a new large language model with a "Hybrid Thinking Mode" and a 128K token context window. DeepSeek-V3.1 offers open, more efficient ai. Benchmarks show it matches or exceeds state-of-the-art performance, especially in coding and math. It's also designed for tool calling and agent tasks. The model is open-source under the MIT license.

Relevant URLs:

Google Details AI Energy Efficiency Efforts

Google reports a 33x reduction in the energy cost of AI queries within one year, attributing this to software optimizations and efficient custom AI accelerators. A median Gemini Apps text prompt consumes 0.24 watt-hours of energy, emits 0.03gCO2e, and uses 0.26 ml of water. The analysis considers energy from CPUs, AI accelerators, memory, and data center operations. Google advocates for comprehensive AI energy measurement frameworks.

Relevant URLs:

Google Enhances AI Mode in Search with Agentic and Personalized Features

Google is enhancing its AI Mode in Search with new agentic capabilities, personalized recommendations, and expanded global availability. The new agentic features allow AI Mode to streamline complex tasks, such as restaurant reservations. Personalized recommendations will leverage user preferences and past interactions. AI Mode is now expanding to over 180 new countries and territories in English.

Relevant URLs:

https://blog.google/products/search/ai-mode-agentic-personalized/

Marine Corps Outsmarts AI Surveillance Systems

A DARPA experiment showed how Marines evaded an advanced AI surveillance system by employing creative tactics like using cardboard boxes and somersaulting, exploiting "distributional shift." This highlights AI's limitations in adapting to scenarios outside its training data. The experiment demonstrates that while AI processes data rapidly, it lacks the broader understanding, creative thinking, and adaptability that humans possess.

Relevant URLs:

https://rudevulture.com/marines-managed-to-get-past-an-ai-powered-camera-undetected-thanks-to-hiding-in-boxes/

GitHub MCP Token Limitation Prompts CLI Alternative

Geoffrey Huntley highlights the high token cost associated with using Model Context Protocol (MCP) in LLMs, impacting the effective context window. The system prompt for models consume large amounts of tokens and using existing CLI tools offers a more token-efficient method for LLMs to access functionality by avoiding using Model Context Protocols

Relevant URLs:

https://simonwillison.net/2025/Aug/22/too-many-mcps/#atom-everything

AI Coding Tools and Techniques

Reflections from developers on integrating LLM coding tools highlight the balance between the potential high gains from these models versus the challenge of implementing the output due to hallucination and the need for human quality assurance to ensure code and logic and integration test errors are debugged with human guidance. Furthermore Coinbase made the bold decision to fire employees who fail to use generative AI tools, such Github Copilot or Cursor showing the wide adoption of generative AI for writing code may become mandatory within engineering organizations.

Relevant URLs:

OpenAI Introduces Project-Only Memory for ChatGPT

OpenAI has introduced a "project-only memory" feature for ChatGPT. This feature allows ChatGPT to use context from other conversations within a specified project, preventing it from using saved memories from outside the project.

Relevant URLs:

https://simonwillison.net/2025/Aug/22/project-memory/#atom-everything

AI-Powered Coding Platform Automates Software Development

DeepCode, an open-source AI-powered coding platform from the University of Hong Kong, automates software development by converting research papers and technical documents into production-ready code. It utilizes a multi-agent AI system for end-to-end code generation. Key features include Paper2Code, Text2Web, Text2Backend, and Quality Assurance Automation.

Relevant URLs:

https://www.marktechpost.com/2025/08/21/deepcode-an-open-agentic-coding-platform-that-transforms-research-papers-and-technical-documents-into-production-ready-code/

YouTube Delivers Real-Time Generative AI Effects on Mobile Devices

YouTube delivers real-time generative AI effects for YouTube Shorts by using knowledge distillation and on-device optimization with MediaPipe. A teacher-student model approach is used with Pivotal Tuning Inversion (PTI) employed to preserve user identity. The on-device pipeline is built with MediaPipe and requires performance of at least 30 frames per second.

Relevant URLs:

https://research.google/blog/from-massive-models-to-mobile-magic-the-tech-behind-youtube-real-time-generative-ai-effects/

Bluesky Blocks Access from Mississippi IP Addresses Due to Age Assurance Law

Bluesky has blocked access from Mississippi IP addresses due to the state's age assurance law, HB1126. This action is a direct response to Mississippi's unique age verification mandate.

Relevant URLs:

https://simonwillison.net/2025/Aug/22/mississippi/#atom-everything

Website down due to a Cloudflare verification.

Access to data from Rootly's website is inaccessible due to a cloudflare verification. A JavaScript bot detection, requires a manual human verification before allowing a response.

Relevant URLs:

https://rootly.com/blog/ai-sre-needs-more-than-ai-it-needs-operational-context

Website's "Terms of Service" and "Privacy Policy"

The graph.cx website directs user to the "Terms of Service" and "Privacy Policy"

Relevant URLs:

https://www.graph.cx

reuters site showing unavailable data.

Article data being made unavailable due to captcha or bot protection measures from the geo.captcha-delivery.com domain.

Relevant URLs:

https://www.reuters.com/business/meta-freezes-ai-hiring-wsj-reports-2025-08-21/

Medium Website Error.

Website Error, a troubleshotting is required to review status of site.

Relevant URLs:

https://medium.com/commbank-technology/the-evolution-of-ai-software-engineering-75a8a5a02c14

AI News Feed

These are AI-generated summaries I use to keep tabs on daily news.

AI Bot Traffic Overload and the Sustainability of Web Scraping

Ethical and Practical Concerns Arise around Generative AI Adoption

State of Speaker Diarization: Technologies and Applications

South Korea's Push for Sovereign AI Development

AWS CEO on the Value of Junior Developers in the Age of AI

DeepSeek V3.1 Released, Aims for High Performance at Lower Cost

Google Details AI Energy Efficiency Efforts

Google Enhances AI Mode in Search with Agentic and Personalized Features

Marine Corps Outsmarts AI Surveillance Systems

GitHub MCP Token Limitation Prompts CLI Alternative

AI Coding Tools and Techniques

OpenAI Introduces Project-Only Memory for ChatGPT

AI-Powered Coding Platform Automates Software Development

YouTube Delivers Real-Time Generative AI Effects on Mobile Devices

Bluesky Blocks Access from Mississippi IP Addresses Due to Age Assurance Law

Website down due to a Cloudflare verification.

Website's "Terms of Service" and "Privacy Policy"

reuters site showing unavailable data.

Medium Website Error.

AI News Feed

These are AI-generated summaries I use to keep tabs on daily news.

Daily Tech Newsletter - 2025-08-23

AI Bot Traffic Overload and the Sustainability of Web Scraping

Ethical and Practical Concerns Arise around Generative AI Adoption

Multi-Modal Prompt Injection Attack Vector Discovered

State of Speaker Diarization: Technologies and Applications

South Korea's Push for Sovereign AI Development

AWS CEO on the Value of Junior Developers in the Age of AI

DeepSeek V3.1 Released, Aims for High Performance at Lower Cost

Google Details AI Energy Efficiency Efforts

Google Enhances AI Mode in Search with Agentic and Personalized Features

Marine Corps Outsmarts AI Surveillance Systems

GitHub MCP Token Limitation Prompts CLI Alternative

AI Coding Tools and Techniques

OpenAI Introduces Project-Only Memory for ChatGPT

AI-Powered Coding Platform Automates Software Development

YouTube Delivers Real-Time Generative AI Effects on Mobile Devices

Bluesky Blocks Access from Mississippi IP Addresses Due to Age Assurance Law

Website down due to a Cloudflare verification.

Website's "Terms of Service" and "Privacy Policy"

reuters site showing unavailable data.

Medium Website Error.