When Zhang Zetian, the investor and socialite widely known as "Milk Tea Sister," published a filmed conversation with veteran actress Carina Lau this month, the episode instantly became a how-to manual for a new media phenomenon: video podcasts. Zhang’s channel on the audio app Xiaoyuzhou and the video cut posted to Xiaohongshu drew nearly 400,000 subscribers across platforms within days, a reminder that celebrity cachet can still jump-start attention in an increasingly crowded content market.
Her arrival is the latest in a string of high‑profile moves: entrepreneurs and hosts such as Luo Yonghao, Chen Luyu and talent manager Yang Tianzhen have all launched filmed discussion shows in the past year. Video podcasts are being championed by platforms including Bilibili and Xiaohongshu, which have rolled out creator funds, AI editing tools and monetization incentives. Those efforts have fuelled rapid consumption: Bilibili reported 259 billion minutes of video‑podcast viewing in Q1 2025, a more than 270% year‑on‑year rise and a user base topping 40 million.
For platforms the arithmetic is simple. Longform filmed conversations keep users on site for hours, and the demographic tilt—urban, higher‑earning listeners concentrated in first‑ and new‑first‑tier cities—matches advertisers’ sweet spot. An Ipsos industry report last year found that 45% of Chinese podcast listeners live in those urban tiers and 65% earn more than 8,000 yuan a month, a cohort more likely to buy cars, tech and luxury goods and to accept brand integrations within editorial content.
But the boom conceals sharp frictions. Video production multiplies costs: multi‑camera shoots, lighting, styling and editing demand teams and cash. Brands that signed annual contracts with production houses have in some cases paid termination fees after a single episode because costs proved hard to justify versus returns. Industry executives caution that virality is not a function of format alone; Luo Yonghao’s success, for example, is as much about his polarising persona as it is about video. Many independent audio creators lack the on‑camera presence, production chops or capital to scale successfully.
The format’s creative promise is nevertheless real. Video adds visual credibility and helps humanise guests—viewers say filmed conversations can change perceptions of entrepreneurs and companies—and it creates new native advertising formats that audio alone cannot deliver. Yet early data also show most consumption remains auditory: a majority of episodes are listened to rather than watched end‑to‑end, meaning creators must still optimise for voice and pacing even when investing in video.
The medium’s future will depend on a handful of structural outcomes. Platforms must balance subsidies with sustainable ad and sponsorship markets; brands need clearer metrics for long‑term value; and creators must decide whether to specialise in audio, invest in video, or attempt both. Absent those adjustments, the current rush risks producing a crowded middle where only a handful of host‑driven IPs and well‑funded institutional producers capture outsized attention and revenue.
For international observers, the episode is notable for what it reveals about Chinese digital media dynamics: a maturing market where attention is monetised via longer formats and where platforms increasingly compete for affluent, influential users. The experiment will test whether longform conversation can be the next big conduit for persuasion, commerce and cultural influence in China’s fast‑evolving content economy.
