Kyu Song

Hi, I'm Kyu (Hyoung-Kyu), a Member of Technical Staff at Mirage (formerly Captions), where I work on diffusion-based video generation and short-form video understanding. My work focuses on generative AI for video and multimodal content, with an emphasis on making large models faster, more efficient, and more useful in real-world products.

Previously, I worked on efficient text-to-image models, large language models, multimodal AI, and talking-face generation at South Korean startups including Nota AI and MAUM.AI. My work is mainly product-driven and engineering-driven, but I also value academic service and active engagement with the research community. I was honored to be recognized as a CVPR 2025 Outstanding Reviewer.

Kyu Song

Publications

Mirage paper thumbnail
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Captions Team
Technical Report, 2025

Introduces Mirage, a diffusion-based model that generates realistic talking-head video directly from audio input, enabling scalable A-roll video creation.

BK-SDM paper thumbnail
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
Bo-Kyeong Kim*, Hyoung-Kyu Song, Thibault Castells, Shinkook Choi
ECCV, 2024

Distills compact Stable Diffusion variants that achieve 30–50% faster and smaller, more cost-efficient inference without sacrificing image quality, including deployment on NVIDIA Jetson devices.

LD-Pruner paper thumbnail
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
Thibault Castells*, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi
CVPR Workshop (EDGE), 2024

Proposes a task-agnostic pruning method for latent diffusion models, achieving efficient compression without retraining on specific downstream tasks.

EdgeFusion paper thumbnail
EdgeFusion: On-Device Text-to-Image Generation
Thibault Castells*, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Chang-gwun Lee, Jae Gon Kim, Tae-Ho Kim†
CVPR Workshop (EDGE), 2024

Demonstrates subsecond text-to-image generation on smartphone by combining model compression techniques including knowledge distllation, step distillation, and quantization.

Shortened LLaMA paper thumbnail
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Bo-Kyeong Kim*, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song
ICLR Workshop (ME-FoMo), 2024

Presents a straightforward depth pruning strategy for LLMs that removes entire transformer layers, yielding smaller models with minimal performance loss.

Compressed Wav2Lip paper thumbnail
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation
Bo-Kyeong Kim*, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim†, Sungsu Lim†
ICCV Demo, 2023

Unifies pruning, quantization, and knowledge distillation into a single framework to compress talking-face generation models for real-time inference.

Multilingual Talking Face paper thumbnail
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo, 2022 (🤗 Hugging Face Prize from Gradio CVPR Event)

Combines multilingual text-to-speech with lip-synced face generation, enabling talking-face videos in multiple languages from a single text input.

Biometric identification paper thumbnail
Deep User Identification Model with Multiple Biometric Data
Hyoung-Kyu Song*, Ebrahim AlAlkeem, Jaewoong Yun, Tae-Ho Kim, Hyerin Yoo, Dasom Heo, Myungsu Chae, Chan Yeob Yeun†
BMC Bioinformatics 21, 315, 2020

Proposes a deep learning model that fuses multiple biometric signals for robust user identification, outperforming single-modality baselines.