Kyu Song

안녕하세요, 송형규입니다. Mirage (구 Captions)에서 Member of Technical Staff로 근무하며, 디퓨전 기반 비디오 생성과 숏폼 비디오 이해 관련 업무를 하고 있습니다. 비디오 및 멀티모달 콘텐츠를 위한 생성형 AI에 초점을 맞추고 있으며, 대규모 모델을 더 빠르고, 더 효율적으로, 그리고 실제 제품에 더 유용하게 만드는 데 주력하고 있습니다.

이전에는 Nota AI, MAUM.AI 등 한국 스타트업에서 효율적인 텍스트-이미지 모델, 대규모 언어 모델, 멀티모달 AI, 토킹 페이스 생성 분야에서 일했습니다. 주로 프로덕트 중심, 엔지니어링 중심으로 일하고 있지만, 학술 활동과 연구 커뮤니티 참여도 소중히 여기고 있습니다. CVPR 2025 Outstanding Reviewer로 선정된 것을 영광으로 생각합니다.

이 웹사이트는 멀티 에이전트 파이프라인을 통해 UI 개선 사항을 제안, 테스트, 구현하며 자동으로 개선됩니다.

송형규

Publications

Mirage paper thumbnail
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Captions Team
Technical Report, 2025

Introduces Mirage, a diffusion-based model that generates realistic talking-head video directly from audio input, enabling scalable A-roll video creation.

BK-SDM paper thumbnail
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
Bo-Kyeong Kim*, Hyoung-Kyu Song, Thibault Castells, Shinkook Choi
ECCV, 2024

Distills compact Stable Diffusion variants that achieve 30–50% faster and smaller, more cost-efficient inference without sacrificing image quality, including deployment on NVIDIA Jetson devices.

LD-Pruner paper thumbnail
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
Thibault Castells*, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi
CVPR Workshop (EDGE), 2024

Proposes a task-agnostic pruning method for latent diffusion models, achieving efficient compression without retraining on specific downstream tasks.

EdgeFusion paper thumbnail
EdgeFusion: On-Device Text-to-Image Generation
Thibault Castells*, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Chang-gwun Lee, Jae Gon Kim, Tae-Ho Kim†
CVPR Workshop (EDGE), 2024

Demonstrates subsecond text-to-image generation on smartphone by combining model compression techniques including knowledge distllation, step distillation, and quantization.

Shortened LLaMA paper thumbnail
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Bo-Kyeong Kim*, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song
ICLR Workshop (ME-FoMo), 2024

Presents a straightforward depth pruning strategy for LLMs that removes entire transformer layers, yielding smaller models with minimal performance loss.

Compressed Wav2Lip paper thumbnail
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation
Bo-Kyeong Kim*, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim†, Sungsu Lim†
ICCV Demo, 2023

Unifies pruning, quantization, and knowledge distillation into a single framework to compress talking-face generation models for real-time inference.

Multilingual Talking Face paper thumbnail
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song*, Sang Hoon Woo*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, Kang-wook Kim
CVPR Demo, 2022 (🤗 Hugging Face Prize from Gradio CVPR Event)

Combines multilingual text-to-speech with lip-synced face generation, enabling talking-face videos in multiple languages from a single text input.

Biometric identification paper thumbnail
Deep User Identification Model with Multiple Biometric Data
Hyoung-Kyu Song*, Ebrahim AlAlkeem, Jaewoong Yun, Tae-Ho Kim, Hyerin Yoo, Dasom Heo, Myungsu Chae, Chan Yeob Yeun†
BMC Bioinformatics 21, 315, 2020

Proposes a deep learning model that fuses multiple biometric signals for robust user identification, outperforming single-modality baselines.