Animal World: A Cross-Species Dataset for Social Animal Pose Understanding

Scene-centric, real-world animal imagery with rich ecological complexity — multi-animal interactions, inter-species co-occurrence, age diversity, and rare species.

Xuyi Hu1* Jin Lyu2* Shaojie Zhang1 Ke Ma1 Houtianfu Wang1 Siwei Liu1
Jiuming Liu1 Silvia Zuffi4 Jiachen Zhao3 Liang An3† Stefan Goetz1
1University of Cambridge 2Southern University of Science and Technology 3Tsinghua University 4IMATI-CNR, Milan, Italy

*Equal contribution Corresponding author

Animal pose understanding in the wild remains challenging due to large morphological variation, frequent occlusion, and complex social interactions. Existing benchmarks are often limited to particular species, sparse pose definitions, or isolated individuals. Animal World introduces a unified 30-keypoint taxonomy and large-scale annotations spanning 124 species in natural environments, enabling evaluation of pose estimation under socially complex, scene-centric conditions.

10,036 Labelled images
18,703 Annotated instances
124 Species covered
30 Unified keypoints

Overview

From single animals to crowded social groups — intra-species herds, cross-species scenes, age-diverse groups, rare species, and complex social behaviors. Animal World shifts the focus from isolated animal pose localization to scene-centric social pose understanding.

Overview of Animal World showing single instances, intra-species groups, cross-species interactions, age diversity, rare species, and complex social behaviors
Overview of Animal World. Representative source images with instance-level pose annotations across diverse ecological scenarios.

Cross-Species Interaction

409 cross-species images covering 94 unique species combinations. Animals with different body sizes, shapes, and textures appear in the same scene, requiring robust instance association under heterogeneous morphology.

Age Diversity

1,918 age-diverse images including juveniles, adults, and mother–offspring pairs.
Enables evaluation of age-sensitive pose understanding beyond simple scale or depth changes.

Complex Social Behaviors

Chasing, fighting, grooming, mating, parental care, group movement, and collective foraging — poses shaped by interaction rather than independent motion alone.

Animal World was collected from 400+ 1080p animal videos. Frames are sparsely sampled with a large temporal stride to reduce near-duplicates and increase diversity across poses, species, viewpoints, and social configurations. Each visible animal is annotated independently with a unified 30-keypoint taxonomy, together with instance-level segmentation masks and bounding boxes derived from SAM 3D.

Keypoint Annotation Protocol

A unified 30-keypoint anatomical taxonomy for quadrupeds and primates, indexed from 0 to 29.

Visualization of the 30 animal keypoints on a quadruped
Keypoint visualization on a representative animal instance.
ID Name ID Name
0Left eye15Right forelimb wrist
1Right eye16Left hind-limb ankle
2Lower jaw17Right hind-limb ankle
3Left forefoot18Neck midpoint
4Right forefoot19Tail tip
5Left hind foot20Left ear base
6Right hind foot21Right ear base
7Tail root22Left mouth corner
8Left forelimb elbow23Right mouth corner
9Right forelimb elbow24Nose tip
10Left hind-limb knee25Tail midpoint
11Right hind-limb knee26Anterior back
12Left upper forelimb27Middle back
13Right upper forelimb28Posterior back
14Left forelimb wrist29Abdomen midpoint

Benchmark Results

Representative pose-estimation backbones evaluated on Animal World reveal substantial challenges under cross-species variation, social interaction, and occlusion.

Effect of social-scene supervision on Animal World.
Training Data Single Intra-species Group Cross-species Social Full Eval
AP mAP AP mAP AP mAP AP mAP AP mAP
Animal World w/o Social 95.1 64.0 84.6 50.0 89.1 59.2 83.3 48.1 88.1 54.6
Animal World Social-only 93.1 54.5 93.2 57.7 89.7 65.1 93.2 59.5 92.9 57.3
Animal World Full 95.3 70.3 92.5 65.8 91.3 70.7 92.6 66.1 91.1 67.4
Gain over w/o Social +0.2 +6.3 +7.9 +15.8 +2.2 +11.5 +9.3 +18.0 +3.0 +12.8
Effect of age-diverse supervision on juvenile animal pose estimation.
Training Data Juvenile-only Family Age-diverse Full Eval
AP mAP AP mAP AP mAP AP mAP
Animal World w/o Age 91.9 53.5 90.9 50.8 91.1 51.4 93.3 58.0
Animal World Age-only 92.7 54.9 93.5 53.7 93.6 53.9 92.3 52.8
Animal World Full 98.0 64.8 94.1 66.8 94.3 65.9 91.1 67.4
Gain over w/o Age +6.1 +11.3 +3.2 +16.0 +3.2 +14.5 -2.2 +9.4
Downstream fitting evaluation on age-diverse animals.
Keypoint Source Juvenile-only Family
Reproj. Err. Mask IoU Failure Reproj. Err. Mask IoU Failure
Animal World w/o Age 4.6 88.0 3.6% 6.8 87.7 15.4%
Animal World Age-only 5.1 88.0 1.2% 6.9 87.7 15.7%
Animal World Full 4.7 88.1 0.0% 6.6 87.8 12.6%
Δ Full − w/o Age +0.1 +0.1 -3.6 -0.2 +0.1 -2.8

Comparison with Existing Datasets

Animal World provides denser keypoints, broader multi-animal coverage, explicit cross-species co-occurrence, social behavior cases, and age-diverse scenes.

Dataset Images Instances Species KPs Intra Mixed Social Rare Age
Animal Pose4,6666,117520
StanfordExtra20,58012,000120
AP-10K10,01513,0285417
Animal Kingdom33,09933,09985020
APT-36K36,00053,0063017
Animal3D3,4003,4004026
Animal World (ours)10,03619,12912430

Citation

@inproceedings{animalworld2026,
  title     = {Animal World: A Cross-Species Dataset for Social Animal Pose Understanding},
  author    = {Hu, Xuyi and Lyu, Jin and Zhang, Shaojie and Ma, Ke and Wang, Houtianfu and Liu, Siwei and Liu, Jiuming and Zuffi, Silvia and Zhao, Jiachen and An, Liang and Goetz, Stefan},
  year      = {2026}
}