Animal World: A Cross-Species Dataset for Social Animal Pose Understanding

Scene-centric, real-world animal imagery with rich ecological complexity — multi-animal interactions, inter-species co-occurrence, age diversity, and rare species.

Xuyi Hu^1,2* Jin Lyu^3* Shaojie Zhang² Ke Ma² Houtianfu Wang² Siwei Liu²

Jiuming Liu² Silvia Zuffi⁵ Jiachen Zhao⁴ Liang An^1† Stefan Goetz²

¹Nanjing University ²University of Cambridge ³Southern University of Science and Technology ⁴Tsinghua University ⁵IMATI-CNR, Milan, Italy

^*Equal contribution ^†Corresponding author

Animal pose understanding in the wild remains challenging due to large morphological variation, frequent occlusion, and complex social interactions. Existing benchmarks are often limited to particular species, sparse pose definitions, or isolated individuals. Animal World introduces a unified 30-keypoint taxonomy and large-scale annotations spanning 124 species in natural environments, enabling evaluation of pose estimation under socially complex, scene-centric conditions.

10,036 Labelled images

19,129 Annotated instances

124 Species covered

30 Unified keypoints

Overview

From single animals to crowded social groups — intra-species herds, cross-species scenes, age-diverse groups, rare species, and complex social behaviors. Animal World shifts the focus from isolated animal pose localization to scene-centric social pose understanding.

Cross-Species Interaction

409 cross-species images covering 94 unique species combinations. Animals with different body sizes, shapes, and textures appear in the same scene, requiring robust instance association under heterogeneous morphology.

Age Diversity

1,918 age-diverse images including juveniles, adults, and mother–offspring pairs.
Enables evaluation of age-sensitive pose understanding beyond simple scale or depth changes.

Complex Social Behaviors

Chasing, fighting, grooming, mating, parental care, group movement, and collective foraging — poses shaped by interaction rather than independent motion alone.

Animal World was collected from 400+ 1080p animal videos. Frames are sparsely sampled with a large temporal stride to reduce near-duplicates and increase diversity across poses, species, viewpoints, and social configurations. Each visible animal is annotated independently with a unified 30-keypoint taxonomy, together with instance-level segmentation masks and bounding boxes derived from SAM 3D.

ID	Name	ID	Name
0	Left eye	15	Right forelimb wrist
1	Right eye	16	Left hind-limb ankle
2	Lower jaw	17	Right hind-limb ankle
3	Left forefoot	18	Neck midpoint
4	Right forefoot	19	Tail tip
5	Left hind foot	20	Left ear base
6	Right hind foot	21	Right ear base
7	Tail root	22	Left mouth corner
8	Left forelimb elbow	23	Right mouth corner
9	Right forelimb elbow	24	Nose tip
10	Left hind-limb knee	25	Tail midpoint
11	Right hind-limb knee	26	Anterior back
12	Left upper forelimb	27	Middle back
13	Right upper forelimb	28	Posterior back
14	Left forelimb wrist	29	Abdomen midpoint

Name

Left eye

Right forelimb wrist

Right eye

Left hind-limb ankle

Lower jaw

Right hind-limb ankle

Left forefoot

Neck midpoint

Right forefoot

Tail tip

Left hind foot

Left ear base

Right hind foot

Right ear base

Tail root

Left mouth corner

Left forelimb elbow

Right mouth corner

Right forelimb elbow

Nose tip

Left hind-limb knee

Tail midpoint

Right hind-limb knee

Anterior back

Left upper forelimb

Middle back

Right upper forelimb

Posterior back

Left forelimb wrist

Abdomen midpoint

Effect of social-scene supervision on Animal World.
Training Data	Single	Intra-species Group	Cross-species	Social	Full Eval
Animal World w/o Social	95.1	64.0	84.6	50.0	89.1	59.2	83.3	48.1	88.1	54.6
Animal World Social-only	93.1	54.5	93.2	57.7	89.7	65.1	93.2	59.5	92.9	57.3
Animal World Full	95.3	70.3	92.5	65.8	91.3	70.7	92.6	66.1	91.1	67.4
Gain over w/o Social	+0.2	+6.3	+7.9	+15.8	+2.2	+11.5	+9.3	+18.0	+3.0	+12.8

Effect of social-scene supervision on Animal World.

Training Data

Single

Intra-species Group

Cross-species

Social

Full Eval

mAP

Animal World w/o Social

95.1

64.0

84.6

50.0

89.1

59.2

83.3

48.1

88.1

54.6

Animal World Social-only

93.1

54.5

93.2

57.7

89.7

65.1

93.2

59.5

92.9

57.3

                  Animal World Full
                  95.3
                  70.3
                  92.5
                  65.8
                  91.3
                  70.7
                  92.6
                  66.1
                  91.1
                  67.4
                

Gain over w/o Social

+0.2

+6.3

+7.9

+15.8

+2.2

+11.5

+9.3

+18.0

+3.0

+12.8

Effect of age-diverse supervision on juvenile animal pose estimation.
Training Data	Juvenile-only	Family	Age-diverse	Full Eval
Animal World w/o Age	91.9	53.5	90.9	50.8	91.1	51.4	93.3	58.0
Animal World Age-only	92.7	54.9	93.5	53.7	93.6	53.9	92.3	52.8
Animal World Full	98.0	64.8	94.1	66.8	94.3	65.9	91.1	67.4
Gain over w/o Age	+6.1	+11.3	+3.2	+16.0	+3.2	+14.5	-2.2	+9.4

Effect of age-diverse supervision on juvenile animal pose estimation.

Training Data

Juvenile-only

Family

Age-diverse

Full Eval

mAP

Animal World w/o Age

91.9

53.5

90.9

50.8

91.1

51.4

93.3

58.0

Animal World Age-only

92.7

54.9

93.5

53.7

93.6

53.9

92.3

52.8

                  Animal World Full
                  98.0
                  64.8
                  94.1
                  66.8
                  94.3
                  65.9
                  91.1
                  67.4
                

Gain over w/o Age

+6.1

+11.3

+3.2

+16.0

+3.2

+14.5

-2.2

+9.4

Downstream fitting evaluation on age-diverse animals.
Keypoint Source	Juvenile-only	Family
Animal World w/o Age	4.6	88.0	3.6%	6.8	87.7	15.4%
Animal World Age-only	5.1	88.0	1.2%	6.9	87.7	15.7%
Animal World Full	4.7	88.1	0.0%	6.6	87.8	12.6%
Δ Full − w/o Age	+0.1	+0.1	-3.6	-0.2	+0.1	-2.8

Downstream fitting evaluation on age-diverse animals.

Keypoint Source

Juvenile-only

Family

Reproj. Err.

Mask IoU

Failure

Reproj. Err.

Mask IoU

Failure

Animal World w/o Age

4.6

88.0

3.6%

6.8

87.7

15.4%

Animal World Age-only

5.1

88.0

1.2%

6.9

87.7

15.7%

                  Animal World Full
                  4.7
                  88.1
                  0.0%
                  6.6
                  87.8
                  12.6%
                

Δ Full − w/o Age

+0.1

-3.6

-0.2

+0.1

-2.8

Dataset	Images	Instances	Species	KPs	Intra	Mixed	Social	Rare	Age
Animal Pose	4,666	6,117	5	20	✓
StanfordExtra	20,580	12,000	1	20
AP-10K	10,015	13,028	54	17	✓			✓
Animal Kingdom	33,099	33,099	850	20				✓
APT-36K	36,000	53,006	30	17	✓			✓
Animal3D	3,400	3,400	40	26
Animal World (ours)	10,036	19,129	124	30	✓	✓	✓	✓	✓

Dataset

Images

Instances

Species

KPs

Intra

Mixed

Social

Rare

Age

Animal Pose

4,666

6,117

✓

StanfordExtra

20,580

12,000

AP-10K

10,015

13,028

✓

Animal Kingdom

33,099

850

✓

APT-36K

36,000

53,006

✓

Animal3D

3,400

Animal World (ours)

10,036

19,129

124

✓

@inproceedings{animalworld2026, title = {Animal World: A Cross-Species Dataset for Social Animal Pose Understanding}, author = {Hu, Xuyi and Lyu, Jin and Zhang, Shaojie and Ma, Ke and Wang, Houtianfu and Liu, Siwei and Liu, Jiuming and Zuffi, Silvia and Zhao, Jiachen and An, Liang and Goetz, Stefan}, year = {2026} }

Animal World: A Cross-Species Dataset for Social Animal Pose Understanding

Animal World: A Large-Scale Cross-Species Social Pose Dataset over 10,000 Images and 124 Species

Overview

Cross-Species Interaction

Age Diversity

Complex Social Behaviors

Keypoint Annotation Protocol

Benchmark Results

Comparison with Existing Datasets

Citation