Robotics Data Architecture
1.

The Core Challenge

The problems facing most robot development projects are not about control algorithms or model accuracy — they are about premise mismatches in input data.

Typical situations:

Training on clear speech from professional speakers

Evaluating in noise-free environments

Designed with single microphone, single viewpoint

Missing non-verbal cues

As a result:

The moment the robot is deployed in the real world, it "misunderstands humans."

M9 STUDIO takes responsibility for "reality-premise data design" that becomes necessary at this stage.

Mission

Robots That Don't Misunderstand.

2.

Data Domains
for Robotics

2.1 Speech / Utterance Data (Robot Speech I/O)

Untrained speaker (general public) utterances

Unclear articulation, restarts, interruptions

Speech from far away or at angles to the microphone

Gaps between speech intent and audio signal

These are not collected randomly — they are designed as conditions.

2.2 Non-Verbal / Behavioral Data (Human Behavior)

Instructions and confirmations through gaze

Backchannels, nods, silent agreement/refusal

Changes in physical distance

Utterances when turned away from robot

We assume cases where behavior comes before words for robots.

2.3 Environmental Sound / Noise / Ambient Sound

Household sounds (footsteps, object sounds, fabric rustling)

Machine sounds (motors, fans, drive sounds)

Overlapping human conversations

Sudden sounds (falling objects, collisions)

These are treated not as background noise, but as part of perceptual conditions.

2.4 Acoustic / Spatial / IR

Sound source direction and distance estimation

Spatial reflection and reverberation

Sound degradation due to occlusion

Acoustic changes during robot movement

IR and spatial acoustics are inputs directly connected to robot action decisions.

3.

Designing
"Amateur Speech"
and "Noise"

M9 STUDIO does not take the approach of "collecting low-quality data" or "randomly mixing noise."

Instead, we treat the following as controllable design variables:

3.1 Degradation Condition Design

Staged utterance clarity levels

Varying utterance distance and angle

Noise types and sound pressure levels

Microphone condition variations

This enables:

Robustness training

Understanding of boundary conditions

Fail-safe design

3.2 Reproducibility Guarantee

The important question is:

"Can we cause the same failure again?"

Can regenerate under the same conditions

Can vary conditions incrementally

Enables comparative experiments

Data without this is unusable for robotics.

4.

Generation Flow

4.1 Requirements Definition

Robot Role: Guidance, care, work assistance, etc.

Usage Environment: Home, facility, public space

Human Relationships: Regular user, first-time, elderly, children

Actions That Must Not Be Misrecognized

4.2 Data Design

Modality composition

Synchronization conditions (language, sound, vision)

Staged degradation condition design

Non-verbal event definitions

4.3 Execution / Recording

New generation

Condition control

Parallel execution

Log and environment recording

5.

Metadata &
Annotation

For robotics applications, the following are especially critical:

Whether utterance is "command" or "soliloquy"

Gaze target (robot / other person / object)

Action outcome (approach / avoidance / no response)

Environmental conditions (noise, distance, occlusion)

These are annotated as temporal structures.

6.

Use Cases

Dialogue Robots

Care & Monitoring Robots

Guidance & Reception Robots

Logistics & Work Assistance Robots

Home Robots

Especially strong for situations like:

"Can't tell if I'm being spoken to."
"Unsure whether to respond."

We can generate data that is strong in these edge cases.

7.

Why This Cannot Be Replaced

The reasons M9 STUDIO cannot be replaced in the robotics domain:

Full Spectrum Design

Can design consistently from pro speech to non-pro speech

Non-Verbal & Spatial

Can handle non-verbal and spatial elements simultaneously

Reproducible Degradation

Can create degradation conditions in reproducible form

Cross-Modal Integration

Can integrate IR, acoustics, and behavior

Long-Term Durability

Rights, reuse, and future expansion durable

The ability to design "ideal state → reality → failure conditions" as a continuous chain.

What matters in robotics is not "sounding smart" — it's "not misunderstanding."

M9 STUDIO is an organization that takes responsibility for input design so that robots don't misunderstand humans.

DISCUSS YOUR REQUIREMENTS

EXPLORE ALL CAPABILITIES

See All Six Business Areas

VIEW WHAT WE INVENT