Specialized Chapter
The data handled here is not for demos or laboratory evaluation. It is input design for robot systems that must operate continuously in real environments.
The problems facing most robot development projects are not about control algorithms or model accuracy — they are about premise mismatches in input data.
Typical situations:
Training on clear speech from professional speakers
Evaluating in noise-free environments
Designed with single microphone, single viewpoint
Missing non-verbal cues
As a result:
The moment the robot is deployed in the real world, it "misunderstands humans."
M9 STUDIO takes responsibility for "reality-premise data design" that becomes necessary at this stage.
Mission
Untrained speaker (general public) utterances
Unclear articulation, restarts, interruptions
Speech from far away or at angles to the microphone
Gaps between speech intent and audio signal
These are not collected randomly — they are designed as conditions.
Instructions and confirmations through gaze
Backchannels, nods, silent agreement/refusal
Changes in physical distance
Utterances when turned away from robot
We assume cases where behavior comes before words for robots.
Household sounds (footsteps, object sounds, fabric rustling)
Machine sounds (motors, fans, drive sounds)
Overlapping human conversations
Sudden sounds (falling objects, collisions)
These are treated not as background noise, but as part of perceptual conditions.
Sound source direction and distance estimation
Spatial reflection and reverberation
Sound degradation due to occlusion
Acoustic changes during robot movement
IR and spatial acoustics are inputs directly connected to robot action decisions.
M9 STUDIO does not take the approach of "collecting low-quality data" or "randomly mixing noise."
Instead, we treat the following as controllable design variables:
3.1 Degradation Condition Design
Staged utterance clarity levels
Varying utterance distance and angle
Noise types and sound pressure levels
Microphone condition variations
This enables:
Robustness training
Understanding of boundary conditions
Fail-safe design
3.2 Reproducibility Guarantee
The important question is:
"Can we cause the same failure again?"
Can regenerate under the same conditions
Can vary conditions incrementally
Enables comparative experiments
Data without this is unusable for robotics.
4.1 Requirements Definition
Robot Role: Guidance, care, work assistance, etc.
Usage Environment: Home, facility, public space
Human Relationships: Regular user, first-time, elderly, children
Actions That Must Not Be Misrecognized
4.2 Data Design
Modality composition
Synchronization conditions (language, sound, vision)
Staged degradation condition design
Non-verbal event definitions
4.3 Execution / Recording
New generation
Condition control
Parallel execution
Log and environment recording
For robotics applications, the following are especially critical:
Whether utterance is "command" or "soliloquy"
Gaze target (robot / other person / object)
Action outcome (approach / avoidance / no response)
Environmental conditions (noise, distance, occlusion)
These are annotated as temporal structures.
Dialogue Robots
Care & Monitoring Robots
Guidance & Reception Robots
Logistics & Work Assistance Robots
Home Robots
Especially strong for situations like:
"Can't tell if I'm being spoken to."
"Unsure whether to respond."
We can generate data that is strong in these edge cases.
The reasons M9 STUDIO cannot be replaced in the robotics domain:
Full Spectrum Design
Can design consistently from pro speech to non-pro speech
Non-Verbal & Spatial
Can handle non-verbal and spatial elements simultaneously
Reproducible Degradation
Can create degradation conditions in reproducible form
Cross-Modal Integration
Can integrate IR, acoustics, and behavior
Long-Term Durability
Rights, reuse, and future expansion durable
The ability to design "ideal state → reality → failure conditions" as a continuous chain.
What matters in robotics is not "sounding smart" — it's "not misunderstanding."
M9 STUDIO is an organization that takes responsibility for input design so that robots don't misunderstand humans.