System Architecture

SARAS is built on a modular architecture that emphasizes reliability, cross-platform compatibility, and a clear separation of concerns.

The central orchestrator (main.py) that initializes all subsystems, manages the main event loop, and coordinates between components.

RobotController

Converts voice input to text using a local Vosk model, featuring continuous listening and voice activity detection.

speech_to_text.py

Converts text responses to high-quality speech using the Piper TTS engine, ensuring a natural and responsive voice.

text_to_speech.py

Manages AI interactions, switching between the primary OpenAI API and a local GGUF model for fallback processing.

ai_processor.py

Abstracts motor control, handling GPIO on Raspberry Pi and simulating movement on Windows for cross-platform development.

motor_controller.py

Reads and processes data from three ultrasonic sensors for real-time obstacle detection and environmental awareness.

sensors.py

Data Flow

The system follows a clear, sequential data flow for handling user interaction and responding to environmental stimuli.

User Voice -> Speech-to-Text (Vosk) -> Command Processor -> AI Processor -> Response Generation

AI Response -> Text-to-Speech (Piper) -> Audio Output & Face Display Update