Enhance system design and implementation chapters; clarify design decisions, improve technical documentation, and update role-based access control details.

This commit is contained in:
2026-03-04 13:24:36 -05:00
parent 88bd10bebb
commit fed059252c
4 changed files with 189 additions and 190 deletions
+2 -2
View File
@@ -1,7 +1,7 @@
\chapter{System Design} \chapter{System Design}
\label{ch:design} \label{ch:design}
Chapter~\ref{ch:background} established six requirements for modern WoZ infrastructure. This chapter presents the design decisions that address them: the hierarchical organization of experiment specifications, the event-driven execution model, the modular interface architecture, and the integrated data flow. Chapter~\ref{ch:background} established six requirements for modern WoZ infrastructure, labeled R1 through R6. This chapter presents the design decisions that address them: the hierarchical organization of experiment specifications, the event-driven execution model, the modular interface architecture, and the integrated data flow.
\section{Hierarchical Organization of Experiments} \section{Hierarchical Organization of Experiments}
@@ -128,7 +128,7 @@ Together, these two figures motivate why the hierarchy is useful in practice. Th
\section{Event-Driven Execution Model} \section{Event-Driven Execution Model}
To achieve real-time responsiveness while maintaining methodological rigor (R3, R5), the system uses an event-driven execution model rather than a time-driven one. In a time-driven approach, the system advances through actions on a fixed schedule regardless of what the participant is doing, so the robot might speak over a participant who is still talking, or move on before a response has been given. The event-driven model avoids this by letting the wizard trigger each action when the interaction is ready for it. To achieve real-time responsiveness while maintaining methodological rigor (R3, R5), the system uses an event-driven execution model rather than a time-driven one. In a time-driven approach, the system advances through actions on a fixed schedule regardless of what the participant is doing, so the robot might speak over a participant who is still talking, or move on before a response has been given. The event-driven model avoids this by letting the wizard trigger each action when the interaction is ready for it. Figure~\ref{fig:event-driven-timeline} contrasts the two approaches across two trials of the same experiment.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
+114 -159
View File
@@ -1,216 +1,171 @@
\chapter{Implementation} \chapter{Implementation}
\label{ch:implementation} \label{ch:implementation}
Chapter~\ref{ch:design} described the conceptual design of HRIStudio. This chapter addresses the realization of these design principles, discussing the core technologies used, the system architecture that integrates these technologies, and the current state of the implementation. The implementation demonstrates the feasibility of the approach proposed in earlier chapters while identifying technical challenges that inform the roadmap for future development. This chapter explains how HRIStudio implements the design from Chapter~\ref{ch:design}. It covers the architectural choices and mechanisms behind how the platform stores experiments, executes trials, integrates robot hardware, and controls access. Technology stack specifics are in Appendix~\ref{app:tech_docs}.
\section{Core Implementation Decisions} \section{Platform Architecture}
HRIStudio is implemented as a web application. Researchers access it through a standard web browser without installing specialized software. This design decision directly addresses requirement R2 (low technical barrier) by eliminating installation complexity and ensuring the system works identically on different operating systems. This section describes the key implementation choices and the rationale behind them. HRIStudio runs as a web application. Researchers access it through a standard browser without installing specialized software, and the entire study team, including researchers, wizards, and observers, connect to the same shared system. This eliminates installation complexity and ensures the platform works identically on any operating system, directly addressing the low-technical-barrier requirement (R2, from Chapter~\ref{ch:background}). It also enables natural collaboration (R6): multiple team members can access experiment data and observe live trials simultaneously from different machines without any additional configuration.
\subsection{Web-Based Architecture} The system is organized into three layers: a browser-based user interface, an application server that manages execution, authentication, and logging, and a data and robot control layer covering storage and hardware communication. These layers are described architecturally in Chapter~\ref{ch:design}; what matters for implementation is that the server runs on the same local network as the robot hardware. This keeps communication latency low during live trials, where a delay between the wizard's input and the robot's response would disrupt the interaction. All three layers are implemented in the same language--TypeScript \cite{TypeScript2014}, a statically-typed superset of JavaScript. When the structure of experiment data changes, the type checker surfaces inconsistencies across the entire codebase at compile time rather than allowing them to surface as runtime failures during a live trial.
The choice to build HRIStudio as a web application was driven by three factors. First, web browsers are universally available, so researchers do not need to install custom software or manage dependencies. Second, web applications naturally support collaboration: multiple team members can access the same experiment data and observe live trials simultaneously from different locations. Third, web deployment simplifies updates: when I fix bugs or add features, all users immediately receive the improvements without manual software updates. \section{Experiment Storage and Trial Logging}
I chose to use the same programming language~\cite{TypeScript2014} across the entire system, including the user interface, the server logic, and the data access layer. This consistency reduces a common source of errors: when the structure of experiment data changes, inconsistencies between different parts of the system are detected automatically rather than causing runtime failures during live trials. Experiments are saved to persistent storage when a researcher completes them in the Design interface. A saved experiment is a complete, reusable specification that can be run across any number of trials without modification.
\subsection{Data Storage Strategy} When a trial begins, the system creates a new trial record linked to that experiment. Every action the wizard triggers during the trial is written to that record with a precise timestamp, whether it was scripted or not. Video, audio, and robot sensor data are recorded alongside the action log for the duration of the trial. Unscripted actions are flagged as deviations. Because the trial record and the experiment reference the same underlying specification, the Analysis interface can directly compare what was planned against what was executed for any trial, without any manual work by the researcher. Figure~\ref{fig:trial-record} shows the structure of a completed trial record.
Experiment protocols and trial data are stored in a structured database that supports efficient queries, for example, retrieving all trials for a particular participant or comparing timing data across multiple sessions. However, video recordings and audio files are large and unstructured, so they are stored separately in a file storage system. This separation ensures that the database remains fast for common queries while still preserving complete multimedia records.
\subsection{Robot Communication Layer}
Rather than writing custom code to communicate with each robot's specific control system, HRIStudio uses the Robot Operating System (ROS)~\cite{Quigley2009} as an intermediary. ROS is a widely-adopted standard in robotics research that provides a common communication framework. This design decision means that any robot with ROS support can work with HRIStudio. For robots without native ROS support, researchers can write a small adapter, a much simpler task than integrating directly with HRIStudio's core code.
\subsection{Plugin Architecture for Platform Agnosticism}
A critical design decision was how to support diverse robot platforms without hardcoding knowledge of specific robots into HRIStudio. The robotics landscape is fragmented: researchers use various robots (NAO, Pepper, Fetch, custom platforms) that communicate in different ways.
The solution is a plugin architecture. When designing an experiment, researchers work with abstract actions like ``speak this text'' or ``raise arm.'' The system does not need to know whether it is controlling a NAO robot, a Pepper robot, or a custom research platform. Instead, each robot is described by a plugin, a configuration file that maps abstract actions to the specific commands that robot understands.
This separation has important consequences. First, researchers can create an interaction protocol without knowing which robot will ultimately execute it, enabling protocol reuse across different hardware. Second, when a research lab acquires a new robot, they can add support for it by writing a plugin rather than modifying HRIStudio itself. Third, the visual designer's palette of available actions is automatically populated from the loaded plugins, ensuring the interface reflects the actual capabilities of the current robot.
The plugin architecture also treats control flow (branches, loops, conditional logic) the same way as robot actions. This uniformity allows researchers to mix logical decisions and physical robot behaviors freely when designing experiments.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\begin{tikzpicture}[ \begin{tikzpicture}[
action/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.2cm, minimum height=0.6cm, align=center, font=\small}, dot/.style={circle, fill=black, minimum size=6pt, inner sep=0pt},
impl/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.2cm, minimum height=0.7cm, align=center, font=\small}, devdot/.style={rectangle, draw=black, thick, fill=gray!50, minimum size=7pt, inner sep=0pt, rotate=45},
arrow/.style={-, thick}] stepbox/.style={rectangle, draw=black, thick, fill=gray!15, align=center, font=\scriptsize, inner sep=3pt, minimum height=0.55cm},
mediabar/.style={rectangle, draw=black, thick, fill=gray!30, minimum height=0.45cm},
track/.style={font=\small, anchor=east}]
% First Y: speak() % Time axis
\node[action] (a1) at (0, 7) {HRIStudio\\speak(text)}; \draw[->, thick] (0, -0.5) -- (11.5, -0.5) node[right, font=\small\itshape] {time};
\node[impl] (nao1) at (-2, 5) {NAO\\{\small /nao/tts}}; \node[font=\small] at (0.1, -0.8) {$t_0$};
\node[impl] (pep1) at (2, 5) {Pepper\\{\small /pepper/say}}; \node[font=\small] at (10.9, -0.8) {$t_n$};
\draw[arrow] (a1) -- (nao1);
\draw[arrow] (a1) -- (pep1);
% Second Y: raise_arm() % Track labels
\node[action] (a2) at (0, 3) {HRIStudio\\raise\_arm()}; \node[track] at (-0.2, 5.2) {Experiment};
\node[impl] (nao2) at (-2, 1) {NAO\\{\small /nao/arm}}; \node[track] at (-0.2, 3.9) {Action Log};
\node[impl] (pep2) at (2, 1) {Pepper\\{\small /pepper/gesture}}; \node[track] at (-0.2, 2.9) {Video};
\draw[arrow] (a2) -- (nao2); \node[track] at (-0.2, 1.9) {Audio};
\draw[arrow] (a2) -- (pep2); \node[track] at (-0.2, 0.9) {Sensor Data};
% Third Y: move_forward() % Track dividers
\node[action] (a3) at (0, -1) {HRIStudio\\move\_forward()}; \foreach \y in {4.5, 3.4, 2.4, 1.4, 0.4} {
\node[impl] (nao3) at (-2, -3) {NAO\\{\small /nao/move}}; \draw[gray!35, thin] (0, \y) -- (11.0, \y);
\node[impl] (pep3) at (2, -3) {Pepper\\{\small /pepper/cmd\_vel}}; }
\draw[arrow] (a3) -- (nao3);
\draw[arrow] (a3) -- (pep3); % Experiment step boxes
\node[stepbox, minimum width=2.5cm] at (1.5, 5.2) {Intro};
\node[stepbox, minimum width=4.0cm] at (5.2, 5.2) {Story Telling};
\node[stepbox, minimum width=2.5cm] at (9.5, 5.2) {Recall Test};
% Step boundary markers
\draw[dashed, gray!60] (3.0, 4.5) -- (3.0, 0.4);
\draw[dashed, gray!60] (7.5, 4.5) -- (7.5, 0.4);
% Scripted actions
\node[dot] at (0.5, 3.9) {};
\node[dot] at (1.4, 3.9) {};
\node[dot] at (2.3, 3.9) {};
\node[dot] at (3.8, 3.9) {};
\node[dot] at (5.0, 3.9) {};
\node[dot] at (6.1, 3.9) {};
\node[dot] at (7.2, 3.9) {};
\node[dot] at (9.0, 3.9) {};
\node[dot] at (10.5, 3.9) {};
% Deviation marker
\node[devdot] at (5.6, 3.9) {};
\node[font=\scriptsize, above=5pt] at (5.6, 3.9) {deviation};
% Video bar
\node[mediabar, minimum width=10.8cm] at (5.4, 2.9) {};
% Audio bar
\node[mediabar, minimum width=10.8cm, fill=gray!20] at (5.4, 1.9) {};
% Sensor data (continuous sampled line)
\draw[thick, gray!60] plot[smooth] coordinates {
(0.0, 0.90) (1.0, 0.97) (2.0, 0.84) (3.0, 1.01) (4.0, 0.87)
(5.0, 0.96) (6.0, 0.83) (7.0, 0.99) (8.0, 0.86) (9.0, 0.95)
(10.0, 0.88) (11.0, 0.93)
};
\end{tikzpicture} \end{tikzpicture}
\caption{Plugin architecture: each abstract action branches to platform-specific implementations.} \caption{Structure of a completed trial record. Action log entries, video, audio, and robot sensor data share a common timestamp reference so the Analysis interface can align them without manual synchronization. Deviations from the experiment specification are flagged inline. Dashed lines mark step boundaries.}
\label{fig:plugin-architecture} \label{fig:trial-record}
\end{figure} \end{figure}
\subsection{Event-Driven Execution} Video and audio are recorded locally in the researcher's browser during the trial rather than streamed to the server in real time. This prevents network delays or server load from dropping frames or degrading audio quality during the interaction. When the trial concludes, the browser transfers the complete recordings to the server and associates them with the trial record. Because the timestamp when recording starts is logged alongside the action log, the Analysis interface can align video and audio with the logged actions without any manual synchronization.
During a trial, HRIStudio must balance two competing demands: following the experimental protocol precisely while allowing natural human-robot timing. The execution engine accomplishes this by waiting for specific events at designated points in the protocol. For example, if the protocol specifies ``wait for wizard to click Continue,'' the system pauses until that event occurs, regardless of how long it takes. This preserves the spontaneous, human-paced nature of interaction while ensuring the protocol structure is followed. This reflects a deliberate split in how data is stored. Experiment specifications and trial records are kept in a structured database, which makes it efficient to query across trials, for example retrieving all trials for a specific participant or comparing action timing across conditions. Video and audio files are stored separately in a dedicated file store, since their size makes them unsuitable for a database but their content is not queried directly.
Every action during a trial, including robot movements, wizard button clicks, sensor readings, and timing information, is immediately recorded with precise timestamps. This comprehensive logging happens automatically, without requiring researchers to instrument their experiments manually. The complete event record enables two critical capabilities: first, researchers can analyze exactly what happened during a trial without relying on memory or handwritten notes; second, the detailed event log makes trials reproducible by documenting not just what was supposed to happen, but what actually occurred. \section{The Execution Engine}
\subsection{Local Media Recording} When a trial begins, the server loads the experiment and maintains a live connection to the wizard's browser and any observer connections. The execution engine does not advance the experiment on a timer; it waits for the wizard to trigger each step. This preserves the natural pacing of the interaction: the wizard advances only when the participant is ready, while the experiment structure ensures the protocol is followed. When the wizard triggers an action, the server dispatches the robot command, writes the log entry, and pushes the updated experiment state to all connected clients in the same operation. This is what keeps the wizard's view, the observer view, and the actual robot state synchronized in real time.
Video and audio recording during trials must not interfere with the live interaction. To ensure this, recording happens locally in the researcher's web browser rather than streaming data to a remote server in real-time. The browser accumulates the video and audio data, then transfers the complete recordings to the server when the trial concludes. This approach prevents network delays or server processing from causing dropped video frames or degraded audio quality during the critical interaction period. Unscripted actions go through the same path. The wizard triggers them via the manual controls in the Execution interface, the robot command runs, and the action is logged with a deviation flag. The result is a complete, unambiguous trial record regardless of how closely the interaction followed the script.
The timestamps when recording starts and stops are logged alongside other trial events, ensuring that when researchers later review the video, they can see exactly what was happening in the experiment protocol at any given moment in the recording. \section{Robot Integration}
\section{System Architecture and Data Flow} Each robot platform is described by a configuration file that lists the actions it supports and specifies how each one maps to a command the robot understands. The execution engine reads this file at startup and uses it whenever it needs to dispatch a command: it looks up the action type, assembles the appropriate message, and sends it to the robot over a bridge process running on the local network. The web server itself has no knowledge of any specific robot; all hardware-specific logic lives in the configuration file.
\subsection{Separation of architectural layers} Control flow elements such as branches and conditionals are treated the same way as robot actions. They appear as action groups in the experiment and are resolved by the execution engine at runtime, so researchers can freely mix logical decisions and physical robot behaviors when designing an experiment without any special handling.
HRIStudio's architecture separates the system into three distinct layers, each with a specific responsibility: Figure~\ref{fig:plugin-architecture} illustrates how the same abstract actions map to different robot-specific commands through each platform's configuration, using NAO6 and TurtleBot as an example.
\begin{enumerate}
\item \textbf{User interface layer:} The visual interfaces (Design, Execute, Playback) run in the researcher's web browser. This layer handles user interactions, including clicking buttons, dragging experiment components, and viewing live trial status.
\item \textbf{Application logic layer:} A server process manages experiment data, coordinates trial execution, authenticates users, and orchestrates communication between the interface and the robot.
\item \textbf{Data and robot control layer:} This layer encompasses two responsibilities: long-term storage of experiment protocols and trial data; and direct communication with robot hardware.
\end{enumerate}
This separation provides several benefits. Different parts of the system can evolve independently; for example, improving the user interface does not require changes to robot control logic. The separation also clarifies responsibilities: the user interface should never directly command robot hardware; all robot actions flow through the application logic layer, which can enforce safety constraints and maintain consistent logging.
\begin{figure}[htbp] \begin{figure}[htbp]
\centering \centering
\begin{tikzpicture}[ \begin{tikzpicture}[
layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm}, expbox/.style={rectangle, draw=black, thick, fill=gray!10, align=left, font=\small, inner sep=10pt},
arrow/.style={->, thick, line width=1.5pt}] cfgbox/.style={rectangle, draw=black, thick, dashed, fill=white, align=center, font=\small\itshape, inner sep=6pt},
robotbox/.style={rectangle, draw=black, thick, fill=gray!25, align=left, font=\small, inner sep=10pt},
arrow/.style={->, thick}]
% Layer 1: UI % Experiment box
\node[layer, fill=gray!15] (ui) at (0, 3.5) { \node[expbox] (exp) at (0, 0) {
\textbf{User Interface}\\[0.1cm] \textbf{Experiment}\\[4pt]
{\small Design, Execute, Playback} \texttt{speak(text)}\\[2pt]
\texttt{raise\_arm()}\\[2pt]
\texttt{move\_forward()}
}; };
% Layer 2: Logic % Configuration file node (intermediate)
\node[layer, fill=gray!30] (logic) at (0, 1.8) { \node[cfgbox] (cfg) at (4.5, 0) {configuration\\file};
\textbf{Application Logic}\\[0.1cm]
{\small Execution, Authentication, Logger} % NAO6 box
\node[robotbox] (nao) at (9.5, 1.6) {
\textbf{NAO6}\\[4pt]
\texttt{speak} $\to$ \texttt{/nao/tts}\\[2pt]
\texttt{raise\_arm} $\to$ \texttt{/nao/arm}\\[2pt]
\texttt{move} $\to$ \texttt{/nao/move}
}; };
% Layer 3: Data % TurtleBot box
\node[layer, fill=gray!45] (data) at (0, 0.1) { \node[robotbox] (tb) at (9.5, -1.6) {
\textbf{Data \& Robot Control}\\[0.1cm] \textbf{TurtleBot}\\[4pt]
{\small Database, File Storage, ROS} \texttt{speak} $\to$ \texttt{/tts/say}\\[2pt]
\texttt{raise\_arm} $\to$ \textit{(not supported)}\\[2pt]
\texttt{move} $\to$ \texttt{/cmd\_vel}
}; };
% Arrows % Arrows
\draw[arrow] (ui.south) -- (logic.north); \draw[arrow] (exp.east) -- (cfg.west);
\draw[arrow] (logic.south) -- (data.north); \draw[arrow] (cfg.east) -- (nao.west);
\draw[arrow] (cfg.east) -- (tb.west);
\end{tikzpicture} \end{tikzpicture}
\caption{HRIStudio's three-layer architecture separates user interface, application logic, and data/robot control.} \caption{The same abstract actions in an experiment are translated to platform-specific robot commands through each robot's configuration file. Actions a platform does not support are declared explicitly rather than silently failing. The experiment itself does not change between platforms.}
\label{fig:three-tier} \label{fig:plugin-architecture}
\end{figure} \end{figure}
\subsection{Data Flow During a Trial} \section{Access Control}
The flow of data during a trial illustrates how the architectural layers coordinate: Each study has a membership list with assigned roles: owner, researcher, wizard, and observer. These roles determine what each team member can see and do within that study. A wizard can trigger actions during a live trial; observers can watch and annotate but cannot trigger anything. This allows studies to separate the wizard's role from the research team's observing role without any additional configuration.
\begin{enumerate} The role system also supports double-blind designs, where certain team members are restricted from seeing condition assignments or result data until the study concludes.
\item A researcher creates an experiment protocol using the Design interface and initiates a trial.
\item The application server loads the protocol and begins stepping through it, sending commands to the robot and waiting for events (wizard inputs, sensor readings, timeouts).
\item Every action, both planned protocol steps and unexpected events, is immediately written to the trial log with precise timing information.
\item The Execute interface continuously displays the current state, allowing the wizard and observers to monitor progress in real-time.
\item When the trial concludes, all recorded media (video, audio) is transferred from the browser to the server and associated with the trial record.
\item Later, the Analysis interface retrieves the stored trial data and reconstructs exactly what happened, synchronized with the video and audio recordings.
\end{enumerate}
This design ensures comprehensive documentation of every trial, supporting both fine-grained analysis and reproducibility. Researchers can review not just what they planned to happen, but what actually occurred, including timing variations and unexpected events. \section{Architectural Challenges}
\begin{figure}[htbp] Two design problems required specific design choices during implementation.
\centering
\begin{tikzpicture}[
stage/.style={rectangle, draw, thick, rounded corners, minimum width=3.5cm, minimum height=1cm, align=center, font=\footnotesize},
arrow/.style={->, thick, line width=1.3pt}]
% Six stages stacked vertically with descriptions inside During a live trial, the execution engine must respond quickly to wizard input. A noticeable delay between the button press and the robot's action can disrupt the interaction. The engine addresses this by maintaining a persistent connection for the duration of each trial. The connection is established once at trial start and held open, so there is no per-action setup overhead.
\node[stage, fill=gray!10] (s1) at (0, 7.5) {1. Design Protocol\\{\scriptsize Researcher creates workflow}};
\node[stage, fill=gray!15] (s2) at (0, 6) {2. Load \& Execute\\{\scriptsize System loads and runs trial}};
\node[stage, fill=gray!20] (s3) at (0, 4.5) {3. Log Events\\{\scriptsize Actions recorded with timestamps}};
\node[stage, fill=gray!25] (s4) at (0, 3) {4. Display Live State\\{\scriptsize Wizard sees real-time progress}};
\node[stage, fill=gray!30] (s5) at (0, 1.5) {5. Transfer Media\\{\scriptsize Video/audio saved to server}};
\node[stage, fill=gray!35] (s6) at (0, 0) {6. Analyze \& Playback\\{\scriptsize Review data with synchronized media}};
% Downward arrows Multi-source synchronization requires aligning data streams during analysis that were captured at different sampling rates by different components: video, audio, action logs, and sensor data. The solution is a shared time reference. Every data source records its timestamps relative to the same trial start time, $t_0$, so the Analysis interface can align all tracks without requiring manual calibration. This is the timestamp structure shown in Figure~\ref{fig:trial-record}.
\draw[arrow] (s1.south) -- (s2.north);
\draw[arrow] (s2.south) -- (s3.north);
\draw[arrow] (s3.south) -- (s4.north);
\draw[arrow] (s4.south) -- (s5.north);
\draw[arrow] (s5.south) -- (s6.north);
\end{tikzpicture}
\caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
\label{fig:trial-dataflow}
\end{figure}
\section{Implementation Status} \section{Implementation Status}
The core architectural components of HRIStudio have been implemented and validated. The framework successfully instantiates the design principles described earlier, demonstrating the feasibility of the approach and highlighting technical challenges to be addressed in future work. HRIStudio has reached minimum viable product status. The Design, Execution, and Analysis interfaces are operational. The execution engine handles scripted and unscripted actions with full timestamped logging, and robot communication has been validated with the NAO6 platform. The platform is capable of running a controlled WoZ study without modification.
\begin{description} Work remaining for future development includes support for studies that use more than one robot at a time and validation of the configuration file approach on robot platforms beyond NAO6.
\item[User interfaces:] The Design, Execute, and Playback interfaces are operational. The visual design environment supports drag-and-drop construction of experiment workflows.
\item[Server logic and data management:] The server manages experiment specifications, user authentication, trial session data, and comprehensive event logging.
\item[Data model:] The hierarchical Study/Experiment/Trial data structures with full event logging infrastructure are implemented and operational.
\item[Robot communication:] The system successfully communicates with robots through ROS, translating abstract protocol actions into robot-specific commands and receiving sensor data.
\item[Plugin system:] The plugin architecture for supporting multiple robot platforms is in place, allowing researchers to define new robot capabilities without modifying core system code.
\end{description}
Components requiring continued development include robust real-time synchronization for complex multi-agent scenarios, comprehensive media playback with full temporal synchronization, and evaluation of the plugin system with diverse robot platforms.
\section{Architectural Challenges and Solutions}
\subsection{Real-Time Responsiveness During Trials}
The Execute interface must maintain responsive communication between the wizard and the robot. Wireless networks and web-based systems can introduce delays that, if not carefully managed, degrade interaction quality or compromise safety. The implementation addresses this in three ways: maintaining persistent connections that avoid the overhead of repeatedly establishing communication; deploying the server on the same local network as the robot to minimize network delays; and anticipating likely next actions to prepare the robot in advance when possible.
\subsection{Synchronizing Multiple Data Sources}
During playback, researchers need to see video, hear audio, and review event logs in perfect synchronization. However, these data sources have different characteristics: video captures 30 frames per second, audio samples thousands of times per second, and event logs record discrete actions at irregular intervals. The implementation uses a common time reference and records precise timestamps for all data, allowing the playback system to align everything accurately regardless of differences in how the data was originally captured.
\subsection{Extensibility Without Fragmentation}
The plugin architecture allows researchers to add support for new robot platforms without modifying HRIStudio's core code. This design separates the evolution of the platform itself from the evolution of robot support: I can improve HRIStudio's core functionality without affecting plugins, and researchers can add new robots without waiting for core platform changes.
However, this separation creates a design challenge: the plugin interface must be flexible enough to accommodate diverse robots, but not so flexible that every robot requires completely custom code. Finding this balance requires validating the plugin design with multiple real robots to ensure the abstraction is appropriate.
\section{Mapping Architecture to Requirements}
The implementation choices described in this chapter directly support the six requirements established earlier:
\begin{description}
\item[R1 (Integrated workflow):] The unified Design/Execute/Analysis pipeline with shared data models ensures coherent workflows without switching between separate tools.
\item[R2 (Low technical barrier):] Web-based deployment and drag-and-drop interface design eliminate installation complexity and reduce the learning curve.
\item[R3 (Real-time control):] Event-driven execution with persistent connections enables responsive, natural human-robot interaction.
\item[R4 (Automated logging):] Comprehensive event logging captures the complete trial trace automatically, without requiring researchers to add logging code to their experiments.
\item[R5 (Platform agnosticism):] The plugin architecture allows integration with diverse robot platforms without modifying core system code.
\item[R6 (Collaborative support):] Multiple team members can simultaneously observe trial execution through shared, synchronized views.
\end{description}
\section{Chapter Summary} \section{Chapter Summary}
This chapter has described the key implementation decisions that realize HRIStudio's design principles. Building the system as a web application addresses accessibility by eliminating installation complexity and enabling natural collaboration. Using a consistent programming approach throughout the system reduces a common source of errors where different parts of an application become inconsistent. This chapter described how HRIStudio's design is realized in practice. Experiments are persistent, reusable specifications that produce complete, comparable trial records. The execution engine is event-driven rather than timer-driven, keeping the wizard in control of pacing while automatically logging every action. Robot hardware integration is handled through per-platform configuration files, keeping the execution engine itself hardware-agnostic. Access control is enforced at the study level through assigned roles. The platform is at minimum viable product status and is capable of running a controlled WoZ study.
The separation between user interface, application logic, and data storage clarifies responsibilities and allows independent evolution of different system components. The plugin architecture directly addresses platform agnosticism (R5), enabling researchers to add robot support without modifying core code. Event-driven execution preserves natural interaction timing while comprehensive automatic logging satisfies requirement R4 and supports reproducibility. Local media recording ensures high-quality video and audio capture without interfering with live trials.
While core architectural components are operational, continued work remains on optimizing real-time responsiveness for complex scenarios, refining multi-modal playback synchronization, and validating the plugin design with diverse robot platforms.
+43 -2
View File
@@ -1,8 +1,49 @@
\chapter{Technical Documentation} \chapter{Technical Documentation}
\label{app:tech_docs} \label{app:tech_docs}
This appendix documents the specific technologies and libraries used to build HRIStudio, organized by the three architectural layers described in Chapter~\ref{ch:design}. The goal here is reference, not justification; Chapter~\ref{ch:implementation} explains the reasoning behind the major architectural choices.
\section{Technology Stack}
\subsection{User Interface Layer}
The frontend is built on Next.js (App Router) using React and TypeScript. TypeScript is used throughout the entire codebase, including the server and data access layers, so that type inconsistencies between layers are caught at compile time rather than at runtime. Styling is handled with Tailwind CSS and the shadcn/ui component library. The drag-and-drop canvas in the Design interface uses the \texttt{@dnd-kit} library (\texttt{@dnd-kit/core} and \texttt{@dnd-kit/sortable}) to manage nested drag operations for arranging steps and action blocks.
\subsection{Application Logic Layer}
The server runs as a Next.js Node.js process. API routes use tRPC over HTTP for typed request/response calls; real-time communication during live trials uses a persistent WebSocket connection via the \texttt{ws} package. Authentication and session management are handled by NextAuth.js (v5 beta) with the \texttt{@auth/drizzle-adapter} and bcryptjs for password hashing. Currently, credential-based (username and password) authentication is supported.
\subsection{Data and Robot Control Layer}
Experiment protocols, trial records, and user data are stored in PostgreSQL. The schema and all database queries are managed through Drizzle ORM, which provides compile-time type safety for database interactions. Action configuration parameters and plugin-specific fields are stored as JSONB columns, which allows the same schema to accommodate any robot's action types.
Video and audio recordings captured during trials are stored in a self-hosted MinIO instance, an S3-compatible object storage service. Recordings are captured in the browser using the native MediaRecorder API (assisted by \texttt{react-webcam}) and uploaded to MinIO as a chunked transfer when the trial concludes.
Robot communication is handled through a ROS Bridge (\texttt{rosbridge\_suite} or \texttt{ros2-web-bridge}) running on the robot's local network. The server connects to the bridge over a WebSocket and exchanges JSON-encoded ROS messages; it does not run as a ROS node itself. The bridge address is configured per robot in the plugin file (for example, \texttt{"rosbridgeUrl": "ws://localhost:9090"} in the NAO6 plugin).
\section{Deployment} \section{Deployment}
% TODO
The full stack is orchestrated using Docker Compose. The \texttt{docker-compose.yml} file defines three services: the PostgreSQL database (\texttt{postgres:15}), the MinIO storage instance, and the Next.js application server. Starting the entire system on any machine with Docker installed is a single \texttt{docker compose up} command. This configuration is intended for on-premises deployment, which is important for studies involving participant data that cannot leave the institution's network.
\section{Plugin Specification} \section{Plugin Specification}
% TODO
Robot capabilities are defined in JSON plugin files. Each file describes a robot platform and the actions it supports. The structure of a plugin file is as follows:
\begin{itemize}
\item \textbf{Metadata}: name, version, and a human-readable description of the platform.
\item \textbf{ROS configuration} (\texttt{ros2Config}): the bridge URL and any global connection parameters.
\item \textbf{Actions}: an array of action definitions. Each action specifies:
\begin{itemize}
\item A unique action type identifier (e.g., \texttt{speak}, \texttt{raise\_arm})
\item A human-readable label shown in the Design interface
\item A parameter schema defining the input fields the researcher configures
\item The target ROS topic and message type
\item A mapping from parameter names to message fields
\end{itemize}
\end{itemize}
When the server dispatches a robot command, it loads the active plugin, locates the matching action definition, constructs the ROS message by applying the parameter mapping, and sends it to the bridge. Adding a new robot means writing a new plugin file; no server code changes are required.
\section{Role-Based Access Control}
HRIStudio uses a two-layer role system. System roles (\texttt{systemRoleEnum}) govern what a user can do across the platform: \emph{administrator}, \emph{researcher}, \emph{wizard}, and \emph{observer}. Study roles (\texttt{studyMemberRoleEnum}) govern what a user can see and do within a specific study: \emph{owner}, \emph{researcher}, \emph{wizard}, and \emph{observer}. A user's system role and study role are checked independently, so a user who is a wizard on one study can be an observer on another without any additional configuration.
+3
View File
@@ -4,6 +4,8 @@
%\usepackage{graphics} %Select graphics package %\usepackage{graphics} %Select graphics package
\usepackage{graphicx} % \usepackage{graphicx} %
%\usepackage{amsthm} %Add other packages as necessary %\usepackage{amsthm} %Add other packages as necessary
\usepackage{array} %Extended column types and \arraybackslash
\usepackage{tabularx} %Auto-width table columns
\usepackage{tikz} %For programmatic diagrams \usepackage{tikz} %For programmatic diagrams
\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds} \usetikzlibrary{shapes,arrows,positioning,fit,backgrounds}
\usepackage[ \usepackage[
@@ -65,6 +67,7 @@
%J.~Good Phys., {\bf 2}, 294 (2004). %J.~Good Phys., {\bf 2}, 294 (2004).
%\end{thebibliography} %\end{thebibliography}
\makeatletter\@mainmattertrue\makeatother
\appendix \appendix
\include{chapters/app_materials} \include{chapters/app_materials}
\include{chapters/app_tech_docs} \include{chapters/app_tech_docs}