honors-thesis/thesis/chapters/05_implementation.tex

\chapter{Implementation}
\label{ch:implementation}

HRIStudio is a complete, operational platform that realizes the design principles established in Chapter~\ref{ch:design}. As the primary artifact of this thesis, it demonstrates that those principles are not merely theoretical: the hierarchical specification model, the event-driven execution model, and the integrated data flow can be built into a system that real researchers use without programming expertise. Any system built on those principles could satisfy the same requirements; HRIStudio is the implementation that proves they work in practice. This chapter explains how HRIStudio realizes those principles, covering the architectural choices and mechanisms behind how the platform stores experiments, executes trials, integrates robot hardware, and controls access. The specific technologies used are presented in Appendix~\ref{app:tech_docs}.

\section{Platform Architecture}

HRIStudio follows the model of a web application. Users access it through a standard browser without installing specialized software, and the entire study team, including researchers, wizards, and observers, connect to the same shared system. This eliminates the need for a local installation and ensures the platform works identically on any operating system, directly addressing the low-technical-barrier requirement (R2, from Chapter~\ref{ch:background}). It also enables easy collaboration (R6): multiple team members can access experiment data and observe trials simultaneously from different machines without any additional configuration.

I organized the system into three layers: User Interface, Application Logic, and Data \& Robot Control. This layered structure is shown in Figure~\ref{fig:three-tier}. In the implementation of this architecture, it is essential that the application server and the robot control hardware run on the same local network. This keeps communication latency low during trials: a noticeable delay between the wizard's input and the robot's response would break the interaction.

I implemented all three layers in the same language: TypeScript~\cite{TypeScript2014}, a statically-typed superset of JavaScript. The single-language decision keeps the type system consistent across the full stack. When the structure of experiment data changes, the type checker surfaces inconsistencies across the entire codebase at compile time rather than allowing them to appear as runtime failures during a trial.

\section{Experiment Storage and Trial Logging}

The system saves experiments to persistent storage when a researcher completes them in the Design interface. A saved experiment is a complete, reusable specification that a researcher can run across any number of trials without modification. In this chapter, a trial means one concrete run of an experiment protocol with one human subject; this is where spontaneous wizard deviations can occur.

When a trial begins, the system creates a new trial record linked to that experiment. The system writes every action the wizard triggers to that record with a precise timestamp, whether scripted or not, including any unscripted actions triggered outside the protocol. The system flags those unscripted actions as deviations. The Execution interface records video, audio, and robot sensor data alongside the action log for the duration of the trial. The Analysis interface can directly compare what was planned against what was executed for any trial, without any manual work by the researcher, because the trial record and the experiment reference the same underlying specification. Figure~\ref{fig:trial-record} shows the structure of a completed trial record: action log entries, video, audio, and robot sensor data all share a common timestamp reference so the Analysis interface can align them without manual synchronization; dashed lines mark step boundaries; and the system flags any deviation from the experiment specification at the appropriate position in the timeline.

\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
    dot/.style={circle, fill=black, minimum size=6pt, inner sep=0pt},
    devdot/.style={rectangle, draw=black, thick, fill=gray!50, minimum size=7pt, inner sep=0pt, rotate=45},
    stepbox/.style={rectangle, draw=black, thick, fill=gray!15, align=center, font=\scriptsize, inner sep=3pt, minimum height=0.55cm},
    mediabar/.style={rectangle, draw=black, thick, fill=gray!30, minimum height=0.45cm},
    track/.style={font=\small, anchor=east}]

    % Time axis
    \draw[->, thick] (0, -0.5) -- (11.5, -0.5) node[right, font=\small\itshape] {time};
    \node[font=\small] at (0.1, -0.8) {$t_0$};
    \node[font=\small] at (10.9, -0.8) {$t_n$};

    % Track labels
    \node[track] at (-0.2, 5.2) {Experiment};
    \node[track] at (-0.2, 3.9) {Action Log};
    \node[track] at (-0.2, 2.9) {Video};
    \node[track] at (-0.2, 1.9) {Audio};
    \node[track] at (-0.2, 0.9) {Sensor Data};

    % Track dividers
    \foreach \y in {4.5, 3.4, 2.4, 1.4, 0.4} {
        \draw[gray!35, thin] (0, \y) -- (11.0, \y);
    }

    % Experiment step boxes
    \node[stepbox, minimum width=2.5cm] at (1.5, 5.2) {Intro};
    \node[stepbox, minimum width=4.0cm] at (5.2, 5.2) {Story Telling};
    \node[stepbox, minimum width=2.5cm] at (9.5, 5.2) {Recall Test};

    % Step boundary markers
    \draw[dashed, gray!60] (3.0, 4.5) -- (3.0, 0.4);
    \draw[dashed, gray!60] (7.5, 4.5) -- (7.5, 0.4);

    % Scripted actions
    \node[dot] at (0.5, 3.9) {};
    \node[dot] at (1.4, 3.9) {};
    \node[dot] at (2.3, 3.9) {};
    \node[dot] at (3.8, 3.9) {};
    \node[dot] at (5.0, 3.9) {};
    \node[dot] at (6.1, 3.9) {};
    \node[dot] at (7.2, 3.9) {};
    \node[dot] at (9.0, 3.9) {};
    \node[dot] at (10.5, 3.9) {};

    % Deviation marker
    \node[devdot] at (5.6, 3.9) {};
    \node[font=\scriptsize, above=5pt] at (5.6, 3.9) {deviation};

    % Video bar
    \node[mediabar, minimum width=10.8cm] at (5.4, 2.9) {};

    % Audio bar
    \node[mediabar, minimum width=10.8cm, fill=gray!20] at (5.4, 1.9) {};

    % Sensor data (continuous sampled line)
    \draw[thick, gray!60] plot[smooth] coordinates {
        (0.0, 0.90) (1.0, 0.97) (2.0, 0.84) (3.0, 1.01) (4.0, 0.87)
        (5.0, 0.96) (6.0, 0.83) (7.0, 0.99) (8.0, 0.86) (9.0, 0.95)
        (10.0, 0.88) (11.0, 0.93)
    };

\end{tikzpicture}
\caption{Structure of a completed trial record, showing synchronized action log, media, and sensor tracks.}
\label{fig:trial-record}
\end{figure}

Video and audio are recorded locally in the researcher's browser during the trial rather than streamed to the server in real time. This prevents network delays or server load from dropping frames or degrading audio quality during the interaction. When the trial concludes, the browser transfers the complete recordings to the server and associates them with the trial record. The Analysis interface can align video and audio with the logged actions without any manual synchronization, because the timestamp when recording starts is logged alongside the action log.

The system stores structured and media data separately. Experiment specifications and trial records are stored in the same structured database, which makes it efficient to query across trials (for example, retrieving all trials for a specific participant or comparing action timing across conditions). Video and audio files are stored in a dedicated file store, since their size makes them unsuitable for a database and the system never queries their content directly.

\section{The Execution Engine}

The execution engine is the component that runs a trial: it loads the experiment, manages the wizard's connection, sends robot commands, and keeps all connected clients in sync.

When a trial begins, the server loads the experiment and maintains a live connection to the wizard's browser and any observer connections. The execution engine does not advance through the actions of an experiment on a timer; instead, the wizard controls how time advances from action to action. This preserves the natural pacing of the interaction: the wizard advances only when the participant is ready, while the experiment structure ensures the protocol is followed. When the wizard triggers an action, the server sends the related command to the robot, writes the log entry, and pushes the updated experiment state to all connected clients in the same operation, keeping the wizard's view, the observer view, and the actual robot state synchronized in real time.

No two human subjects respond identically to an experimental protocol. One subject gives a one-word answer; another offers a paragraph; a third asks the robot a question the script never anticipated. A fully programmed robot has no answer for that third subject: the interaction stalls, or immersion breaks. The wizard exists to fill that gap: where the program runs out of instructions, the wizard draws on their knowledge of human social interaction to keep the exchange coherent. Unscripted actions give the wizard the tools to exercise that judgment in the moment. The wizard triggers them via the manual controls in the Execution interface, the robot command runs, and the system logs the action with a deviation flag. This design preserves research value: the interaction gains the flexibility only a human can provide, and that flexibility appears explicitly in the record rather than disappearing into it.

\section{Robot Integration}

A configuration file describes each robot platform, listing the actions it supports and specifying how each one maps to a command the robot understands. The execution engine reads this file at startup and uses it whenever it needs to dispatch a command: it looks up the action type, assembles the appropriate message, and sends it to the robot over a bridge process running on the local network. The web server itself has no knowledge of any specific robot; all hardware-specific logic lives in the configuration file.

The execution engine treats control flow elements such as branches and conditionals, which function as elements of a computer program, the same way as robot actions. These control-flow elements appear as action groups in the experiment and are evaluated during the trial, so researchers can freely mix logical decisions and physical robot behaviors when designing an experiment without any special handling.

Figure~\ref{fig:plugin-architecture} illustrates this mapping using NAO6 and TurtleBot as an example. Actions a platform does not support (such as \texttt{raise\_arm} on TurtleBot) appear as explicitly unsupported in the configuration file rather than silently failing. Because all hardware-specific logic lives in the configuration file, the experiment itself does not change between platforms.

\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
    expbox/.style={rectangle, draw=black, thick, fill=gray!10, align=left, font=\small, inner sep=10pt},
    cfgbox/.style={rectangle, draw=black, thick, dashed, fill=white, align=center, font=\small\itshape, inner sep=6pt},
    robotbox/.style={rectangle, draw=black, thick, fill=gray!25, align=left, font=\small, inner sep=10pt},
    arrow/.style={->, thick}]

    % Experiment box
    \node[expbox] (exp) at (0, 0) {
        \textbf{Experiment}\\[4pt]
        \texttt{speak(text)}\\[2pt]
        \texttt{raise\_arm()}\\[2pt]
        \texttt{move\_forward()}
    };

    % Configuration file node (intermediate)
    \node[cfgbox] (cfg) at (4.5, 0) {configuration\\file};

    % NAO6 box
    \node[robotbox] (nao) at (9.5, 1.6) {
        \textbf{NAO6}\\[4pt]
        \texttt{speak} $\to$ \texttt{/nao/tts}\\[2pt]
        \texttt{raise\_arm} $\to$ \texttt{/nao/arm}\\[2pt]
        \texttt{move} $\to$ \texttt{/nao/move}
    };

    % TurtleBot box
    \node[robotbox] (tb) at (9.5, -1.6) {
        \textbf{TurtleBot}\\[4pt]
        \texttt{speak} $\to$ \texttt{/tts/say}\\[2pt]
        \texttt{raise\_arm} $\to$ \textit{(not supported)}\\[2pt]
        \texttt{move} $\to$ \texttt{/cmd\_vel}
    };

    % Arrows
    \draw[arrow] (exp.east) -- (cfg.west);
    \draw[arrow] (cfg.east) -- (nao.west);
    \draw[arrow] (cfg.east) -- (tb.west);

\end{tikzpicture}
\caption{Abstract experiment actions translated to platform-specific robot commands through per-platform configuration files.}
\label{fig:plugin-architecture}
\end{figure}

\section{Access Control}

I implemented access control using a role-based access control (RBAC) model. Each study has a membership list, and each member is assigned one of four roles that define a clear separation of capabilities: those who own the study, those who design it, those who run it, and those who observe it. This enforces need-to-know access at the study level so that each team member sees or is able to modify only what their role requires.

\begin{description}
    \item[Owner.] Full control over the study: can invite or remove members, configure the study settings, and access all data.
    \item[Researcher.] Can create and modify experiment designs and review all collected trial data, but cannot manage team membership.
    \item[Wizard.] Can trigger actions during a trial and view the execution interface, but cannot modify the experiment design or access other wizards' sessions.
    \item[Observer.] Read-only access: can watch a trial in real time and annotate significant moments, but cannot trigger actions or modify any data.
\end{description}

The role definitions above determine who can view and change data during normal study operation. The role system also supports what is known as a double-blind design~\cite{Bartneck2024}, where neither the wizard nor the researcher has access to condition assignments or results until the study concludes. For example, the Owner can restrict a Wizard's view of which condition a human subject has been assigned to, and can prevent Researchers from accessing result data until all trials are complete, without any changes to the underlying experiment.

\section{Architectural Challenges}

The following two problems required specific solutions during implementation.

\begin{description}
    \item[Execution latency.] During a trial, the execution engine must respond quickly to wizard input, as a noticeable delay between the button press and the robot's action can disrupt the interaction. I addressed this by maintaining a persistent network connection to the robot bridge for the duration of each trial. The connection is established once at trial start and kept open, eliminating per-action setup overhead.

    \item[Multi-source synchronization.] The Analysis interface requires aligning data streams captured at different sampling rates by different components: video, audio, action logs, and sensor data. The solution is a shared time reference: every data source records its timestamps relative to the same trial start time, $t_0$, so the Analysis interface can align all tracks without requiring manual calibration.
\end{description}

\section{Implementation Status}

HRIStudio is fully operational for controlled Wizard-of-Oz studies. The Design, Execution, and Analysis interfaces are complete and integrated. The execution engine handles scripted and unscripted actions with full timestamped logging, and I validated robot communication on the NAO6 platform during development. A researcher can design an experiment, run a live trial with a wizard, and review the resulting logs and recordings without modification to the platform's core architecture or execution workflow.

Work remaining for future development includes broader validation of the configuration file approach on robot platforms beyond NAO6.

\section{Chapter Summary}

This chapter described how HRIStudio realizes the design principles from Chapter~\ref{ch:design} in practice. Experiments are persistent, reusable specifications that produce complete, comparable trial records. The execution engine is event-driven rather than timer-driven, keeping the wizard in control of pacing while logging every action automatically. Per-platform configuration files keep the execution engine hardware-agnostic. The role system enforces access control at the study level. The platform is fully operational for controlled WoZ studies today, demonstrated through the pilot validation study presented in Chapter~\ref{ch:evaluation}. The design principles are general; HRIStudio shows they are workable.