Enhance clarity and structure in introduction, background, reproducibility, system design, and implementation chapters; add new references and include TikZ for diagrams
All checks were successful
Build Proposal and Thesis / build-github (push) Has been skipped
Build Proposal and Thesis / build-gitea (push) Successful in 3m6s

This commit is contained in:
Sean O'Connor
2026-02-23 22:24:41 -05:00
parent 92ef1b7ef0
commit ad940986c7
7 changed files with 283 additions and 19 deletions

View File

@@ -29,4 +29,4 @@ To answer this question, this thesis validates the framework through a user stud
\section{Chapter Summary} \section{Chapter Summary}
This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research: the accessibility problem, where high technical barriers limit participation by non-programmers, and the reproducibility problem, where fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and a user study. To validate this approach, the next chapters establish the technical and methodological foundations. This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the accessibility problem: high technical barriers limit participation by non-programmers. The second is the reproducibility problem: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and a user study. To validate this approach, the next chapters establish the technical and methodological foundations.

View File

@@ -3,21 +3,21 @@
This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic. This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic.
As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are developed. To understand how the proposed framework advances this research paradigm, I review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field which brings together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}. Yet tool fragmentation--where each research group builds custom software for specific robots--and technical barriers have historically limited participation from non-technical researchers. As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are developed. To understand how the proposed framework advances this research paradigm, I review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field which brings together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}. Yet two challenges have historically limited participation from non-technical researchers. First, each research group builds custom software for specific robots, creating tool fragmentation across the field. Second, high technical barriers prevent many domain experts from conducting independent studies.
\section{Existing WoZ Platforms and Tools} \section{Existing WoZ Platforms and Tools}
Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human (Oz of Wizard, where the robot acts as if controlled by a person when it is actually autonomous). This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments. Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human. In the latter case (Oz of Wizard), the robot acts as if controlled by a person when it is actually autonomous. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.
Early platform-agnostic tools--systems designed to work with multiple robot types rather than a single hardware platform--focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating an accessibility problem for non-technical researchers. Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. These systems were designed to work with multiple robot types rather than a single hardware platform. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating an accessibility problem for non-technical researchers.
A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern.
Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows using drag-and-drop blocks without writing code in traditional programming languages. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. Pettersson and Wik, in their review of WoZ tools \cite{Pettersson2015}, note that platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure. Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows using drag-and-drop blocks without writing code in traditional programming languages. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. Pettersson and Wik, in their review of WoZ tools \cite{Pettersson2015}, note that platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.
Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the Super Wizard of Oz (SWoOZ) platform, which integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection. Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the Super Wizard of Oz (SWoOZ) platform. This system integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor--meaning systematic features that guide experimenters toward best practices like standardized protocols, comprehensive logging, and reproducible experimental designs. This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. By methodological rigor, I refer to systematic features that guide experimenters toward best practices like standardized protocols, comprehensive logging, and reproducible experimental designs.
Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
@@ -26,15 +26,15 @@ Moreover, few platforms directly address the methodological concerns raised by s
This thesis represents the culmination of a multi-year research effort to develop infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide: This thesis represents the culmination of a multi-year research effort to develop infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:
\begin{description} \begin{description}
\item[R1: Integrated workflow.] All phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. \item[R1: Integrated workflow.] All phases of the experimental workflow (design, execution, and analysis) should be integrated within a single unified environment to minimize context switching and tool fragmentation.
\item[R2: Low technical barrier.] Creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. \item[R2: Low technical barrier.] Creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}.
\item[R3: Real-time control.] The system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms. \item[R3: Real-time control.] The system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
\item[R4: Automated logging.] All actions, timings, and sensor data should be automatically logged with synchronized timestamps to facilitate analysis. \item[R4: Automated logging.] All actions, timings, and sensor data should be automatically logged with synchronized timestamps to facilitate analysis.
\item[R5: Platform agnosticism.] The architecture should decouple experimental logic from robot-specific implementations, meaning experiments designed for one robot type can be adapted to others, ensuring the platform remains viable as hardware evolves. \item[R5: Platform agnosticism.] The architecture should decouple experimental logic from robot-specific implementations. This allows experiments designed for one robot type to be adapted to others, ensuring the platform remains viable as hardware evolves.
\item[R6: Collaborative support.] Multiple team members should be able to contribute to experiment design and review execution data, supporting truly interdisciplinary research. \item[R6: Collaborative support.] Multiple team members should be able to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
\end{description} \end{description}
To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design--like training wheels on a bicycle, guiding experimenters to follow sound methodology by default. To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design, like training wheels on a bicycle, guiding experimenters to follow sound methodology by default.
The ideas presented here build upon prior work established in two peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them. The ideas presented here build upon prior work established in two peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.

View File

@@ -20,18 +20,18 @@ Based on this analysis, I identify specific ways that software infrastructure ca
\begin{enumerate} \begin{enumerate}
\item \textbf{Guided wizard execution.} Rather than merely providing tools for wizard control, an ideal WoZ platform should actively guide wizards through scripted procedures. This means presenting actions in a prescribed sequence to prevent out-of-order execution, highlighting the current step in the protocol, recording any deviations from the script as explicit events in the data log, and supporting repeatable decision logic through clearly defined conditional branches. By constraining wizard behavior within the bounds of the experimental design, the system reduces unintended variability across trials and participants. \item \textbf{Guided wizard execution.} Rather than merely providing tools for wizard control, an ideal WoZ platform should actively guide wizards through scripted procedures. This means presenting actions in a prescribed sequence to prevent out-of-order execution, highlighting the current step in the protocol, recording any deviations from the script as explicit events in the data log, and supporting repeatable decision logic through clearly defined conditional branches. By constraining wizard behavior within the bounds of the experimental design, the system reduces unintended variability across trials and participants.
\item \textbf{Comprehensive automatic logging.} Manual data collection is error-prone and often incomplete. The platform should automatically record every action triggered by the wizard with precise timestamps, all robot sensor data and state changes, and timing information indicating when actions were requested, when they began executing, and when they completed. The full experimental protocol should be embedded in each log file so that researchers can recover the exact script used for any session. Note that recording precise timestamps does not imply that trials must have identical timing--human-robot interactions naturally vary in duration--but rather that the system captures what actually occurred for later analysis. \item \textbf{Comprehensive automatic logging.} Manual data collection is error-prone and often incomplete. The platform should automatically record every action triggered by the wizard with precise timestamps, all robot sensor data and state changes, and timing information indicating when actions were requested, when they began executing, and when they completed. The full experimental protocol should be embedded in each log file so that researchers can recover the exact script used for any session. Note that recording precise timestamps does not imply that trials must have identical timing, since human-robot interactions naturally vary in duration; rather, the system captures what actually occurred for later analysis.
\item \textbf{Self-documenting protocol specifications.} The protocol specification itself should serve as documentation. When interaction protocols are defined using structured formats such as visual flowcharts or declarative scripts rather than imperative code, they become simultaneously executable and human-readable. Researchers can then share complete, unambiguous descriptions of their experimental procedures alongside their results. \item \textbf{Self-documenting protocol specifications.} The protocol specification itself should serve as documentation. When interaction protocols are defined using structured formats such as visual flowcharts or declarative scripts rather than imperative code, they become simultaneously executable and human-readable. Researchers can then share complete, unambiguous descriptions of their experimental procedures alongside their results.
\item \textbf{Platform-independent abstractions.} To maximize the lifespan and transferability of experimental designs, the platform must separate the high-level control logic--the sequence of wizard and robot actions--from the low-level details of how specific robots execute those behaviors. This abstraction allows experiments designed for one robot to be more easily adapted to another, extending the reproducibility of interaction designs even when the original hardware becomes obsolete. \item \textbf{Platform-independent abstractions.} To maximize the lifespan and transferability of experimental designs, the platform must separate the high-level control logic, the sequence of wizard and robot actions, from the low-level details of how specific robots execute those behaviors. This abstraction allows experiments designed for one robot to be more easily adapted to another, extending the reproducibility of interaction designs even when the original hardware becomes obsolete.
\end{enumerate} \end{enumerate}
\section{Connecting Reproducibility Challenges to Infrastructure Requirements} \section{Connecting Reproducibility Challenges to Infrastructure Requirements}
The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior creates the need for enforced experimental protocols (R1, R2) that guide wizards systematically. The lack of comprehensive data undermines analysis, motivating automatic logging requirements (R4). Technical fragmentation--where each lab builds custom software tied to specific hardware--violates platform agnosticism (R5), as these custom systems become obsolete when hardware evolves. Incomplete documentation reflects the need for self-documenting protocol specifications (R1, R2) that are simultaneously executable and shareable. As Chapter~\ref{ch:background} demonstrated, no existing platform simultaneously satisfies all six requirements. Addressing this gap requires rethinking how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts. The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior creates the need for enforced experimental protocols (R1, R2) that guide wizards systematically. The lack of comprehensive data undermines analysis, motivating automatic logging requirements (R4). Technical fragmentation violates platform agnosticism (R5). Each lab builds custom software tied to specific hardware, and these custom systems become obsolete when hardware evolves. Incomplete documentation reflects the need for self-documenting protocol specifications (R1, R2) that are simultaneously executable and shareable. As Chapter~\ref{ch:background} demonstrated, no existing platform simultaneously satisfies all six requirements. Addressing this gap requires rethinking how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.
\section{Chapter Summary} \section{Chapter Summary}

View File

@@ -11,6 +11,35 @@ At the top level, researchers create a \emph{study} element that defines the ove
Each experiment protocol comprises a sequence of \emph{step} elements, which model distinct phases of the interaction design. For example, an experiment protocol might define steps such as ``Introduction,'' ``Learning Task,'' and ``Closing.'' Within each step, researchers define one or more \emph{action} elements that are the atomic units of the experimental procedure. Actions can be directed at the wizard (e.g., ``Wait for subject to finish task, then say encouraging phrase'') or at the robot (e.g., ``Move arm to point, play audio greeting, wait for subject response''). Each experiment protocol comprises a sequence of \emph{step} elements, which model distinct phases of the interaction design. For example, an experiment protocol might define steps such as ``Introduction,'' ``Learning Task,'' and ``Closing.'' Within each step, researchers define one or more \emph{action} elements that are the atomic units of the experimental procedure. Actions can be directed at the wizard (e.g., ``Wait for subject to finish task, then say encouraging phrase'') or at the robot (e.g., ``Move arm to point, play audio greeting, wait for subject response'').
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
nodebox/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
nodeboxdark/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
arrow/.style={->, thick}]
\node[nodebox] (study) at (0, 3.4) {Study};
\node[nodebox] (experiment) at (0, 2.1) {Experiment};
\node[nodebox] (step1) at (-3.0, 0.7) {Step};
\node[nodebox] (step2) at (0, 0.7) {Step};
\node[nodebox] (step3) at (3.0, 0.7) {Step};
\node[nodeboxdark] (action1) at (-4.5, -0.7) {Action};
\node[nodeboxdark] (action2) at (-1.5, -0.7) {Action};
\draw[arrow] (study.south) -- (experiment.north);
\draw[arrow] (experiment.south) -- (step1.north);
\draw[arrow] (experiment.south) -- (step2.north);
\draw[arrow] (experiment.south) -- (step3.north);
\draw[arrow] (step1.south) -- (action1.north);
\draw[arrow] (step1.south) -- (action2.north);
\end{tikzpicture}
\caption{Hierarchy of experiment specifications from study-level context to atomic actions.}
\label{fig:experiment-hierarchy}
\end{figure}
This hierarchical structure serves multiple purposes. First, it permits researchers to design experiment protocols without programming knowledge, using visual or declarative specifications at each level. Second, it naturally maps to the temporal structure of a trial session, making the protocol easy to follow during live execution. Third, it provides a foundation for comprehensive logging: each action executed during a trial can be recorded with precise timestamps and outcomes, making the experimental trace reproducible and analyzable. Fourth, the separation of experiment (protocol) from trial (execution) enables researchers to run the same protocol with different participants, facilitating direct comparison across trials while maintaining clear record-keeping of which participant ran which protocol. This hierarchical structure serves multiple purposes. First, it permits researchers to design experiment protocols without programming knowledge, using visual or declarative specifications at each level. Second, it naturally maps to the temporal structure of a trial session, making the protocol easy to follow during live execution. Third, it provides a foundation for comprehensive logging: each action executed during a trial can be recorded with precise timestamps and outcomes, making the experimental trace reproducible and analyzable. Fourth, the separation of experiment (protocol) from trial (execution) enables researchers to run the same protocol with different participants, facilitating direct comparison across trials while maintaining clear record-keeping of which participant ran which protocol.
\section{Modular Interface Architecture} \section{Modular Interface Architecture}

View File

@@ -1,11 +1,216 @@
\chapter{Implementation Details} \chapter{Implementation}
\label{ch:implementation} \label{ch:implementation}
\section{Technology Stack} Chapter~\ref{ch:design} described the conceptual design of HRIStudio. This chapter addresses the realization of these design principles, discussing the core technologies used, the system architecture that integrates these technologies, and the current state of the implementation. The implementation demonstrates the feasibility of the approach proposed in earlier chapters while identifying technical challenges that inform the roadmap for future development.
% TODO
\section{Technical Challenges} \section{Core Implementation Decisions}
% TODO
\section{System Capabilities} HRIStudio is implemented as a web application. Researchers access it through a standard web browser without installing specialized software. This design decision directly addresses requirement R2 (low technical barrier) by eliminating installation complexity and ensuring the system works identically on different operating systems. This section describes the key implementation choices and the rationale behind them.
% TODO
\subsection{Web-Based Architecture}
The choice to build HRIStudio as a web application was driven by three factors. First, web browsers are universally available, so researchers do not need to install custom software or manage dependencies. Second, web applications naturally support collaboration: multiple team members can access the same experiment data and observe live trials simultaneously from different locations. Third, web deployment simplifies updates: when I fix bugs or add features, all users immediately receive the improvements without manual software updates.
I chose to use the same programming language~\cite{TypeScript2024} across the entire system, including the user interface, the server logic, and the data access layer. This consistency reduces a common source of errors: when the structure of experiment data changes, inconsistencies between different parts of the system are detected automatically rather than causing runtime failures during live trials.
\subsection{Data Storage Strategy}
Experiment protocols and trial data are stored in a structured database that supports efficient queries, for example, retrieving all trials for a particular participant or comparing timing data across multiple sessions. However, video recordings and audio files are large and unstructured, so they are stored separately in a file storage system. This separation ensures that the database remains fast for common queries while still preserving complete multimedia records.
\subsection{Robot Communication Layer}
Rather than writing custom code to communicate with each robot's specific control system, HRIStudio uses the Robot Operating System (ROS)~\cite{Quigley2009} as an intermediary. ROS is a widely-adopted standard in robotics research that provides a common communication framework. This design decision means that any robot with ROS support can work with HRIStudio. For robots without native ROS support, researchers can write a small adapter, a much simpler task than integrating directly with HRIStudio's core code.
\subsection{Plugin Architecture for Platform Agnosticism}
A critical design decision was how to support diverse robot platforms without hardcoding knowledge of specific robots into HRIStudio. The robotics landscape is fragmented: researchers use various robots (NAO, Pepper, Fetch, custom platforms) that communicate in different ways.
The solution is a plugin architecture. When designing an experiment, researchers work with abstract actions like ``speak this text'' or ``raise arm.'' The system does not need to know whether it is controlling a NAO robot, a Pepper robot, or a custom research platform. Instead, each robot is described by a plugin, a configuration file that maps abstract actions to the specific commands that robot understands.
This separation has important consequences. First, researchers can create an interaction protocol without knowing which robot will ultimately execute it, enabling protocol reuse across different hardware. Second, when a research lab acquires a new robot, they can add support for it by writing a plugin rather than modifying HRIStudio itself. Third, the visual designer's palette of available actions is automatically populated from the loaded plugins, ensuring the interface reflects the actual capabilities of the current robot.
The plugin architecture also treats control flow (branches, loops, conditional logic) the same way as robot actions. This uniformity allows researchers to mix logical decisions and physical robot behaviors freely when designing experiments.
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
action/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.2cm, minimum height=0.6cm, align=center, font=\small},
impl/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.2cm, minimum height=0.7cm, align=center, font=\small},
arrow/.style={-, thick}]
% First Y: speak()
\node[action] (a1) at (0, 7) {HRIStudio\\speak(text)};
\node[impl] (nao1) at (-2, 5) {NAO\\{\small /nao/tts}};
\node[impl] (pep1) at (2, 5) {Pepper\\{\small /pepper/say}};
\draw[arrow] (a1) -- (nao1);
\draw[arrow] (a1) -- (pep1);
% Second Y: raise_arm()
\node[action] (a2) at (0, 3) {HRIStudio\\raise\_arm()};
\node[impl] (nao2) at (-2, 1) {NAO\\{\small /nao/arm}};
\node[impl] (pep2) at (2, 1) {Pepper\\{\small /pepper/gesture}};
\draw[arrow] (a2) -- (nao2);
\draw[arrow] (a2) -- (pep2);
% Third Y: move_forward()
\node[action] (a3) at (0, -1) {HRIStudio\\move\_forward()};
\node[impl] (nao3) at (-2, -3) {NAO\\{\small /nao/move}};
\node[impl] (pep3) at (2, -3) {Pepper\\{\small /pepper/cmd\_vel}};
\draw[arrow] (a3) -- (nao3);
\draw[arrow] (a3) -- (pep3);
\end{tikzpicture}
\caption{Plugin architecture: each abstract action branches to platform-specific implementations.}
\label{fig:plugin-architecture}
\end{figure}
\subsection{Event-Driven Execution}
During a trial, HRIStudio must balance two competing demands: following the experimental protocol precisely while allowing natural human-robot timing. The execution engine accomplishes this by waiting for specific events at designated points in the protocol. For example, if the protocol specifies ``wait for wizard to click Continue,'' the system pauses until that event occurs, regardless of how long it takes. This preserves the spontaneous, human-paced nature of interaction while ensuring the protocol structure is followed.
Every action during a trial, including robot movements, wizard button clicks, sensor readings, and timing information, is immediately recorded with precise timestamps. This comprehensive logging happens automatically, without requiring researchers to instrument their experiments manually. The complete event record enables two critical capabilities: first, researchers can analyze exactly what happened during a trial without relying on memory or handwritten notes; second, the detailed event log makes trials reproducible by documenting not just what was supposed to happen, but what actually occurred.
\subsection{Local Media Recording}
Video and audio recording during trials must not interfere with the live interaction. To ensure this, recording happens locally in the researcher's web browser rather than streaming data to a remote server in real-time. The browser accumulates the video and audio data, then transfers the complete recordings to the server when the trial concludes. This approach prevents network delays or server processing from causing dropped video frames or degraded audio quality during the critical interaction period.
The timestamps when recording starts and stops are logged alongside other trial events, ensuring that when researchers later review the video, they can see exactly what was happening in the experiment protocol at any given moment in the recording.
\section{System Architecture and Data Flow}
\subsection{Separation of architectural layers}
HRIStudio's architecture separates the system into three distinct layers, each with a specific responsibility:
\begin{enumerate}
\item \textbf{User interface layer:} The visual interfaces (Design, Execute, Playback) run in the researcher's web browser. This layer handles user interactions, including clicking buttons, dragging experiment components, and viewing live trial status.
\item \textbf{Application logic layer:} A server process manages experiment data, coordinates trial execution, authenticates users, and orchestrates communication between the interface and the robot.
\item \textbf{Data and robot control layer:} This layer encompasses two responsibilities: long-term storage of experiment protocols and trial data; and direct communication with robot hardware.
\end{enumerate}
This separation provides several benefits. Different parts of the system can evolve independently; for example, improving the user interface does not require changes to robot control logic. The separation also clarifies responsibilities: the user interface should never directly command robot hardware; all robot actions flow through the application logic layer, which can enforce safety constraints and maintain consistent logging.
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm},
arrow/.style={->, thick, line width=1.5pt}]
% Layer 1: UI
\node[layer, fill=gray!15] (ui) at (0, 3.5) {
\textbf{User Interface}\\[0.1cm]
{\small Design, Execute, Playback}
};
% Layer 2: Logic
\node[layer, fill=gray!30] (logic) at (0, 1.8) {
\textbf{Application Logic}\\[0.1cm]
{\small Execution, Authentication, Logger}
};
% Layer 3: Data
\node[layer, fill=gray!45] (data) at (0, 0.1) {
\textbf{Data \& Robot Control}\\[0.1cm]
{\small Database, File Storage, ROS}
};
% Arrows
\draw[arrow] (ui.south) -- (logic.north);
\draw[arrow] (logic.south) -- (data.north);
\end{tikzpicture}
\caption{HRIStudio's three-layer architecture separates user interface, application logic, and data/robot control.}
\label{fig:three-tier}
\end{figure}
\subsection{Data Flow During a Trial}
The flow of data during a trial illustrates how the architectural layers coordinate:
\begin{enumerate}
\item A researcher creates an experiment protocol using the Design interface and initiates a trial.
\item The application server loads the protocol and begins stepping through it, sending commands to the robot and waiting for events (wizard inputs, sensor readings, timeouts).
\item Every action, both planned protocol steps and unexpected events, is immediately written to the trial log with precise timing information.
\item The Execute interface continuously displays the current state, allowing the wizard and observers to monitor progress in real-time.
\item When the trial concludes, all recorded media (video, audio) is transferred from the browser to the server and associated with the trial record.
\item Later, the Analysis interface retrieves the stored trial data and reconstructs exactly what happened, synchronized with the video and audio recordings.
\end{enumerate}
This design ensures comprehensive documentation of every trial, supporting both fine-grained analysis and reproducibility. Researchers can review not just what they planned to happen, but what actually occurred, including timing variations and unexpected events.
\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
stage/.style={rectangle, draw, thick, rounded corners, minimum width=3.5cm, minimum height=1cm, align=center, font=\footnotesize},
arrow/.style={->, thick, line width=1.3pt}]
% Six stages stacked vertically with descriptions inside
\node[stage, fill=gray!10] (s1) at (0, 7.5) {1. Design Protocol\\{\scriptsize Researcher creates workflow}};
\node[stage, fill=gray!15] (s2) at (0, 6) {2. Load \& Execute\\{\scriptsize System loads and runs trial}};
\node[stage, fill=gray!20] (s3) at (0, 4.5) {3. Log Events\\{\scriptsize Actions recorded with timestamps}};
\node[stage, fill=gray!25] (s4) at (0, 3) {4. Display Live State\\{\scriptsize Wizard sees real-time progress}};
\node[stage, fill=gray!30] (s5) at (0, 1.5) {5. Transfer Media\\{\scriptsize Video/audio saved to server}};
\node[stage, fill=gray!35] (s6) at (0, 0) {6. Analyze \& Playback\\{\scriptsize Review data with synchronized media}};
% Downward arrows
\draw[arrow] (s1.south) -- (s2.north);
\draw[arrow] (s2.south) -- (s3.north);
\draw[arrow] (s3.south) -- (s4.north);
\draw[arrow] (s4.south) -- (s5.north);
\draw[arrow] (s5.south) -- (s6.north);
\end{tikzpicture}
\caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
\label{fig:trial-dataflow}
\end{figure}
\section{Implementation Status}
The core architectural components of HRIStudio have been implemented and validated. The framework successfully instantiates the design principles described earlier, demonstrating the feasibility of the approach and highlighting technical challenges to be addressed in future work.
\begin{description}
\item[User interfaces:] The Design, Execute, and Playback interfaces are operational. The visual design environment supports drag-and-drop construction of experiment workflows.
\item[Server logic and data management:] The server manages experiment specifications, user authentication, trial session data, and comprehensive event logging.
\item[Data model:] The hierarchical Study/Experiment/Trial data structures with full event logging infrastructure are implemented and operational.
\item[Robot communication:] The system successfully communicates with robots through ROS, translating abstract protocol actions into robot-specific commands and receiving sensor data.
\item[Plugin system:] The plugin architecture for supporting multiple robot platforms is in place, allowing researchers to define new robot capabilities without modifying core system code.
\end{description}
Components requiring continued development include robust real-time synchronization for complex multi-agent scenarios, comprehensive media playback with full temporal synchronization, and evaluation of the plugin system with diverse robot platforms.
\section{Architectural Challenges and Solutions}
\subsection{Real-Time Responsiveness During Trials}
The Execute interface must maintain responsive communication between the wizard and the robot. Wireless networks and web-based systems can introduce delays that, if not carefully managed, degrade interaction quality or compromise safety. The implementation addresses this in three ways: maintaining persistent connections that avoid the overhead of repeatedly establishing communication; deploying the server on the same local network as the robot to minimize network delays; and anticipating likely next actions to prepare the robot in advance when possible.
\subsection{Synchronizing Multiple Data Sources}
During playback, researchers need to see video, hear audio, and review event logs in perfect synchronization. However, these data sources have different characteristics: video captures 30 frames per second, audio samples thousands of times per second, and event logs record discrete actions at irregular intervals. The implementation uses a common time reference and records precise timestamps for all data, allowing the playback system to align everything accurately regardless of differences in how the data was originally captured.
\subsection{Extensibility Without Fragmentation}
The plugin architecture allows researchers to add support for new robot platforms without modifying HRIStudio's core code. This design separates the evolution of the platform itself from the evolution of robot support: I can improve HRIStudio's core functionality without affecting plugins, and researchers can add new robots without waiting for core platform changes.
However, this separation creates a design challenge: the plugin interface must be flexible enough to accommodate diverse robots, but not so flexible that every robot requires completely custom code. Finding this balance requires validating the plugin design with multiple real robots to ensure the abstraction is appropriate.
\section{Mapping Architecture to Requirements}
The implementation choices described in this chapter directly support the six requirements established earlier:
\begin{description}
\item[R1 (Integrated workflow):] The unified Design/Execute/Analysis pipeline with shared data models ensures coherent workflows without switching between separate tools.
\item[R2 (Low technical barrier):] Web-based deployment and drag-and-drop interface design eliminate installation complexity and reduce the learning curve.
\item[R3 (Real-time control):] Event-driven execution with persistent connections enables responsive, natural human-robot interaction.
\item[R4 (Automated logging):] Comprehensive event logging captures the complete trial trace automatically, without requiring researchers to add logging code to their experiments.
\item[R5 (Platform agnosticism):] The plugin architecture allows integration with diverse robot platforms without modifying core system code.
\item[R6 (Collaborative support):] Multiple team members can simultaneously observe trial execution through shared, synchronized views.
\end{description}
\section{Chapter Summary}
This chapter has described the key implementation decisions that realize HRIStudio's design principles. Building the system as a web application addresses accessibility by eliminating installation complexity and enabling natural collaboration. Using a consistent programming approach throughout the system reduces a common source of errors where different parts of an application become inconsistent.
The separation between user interface, application logic, and data storage clarifies responsibilities and allows independent evolution of different system components. The plugin architecture directly addresses platform agnosticism (R5), enabling researchers to add robot support without modifying core code. Event-driven execution preserves natural interaction timing while comprehensive automatic logging satisfies requirement R4 and supports reproducibility. Local media recording ensures high-quality video and audio capture without interfering with live trials.
While core architectural components are operational, continued work remains on optimizing real-time responsiveness for complex scenarios, refining multi-modal playback synchronization, and validating the plugin design with diverse robot platforms.

View File

@@ -193,5 +193,33 @@ series = {OzCHI '15}
doi = {10.1145/3610978.3640741} doi = {10.1145/3610978.3640741}
} }
@misc{React2024,
title={{React: A JavaScript library for building user interfaces}},
author={Meta},
year={2024},
url={https://react.dev}
}
@misc{Nextjs2024,
title={{Next.js: The React Framework for the Web}},
author={Vercel},
year={2024},
url={https://nextjs.org}
}
@misc{TypeScript2024,
title={{TypeScript: Typed JavaScript at Any Scale}},
author={{Microsoft and the TypeScript Community}},
year={2024},
url={https://www.typescriptlang.org}
}
@misc{tRPC2024,
title={{tRPC: Move fast and break nothing. End-to-end typesafe APIs made easy}},
author={Alex Johansson and community contributors},
year={2024},
url={https://trpc.io}
}

View File

@@ -4,6 +4,8 @@
%\usepackage{graphics} %Select graphics package %\usepackage{graphics} %Select graphics package
\usepackage{graphicx} % \usepackage{graphicx} %
%\usepackage{amsthm} %Add other packages as necessary %\usepackage{amsthm} %Add other packages as necessary
\usepackage{tikz} %For programmatic diagrams
\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds}
\usepackage[hidelinks]{hyperref} %Enable hyperlinks and \autoref, hide colored boxes \usepackage[hidelinks]{hyperref} %Enable hyperlinks and \autoref, hide colored boxes
\begin{document} \begin{document}
\butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research} \butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research}