honors-thesis/thesis/chapters/02_background.tex

\chapter{Background and Related Work}
\label{ch:background}

This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis within the context of prior work on this topic.

As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are developed. This thesis is situated within a specific subset of HRI activity: social robotics, a subfield concerned with robots designed for direct social interaction with humans, and more narrowly within that, WoZ experiments used to prototype and evaluate social robot behaviors. To understand how the proposed framework advances this research paradigm, I review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field which brings together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}. Yet two challenges have historically limited participation from non-technical researchers in WoZ-based HRI studies. First, high technical barriers prevent many domain experts from conducting independent studies. Second, each research group builds custom software for specific robots, creating tool fragmentation across the field.

\section{Existing WoZ Platforms and Tools}

Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human. In the latter case (Oz of Wizard), the robot acts as if controlled by a person when it is actually autonomous. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.

Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. These systems were designed to work with multiple robot types rather than a single hardware platform. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating the \emph{Accessibility Problem} for non-technical researchers.

A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U does not work for other robot platforms. Manufacturer-provided software for various robots follow a similar pattern.

Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows using drag-and-drop blocks without writing code in traditional programming languages. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. Pettersson and Wik, in their review of WoZ tools \cite{Pettersson2015}, note that platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.

Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the Super Wizard of Oz (SWoOZ) platform. This system integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high-fidelity data collection.

This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor.

\begin{figure}[htbp]
\centering
\begin{tikzpicture}[
    scale=1.0,
    quadbox/.style={rectangle, draw=white, ultra thick, minimum width=5.5cm, minimum height=4.5cm, align=center},
    title/.style={font=\small\bfseries, align=center},
    desc/.style={font=\footnotesize, text=gray!60, align=center},
    axislabel/.style={font=\small\bfseries, align=center}
]

    % Quadrant Backgrounds
    \fill[gray!20] (0, 4.5) rectangle (5.5, 9.0);   % Top Left (HRIStudio)
    \fill[gray!15] (5.5, 4.5) rectangle (11.0, 9.0); % Top Right (Polonius)
    \fill[gray!10] (0, 0) rectangle (5.5, 4.5);      % Bottom Left (WoZ4U)
    \fill[gray!5] (5.5, 0) rectangle (11.0, 4.5);    % Bottom Right (Choregraphe)

    % Quadrant Lines
    \draw[white, ultra thick] (5.5, 0) -- (5.5, 9.0);
    \draw[white, ultra thick] (0, 4.5) -- (11.0, 4.5);

    % Axis Labels
    \node[axislabel, above] at (2.75, 9.2) {Low technical barrier};
    \node[axislabel, above] at (8.25, 9.2) {High technical barrier};
    \node[axislabel, left] at (-0.2, 6.75) {More rigorous};
    \node[axislabel, left] at (-0.2, 2.25) {Less rigorous};

    % Top Left: The Gap
    \node[axislabel] at (2.75, 6.75) {\Huge ?};

    % Top Right: Polonius, OpenWoZ, SWoOZ
    \node[title] at (8.25, 7.4) {Polonius, OpenWoZ\\SWoOZ, VR Environments};
    \node[desc] at (8.25, 6.0) {Flexible and powerful,\\but requires significant\\programming expertise};

    % Bottom Left: WoZ4U
    \node[title] at (2.75, 2.7) {WoZ4U};
    \node[desc] at (2.75, 1.7) {Accessible, but\\platform-specific\\No methodological rigor};

    % Bottom Right: Choregraphe
    \node[title] at (8.25, 2.7) {Choregraphe};
    \node[desc] at (8.25, 1.7) {Requires specialized\\training\\No methodological rigor};

\end{tikzpicture}
\caption{WoZ tool design space by technical barrier and methodological rigor.}
\label{fig:tool-matrix}
\end{figure}

The missing quadrant in Figure~\ref{fig:tool-matrix} matters because methodological rigor requires systematic features that guide experimenters toward best practices: consistently following experimental protocols, maintaining comprehensive logging, and producing reproducible experimental designs. Few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity---that is, whether observed outcomes can be attributed to the intended experimental manipulation rather than to uncontrolled variation in wizard behavior. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.

\section{Requirements for Modern WoZ Infrastructure}

This thesis is the latest step in a multi-year effort to build infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:

\begin{description}
\item[R1: Integrated workflow.] All phases of the experimental workflow (design, execution, and analysis) should be integrated within a single unified environment, so that researchers do not need to move between separate tools to design, run, and analyze their experiments.
\item[R2: Low technical barrier.] Creating social interaction protocols between robot and human should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}.
\item[R3: Real-time control.] The system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms. Consistent real-time control across platforms also directly supports reproducibility: the same script should execute with equivalent responsiveness regardless of which robot is used.
\item[R4: Automated logging.] All actions, timings, and sensor data should be automatically logged with synchronized timestamps to facilitate analysis.
\item[R5: Platform agnosticism.] The architecture should decouple experimental logic from robot-specific actions and behaviors. This allows experiments designed for one robot platform to be adapted to others, ensuring the WoZ platform remains viable as hardware evolves. This requirement also directly addresses the Reproducibility Problem: a platform-agnostic design makes it possible to run the same interaction script on different robots with minimal change to the implementing program.
\item[R6: Collaborative support.] Multiple team members should be able to contribute to experiment design and review of execution data, supporting truly interdisciplinary research.
\end{description}

To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and reproducibility remains unresolved. Few tools embed methodological best practices directly into their design to guide experimenters toward sound methodology by default.

This work builds on two prior peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.

Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.

While those prior publications established the conceptual framework and technical architecture, this thesis formalizes those design principles, realizes them in a complete implementation, and evaluates whether they produce measurably different outcomes in a pilot validation study. The pilot study compares design fidelity and execution reliability between HRIStudio and a representative baseline tool, showing whether these principles translate into better outcomes for real researchers.

\section{Chapter Summary}

This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet to the best of my knowledge, no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure (R1--R6): integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform-agnostic design, and collaborative support. These requirements are the standard against which the proposed design is evaluated in Chapter~\ref{ch:evaluation}. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.