Update thesis content and improve reproducibility framework

- Refine introduction and background chapters for clarity and coherence. - Enhance reproducibility chapter by connecting challenges to infrastructure requirements. - Add new references to support the thesis arguments. - Update .gitignore to include IDE files. - Modify hyperref package usage to hide colored boxes in the document.
2026-05-08 15:18:54 -04:00 · 2026-02-12 10:26:49 -05:00
parent b29e14c054
commit b75f31271b
6 changed files with 65 additions and 19 deletions
@@ -1,23 +1,27 @@
 \chapter{Background and Related Work}
 \label{ch:background}

-This chapter provides the necessary context for understanding the challenges addressed by this thesis. I begin by surveying the landscape of existing WoZ platforms, analyzing their capabilities and limitations. Through this analysis, I identify critical gaps in current tooling that motivate the need for a new approach. Finally, I position this thesis relative to prior work on this topic.
+This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic.

-As established in Chapter~\ref{ch:intro}, the Wizard-of-Oz technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are fully developed. To understand how the proposed framework advances this research paradigm, it is essential to review the existing landscape of WoZ platforms, identify their limitations, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field, bringing together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}, yet the fragmentation of tools and technical barriers have historically limited participation from non-technical researchers.
+As established in Chapter~\ref{ch:intro}, the Wizard-of-Oz technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are fully developed. To understand how the proposed framework advances this research paradigm, it is essential to review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field, bringing together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}, yet the fragmentation of tools and technical barriers have historically limited participation from non-technical researchers.

 \section{Existing WoZ Platforms and Tools}

-Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases. Early efforts focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS), exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise.
+Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot and the robot simulating the human. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.

-Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its multi-client design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality. Critically, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, limiting its accessibility to non-technical researchers.
+Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS), exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its multi-client design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality. Critically, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, limiting its accessibility to non-technical researchers.

-A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with SoftBank's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. Choregraphe \cite{Pot2009}, developed by SoftBank Robotics for the NAO and Pepper robots, offers a sophisticated visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows without traditional coding. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. As Pettersson and Wik note in their review of WoZ tools \cite{Pettersson2015}, platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.
+A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows without traditional coding. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. As Pettersson and Wik note in their review of WoZ tools \cite{Pettersson2015}, platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.

-This survey reveals a fundamental tension in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. No existing tool successfully balances accessibility, flexibility, and sustainability. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results.
+Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the SWoOZ platform, a super-Wizard of Oz system integrating facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ; Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
+
+This expanding landscape reveals a persistent fundamental lack in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing how intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.

 \section{Requirements for Modern WoZ Infrastructure}

-Based on the analysis of existing platforms and identified methodological gaps, I establish requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide. First, all phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. Second, creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. Third, the system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms. Fourth, automated logging of all actions, timings, and sensor data should be built-in, with synchronized timestamps to facilitate analysis. Fifth, the architecture should decouple experimental logic from robot-specific implementations through platform agnosticism, ensuring the platform remains viable as hardware evolves. Finally, collaborative features should allow multiple team members to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
+Based on the analysis of existing platforms and identified methodological gaps, I establish requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide. First, all phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. Second, creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. Third, the system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
+
+Fourth, automated logging of all actions, timings, and sensor data should be built-in, with synchronized timestamps to facilitate analysis. Fifth, the architecture should decouple experimental logic from robot-specific implementations through platform agnostic development, ensuring the platform remains viable as hardware evolves. Finally, collaborative features should allow multiple team members to contribute to experiment design and review execution data, supporting truly interdisciplinary research.

 No existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design.

@@ -29,8 +33,8 @@ We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2

 Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.

-While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I move beyond prototypes to deliver a complete, functional software system, resolving complex engineering challenges related to stability, latency, and deployment. Second, and most importantly, I provide the first rigorous user study comparing the proposed framework against industry-standard tools. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
+While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I move beyond prototypes to deliver a complete, functional software system, resolving complex engineering challenges related to stability, latency, and deployment. Second, I provide the first rigorous user study comparing the proposed framework against industry-standard tools. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.

 \section{Chapter Summary}

-This chapter has established the technical and methodological context for this thesis. I introduced HRI and the WoZ technique, surveyed existing platforms and their limitations, and identified a persistent tension between accessibility and flexibility in current tools. I established six requirements for modern WoZ infrastructure and positioned this thesis relative to prior publications on this topic. The analysis reveals a clear gap: no existing platform simultaneously achieves accessibility for non-programmers, flexibility across robot platforms, and embedded support for methodological rigor. The following chapters describe how this approach addresses that gap.
+This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure: integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.