Compare commits

...

4 Commits

Author SHA1 Message Date
Sean O'Connor
b75f31271b Update thesis content and improve reproducibility framework
All checks were successful
Build Proposal and Thesis / build-github (push) Has been skipped
Build Proposal and Thesis / build-gitea (push) Successful in 49s
- Refine introduction and background chapters for clarity and coherence.
- Enhance reproducibility chapter by connecting challenges to infrastructure requirements.
- Add new references to support the thesis arguments.
- Update .gitignore to include IDE files.
- Modify hyperref package usage to hide colored boxes in the document.
2026-02-12 10:26:49 -05:00
Sean O'Connor
b29e14c054 fill chapter03 gap, write new chapter 3 2026-02-10 00:56:19 -05:00
Sean O'Connor
a9cfd1a52c clean up background 2026-02-10 00:45:24 -05:00
Sean O'Connor
bc8e137f5b update introduction after milestone 3 2026-02-10 00:08:57 -05:00
14 changed files with 138 additions and 55 deletions

3
.gitignore vendored
View File

@@ -22,3 +22,6 @@ context
# OS files
.DS_Store
# IDE files
.vscode/

View File

@@ -1,24 +1,32 @@
\chapter{Introduction}
\label{ch:intro}
Human-Robot Interaction (HRI) is an essential field of study for understanding how robots should communicate, collaborate, and coexist with people. As researchers work to develop social robots capable of natural interaction, they face a fundamental challenge: how to prototype and evaluate interaction designs before the underlying autonomous systems are fully developed. This chapter introduces the technical and methodological barriers that currently limit HRI research, describes a generalized approach to address these challenges, and establishes the research objectives and thesis statement for this work.
\section{Motivation}
To build the social robots of tomorrow, researchers must find ways to convincingly simulate them today. The process of designing and optimizing interactions between human and robot is essential to the field of Human-Robot Interaction (HRI), a discipline dedicated to ensuring these technologies are safe, effective, and accepted by the public. However, current practices for prototyping these interactions are often hindered by complex technical requirements and inconsistent methodologies.
To build the social robots of tomorrow, researchers must find ways to convincingly simulate them today. The process of designing and optimizing interactions between human and robot is essential to HRI, a discipline dedicated to ensuring these technologies are safe, effective, and accepted by the public \cite{Bartneck2024}. However, current practices for prototyping these interactions are often hindered by complex technical requirements and inconsistent methodologies.
In a typical social robotics interaction, a robot operates autonomously based on pre-programmed behaviors. Because human interaction is inherently unpredictable, pre-programmed autonomy often fails to respond appropriately to subtle social cues, causing the interaction to degrade. To overcome this, researchers utilize the Wizard-of-Oz (WoZ) technique, where a human operator--the ``wizard''--controls the robot's actions in real-time, creating the illusion of autonomy. This allows for rapid prototyping and testing of interaction designs before the underlying artificial intelligence is fully matured.
Social robotics, a subfield of HRI focused on robots designed for social interaction with humans, presents unique challenges. In a typical social robotics interaction, a robot operates autonomously based on pre-programmed behaviors. Because human interaction is inherently unpredictable, pre-programmed autonomy often fails to respond appropriately to subtle social cues, causing the interaction to degrade.
Despite its versaility, WoZ research faces two critical challenges. First, a high technical barrier prevents many non-programmers, such as experts in psychology or sociology, from conducting their own studies without engineering support. Second, the hardware landscape is highly fragmented. Researchers frequently build bespoke, ``one-off'' control interfaces for specific robots and specific experiments. These ad-hoc tools are rarely shared, making it difficult for the scientific community to replicate studies or verify findings. This has led to a replication crisis in HRI, where a lack of standardized tooling undermines the reliability of the field's body of knowledge.
To overcome this limitation, researchers employ the Wizard-of-Oz (WoZ) technique. Consider a scenario where a researcher wants to test whether a robot tutor can effectively encourage student subjects during a learning task. Rather than building a complete autonomous system with speech recognition, natural language understanding, and emotion detection, the researcher uses WoZ: a human operator (the ``wizard'') sits in a separate room, observing the interaction through cameras and microphones. When the subject appears frustrated, the wizard triggers the robot to say an encouraging phrase and perform a supportive gesture. To the subject, the robot appears to be acting autonomously, responding naturally to their emotional state. This methodology allows researchers to rapidly prototype and test interaction designs, gathering valuable data about human responses before investing in the development of complex autonomous capabilities.
\section{HRIStudio Overview}
Despite its versatility, WoZ research faces two critical challenges. First, a high technical barrier prevents many non-programmers, such as experts in psychology or sociology, from conducting their own studies without engineering support. Second, the hardware landscape is highly fragmented. Researchers frequently build bespoke, ``one-off'' control interfaces for specific robots and specific experiments. These ad-hoc tools are rarely shared, making it difficult for the scientific community to replicate studies or verify findings. This has led to a replication crisis in HRI, where a lack of standardized tooling undermines the reliability of the field's body of knowledge.
To address these challenges, this thesis presents HRIStudio, a web-based platform designed to manage the entire lifecycle of a WoZ experiment: from interaction design, through live execution, to final analysis.
\section{Proposed Approach}
HRIStudio is built on three core design principles: disciplinary accessibility, scientific reproducibility, and platform sustainability. To achieve accessibility, the platform replaces complex code with a visual, drag-and-drop interface, allowing domain experts to design interaction flows much like creating a storyboard. To ensure reproducibility, HRIStudio enforces a structured experimental workflow that acts as a ``smart co-pilot'' for the wizard. It guides them through a standardized script to minimize human error while automatically logging synchronized data streams for analysis. Finally, unlike tools tightly coupled to specific hardware, HRIStudio utilizes a robot-agnostic architecture to ensure sustainability. This design ensures that the platform remains a viable tool for the community even as individual robot platforms become obsolete.
To address the challenges of accessibility and reproducibility in WoZ-based HRI research, I propose a web-based software framework that integrates three key capabilities. First, the framework must provide an intuitive interface for experiment design that does not require programming expertise, enabling domain experts from psychology, sociology, or other fields to create interaction protocols independently. Second, it must enforce methodological rigor during experiment execution by guiding the wizard through standardized procedures and preventing deviations from the experimental script that could compromise validity. Third, it must be platform-agnostic, separating experimental design from specific robot hardware to ensure the framework remains viable as technology evolves.
This approach represents a shift from the current paradigm of bespoke, robot-specific tools toward a unified platform that can serve as shared infrastructure for the HRI research community. By treating experiment design, execution, and analysis as distinct but integrated phases within a single system, such a framework can systematically address the sources of variability and technical barriers that currently limit research quality and reproducibility.
The implementation of this approach, realized as HRIStudio, demonstrates the feasibility of web-based control for real-time robot interaction studies. While HRIStudio is available as open-source software, it should be understood as a minimum viable product developed to validate the proposed framework. It is provided without ongoing technical support and serves primarily as a proof-of-concept for the architectural and methodological principles presented in this work.
\section{Research Objectives}
The primary objective of this work is to demonstrate that a unified, web-based software framework can significantly improve both the accessibility and reproducibility of HRI research. Specifically, this thesis aims to develop a production-ready platform, validate its accessibility for non-programmers, and assess its impact on experimental rigor.
This thesis builds upon foundational work presented in two prior peer-reviewed publications. We first introduced the conceptual framework for HRIStudio at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}, establishing the vision for a collaborative, web-based platform. Subsequently, we published the detailed system architecture and preliminary prototype at RO-MAN 2025 \cite{OConnor2025}, validating the technical feasibility of web-based robot control. These publications form the foundation upon which this thesis asks its central research question: can a unified, web-based software framework for Wizard-of-Oz experiments measurably improve both the disciplinary accessibility and scientific reproducibility of Human-Robot Interaction research compared to existing platform-specific tools?
First, this work translates the foundational architecture proposed in prior publications into a stable, full-featured software platform capable of supporting real-world experiments. Second, through a formal user study, we evaluate whether HRIStudio allows participants with no robotics experience to successfully design and execute a robot interaction, comparing their performance against industry-standard software. Finally, we quantify the impact of the platform's guided execution features on the consistency of wizard behavior and the accuracy of data collection.
To answer this question, this thesis validates the framework through implementation and empirical evaluation. I translate the architectural concepts from the prior work into a complete, functional software platform and subject it to rigorous testing with real users. The successful demonstration of this approach would provide evidence that thoughtful software infrastructure can lower barriers to entry in HRI while simultaneously improving the methodological rigor of the field.
This work builds upon preliminary concepts reported in two peer-reviewed publications \cite{OConnor2024, OConnor2025}. It extends that research by delivering the complete implementation of the system and a comprehensive empirical evaluation of its efficacy.
\section{Chapter Summary}
This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research: high technical barriers that limit accessibility to non-programmers, and fragmented tooling that undermines reproducibility. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and empirical evaluation. To validate this approach, the next chapters establish the technical and methodological foundations.

View File

@@ -1,20 +1,40 @@
\chapter{Background and Context}
\chapter{Background and Related Work}
\label{ch:background}
\section{Human-Robot Interaction and Wizard-of-Oz}
This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic.
HRI is a multidisciplinary field dedicated to understanding, designing, and evaluating robotic systems for use by or with humans. Unlike industrial robotics, where safety often means physical separation, social robotics envisions a future where robots operate in shared spaces, collaborating with people in roles ranging from healthcare assistants and educational tutors to customer service agents.
As established in Chapter~\ref{ch:intro}, the Wizard-of-Oz technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are fully developed. To understand how the proposed framework advances this research paradigm, it is essential to review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field, bringing together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}, yet the fragmentation of tools and technical barriers have historically limited participation from non-technical researchers.
For these interactions to be effective, robots must exhibit social intelligence. They must recognize and respond to human social cues--such as speech, gaze, and gesture--in a manner that is natural and intuitive. However, developing the artificial intelligence required for fully autonomous social interaction is an immense technical challenge. Perception systems often struggle in noisy environments, and natural language understanding remains an area of active research.
\section{Existing WoZ Platforms and Tools}
To bridge the gap between current technical limitations and desired interaction capabilities, researchers employ the WoZ technique. In a WoZ experiment, a human operator (the ``wizard'') remotely controls the robot's behaviors, unaware to the study participant. To the participant, the robot appears to be acting autonomously. This methodology allows researchers to test hypotheses about human responses to robot behaviors without needing to solve the underlying engineering challenges first.
Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot and the robot simulating the human. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.
Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS), exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its multi-client design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality. Critically, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, limiting its accessibility to non-technical researchers.
A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows without traditional coding. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. As Pettersson and Wik note in their review of WoZ tools \cite{Pettersson2015}, platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.
Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the SWoOZ platform, a super-Wizard of Oz system integrating facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ; Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
This expanding landscape reveals a persistent fundamental lack in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing how intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
\section{Requirements for Modern WoZ Infrastructure}
Based on the analysis of existing platforms and identified methodological gaps, I establish requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide. First, all phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. Second, creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. Third, the system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
Fourth, automated logging of all actions, timings, and sensor data should be built-in, with synchronized timestamps to facilitate analysis. Fifth, the architecture should decouple experimental logic from robot-specific implementations through platform agnostic development, ensuring the platform remains viable as hardware evolves. Finally, collaborative features should allow multiple team members to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
No existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design.
\section{Prior Work}
This thesis represents the culmination of a multi-year research effort to address critical infrastructure gaps in the HRI community. The ideas presented here build upon a foundational trajectory established through two peer-reviewed publications.
This thesis represents the culmination of a multi-year research effort to develop infrastructure that meets these requirements. The ideas presented here build upon prior work established in two peer-reviewed publications.
We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that work, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements for the system: disciplinary accessibility, robot agnosticism, and reproducibility.
We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that work, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.
Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our web-based approach, detailing the communication protocols and data models necessary to support real-time robot control using standard web technologies.
Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.
While those prior publications established the ``what'' and the ``how'' of HRIStudio, this thesis focuses on the realization and validation of the platform. We extend our previous research in two key ways. First, we move beyond prototypes to deliver a complete, production-ready software platform (v1.0), resolving complex engineering challenges related to stability, latency, and deployment. Second, and crucially, we provide the first rigorous user study of the platform. By comparing HRIStudio against industry-standard tools, this work provides empirical evidence to support our claims of improved accessibility and experimental consistency.
While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I move beyond prototypes to deliver a complete, functional software system, resolving complex engineering challenges related to stability, latency, and deployment. Second, I provide the first rigorous user study comparing the proposed framework against industry-standard tools. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
\section{Chapter Summary}
This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure: integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.

View File

@@ -1,18 +0,0 @@
\chapter{Related Work and State of the Art}
\label{ch:related_work}
\section{Existing Frameworks}
The HRI community has a long history of developing custom tools to support WoZ studies. Early efforts focused on providing robust interfaces for technical users. For example, Polonius \cite{Lu2011} was designed to give robotics engineers a flexible way to create experiments for their collaborators, emphasizing integrated logging to streamline analysis. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture that allowed researchers to modify robot behaviors on the fly. These tools represented significant advancements in experimental infrastructure, moving the field away from purely hard-coded scripts. However, they largely targeted users with significant technical expertise, requiring knowledge of specific programming languages or network protocols to configure and extend.
\section{General vs. Domain-Specific Tools}
A recurring tension in the design of HRI tools is the trade-off between specialization and generalizability. Some tools prioritize usability by coupling tightly with specific hardware. WoZ4U \cite{Rietz2021}, for instance, provides an intuitive graphical interface specifically for the Pepper robot, making it accessible to non-technical researchers but unusable for other platforms. Manufacturer-provided software like Choregraphe \cite{Pot2009} for the NAO robot follows a similar pattern: it offers a powerful visual programming environment but locks the user into a single vendor's ecosystem. Conversely, generic tools like Ozlab seek to support a wide range of devices but often struggle to maintain relevance as hardware evolves \cite{Pettersson2015}. This fragmentation forces labs to constantly switch tools or reinvent infrastructure, hindering the accumulation of shared methodological knowledge.
\section{Methodological Critiques}
Beyond software architecture, the methodological rigor of WoZ studies has been a subject of critical review. In a seminal systematic review, Riek \cite{Riek2012} analyzed 54 HRI studies and uncovered a widespread lack of consistency in how wizard behaviors were controlled and reported. The review noted that very few researchers reported standardized wizard training or measured wizard error rates, raising concerns about the internal validity of many experiments. This lack of rigor is often exacerbated by the tools themselves; when interfaces are ad-hoc or poorly designed, they increase the cognitive load on the wizard, leading to inconsistent timing and behavior that can confound study results.
\section{Research Gaps}
Despite the rich landscape of existing tools, a critical gap remains for a platform that is simultaneously accessible, reproducible, and sustainable. Existing accessible tools are often too platform-specific to be widely adopted, while flexible, general-purpose frameworks often present a prohibitively high technical barrier. Furthermore, few tools directly address the methodological crisis identified by Riek by enforcing standardized protocols or actively guiding the wizard during execution. HRIStudio aims to fill this void by providing a web-based, robot-agnostic platform that not only lowers the barrier to entry for interdisciplinary researchers but also embeds methodological best practices directly into the experimental workflow.

View File

@@ -0,0 +1,32 @@
\chapter{Reproducibility Challenges in WoZ-based HRI Research}
\label{ch:reproducibility}
Having established the landscape of existing WoZ platforms and their limitations, I now examine the factors that make WoZ experiments difficult to reproduce and how software infrastructure can address them. This chapter analyzes the sources of variability in WoZ studies, examines how current practices in infrastructure and reporting contribute to reproducibility problems, and derives specific platform requirements that can mitigate these issues. Understanding these challenges is essential for designing a system that supports experimentation at scale while remaining scientifically rigorous.
\section{Sources of Variability}
Reproducibility in experimental research requires that independent investigators can obtain consistent results when following the same procedures. In WoZ-based HRI studies, however, multiple sources of variability can compromise this goal. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants. Even with a detailed script, the wizard may vary in timing, with delays between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may choose differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.
Riek's systematic review \cite{Riek2012} found that very few published studies reported measuring wizard error rates or providing standardized wizard training. Without such measures, it becomes impossible to determine whether experimental results reflect the intended interaction design or inadvertent variations in wizard behavior.
Beyond wizard behavior, the ``one-off'' nature of many WoZ control systems introduces technical variability. When each research group builds custom software for each study, several problems arise. Custom interfaces may have undocumented capabilities, hidden features, default behaviors, or timing characteristics that are never formally described. Software tightly coupled to specific robot models or operating system versions may become unusable when hardware is upgraded or replaced. Each system logs data differently, with different file formats, different levels of granularity, and different choices about what to record. This fragmentation means that replicating a study often requires not just following an experimental protocol but also reverse-engineering or rebuilding the original software infrastructure.
Even when researchers intend for their work to be reproducible, practical constraints on publication length lead to incomplete documentation. Exact timing parameters are often omitted. Decision rules for wizard actions remain unspecified. Details of the wizard interface go unreported. Specifications of data collection, including which sensor streams were recorded and at what sampling rate, are frequently missing. Without this information, other researchers cannot faithfully recreate the experimental conditions, limiting both direct replication and conceptual extensions of prior work.
\section{Infrastructure Requirements for Enhanced Reproducibility}
Based on this analysis, I identify specific ways that software infrastructure can mitigate reproducibility challenges. Rather than merely providing tools for wizard control, an ideal WoZ platform should actively guide wizards through scripted procedures. This means presenting actions in a prescribed sequence to prevent out-of-order execution, highlighting the current step in the protocol, recording any deviations from the script as explicit events in the data log, and supporting repeatable decision logic through clearly defined conditional branches. By constraining wizard behavior within the bounds of the experimental design, the system reduces unintended variability across trials and participants.
Manual data collection is error-prone and often incomplete. The platform should automatically record every action triggered by the wizard with precise timestamps, all robot sensor data and state changes, timing information indicating when actions were requested, when they began executing, and when they completed, as well as the full experimental protocol embedded in the log file so that the script used for any session can be recovered later. This approach of recording data by default ensures that critical information is never accidentally omitted.
The experimental design itself should serve as documentation. When interaction protocols are defined using structured formats such as visual flowcharts or declarative scripts rather than imperative code, they become simultaneously executable and human-readable. Researchers can then share complete, unambiguous descriptions of their experimental procedures alongside their results.
To maximize the lifespan and transferability of experimental designs, the platform must separate the high-level logic of an interaction from the low-level details of how specific robots execute those behaviors. This abstraction allows experiments designed for one robot to be adapted to another, extending the reproducibility of interaction designs even when the original hardware becomes obsolete.
\section{Connecting Reproducibility Challenges to Infrastructure Requirements}
The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior violates the requirement for enforced experimental protocols and comprehensive automatic logging. The absence of standardized logging formats and sensor specifications violates both the automated logging and self-documenting design requirements. Technical fragmentation violates the platform-agnostic requirement, as bespoke systems become obsolete when hardware evolves. Incomplete documentation reflects a failure to treat experiment design as executable, self-documenting specifications. No existing platform simultaneously satisfies all six requirements: most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design. As Chapter~\ref{ch:background} demonstrated, this gap persists across a decade of platform development. Addressing it requires a fundamental rethinking of how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.
\section{Chapter Summary}
This chapter has analyzed the reproducibility challenges inherent in WoZ-based HRI research, identifying three primary sources of variability: inconsistent wizard behavior, fragmented technical infrastructure, and incomplete documentation. Rather than treating these challenges as inherent to the WoZ paradigm, I showed how each stems from gaps in current infrastructure. Software design can systematically mitigate these challenges through enforced experimental protocols, comprehensive automatic logging, self-documenting experiment designs, and platform-independent abstractions. These design goals directly address the six infrastructure requirements identified in Chapter~\ref{ch:background}. The following chapters describe the design, implementation, and empirical evaluation of a system that prioritizes reproducibility as a foundational design principle from inception.

View File

@@ -1,11 +0,0 @@
\chapter{Reproducibility Challenges in WoZ-based HRI Research}
\label{ch:reproducibility}
\section{Sources of Variability}
% TODO
\section{Infrastructure and Reporting}
% TODO
\section{Platform Requirements}
% TODO

View File

@@ -129,4 +129,53 @@ series = {OzCHI '15}
keywords={Humanoid robots;Robot programming;Mobile robots;Human robot interaction;Programming environments;Prototypes;Microcomputers;Software tools;Software prototyping;Man machine systems},
doi={10.1109/ROMAN.2009.5326209}}
@book{Bartneck2024,
title={Human-Robot Interaction -- An Introduction},
author={Bartneck, Christoph and Belpaeme, Tony and Eyssel, Friederike and Kanda, Takayuki and Keijsers, Merel and Sabanovic, Selma},
year={2024},
edition={2nd},
publisher={Cambridge University Press},
address={Cambridge}
}
@inproceedings{Steinfeld2009,
author = {Steinfeld, Aaron and Jenkins, Odest Chadwicke and Scassellati, Brian},
title = {{The oz of wizard: simulating the human for interaction research}},
year = {2009},
isbn = {9781605582934},
publisher = {Association for Computing Machinery},
booktitle = {Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction},
pages = {101--108},
doi = {10.1145/1514095.1514115}
}
@inproceedings{Gibert2013,
author = {Gibert, Guillaume and Petit, Morgan and Lance, Frederic and Pointeau, Gregoire and Dominey, Peter F.},
title = {{What makes human so different? Analysis of human-humanoid robot interaction with a super wizard of oz platform}},
year = {2013},
booktitle = {Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {931--938},
doi = {10.1109/IROS.2013.6696465}
}
@article{Strazdas2020,
author = {Strazdas, Daniel and Hintz, Jonathan and Felßberg, Anna Maria and Al-Hamadi, Ayoub},
title = {{Robots and wizards: An investigation into natural humanrobot interaction}},
journal = {IEEE Access},
volume = {8},
pages = {218808--218821},
year = {2020},
doi = {10.1109/ACCESS.2020.3042287}
}
@inproceedings{Helgert2024,
author = {Helgert, Anna and Straßmann, Christopher and Eimler, Sabine C.},
title = {{Unlocking potentials of virtual reality as a research tool in human-robot interaction: A wizard-of-oz approach}},
year = {2024},
booktitle = {Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction},
pages = {123--132},
doi = {10.1145/3610978.3640741}
}

View File

@@ -4,6 +4,7 @@
%\usepackage{graphics} %Select graphics package
\usepackage{graphicx} %
%\usepackage{amsthm} %Add other packages as necessary
\usepackage[hidelinks]{hyperref} %Enable hyperlinks and \autoref, hide colored boxes
\begin{document}
\butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research}
\author{Sean O'Connor}
@@ -32,14 +33,13 @@
\include{chapters/01_introduction}
\include{chapters/02_background}
\include{chapters/03_related_work}
\include{chapters/04_reproducibility}
\include{chapters/05_system_design}
\include{chapters/06_implementation}
\include{chapters/07_evaluation}
\include{chapters/08_results}
\include{chapters/09_discussion}
\include{chapters/10_conclusion}
\include{chapters/03_reproducibility}
\include{chapters/04_system_design}
\include{chapters/05_implementation}
\include{chapters/06_evaluation}
\include{chapters/07_results}
\include{chapters/08_discussion}
\include{chapters/09_conclusion}
\backmatter
%\bibliographystyle{thesis_num} %This uses BU thesis file thesis_num.bst