post-m04-ch02 edits

2026-06-23 18:21:44 -04:00 · 2026-02-19 23:11:07 -05:00
parent 9423fc09b6
commit c417f22209
3 changed files with 43 additions and 13 deletions
@@ -9,7 +9,7 @@ To build the social robots of tomorrow, researchers must study how people respon

 Social robotics focuses on robots designed for social interaction with humans, and it poses unique challenges for autonomy. In a typical social robotics interaction, a robot operates autonomously based on pre-programmed behaviors. Because human interaction is inherently unpredictable, pre-programmed autonomy often fails to respond appropriately to subtle social cues, causing the interaction to degrade.

-To overcome this limitation, researchers use the Wizard-of-Oz (WoZ) technique. Consider a scenario where a researcher wants to test whether a robot tutor can effectively encourage student subjects during a learning task. Rather than building a complete autonomous system with speech recognition, natural language understanding, and emotion detection, the researcher uses a WoZ setup: a human operator (the ``wizard'') sits in a separate room, observing the interaction through cameras and microphones. When the subject appears frustrated, the wizard makes the robot say an encouraging phrase and perform a supportive gesture. To the subject, the robot appears to be acting autonomously, responding naturally to the subject's emotional state. This methodology allows researchers to rapidly prototype and test interaction designs, gathering valuable data about human responses before investing in the development of complex autonomous capabilities.
+To overcome this limitation, researchers use the Wizard-of-Oz (WoZ) technique. The name references L. Frank Baum's story \cite{Baum1900}, in which the "great and powerful" Oz is revealed to be an ordinary person operating machinery behind a curtain, creating an illusion of magic. In HRI, the wizard similarly creates an illusion of robot intelligence from behind the scenes. Consider a scenario where a researcher wants to test whether a robot tutor can effectively encourage student subjects during a learning task. Rather than building a complete autonomous system with speech recognition, natural language understanding, and emotion detection, the researcher uses a WoZ setup: a human operator (the ``wizard'') sits in a separate room, observing the interaction through cameras and microphones. When the subject appears frustrated, the wizard makes the robot say an encouraging phrase and perform a supportive gesture. To the subject, the robot appears to be acting autonomously, responding naturally to the subject's emotional state. This methodology allows researchers to rapidly prototype and test interaction designs, gathering valuable data about human responses before investing in the development of complex autonomous capabilities.

 Despite its versatility, WoZ research faces two critical challenges. The first is the accessibility problem: a high technical barrier prevents many non-programmers, such as experts in psychology or sociology, from conducting their own studies without engineering support. The second is the reproducibility problem: the hardware landscape is highly fragmented, and researchers frequently build custom control interfaces for specific robots and experiments. These tools are rarely shared, making it difficult for the scientific community to replicate results or compare findings across labs.

@@ -3,38 +3,52 @@

 This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic.

-As established in Chapter~\ref{ch:intro}, the Wizard-of-Oz technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are fully developed. To understand how the proposed framework advances this research paradigm, it is essential to review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field, bringing together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}, yet the fragmentation of tools and technical barriers have historically limited participation from non-technical researchers.
+As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are developed. To understand how the proposed framework advances this research paradigm, I review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field which brings together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}. Yet tool fragmentation--where each research group builds custom software for specific robots--and technical barriers have historically limited participation from non-technical researchers.

 \section{Existing WoZ Platforms and Tools}

-Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot and the robot simulating the human. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.
+Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human (Oz of Wizard, where the robot acts as if controlled by a person when it is actually autonomous). This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.

-Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS), exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its multi-client design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality. Critically, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, limiting its accessibility to non-technical researchers.
+Early platform-agnostic tools--systems designed to work with multiple robot types rather than a single hardware platform--focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating an accessibility problem for non-technical researchers.

-A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows without traditional coding. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. As Pettersson and Wik note in their review of WoZ tools \cite{Pettersson2015}, platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.
+A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern.

-Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the SWoOZ platform, a super-Wizard of Oz system integrating facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ; Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
+Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows using drag-and-drop blocks without writing code in traditional programming languages. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. Pettersson and Wik, in their review of WoZ tools \cite{Pettersson2015}, note that platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.

-This expanding landscape reveals a persistent fundamental lack in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing how intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
+Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the SWoOZ (Super Wizard of Oz) platform, which integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
+
+This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor--meaning systematic features that guide experimenters toward best practices like standardized protocols, comprehensive logging, and reproducible experimental designs.
+
+Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.

 \section{Requirements for Modern WoZ Infrastructure}

-Based on the analysis of existing platforms and identified methodological gaps, I establish requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide. First, all phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. Second, creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. Third, the system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
+Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:

-Fourth, automated logging of all actions, timings, and sensor data should be built-in, with synchronized timestamps to facilitate analysis. Fifth, the architecture should decouple experimental logic from robot-specific implementations through platform agnostic development, ensuring the platform remains viable as hardware evolves. Finally, collaborative features should allow multiple team members to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
+\begin{enumerate}
+\item[R1:] \textbf{Integrated workflow.} All phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation.
+\item[R2:] \textbf{Low technical barrier.} Creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}.
+\item[R3:] \textbf{Real-time control.} The system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
+\item[R4:] \textbf{Automated logging.} All actions, timings, and sensor data should be automatically logged with synchronized timestamps to facilitate analysis.
+\item[R5:] \textbf{Platform agnosticism.} The architecture should decouple experimental logic from robot-specific implementations, meaning experiments designed for one robot type can be adapted to others, ensuring the platform remains viable as hardware evolves.
+\item[R6:] \textbf{Collaborative support.} Multiple team members should be able to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
+\end{enumerate}

-No existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design.
+To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design--like training wheels on a bicycle, guiding experimenters to follow sound methodology by default.

 \section{Prior Work}

 This thesis represents the culmination of a multi-year research effort to develop infrastructure that meets these requirements. The ideas presented here build upon prior work established in two peer-reviewed publications.

-We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that work, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.
+We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.

 Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.

-While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I move beyond prototypes to deliver a complete, functional software system, resolving complex engineering challenges related to stability, latency, and deployment. Second, I provide the first rigorous user study comparing the proposed framework against industry-standard tools. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
+While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I implement a functional software system that addresses engineering challenges related to stability, latency, and deployment, providing a minimum viable product for evaluation. Second, I provide a rigorous user study comparing the proposed framework against a representative baseline tool. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
+
+% Chapter notes:
+% I am wondering if you should give a short description of what the Wizard of Oz was. I know that now people are into Wicked, but for many years, I didn't know the story. Of course, it should appear early in this document. ChatGPT Generated: The Wizard’s supposed magic is an illusion sustained by spectacle and belief. His authority depends not on real power, but on people’s willingness to accept appearances without questioning them. When the curtain is pulled back, we learn that power can be constructed--and that fear and reverence often rest on performance rather than substance.

 \section{Chapter Summary}

-This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure: integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
+This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet to the best of my knowledge, no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure (R1-R6): integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform-agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
@@ -1,3 +1,11 @@
+@book{Baum1900,
+  title={{The Wonderful Wizard of Oz}},
+  author={Baum, L. Frank},
+  year={1900},
+  publisher={George M. Hill Company},
+  address={Chicago, IL}
+}
+
@article{Lu2011,
  title={{Polonius: A Wizard of Oz Interface for HRI Experiments}},
  author={Lu, David V. and Smart, William D.},
@@ -34,6 +42,14 @@
  publisher={IEEE}
 }

+@inproceedings{Quigley2009,
+  title={{ROS: an open-source Robot Operating System}},
+  author={Quigley, Morgan and Conley, Ken and Gerkey, Brian and Faust, Josh and Foote, Tully and Leibs, Jeremy and Wheeler, Rob and Ng, Andrew Y},
+  booktitle={IEEE International Conference on Robotics and Automation},
+  year={2009},
+  url={https://api.semanticscholar.org/CorpusID:6324125}
+}
+
@article{Riek2012,
 author = {Riek, Laurel D.},
 title = {{Wizard of Oz studies in HRI: a systematic review and new reporting guidelines}},