Enhance clarity and structure in introduction, background, reproducibility, system design, and implementation chapters; add new references and include TikZ for diagrams

Refine background and system design chapters; enhance clarity and structure in experiment protocols and trial execution descriptions
Refine introduction and background chapters; enhance clarity and structure in system design section
2026-03-24 04:07:52 -04:00 · 2026-02-23 22:24:41 -05:00 · 2026-02-23 13:32:09 -05:00 · 2026-02-22 22:13:48 -05:00 · 2026-02-19 23:50:30 -05:00 · 2026-02-19 23:12:33 -05:00
7 changed files with 383 additions and 57 deletions
--- a/thesis/chapters/01_introduction.tex
+++ b/thesis/chapters/01_introduction.tex
@@ -5,28 +5,28 @@ Human-Robot Interaction (HRI) is an essential field of study for understanding h

 \section{Motivation}

-To build the social robots of tomorrow, researchers must find ways to convincingly simulate them today. The process of designing and optimizing interactions between human and robot is essential to HRI, a discipline dedicated to ensuring these technologies are safe, effective, and accepted by the public \cite{Bartneck2024}. However, current practices for prototyping these interactions are often hindered by complex technical requirements and inconsistent methodologies.
+To build the social robots of tomorrow, researchers must study how people respond to robot behavior today. That requires interactions that feel real even when autonomy is incomplete. The process of designing and optimizing interactions between human and robot is essential to HRI, a discipline dedicated to ensuring these technologies are safe, effective, and accepted by the public \cite{Bartneck2024}. However, current practices for prototyping these interactions are often hindered by complex technical requirements and inconsistent methodologies.

-Social robotics, a subfield of HRI focused on robots designed for social interaction with humans, presents unique challenges. In a typical social robotics interaction, a robot operates autonomously based on pre-programmed behaviors. Because human interaction is inherently unpredictable, pre-programmed autonomy often fails to respond appropriately to subtle social cues, causing the interaction to degrade.
+Social robotics focuses on robots designed for social interaction with humans, and it poses unique challenges for autonomy. In a typical social robotics interaction, a robot operates autonomously based on pre-programmed behaviors. Because human interaction is inherently unpredictable, pre-programmed autonomy often fails to respond appropriately to subtle social cues, causing the interaction to degrade.

-To overcome this limitation, researchers employ the Wizard-of-Oz (WoZ) technique. Consider a scenario where a researcher wants to test whether a robot tutor can effectively encourage student subjects during a learning task. Rather than building a complete autonomous system with speech recognition, natural language understanding, and emotion detection, the researcher uses WoZ: a human operator (the ``wizard'') sits in a separate room, observing the interaction through cameras and microphones. When the subject appears frustrated, the wizard triggers the robot to say an encouraging phrase and perform a supportive gesture. To the subject, the robot appears to be acting autonomously, responding naturally to their emotional state. This methodology allows researchers to rapidly prototype and test interaction designs, gathering valuable data about human responses before investing in the development of complex autonomous capabilities.
+To overcome this limitation, researchers use the Wizard-of-Oz (WoZ) technique. The name references L. Frank Baum's story \cite{Baum1900}, in which the "great and powerful" Oz is revealed to be an ordinary person operating machinery behind a curtain, creating an illusion of magic. In HRI, the wizard similarly creates an illusion of robot intelligence from behind the scenes. Consider a scenario where a researcher wants to test whether a robot tutor can effectively encourage student subjects during a learning task. Rather than building a complete autonomous system with speech recognition, natural language understanding, and emotion detection, the researcher uses a WoZ setup: a human operator (the ``wizard'') sits in a separate room, observing the interaction through cameras and microphones. When the subject appears frustrated, the wizard makes the robot say an encouraging phrase and perform a supportive gesture. To the subject, the robot appears to be acting autonomously, responding naturally to the subject's emotional state. This methodology allows researchers to rapidly prototype and test interaction designs, gathering valuable data about human responses before investing in the development of complex autonomous capabilities.

-Despite its versatility, WoZ research faces two critical challenges. First, a high technical barrier prevents many non-programmers, such as experts in psychology or sociology, from conducting their own studies without engineering support. Second, the hardware landscape is highly fragmented. Researchers frequently build bespoke, ``one-off'' control interfaces for specific robots and specific experiments. These ad-hoc tools are rarely shared, making it difficult for the scientific community to replicate studies or verify findings. This has led to a replication crisis in HRI, where a lack of standardized tooling undermines the reliability of the field's body of knowledge.
+Despite its versatility, WoZ research faces two critical challenges. The first is \emph{The Accessibility Problem}: a high technical barrier prevents many non-programmers, such as experts in psychology or sociology, from conducting their own studies without engineering support. The second is \emph{The Reproducibility Problem}: the hardware landscape is highly fragmented, and researchers frequently build custom control interfaces for specific robots and experiments. These tools are rarely shared, making it difficult for the scientific community to replicate results or compare findings across labs.

 \section{Proposed Approach}

-To address the challenges of accessibility and reproducibility in WoZ-based HRI research, I propose a web-based software framework that integrates three key capabilities. First, the framework must provide an intuitive interface for experiment design that does not require programming expertise, enabling domain experts from psychology, sociology, or other fields to create interaction protocols independently. Second, it must enforce methodological rigor during experiment execution by guiding the wizard through standardized procedures and preventing deviations from the experimental script that could compromise validity. Third, it must be platform-agnostic, separating experimental design from specific robot hardware to ensure the framework remains viable as technology evolves.
+To address the accessibility and reproducibility problems in WoZ-based HRI research, I propose a web-based software framework that integrates three key capabilities. First, the framework must provide an intuitive interface for experiment design that does not require programming expertise, enabling domain experts from psychology, sociology, or other fields to create interaction protocols independently. Second, it must enforce methodological rigor during experiment execution by guiding the wizard through standardized procedures and preventing deviations from the experimental script that could compromise validity. Third, it must be platform-agnostic, meaning the same experiment design can be reused across different robot hardware as technology evolves.

-This approach represents a shift from the current paradigm of bespoke, robot-specific tools toward a unified platform that can serve as shared infrastructure for the HRI research community. By treating experiment design, execution, and analysis as distinct but integrated phases within a single system, such a framework can systematically address the sources of variability and technical barriers that currently limit research quality and reproducibility.
+This approach represents a shift from the current paradigm of custom, robot-specific tools toward a unified platform that can serve as shared infrastructure for the HRI research community. By treating experiment design, execution, and analysis as distinct but integrated phases of a study, such a framework can systematically address both technical barriers and sources of variability that currently limit research quality and reproducibility.

-The implementation of this approach, realized as HRIStudio, demonstrates the feasibility of web-based control for real-time robot interaction studies. While HRIStudio is available as open-source software, it should be understood as a minimum viable product developed to validate the proposed framework. It is provided without ongoing technical support and serves primarily as a proof-of-concept for the architectural and methodological principles presented in this work.
+The implementation of this approach, realized as HRIStudio, demonstrates the feasibility of web-based control for real-time robot interaction studies. HRIStudio is an open-source proof-of-concept implementation that validates the proposed framework and serves as the reference system evaluated in this thesis.

 \section{Research Objectives}

-This thesis builds upon foundational work presented in two prior peer-reviewed publications. We first introduced the conceptual framework for HRIStudio at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}, establishing the vision for a collaborative, web-based platform. Subsequently, we published the detailed system architecture and preliminary prototype at RO-MAN 2025 \cite{OConnor2025}, validating the technical feasibility of web-based robot control. These publications form the foundation upon which this thesis asks its central research question: can a unified, web-based software framework for Wizard-of-Oz experiments measurably improve both the disciplinary accessibility and scientific reproducibility of Human-Robot Interaction research compared to existing platform-specific tools?
+This thesis builds upon foundational work presented in two prior peer-reviewed publications. Prof. Perrone and I first introduced the conceptual framework for HRIStudio at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}, establishing the vision for a collaborative, web-based platform. Subsequently, we published the detailed system architecture and a first prototype at RO-MAN 2025 \cite{OConnor2025}, validating the technical feasibility of web-based robot control. These publications form the foundation upon which this thesis asks its central research question: can a unified, web-based software framework for Wizard-of-Oz experiments measurably improve both disciplinary accessibility and scientific reproducibility of Human-Robot Interaction research compared to existing platform-specific tools?

-To answer this question, this thesis validates the framework through implementation and empirical evaluation. I translate the architectural concepts from the prior work into a complete, functional software platform and subject it to rigorous testing with real users. The successful demonstration of this approach would provide evidence that thoughtful software infrastructure can lower barriers to entry in HRI while simultaneously improving the methodological rigor of the field.
+To answer this question, this thesis validates the framework through a user study, in which I implement the architectural concepts from the prior work in a complete, functional software platform and evaluate it with real users. The study compares setup effort, protocol adherence, and usability between HRIStudio and a representative baseline. The successful demonstration of this approach would provide evidence that thoughtful software infrastructure can lower barriers to entry in HRI while simultaneously improving the methodological rigor of the field.

 \section{Chapter Summary}

-This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research: high technical barriers that limit accessibility to non-programmers, and fragmented tooling that undermines reproducibility. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and empirical evaluation. To validate this approach, the next chapters establish the technical and methodological foundations.
+This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the accessibility problem: high technical barriers limit participation by non-programmers. The second is the reproducibility problem: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and a user study. To validate this approach, the next chapters establish the technical and methodological foundations.
--- a/thesis/chapters/02_background.tex
+++ b/thesis/chapters/02_background.tex
@@ -3,38 +3,45 @@

 This chapter provides the necessary context for understanding the challenges addressed by this thesis. I survey the landscape of existing WoZ platforms, analyze their capabilities and limitations, and establish requirements that a modern infrastructure should satisfy. Finally, I position this thesis relative to prior work on this topic.

-As established in Chapter~\ref{ch:intro}, the Wizard-of-Oz technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are fully developed. To understand how the proposed framework advances this research paradigm, it is essential to review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field, bringing together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}, yet the fragmentation of tools and technical barriers have historically limited participation from non-technical researchers.
+As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers to prototype and test robot interaction designs before autonomous capabilities are developed. To understand how the proposed framework advances this research paradigm, I review the existing landscape of WoZ platforms, identify their limitations relative to disciplinary needs, and establish requirements for a more comprehensive approach. HRI is fundamentally a multidisciplinary field which brings together engineers, psychologists, designers, and domain experts from various application areas \cite{Bartneck2024}. Yet two challenges have historically limited participation from non-technical researchers. First, each research group builds custom software for specific robots, creating tool fragmentation across the field. Second, high technical barriers prevent many domain experts from conducting independent studies.

 \section{Existing WoZ Platforms and Tools}

-Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot and the robot simulating the human. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.
+Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human. In the latter case (Oz of Wizard), the robot acts as if controlled by a person when it is actually autonomous. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.

-Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS), exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its multi-client design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality. Critically, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, limiting its accessibility to non-technical researchers.
+Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. These systems were designed to work with multiple robot types rather than a single hardware platform. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating an accessibility problem for non-technical researchers.

-A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern. Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows without traditional coding. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. As Pettersson and Wik note in their review of WoZ tools \cite{Pettersson2015}, platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.
+A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern.

-Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the SWoOZ platform, a super-Wizard of Oz system integrating facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ; Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
+Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepper robots, offers a visual programming environment based on connected behavior boxes. Researchers can create complex interaction flows using drag-and-drop blocks without writing code in traditional programming languages. However, when new robot platforms emerge or when hardware becomes obsolete, tools like Choregraphe and WoZ4U lose their utility. Pettersson and Wik, in their review of WoZ tools \cite{Pettersson2015}, note that platform-specific systems often fall out of use as technology evolves, forcing researchers to constantly rebuild their experimental infrastructure.

-This expanding landscape reveals a persistent fundamental lack in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing how intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
+Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the Super Wizard of Oz (SWoOZ) platform. This system integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high fidelity data collection.
+
+This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. By methodological rigor, I refer to systematic features that guide experimenters toward best practices like standardized protocols, comprehensive logging, and reproducible experimental designs.
+
+Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.

 \section{Requirements for Modern WoZ Infrastructure}

-Based on the analysis of existing platforms and identified methodological gaps, I establish requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide. First, all phases of the experimental workflow--design, execution, and analysis--should be integrated within a single unified environment to minimize context switching and tool fragmentation. Second, creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}. Third, the system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
+This thesis represents the culmination of a multi-year research effort to develop infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:

-Fourth, automated logging of all actions, timings, and sensor data should be built-in, with synchronized timestamps to facilitate analysis. Fifth, the architecture should decouple experimental logic from robot-specific implementations through platform agnostic development, ensuring the platform remains viable as hardware evolves. Finally, collaborative features should allow multiple team members to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
+\begin{description}
+\item[R1: Integrated workflow.] All phases of the experimental workflow (design, execution, and analysis) should be integrated within a single unified environment to minimize context switching and tool fragmentation.
+\item[R2: Low technical barrier.] Creating interaction protocols should require minimal to no programming expertise, enabling domain experts from psychology, education, or other fields to work independently \cite{Bartneck2024}.
+\item[R3: Real-time control.] The system must support fine-grained, responsive real-time control during live experiment sessions across a variety of robotic platforms.
+\item[R4: Automated logging.] All actions, timings, and sensor data should be automatically logged with synchronized timestamps to facilitate analysis.
+\item[R5: Platform agnosticism.] The architecture should decouple experimental logic from robot-specific implementations. This allows experiments designed for one robot type to be adapted to others, ensuring the platform remains viable as hardware evolves.
+\item[R6: Collaborative support.] Multiple team members should be able to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
+\end{description}

-No existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design.
+To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design, like training wheels on a bicycle, guiding experimenters to follow sound methodology by default.

-\section{Prior Work}
-
-This thesis represents the culmination of a multi-year research effort to develop infrastructure that meets these requirements. The ideas presented here build upon prior work established in two peer-reviewed publications.
-
-We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that work, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.
+The ideas presented here build upon prior work established in two peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.

 Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.

-While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I move beyond prototypes to deliver a complete, functional software system, resolving complex engineering challenges related to stability, latency, and deployment. Second, I provide the first rigorous user study comparing the proposed framework against industry-standard tools. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
+While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I implement a functional software system that addresses engineering challenges related to stability, latency, and deployment, providing a minimum viable product for evaluation. Second, I provide a rigorous user study comparing the proposed framework against a representative baseline tool. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.

 \section{Chapter Summary}

-This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure: integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
+This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet to the best of my knowledge, no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure (R1-R6): integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform-agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
--- a/thesis/chapters/03_reproducibility.tex
+++ b/thesis/chapters/03_reproducibility.tex
@@ -1,31 +1,37 @@
 \chapter{Reproducibility Challenges in WoZ-based HRI Research}
 \label{ch:reproducibility}

-Having established the landscape of existing WoZ platforms and their limitations, I now examine the factors that make WoZ experiments difficult to reproduce and how software infrastructure can address them. This chapter analyzes the sources of variability in WoZ studies, examines how current practices in infrastructure and reporting contribute to reproducibility problems, and derives specific platform requirements that can mitigate these issues. Understanding these challenges is essential for designing a system that supports experimentation at scale while remaining scientifically rigorous.
+Having established the landscape of existing WoZ platforms and their limitations, I now examine the factors that make WoZ experiments difficult to reproduce and how software infrastructure can address them. This chapter analyzes the sources of variability in WoZ studies and examines how current practices in infrastructure and reporting contribute to reproducibility problems. Understanding these challenges is essential for designing a system that supports experimentation at scale while remaining scientifically rigorous.

 \section{Sources of Variability}

-Reproducibility in experimental research requires that independent investigators can obtain consistent results when following the same procedures. In WoZ-based HRI studies, however, multiple sources of variability can compromise this goal. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants. Even with a detailed script, the wizard may vary in timing, with delays between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may choose differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.
+Reproducibility in experimental research requires that independent investigators can obtain consistent results when following the same procedures. In WoZ-based HRI studies, however, multiple sources of variability can compromise this goal. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants. Even with a detailed script, the wizard may vary in timing, with delays between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may act differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.

 Riek's systematic review \cite{Riek2012} found that very few published studies reported measuring wizard error rates or providing standardized wizard training. Without such measures, it becomes impossible to determine whether experimental results reflect the intended interaction design or inadvertent variations in wizard behavior.

-Beyond wizard behavior, the ``one-off'' nature of many WoZ control systems introduces technical variability. When each research group builds custom software for each study, several problems arise. Custom interfaces may have undocumented capabilities, hidden features, default behaviors, or timing characteristics that are never formally described. Software tightly coupled to specific robot models or operating system versions may become unusable when hardware is upgraded or replaced. Each system logs data differently, with different file formats, different levels of granularity, and different choices about what to record. This fragmentation means that replicating a study often requires not just following an experimental protocol but also reverse-engineering or rebuilding the original software infrastructure.
+Beyond wizard behavior, the custom nature of many WoZ control systems introduces technical variability. When each research group builds custom software for each study, several problems arise. Custom interfaces may have undocumented capabilities, hidden features, default behaviors, or timing characteristics researchers never formally describe. Software tightly coupled to specific robot models or operating system versions may become unusable when hardware or software is upgraded or replaced. Each system logs data differently, with different file formats, different levels of granularity, and different choices about what to record. This fragmentation means that replicating a study often requires not just following an experimental protocol but also reverse-engineering or rebuilding the original software and hardware infrastructure.

-Even when researchers intend for their work to be reproducible, practical constraints on publication length lead to incomplete documentation. Exact timing parameters are often omitted. Decision rules for wizard actions remain unspecified. Details of the wizard interface go unreported. Specifications of data collection, including which sensor streams were recorded and at what sampling rate, are frequently missing. Without this information, other researchers cannot faithfully recreate the experimental conditions, limiting both direct replication and conceptual extensions of prior work.
+Even when researchers intend for their work to be reproducible, practical constraints on publication length lead to incomplete documentation. Papers often omit exact timing parameters. Authors leave decision rules for wizard actions unspecified and fail to report details of the wizard interface. Specifications of data collection, including which sensor streams were recorded and at what sampling rate, frequently go missing. Without this information, other researchers cannot faithfully recreate the experimental conditions, limiting both direct replication and conceptual extensions of prior work.

 \section{Infrastructure Requirements for Enhanced Reproducibility}

-Based on this analysis, I identify specific ways that software infrastructure can mitigate reproducibility challenges. Rather than merely providing tools for wizard control, an ideal WoZ platform should actively guide wizards through scripted procedures. This means presenting actions in a prescribed sequence to prevent out-of-order execution, highlighting the current step in the protocol, recording any deviations from the script as explicit events in the data log, and supporting repeatable decision logic through clearly defined conditional branches. By constraining wizard behavior within the bounds of the experimental design, the system reduces unintended variability across trials and participants.
+Based on this analysis, I identify specific ways that software infrastructure can mitigate reproducibility challenges:

-Manual data collection is error-prone and often incomplete. The platform should automatically record every action triggered by the wizard with precise timestamps, all robot sensor data and state changes, timing information indicating when actions were requested, when they began executing, and when they completed, as well as the full experimental protocol embedded in the log file so that the script used for any session can be recovered later. This approach of recording data by default ensures that critical information is never accidentally omitted.
+\begin{enumerate}
+\item \textbf{Guided wizard execution.} Rather than merely providing tools for wizard control, an ideal WoZ platform should actively guide wizards through scripted procedures. This means presenting actions in a prescribed sequence to prevent out-of-order execution, highlighting the current step in the protocol, recording any deviations from the script as explicit events in the data log, and supporting repeatable decision logic through clearly defined conditional branches. By constraining wizard behavior within the bounds of the experimental design, the system reduces unintended variability across trials and participants.
+
+\item \textbf{Comprehensive automatic logging.} Manual data collection is error-prone and often incomplete. The platform should automatically record every action triggered by the wizard with precise timestamps, all robot sensor data and state changes, and timing information indicating when actions were requested, when they began executing, and when they completed. The full experimental protocol should be embedded in each log file so that researchers can recover the exact script used for any session. Note that recording precise timestamps does not imply that trials must have identical timing, since human-robot interactions naturally vary in duration; rather, the system captures what actually occurred for later analysis.
+
+\item \textbf{Self-documenting protocol specifications.} The protocol specification itself should serve as documentation. When interaction protocols are defined using structured formats such as visual flowcharts or declarative scripts rather than imperative code, they become simultaneously executable and human-readable. Researchers can then share complete, unambiguous descriptions of their experimental procedures alongside their results.
+
+\item \textbf{Platform-independent abstractions.} To maximize the lifespan and transferability of experimental designs, the platform must separate the high-level control logic, the sequence of wizard and robot actions, from the low-level details of how specific robots execute those behaviors. This abstraction allows experiments designed for one robot to be more easily adapted to another, extending the reproducibility of interaction designs even when the original hardware becomes obsolete.
+\end{enumerate}

-The experimental design itself should serve as documentation. When interaction protocols are defined using structured formats such as visual flowcharts or declarative scripts rather than imperative code, they become simultaneously executable and human-readable. Researchers can then share complete, unambiguous descriptions of their experimental procedures alongside their results.

-To maximize the lifespan and transferability of experimental designs, the platform must separate the high-level logic of an interaction from the low-level details of how specific robots execute those behaviors. This abstraction allows experiments designed for one robot to be adapted to another, extending the reproducibility of interaction designs even when the original hardware becomes obsolete.

 \section{Connecting Reproducibility Challenges to Infrastructure Requirements}

-The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior violates the requirement for enforced experimental protocols and comprehensive automatic logging. The absence of standardized logging formats and sensor specifications violates both the automated logging and self-documenting design requirements. Technical fragmentation violates the platform-agnostic requirement, as bespoke systems become obsolete when hardware evolves. Incomplete documentation reflects a failure to treat experiment design as executable, self-documenting specifications. No existing platform simultaneously satisfies all six requirements: most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design. As Chapter~\ref{ch:background} demonstrated, this gap persists across a decade of platform development. Addressing it requires a fundamental rethinking of how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.
+The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior creates the need for enforced experimental protocols (R1, R2) that guide wizards systematically. The lack of comprehensive data undermines analysis, motivating automatic logging requirements (R4). Technical fragmentation violates platform agnosticism (R5). Each lab builds custom software tied to specific hardware, and these custom systems become obsolete when hardware evolves. Incomplete documentation reflects the need for self-documenting protocol specifications (R1, R2) that are simultaneously executable and shareable. As Chapter~\ref{ch:background} demonstrated, no existing platform simultaneously satisfies all six requirements. Addressing this gap requires rethinking how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.

 \section{Chapter Summary}

--- a/thesis/chapters/04_system_design.tex
+++ b/thesis/chapters/04_system_design.tex
@@ -1,23 +1,85 @@
-\chapter{System Design: HRIStudio Platform}
+\chapter{System Design}
 \label{ch:design}

-\section{Design Goals}
-% TODO
+The previous chapters established the motivation for a web-based WoZ platform and identified six critical requirements for modern HRI research infrastructure. This chapter describes the design of HRIStudio, focusing on how the system architecture and experimental workflow implement these requirements. In this chapter I go over three key design decisions: the hierarchical structure of experiment specifications, the modular interface architecture, and the data flow during experiment execution.

-\section{High-Level Architecture}
-% TODO
+\section{Hierarchical Organization of Experiments}

-\section{Hierarchical Experimental Model}
-% TODO
+To address the need for self-documenting, executable experiment specifications (R1, R2), HRIStudio introduces a hierarchical organization of elements that allows researchers to express WoZ studies at multiple levels of abstraction. This structure enables experiment designs to be simultaneously intuitive for researchers to create and precise enough for the system to execute.

-\section{Visual Experiment Designer}
-% TODO
+At the top level, researchers create a \emph{study} element that defines the overall research context, including metadata about the research project, collaborators, and general experimental conditions. A study contains two types of subordinate elements: \emph{experiment} elements represent reusable protocols (e.g., ``The Interactive Storyteller'' experiment), each specifying the sequence of steps and actions that define an interaction design. \emph{Trial} elements represent specific instantiations where a particular participant executes a particular experiment protocol. This distinction between protocol (experiment) and execution instance (trial) allows researchers to manage multiple repetitions of the same protocol (trials with different participants) while maintaining clear traceability.

-\section{Execution Interfaces}
-% TODO
+Each experiment protocol comprises a sequence of \emph{step} elements, which model distinct phases of the interaction design. For example, an experiment protocol might define steps such as ``Introduction,'' ``Learning Task,'' and ``Closing.'' Within each step, researchers define one or more \emph{action} elements that are the atomic units of the experimental procedure. Actions can be directed at the wizard (e.g., ``Wait for subject to finish task, then say encouraging phrase'') or at the robot (e.g., ``Move arm to point, play audio greeting, wait for subject response''). 

-\section{Robot Integration and Plugins}
-% TODO
+\begin{figure}[htbp]
+\centering
+\begin{tikzpicture}[
+	nodebox/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
+	nodeboxdark/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
+	arrow/.style={->, thick}]

-\section{Data Management}
-% TODO
+	\node[nodebox] (study) at (0, 3.4) {Study};
+	\node[nodebox] (experiment) at (0, 2.1) {Experiment};
+
+	\node[nodebox] (step1) at (-3.0, 0.7) {Step};
+	\node[nodebox] (step2) at (0, 0.7) {Step};
+	\node[nodebox] (step3) at (3.0, 0.7) {Step};
+
+	\node[nodeboxdark] (action1) at (-4.5, -0.7) {Action};
+	\node[nodeboxdark] (action2) at (-1.5, -0.7) {Action};
+
+	\draw[arrow] (study.south) -- (experiment.north);
+	\draw[arrow] (experiment.south) -- (step1.north);
+	\draw[arrow] (experiment.south) -- (step2.north);
+	\draw[arrow] (experiment.south) -- (step3.north);
+	\draw[arrow] (step1.south) -- (action1.north);
+	\draw[arrow] (step1.south) -- (action2.north);
+
+\end{tikzpicture}
+\caption{Hierarchy of experiment specifications from study-level context to atomic actions.}
+\label{fig:experiment-hierarchy}
+\end{figure}
+
+This hierarchical structure serves multiple purposes. First, it permits researchers to design experiment protocols without programming knowledge, using visual or declarative specifications at each level. Second, it naturally maps to the temporal structure of a trial session, making the protocol easy to follow during live execution. Third, it provides a foundation for comprehensive logging: each action executed during a trial can be recorded with precise timestamps and outcomes, making the experimental trace reproducible and analyzable. Fourth, the separation of experiment (protocol) from trial (execution) enables researchers to run the same protocol with different participants, facilitating direct comparison across trials while maintaining clear record-keeping of which participant ran which protocol.
+
+\section{Modular Interface Architecture}
+
+To support different roles in an experiment while maintaining coherent data flow (R3, R4, R6), HRIStudio implements three primary user interfaces, each optimized for a specific phase of the research lifecycle.
+
+\subsection{Design Interface}
+
+The \emph{Design} interface enables researchers to construct experiment specifications using drag-and-drop visual programming. Rather than requiring researchers to write code or complex configuration files, the interface presents a canvas where researchers can assemble pre-built action components into sequences. Components represent common tasks such as robot movements, speech synthesis, wizard instructions, and conditional logic. Researchers configure each component's parameters through property panels that provide contextual guidance and examples of best practices.
+
+By treating experiment design as a visual specification task, the interface lowers technical barriers (R2) and ensures that the resulting protocol specification is human-readable and shareable alongside research results. The specification is stored in a structured, machine-readable format that can be both displayed as a flowchart and executed by the platform's runtime.
+
+\subsection{Execute Interface}
+
+During live trials, the Execute interface provides a synchronized live view of experiment execution. The wizard sees the current step and available actions, guiding the wizard through the experimental protocol while allowing flexibility for spontaneous, contextual responses. Actions are presented sequentially, but the wizard can manually trigger specific actions based on participant responses, ensuring that the interaction remains natural and responsive rather than rigidly scripted.
+
+The Execute view includes manual controls for unscripted behaviors such as additional robot movements, speech, or gestures. These unscripted actions are recorded in the trial log as explicit deviations from the protocol, enabling researchers to later analyze both scripted and improvised interactions. This design balances the need for consistent, monitored behavior (which supports reproducibility) with the flexibility required for realistic human-robot interactions.
+
+Additional researchers can simultaneously access this same synchronized live view through the platform's Dashboard by selecting a live trial to ``spectate.'' Multiple researchers observing the same trial view the identical synchronized display of the wizard's controls, participant interactions, and robot state, supporting real-time collaboration and interdisciplinary observation (R6). Observers can take notes and mark significant moments without interfering with the wizard's control or the participant's experience.
+
+\subsection{Analysis Interface}
+
+After a live experiment session, the \emph{Analysis} interface enables researchers to review all recorded data streams in synchronized fashion. This includes video of the human-robot interaction, audio of speech and ambient sounds, logged actions and state changes, and sensor data from the robot. Researchers can scrub through the recording, mark significant events with annotations, and export selected segments or annotations for analysis.
+
+The analysis interface directly supports reproducibility (R4) by making the complete experimental trace accessible and analyzable. Researchers can verify that the protocol was executed as intended, examine deviations from the protocol, and compare execution traces across multiple sessions to verify consistency.
+
+\section{Event-Driven Execution Model}
+
+To achieve real-time responsiveness while maintaining methodological rigor (R3, R5), HRIStudio uses an event-driven execution model rather than a time-driven one. In a time-driven approach, the system would advance through actions on a fixed schedule, leading to rigid, potentially unnatural interaction timing. In contrast, the event-driven model allows the wizard to trigger or advance actions based on the perceived state of the human participant.
+
+This approach has several implications. First, not all sessions of the same experiment will have identical timing or duration; the length of a learning task, for example, depends on the participant's progress. The system records the actual timing of actions, permitting researchers to capture these natural variations in their data. Second, the event-driven model enables the wizard to respond contextually without departing from the protocol; the wizard remains guided by the sequence of available actions while having control over when to advance based on participant cues.
+
+The system enforces protocol consistency by constraining the wizard's choices to the set of actions defined in the protocol specification, while recording all choices made and any deviations. This design directly addresses the reproducibility challenge of inconsistent wizard behavior by making the wizard's degrees of freedom explicit and logged.
+
+\section{Data Flow and Infrastructure Implementation}
+
+The overall data flow through HRIStudio follows the experimental workflow from design through analysis. During the design phase, researchers create experiment specifications that are stored in the system database. During a live experiment session, the system manages bidirectional communication between the wizard's interface and the robot control layer. All actions, sensor data, and events are streamed to a data logging service that stores complete session records. After the experiment, researchers access these records through the Analysis interface for analysis.
+
+This architecture satisfies the infrastructure requirements by design. The integrated workflow (R1) flows naturally through design $\rightarrow$ execution $\rightarrow$ analysis. Low technical barriers (R2) are achieved through the visual Design interface. Real-time control (R3) is supported by responsive event-driven execution. Automated logging (R4) is built-in at the system level. Platform agnosticism (R5) is achieved by decoupling the high-level action specification from robot-specific control commands in the ROS interface. Collaborative support (R6) is enabled through shared views and multi-user access to all system components.
+
+\section{Chapter Summary}
+
+This chapter has described the system design of HRIStudio, with emphasis on how architectural choices directly implement the infrastructure requirements identified in Chapter~\ref{ch:background}. The hierarchical organization of experiment specifications enables intuitive, executable design. The modular interface architecture separates concerns across design, execution, and analysis phases while maintaining data coherence. The event-driven execution model balances protocol consistency with realistic interaction dynamics. The integrated data flow ensures that reproducibility is supported by design rather than by afterthought. The following chapter describes the implementation of these design principles using specific technologies and architectural components.
--- a/thesis/chapters/05_implementation.tex
+++ b/thesis/chapters/05_implementation.tex
@@ -1,11 +1,216 @@
-\chapter{Implementation Details}
+\chapter{Implementation}
 \label{ch:implementation}

-\section{Technology Stack}
-% TODO
+Chapter~\ref{ch:design} described the conceptual design of HRIStudio. This chapter addresses the realization of these design principles, discussing the core technologies used, the system architecture that integrates these technologies, and the current state of the implementation. The implementation demonstrates the feasibility of the approach proposed in earlier chapters while identifying technical challenges that inform the roadmap for future development.

-\section{Technical Challenges}
-% TODO
+\section{Core Implementation Decisions}

-\section{System Capabilities}
-% TODO
+HRIStudio is implemented as a web application. Researchers access it through a standard web browser without installing specialized software. This design decision directly addresses requirement R2 (low technical barrier) by eliminating installation complexity and ensuring the system works identically on different operating systems. This section describes the key implementation choices and the rationale behind them.
+
+\subsection{Web-Based Architecture}
+
+The choice to build HRIStudio as a web application was driven by three factors. First, web browsers are universally available, so researchers do not need to install custom software or manage dependencies. Second, web applications naturally support collaboration: multiple team members can access the same experiment data and observe live trials simultaneously from different locations. Third, web deployment simplifies updates: when I fix bugs or add features, all users immediately receive the improvements without manual software updates.
+
+I chose to use the same programming language~\cite{TypeScript2024} across the entire system, including the user interface, the server logic, and the data access layer. This consistency reduces a common source of errors: when the structure of experiment data changes, inconsistencies between different parts of the system are detected automatically rather than causing runtime failures during live trials.
+
+\subsection{Data Storage Strategy}
+
+Experiment protocols and trial data are stored in a structured database that supports efficient queries, for example, retrieving all trials for a particular participant or comparing timing data across multiple sessions. However, video recordings and audio files are large and unstructured, so they are stored separately in a file storage system. This separation ensures that the database remains fast for common queries while still preserving complete multimedia records.
+
+\subsection{Robot Communication Layer}
+
+Rather than writing custom code to communicate with each robot's specific control system, HRIStudio uses the Robot Operating System (ROS)~\cite{Quigley2009} as an intermediary. ROS is a widely-adopted standard in robotics research that provides a common communication framework. This design decision means that any robot with ROS support can work with HRIStudio. For robots without native ROS support, researchers can write a small adapter, a much simpler task than integrating directly with HRIStudio's core code.
+
+\subsection{Plugin Architecture for Platform Agnosticism}
+
+A critical design decision was how to support diverse robot platforms without hardcoding knowledge of specific robots into HRIStudio. The robotics landscape is fragmented: researchers use various robots (NAO, Pepper, Fetch, custom platforms) that communicate in different ways.
+
+The solution is a plugin architecture. When designing an experiment, researchers work with abstract actions like ``speak this text'' or ``raise arm.'' The system does not need to know whether it is controlling a NAO robot, a Pepper robot, or a custom research platform. Instead, each robot is described by a plugin, a configuration file that maps abstract actions to the specific commands that robot understands.
+
+This separation has important consequences. First, researchers can create an interaction protocol without knowing which robot will ultimately execute it, enabling protocol reuse across different hardware. Second, when a research lab acquires a new robot, they can add support for it by writing a plugin rather than modifying HRIStudio itself. Third, the visual designer's palette of available actions is automatically populated from the loaded plugins, ensuring the interface reflects the actual capabilities of the current robot.
+
+The plugin architecture also treats control flow (branches, loops, conditional logic) the same way as robot actions. This uniformity allows researchers to mix logical decisions and physical robot behaviors freely when designing experiments.
+
+\begin{figure}[htbp]
+\centering
+\begin{tikzpicture}[
+    action/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.2cm, minimum height=0.6cm, align=center, font=\small},
+    impl/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.2cm, minimum height=0.7cm, align=center, font=\small},
+    arrow/.style={-, thick}]
+    
+    % First Y: speak()
+    \node[action] (a1) at (0, 7) {HRIStudio\\speak(text)};
+    \node[impl] (nao1) at (-2, 5) {NAO\\{\small /nao/tts}};
+    \node[impl] (pep1) at (2, 5) {Pepper\\{\small /pepper/say}};
+    \draw[arrow] (a1) -- (nao1);
+    \draw[arrow] (a1) -- (pep1);
+    
+    % Second Y: raise_arm()
+    \node[action] (a2) at (0, 3) {HRIStudio\\raise\_arm()};
+    \node[impl] (nao2) at (-2, 1) {NAO\\{\small /nao/arm}};
+    \node[impl] (pep2) at (2, 1) {Pepper\\{\small /pepper/gesture}};
+    \draw[arrow] (a2) -- (nao2);
+    \draw[arrow] (a2) -- (pep2);
+    
+    % Third Y: move_forward()
+    \node[action] (a3) at (0, -1) {HRIStudio\\move\_forward()};
+    \node[impl] (nao3) at (-2, -3) {NAO\\{\small /nao/move}};
+    \node[impl] (pep3) at (2, -3) {Pepper\\{\small /pepper/cmd\_vel}};
+    \draw[arrow] (a3) -- (nao3);
+    \draw[arrow] (a3) -- (pep3);
+    
+\end{tikzpicture}
+\caption{Plugin architecture: each abstract action branches to platform-specific implementations.}
+\label{fig:plugin-architecture}
+\end{figure}
+
+\subsection{Event-Driven Execution}
+
+During a trial, HRIStudio must balance two competing demands: following the experimental protocol precisely while allowing natural human-robot timing. The execution engine accomplishes this by waiting for specific events at designated points in the protocol. For example, if the protocol specifies ``wait for wizard to click Continue,'' the system pauses until that event occurs, regardless of how long it takes. This preserves the spontaneous, human-paced nature of interaction while ensuring the protocol structure is followed.
+
+Every action during a trial, including robot movements, wizard button clicks, sensor readings, and timing information, is immediately recorded with precise timestamps. This comprehensive logging happens automatically, without requiring researchers to instrument their experiments manually. The complete event record enables two critical capabilities: first, researchers can analyze exactly what happened during a trial without relying on memory or handwritten notes; second, the detailed event log makes trials reproducible by documenting not just what was supposed to happen, but what actually occurred.
+
+\subsection{Local Media Recording}
+
+Video and audio recording during trials must not interfere with the live interaction. To ensure this, recording happens locally in the researcher's web browser rather than streaming data to a remote server in real-time. The browser accumulates the video and audio data, then transfers the complete recordings to the server when the trial concludes. This approach prevents network delays or server processing from causing dropped video frames or degraded audio quality during the critical interaction period.
+
+The timestamps when recording starts and stops are logged alongside other trial events, ensuring that when researchers later review the video, they can see exactly what was happening in the experiment protocol at any given moment in the recording.
+
+\section{System Architecture and Data Flow}
+
+\subsection{Separation of architectural layers}
+
+HRIStudio's architecture separates the system into three distinct layers, each with a specific responsibility:
+
+\begin{enumerate}
+\item \textbf{User interface layer:} The visual interfaces (Design, Execute, Playback) run in the researcher's web browser. This layer handles user interactions, including clicking buttons, dragging experiment components, and viewing live trial status.
+\item \textbf{Application logic layer:} A server process manages experiment data, coordinates trial execution, authenticates users, and orchestrates communication between the interface and the robot.
+\item \textbf{Data and robot control layer:} This layer encompasses two responsibilities: long-term storage of experiment protocols and trial data; and direct communication with robot hardware.
+\end{enumerate}
+
+This separation provides several benefits. Different parts of the system can evolve independently; for example, improving the user interface does not require changes to robot control logic. The separation also clarifies responsibilities: the user interface should never directly command robot hardware; all robot actions flow through the application logic layer, which can enforce safety constraints and maintain consistent logging.
+
+\begin{figure}[htbp]
+\centering
+\begin{tikzpicture}[
+    layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm},
+    arrow/.style={->, thick, line width=1.5pt}]
+    
+    % Layer 1: UI
+    \node[layer, fill=gray!15] (ui) at (0, 3.5) {
+        \textbf{User Interface}\\[0.1cm]
+        {\small Design, Execute, Playback}
+    };
+    
+    % Layer 2: Logic
+    \node[layer, fill=gray!30] (logic) at (0, 1.8) {
+        \textbf{Application Logic}\\[0.1cm]
+        {\small Execution, Authentication, Logger}
+    };
+    
+    % Layer 3: Data
+    \node[layer, fill=gray!45] (data) at (0, 0.1) {
+        \textbf{Data \& Robot Control}\\[0.1cm]
+        {\small Database, File Storage, ROS}
+    };
+    
+    % Arrows
+    \draw[arrow] (ui.south) -- (logic.north);
+    \draw[arrow] (logic.south) -- (data.north);
+    
+\end{tikzpicture}
+\caption{HRIStudio's three-layer architecture separates user interface, application logic, and data/robot control.}
+\label{fig:three-tier}
+\end{figure}
+
+\subsection{Data Flow During a Trial}
+
+The flow of data during a trial illustrates how the architectural layers coordinate:
+
+\begin{enumerate}
+\item A researcher creates an experiment protocol using the Design interface and initiates a trial.
+\item The application server loads the protocol and begins stepping through it, sending commands to the robot and waiting for events (wizard inputs, sensor readings, timeouts).
+\item Every action, both planned protocol steps and unexpected events, is immediately written to the trial log with precise timing information.
+\item The Execute interface continuously displays the current state, allowing the wizard and observers to monitor progress in real-time.
+\item When the trial concludes, all recorded media (video, audio) is transferred from the browser to the server and associated with the trial record.
+\item Later, the Analysis interface retrieves the stored trial data and reconstructs exactly what happened, synchronized with the video and audio recordings.
+\end{enumerate}
+
+This design ensures comprehensive documentation of every trial, supporting both fine-grained analysis and reproducibility. Researchers can review not just what they planned to happen, but what actually occurred, including timing variations and unexpected events.
+
+\begin{figure}[htbp]
+\centering
+\begin{tikzpicture}[
+    stage/.style={rectangle, draw, thick, rounded corners, minimum width=3.5cm, minimum height=1cm, align=center, font=\footnotesize},
+    arrow/.style={->, thick, line width=1.3pt}]
+    
+    % Six stages stacked vertically with descriptions inside
+    \node[stage, fill=gray!10] (s1) at (0, 7.5) {1. Design Protocol\\{\scriptsize Researcher creates workflow}};
+    \node[stage, fill=gray!15] (s2) at (0, 6) {2. Load \& Execute\\{\scriptsize System loads and runs trial}};
+    \node[stage, fill=gray!20] (s3) at (0, 4.5) {3. Log Events\\{\scriptsize Actions recorded with timestamps}};
+    \node[stage, fill=gray!25] (s4) at (0, 3) {4. Display Live State\\{\scriptsize Wizard sees real-time progress}};
+    \node[stage, fill=gray!30] (s5) at (0, 1.5) {5. Transfer Media\\{\scriptsize Video/audio saved to server}};
+    \node[stage, fill=gray!35] (s6) at (0, 0) {6. Analyze \& Playback\\{\scriptsize Review data with synchronized media}};
+    
+    % Downward arrows
+    \draw[arrow] (s1.south) -- (s2.north);
+    \draw[arrow] (s2.south) -- (s3.north);
+    \draw[arrow] (s3.south) -- (s4.north);
+    \draw[arrow] (s4.south) -- (s5.north);
+    \draw[arrow] (s5.south) -- (s6.north);
+    
+\end{tikzpicture}
+\caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
+\label{fig:trial-dataflow}
+\end{figure}
+
+\section{Implementation Status}
+
+The core architectural components of HRIStudio have been implemented and validated. The framework successfully instantiates the design principles described earlier, demonstrating the feasibility of the approach and highlighting technical challenges to be addressed in future work.
+
+\begin{description}
+\item[User interfaces:] The Design, Execute, and Playback interfaces are operational. The visual design environment supports drag-and-drop construction of experiment workflows.
+\item[Server logic and data management:] The server manages experiment specifications, user authentication, trial session data, and comprehensive event logging.
+\item[Data model:] The hierarchical Study/Experiment/Trial data structures with full event logging infrastructure are implemented and operational.
+\item[Robot communication:] The system successfully communicates with robots through ROS, translating abstract protocol actions into robot-specific commands and receiving sensor data.
+\item[Plugin system:] The plugin architecture for supporting multiple robot platforms is in place, allowing researchers to define new robot capabilities without modifying core system code.
+\end{description}
+
+Components requiring continued development include robust real-time synchronization for complex multi-agent scenarios, comprehensive media playback with full temporal synchronization, and evaluation of the plugin system with diverse robot platforms.
+
+\section{Architectural Challenges and Solutions}
+
+\subsection{Real-Time Responsiveness During Trials}
+
+The Execute interface must maintain responsive communication between the wizard and the robot. Wireless networks and web-based systems can introduce delays that, if not carefully managed, degrade interaction quality or compromise safety. The implementation addresses this in three ways: maintaining persistent connections that avoid the overhead of repeatedly establishing communication; deploying the server on the same local network as the robot to minimize network delays; and anticipating likely next actions to prepare the robot in advance when possible.
+
+\subsection{Synchronizing Multiple Data Sources}
+
+During playback, researchers need to see video, hear audio, and review event logs in perfect synchronization. However, these data sources have different characteristics: video captures 30 frames per second, audio samples thousands of times per second, and event logs record discrete actions at irregular intervals. The implementation uses a common time reference and records precise timestamps for all data, allowing the playback system to align everything accurately regardless of differences in how the data was originally captured.
+
+\subsection{Extensibility Without Fragmentation}
+
+The plugin architecture allows researchers to add support for new robot platforms without modifying HRIStudio's core code. This design separates the evolution of the platform itself from the evolution of robot support: I can improve HRIStudio's core functionality without affecting plugins, and researchers can add new robots without waiting for core platform changes.
+
+However, this separation creates a design challenge: the plugin interface must be flexible enough to accommodate diverse robots, but not so flexible that every robot requires completely custom code. Finding this balance requires validating the plugin design with multiple real robots to ensure the abstraction is appropriate.
+
+\section{Mapping Architecture to Requirements}
+
+The implementation choices described in this chapter directly support the six requirements established earlier:
+
+\begin{description}
+\item[R1 (Integrated workflow):] The unified Design/Execute/Analysis pipeline with shared data models ensures coherent workflows without switching between separate tools.
+\item[R2 (Low technical barrier):] Web-based deployment and drag-and-drop interface design eliminate installation complexity and reduce the learning curve.
+\item[R3 (Real-time control):] Event-driven execution with persistent connections enables responsive, natural human-robot interaction.
+\item[R4 (Automated logging):] Comprehensive event logging captures the complete trial trace automatically, without requiring researchers to add logging code to their experiments.
+\item[R5 (Platform agnosticism):] The plugin architecture allows integration with diverse robot platforms without modifying core system code.
+\item[R6 (Collaborative support):] Multiple team members can simultaneously observe trial execution through shared, synchronized views.
+\end{description}
+
+\section{Chapter Summary}
+
+This chapter has described the key implementation decisions that realize HRIStudio's design principles. Building the system as a web application addresses accessibility by eliminating installation complexity and enabling natural collaboration. Using a consistent programming approach throughout the system reduces a common source of errors where different parts of an application become inconsistent.
+
+The separation between user interface, application logic, and data storage clarifies responsibilities and allows independent evolution of different system components. The plugin architecture directly addresses platform agnosticism (R5), enabling researchers to add robot support without modifying core code. Event-driven execution preserves natural interaction timing while comprehensive automatic logging satisfies requirement R4 and supports reproducibility. Local media recording ensures high-quality video and audio capture without interfering with live trials.
+
+While core architectural components are operational, continued work remains on optimizing real-time responsiveness for complex scenarios, refining multi-modal playback synchronization, and validating the plugin design with diverse robot platforms.
--- a/thesis/refs.bib
+++ b/thesis/refs.bib
@@ -1,3 +1,11 @@
+@book{Baum1900,
+  title={{The Wonderful Wizard of Oz}},
+  author={Baum, L. Frank},
+  year={1900},
+  publisher={George M. Hill Company},
+  address={Chicago, IL}
+}
+
@article{Lu2011,
  title={{Polonius: A Wizard of Oz Interface for HRI Experiments}},
  author={Lu, David V. and Smart, William D.},
@@ -34,6 +42,14 @@
  publisher={IEEE}
 }

+@inproceedings{Quigley2009,
+  title={{ROS: an open-source Robot Operating System}},
+  author={Quigley, Morgan and Conley, Ken and Gerkey, Brian and Faust, Josh and Foote, Tully and Leibs, Jeremy and Wheeler, Rob and Ng, Andrew Y},
+  booktitle={IEEE International Conference on Robotics and Automation},
+  year={2009},
+  url={https://api.semanticscholar.org/CorpusID:6324125}
+}
+
@article{Riek2012,
 author = {Riek, Laurel D.},
 title = {{Wizard of Oz studies in HRI: a systematic review and new reporting guidelines}},
@@ -177,5 +193,33 @@ series = {OzCHI '15}
  doi = {10.1145/3610978.3640741}
 }

+@misc{React2024,
+  title={{React: A JavaScript library for building user interfaces}},
+  author={Meta},
+  year={2024},
+  url={https://react.dev}
+}
+
+@misc{Nextjs2024,
+  title={{Next.js: The React Framework for the Web}},
+  author={Vercel},
+  year={2024},
+  url={https://nextjs.org}
+}
+
+@misc{TypeScript2024,
+  title={{TypeScript: Typed JavaScript at Any Scale}},
+  author={{Microsoft and the TypeScript Community}},
+  year={2024},
+  url={https://www.typescriptlang.org}
+}
+
+@misc{tRPC2024,
+  title={{tRPC: Move fast and break nothing. End-to-end typesafe APIs made easy}},
+  author={Alex Johansson and community contributors},
+  year={2024},
+  url={https://trpc.io}
+}
+


--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@@ -4,6 +4,8 @@
 %\usepackage{graphics}            %Select graphics package
 \usepackage{graphicx}             %
 %\usepackage{amsthm}              %Add other packages as necessary
+\usepackage{tikz}                 %For programmatic diagrams
+\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds}
 \usepackage[hidelinks]{hyperref}  %Enable hyperlinks and \autoref, hide colored boxes
 \begin{document}
 \butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research}
Author	SHA1	Message	Date
Sean O'Connor	ad940986c7	Enhance clarity and structure in introduction, background, reproducibility, system design, and implementation chapters; add new references and include TikZ for diagrams All checks were successful Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 3m6s Details	2026-02-23 22:24:41 -05:00
Sean O'Connor	92ef1b7ef0	Refine background and system design chapters; enhance clarity and structure in experiment protocols and trial execution descriptions	2026-02-23 13:32:09 -05:00
Sean O'Connor	a172c6ce0a	Refine introduction and background chapters; enhance clarity and structure in system design section	2026-02-22 22:13:48 -05:00
Sean O'Connor	02c40dde96	post-m04-ch03 edits	2026-02-19 23:50:30 -05:00
Sean O'Connor	5288007c8b	ch02 - merge 2.2 and 2.3	2026-02-19 23:12:33 -05:00
Sean O'Connor	c417f22209	post-m04-ch02 edits	2026-02-19 23:11:07 -05:00
Sean O'Connor	9423fc09b6	post-m04-ch01 revisions	2026-02-19 22:48:32 -05:00