revisions of the revisions

2026-05-08 07:08:55 -04:00 · 2026-04-30 00:19:02 -04:00
parent 51009cd1ce
commit 5d8ef0ce76
9 changed files with 66 additions and 44 deletions
@@ -5,10 +5,10 @@ Having established the landscape of existing WoZ platforms and their limitations

 \section{Sources of Variability}

-\emph{The Reproducibility Problem}, as introduced in Chapter~\ref{ch:intro}, encompasses two related challenges. The first concerns \emph{execution consistency}: whether a wizard reliably follows the same experimental script across multiple trials with different participants, producing comparable robot behavior in each. The second concerns \emph{cross-platform reproducibility}: whether the same experiment can be transferred to a different robot platform with minimal change to the implementing program. Both stem from gaps in current WoZ infrastructure and are examined in this chapter. A third interpretation of the term — independent replication of a published study by researchers at other institutions — is distinct from both and is not what this thesis evaluates. It is also worth noting that execution consistency, as defined here, corresponds to what the measurement literature sometimes calls \emph{repeatability}: the degree to which the same procedure produces consistent results when repeated across multiple trials of the same study.
+\emph{The Reproducibility Problem}, as introduced in Chapter~\ref{ch:intro}, encompasses two related challenges. The first concerns \emph{execution consistency}: whether a wizard reliably follows the same experimental script across multiple trials with different participants, producing comparable robot behavior in each. The second concerns \emph{cross-platform reproducibility}: whether the same experiment can be transferred to a different robot platform with minimal change to the implementing program. Both stem from gaps in current WoZ infrastructure and are examined in this chapter. It is important to note that the term reproducibility may also refer to \emph{allowing independent replications of published studies}; this is not what this thesis evaluates. Execution consistency, as defined here, corresponds to what the measurement literature sometimes calls \emph{repeatability}: the degree to which the same procedure produces consistent results when repeated across multiple trials of the same study.

 In WoZ-based HRI studies, multiple sources of variability can compromise execution consistency. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants.
- Even with a detailed script, the wizard may vary in timing, with delays between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may act differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.
+ Even with a detailed script, the wizard may vary in timing, with the delay between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may act differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.

 Riek's systematic review \cite{Riek2012} found that very few published studies reported measuring wizard error rates or providing standardized wizard training. Without such measures, it becomes impossible to determine whether experimental results reflect the intended interaction design or inadvertent variations in wizard behavior.