One approach to assess the security of embedded IoT devices is applying dynamic analysis such as fuzz testing to their firmware in scale. To this end, existing approaches aim to provide an emulation environment that mimics the behavior of real hardware/peripherals. Nonetheless, in practice, such approaches can emulate only a small fraction of firmware images. For example, Firmadyne, a state-of-theart tool, can only run 183 (16.28%) of 1,124 wireless router/IP-camera images that we collected from the top eight manufacturers. Such a low emulation success rate is caused by discrepancy in the real and emulated firmware execution environment. In this study, we analyzed the emulation failure cases in a largescale dataset to figure out the causes of the low emulation rate. We found that widespread failure cases often avoided by simple heuristics despite having different root causes, significantly increasing the emulation success rate. Based on these findings, we propose a technique, arbitrated emulation, and we systematize several heuristics as arbitration techniques to address these failures. Our automated prototype, FirmAE, successfully ran 892 (79.36%) of 1,124 firmware images, including web servers, which is significantly (≈4.8x) more images than that run by Firmadyne. Finally, by applying dynamic testing techniques on the emulated images, FirmAE could check 320 known vulnerabilities (306 more than Firmadyne), and also find 12 new 0-days in 23 devices.