Energy efficiency is critical in many IoT applications with sensors that deliver data over wireless communications. Duty-cycling has been a major method for reducing energy consumption. One popular duty-cycling is MAC duty-cycling where an MCU commands the periodic or adaptive turning on and off of an RF chip. Recently, there has been an increase in the number of RF chips for IoT that are equipped with PHY duty-cycling, a new capability of autonomously switching on and off an RF chip without the use of an MCU. These two schemes working at different layers have different pros and cons in terms of operating time scale, the amount of energy saved when the RF chip is switched off, all of which depend on the characteristics of the MCU and the RF chip. In this paper, we propose a novel protocol named HD-MAC (Hierarchical Duty-cycling MAC) that hierarchically integrates duty-cycling in the MAC and physical layers. By smartly applying the new function of chip-level duty-cycling, HD-MAC is able to further reduce the amount of on-time in MAC duty-cycling; hence, energy efficiency can be improved. To optimize HD-MAC's energy efficiency while achieving a given delay requirement, we formulate an optimization problem and solve it to obtain the optimal parameters in the cross-layer context. We implement HD-MAC on Contiki OS and perform extensive experiments using a real sensor mote Firefly with a CC1200 RF chip. We demonstrate that the energy efficiency of HD-MAC is up to 72% higher than that of existing protocols while still satisfying the delay requirement and sustaining similar reliability.