Yield assessment is an important tool for farmers for crop harvesting and economic planning. Current techniques for yield assessment are labor intensive and hence tend to be expensive. Moreover, the process is inaccurate as it is carried out manually and is based on yield sampling. Sonar sensing is the extraction and processing of information from the environment using high frequency sound waves (ultrasound) energy returns (echoes). In robotics, sonar sensors are usually used as proximity sensors. The advantages of sonar for yield estimation are three folded: sound waves are capable of propagating through the foliage and can detect hidden fruit while other sensors which require a line of sight fail; transmission of a wide frequency range (20kHz 200 kHz) multi-spectral signal enables classification of plant properties; and high-resolution distance measurement. The objective of the current study is to develop an autonomous robot for yield estimation using an integrated sensor system with sonar and vision to minimize the required human labor, reduce costs and increasing the estimation accuracy (to an individual tree resolution and to 5-10% error). Experimental results indicate that the presence of pepper fruits on a plant is corresponding with the amount of energy reflected at 60-80 kHz and 110-115 kHz frequencies. The system detected the position and plant properties of rows which were out of view for the camera; the rows width and direction; and greenhouse infrastructure. In addition, preliminary analysis showed a distinguishable difference in the signal and the spectrograph between cucumber and pepper plants.