Performance Review of Selected Topology-Aware Routing Strategies for Clustering Sensor Networks

: In this paper, cluster-based routing (CBR) protocols for addressing issues pertinent to energy consumption, network lifespan, resource allocation and network coverage are reviewed. The paper presents an in-depth performance analysis and critical review of selected CBR algorithms. The study is domain-specific and simulation-based with emphasis on the tripartite trade-off between coverage, connectivity and lifespan. The rigorous statistical analysis of selected CBR schemes was also presented. Network simulation was conducted with Java-based Atarraya discrete-event simulation toolkit while statistical analysis was carried out using MATLAB. It was observed that the Periodic, Event-Driven and Query-Based Routing (PEQ) schemes performs better than Low-Energy Adaptive Clustering Hierarchy (LEACH), Threshold-Sensitive Energy-Efficient Sensor Network (TEEN) and Geographic Adaptive Fidelity (GAF) in terms of network lifespan, energy consumption and network throughput.


I. INTRODUCTION
Wireless sensor network (WSN) is a rapidly evolving field of study which has paved the way for innovative solutions in the area of near and remote sensing and real-time event monitoring (Ingelrest et al., 2005;Crnjin, 2011). Contemporary advancements in digital electronics, wireless radio technology, nanotechnology, nano-electromechanical systems (NEMS) technology and digital signal processing (DSP) have facilitated the design of smart dusts, motes and other forms of miniaturized sensor devices (Dixit et al., 2011;Dishongh and McGrath, 2010). These technological leaps and engineering breakthroughs made the idea of sensor networking feasible which led to the proliferation of useful and interesting applications for pervasive computing, strategic surveillance, weather observations, wildlife tracking, and inferno detection and control (Cecilio, 2010;Akyildiz et al., 2002).
WSN can be conceptualized as a number of sensor nodes randomly deployed in a geographical terrain to either form a sparse or dense network (Chen et al., 2011;Oliveira and Rodrigues, 2011). The architecture of these nodes consists of a battery, digital processor, radio unit and sensing modules (Karapistoli et al., 2010;Ullah, 2010). The deployed nodes operate collaboratively in an intra-networking fashion to achieve a global sensing task by forwarding the desired information to the sink or base station (BS) (Jain et al., 2011;Li et al., 2011). The quality of data transmission to the BS is affected by external factors influenced by the prevailing environmental conditions of the deployed area and internal factors associated with limited energy and computational resources of these nodes (Akkaya and Younis, 2005;Al-Karaki and Kamal, 2004).
In addition to this, it must be mentioned that these nodes can be deployed in remote and inaccessible regions where battery recharge or replacement can be extremely difficult (Beutel et al., 2009;Dargie and Poellabauer, 2010). Therefore, the goal of WSN designers is to maximize the battery's lifespan amidst the afore-mentioned constraints and limiting environmental factors (Schoellhammer, 2010;Ma and Romer, 2014). Studies have empirically demonstrated that data transmission and reception are high energyconsuming network operations (Salami et al., 2011;Karapistoli et al., 2010). Low duty cycling was proposed as a basic hardware solution by switching ON the sensor's radio unit only when the network is triggered to monitor an event. Software-based approaches for minimizing energy consumption hinge on reducing redundancy in data transmission by employing data filtering, data fusion, data aggregation and data compression techniques (Bello-Salau et al., 2011;Ullah et al., 2010).
Presently, there are a lot of routing algorithms proposed for WSN but the goal of realizing an energy-efficient and optimal routing protocol still remains an open study problem due to the tripartite trade-off between network coverage, connectivity and lifespan (Hou et al., 2005;Akgul et al., 2009;Wang, 2010). The implication of this trade-off is that in an effort to optimize network performance for any one of these factors, there is consequential performance degradation with respect to the two residual factors (Hayajneh and Khasawneh, 2011;Abdelzaher, 2011). This coupled problem is one of the banes and challenges of WSN design.
Topological considerations play a crucial role in WSN design and numerous experts have practically demonstrated in their studies that topology control is central to optimal routing and node deployment (Beutel et Stavrou and Pitsillides, 2010;. In view of this, researchers have proposed cluster-based routing (CBR) protocols to address issues pertinent to energy consumption, network lifespan, resource allocation, network coverage, and connectivity management (Al-Ameen, 2010; Wang et al., 2010;Zhang et al., 2009;Salami et al., 2009;Martirosyan, 2008). Studies have also shown that though CBR algorithms may introduce network creation, configuration and maintenance overheads, CBR protocols still yield relatively better network performance than flat network topologies, most especially when performance is quantified based on the aforementioned coverage, connectivity and lifespan nexus (Salami et al., 2011;Martirosyan et al., 2008;Salami et al., 2010;. Consequently, this study is a performance review of selected CBR algorithms. This study is domain-targeted, simulation-based and originality of this work lies in: i) the special focus on the tripartite trade-off between network coverage, connectivity and lifespan which is still an open study issue, and ii) rigorous statistical analysis of selected CBR schemes for deeper insight into the temporal network dynamics. Network simulation was conducted with Javabased Atarraya discrete-event simulation toolkit demonstrated in (Wightman and Labrador, 2009) Li et al., 2011). In addition to this, there are possibilities of having free nodes that do not fall into any of the logical clusters in a particular round of network operation (Martirosyan et al., 2008;Salami et al., 2010). Studies have addressed this issue by adopting a randomized round-robin load balancing scheme that ensures equitable distribution of sensing tasks and assignments over the entire network lifespan (Akyildiz et al., 2002;Salami et al., 2009;Eugster et al., 2003). This enhancement ensures balanced energy consumption, especially for time-critical and long lifespan applications such as environmental monitoring systems where frequent updates and continuous data stream is needed (Cecilio, 2010;Martirosyan et al., 2008).
In its simplest form, CBR algorithms adopt a configuration process that forms uniform-sized clusters with the aim of minimizing the distance between CHs and CMs (Akkaya and Younis, 2005;Al-Karaki and Kamal, 2004). The logical implication of utilizing minimum distance communication is that the energy needed for data transmission and reception is reduced which is a key performance goal of WSN (Bello-Salau, 2011;Salami et al., 2011;Hussaini et al., 2012). Therefore, this section provides a concise assessment of selected CBR strategies for WSN.

A. Low-Energy Adaptive Clustering Hierarchy
Low-Energy Adaptive Clustering Hierarchy (LEACH) is an adaptive and self-organizing CBR technique that reduces the rate of energy dissipation for WSN by adopting a randomized rotation of CHs (Heinzelman et al., 2000). This ensures that the high energy cost of data transmission to the BS is balanced and evenly distributed among all nodes in the network Heinzelman et al., 2000). LEACH operates in two stages, namely; set-up (or election) and steady (or operation) stage (Salami et al., 2011;Heinzelman et al., 2000). The set-up phase for a given round of network operation entails electing CHs based on the criteria that randomly generated votes for such elected nodes must be greater than a specified threshold (Martirosyan, 2008;Heinzelman et al., 2000).
After this election phase, the network enters an advertisement phase where CMs identify their CHs based on the received signal strength (RSS) of advertisement (ADV) packets (Martirosyan, 2008;Heinzelman et al., 2000). The rationale behind this is to group CMs to the nearest CH in order to minimize energy expended for intra-cluster communication (Heinzelman et al., 2000). Afterwards, neighbouring CMs form clusters with their nearest CHs by successfully exchanging acknowledgment (ACK) packets (Akkaya and Younis, 2005;Heinzelman et al., 2000).
For the purpose of intra-cluster communication, timedivision multiple access (TDMA) is used to allocate time for the CMs while the CHs use data aggregation to reduce the received packets into an encapsulated form for onward transmission to the BS via single-hop communication (Al-Karaki and Kamal, 2004;Heinzelman et al., 2000). After a successful round of data transmission to the BS, the network goes into a reconfiguration phase where new CHs are elected (Salami et al., 2011;Heinzelman et al., 2000). Power-Efficient Gathering in Sensor Information Systems (PEGASIS) was proposed as an extended version of LEACH (Hussaini et al., 2012;Heinzelman et al., 2000).
With respect to lifespan analysis of LEACH, this is achieved by incorporating data aggregation and randomized CH rotation features into the protocol. Therefore, redundant data transmission and energy consumption are respectively minimized (Salami et al., 2011;Heinzelman et al., 2000). Accordingly, these features enhance network lifespan.
The LEACH protocol faces connectivity issues due to the implicit assumption of the radio propagation model that sensor device maintains an altitude of 1 meter above ground level (Bello-Salau, 2011;Heinzelman et al., 2000). However, in practical field scenarios, there are different terrain dynamics, topographical undulations, natural obstructions, and manmade occlusions (Salami et al., 2011;Heinzelman et al., 2000). These artifacts lead to interference, attenuation, multipath effects and signal fading, all which deleteriously affect network connectivity.
With respect to coverage analysis, this protocol also faces coverage problems when it is utilized in a dense WSN scenario due to the fact that it relies on single-hop communication which is inefficient and unreliable for longrange communications (Salami et al., 2009;Heinzelman et al., 2000). Moreover, the non-uniform distribution of CHs means a portion of the network will have full coverage while other network segments will suffer from poor or lack of coverage due to the absence of CHs in their neighborhood (Martirosyan, 2008;Heinzelman et al., 2000).

B. Threshold-Sensitive Energy-Efficient Sensor Network
The Threshold-Sensitive Energy-Efficient Sensor Network (TEEN) was proposed as an energy-efficient solution for time-critical applications where there is need to observe, capture, and respond to sudden changes in periodic data occurrences from monitored events (Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002). In this algorithm, the process of CH election and cluster formation is centrally controlled by the BS (Manjeshwar and Agarwal, 2015). The uniqueness of this protocol is that CHs closer to the BS are assigned higher priority than CHs farther from the BS (Martirosyan, 2008;Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002). This prioritized CH arrangement facilitates inter-cluster communication of aggregated data to the BS Manjeshwar and Agarwal, 2015). In order to control the duty cycle of CMs, CHs use an adaptive MAC scheme by broadcasting hard and soft threshold values within their clusters Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002).
Essentially, hard threshold is the minimum value for a sharp change in the monitored event necessary to trigger the CMs to turn ON their radio units for data transmission (Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002). On the other hand, soft threshold sets the minimum value for a gradual change in the sensed attribute necessary to activate and wake up the CMs for data monitoring (Akkaya and Younis, 2005;Manjeshwar and Agarwal, 2015). This MAC scheme helps in reducing data transmissions, especially for the case of events with little or no significant changes in their monitored values (Salami et al., 2011;Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002).
With respect to lifespan analysis, TEEN utilizes count timers, scheduling intervals, adaptive MAC scheme and data aggregation. Therefore, redundancy in data transmission is considerably reduced and scarce energy resources are conserved (Salami et al., 2010;Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002). These features, in turn, enhance network lifespan.
TEEN suffers from connectivity issues because the radio propagation model is based on the assumption that the distance between nodes is very short (Bello-Salau, 2011;Manjeshwar and Agarwal, 2015). However, this proximitybased sensitivity assumption does not always hold for practical deployment scenarios which lead to limited communication range problems and connectivity issues (Manjeshwar and Agarwal, 2015;Salami et al., 2011).
Concerning coverage analysis, TEEN also suffers from coverage problems, especially when CHs are located outside each other's transmission radius (Salami et al., 2009;Manjeshwar and Agarwal, 2015). This means that threshold notifications, network queries and configuration (CFG) packets can easily get lost since there are no corrective measures (reliability and fault-tolerant techniques) for likely cases of persistent collisions and prolonged signal loss (Martirosyan, 2008;Manjeshwar and Agarwal, 2015). This is particularly true for WSN applications where frequent and periodic data readings need to be forwarded to the BS. In such cases, there is a high possibility that sensor readings will not reach the hard and soft threshold levels (Salami et al., 2009;Manjeshwar and Agarwal, 2015;Manjeshwar and Agarwal, 2002). Therefore, this is tantamount to coverage shutdown as data will not be reported to the BS even when there is enough energy and computational resources for data transmission.

C. Geographic Adaptive Fidelity
Geographic Adaptive Fidelity (GAF) is a location-based algorithm which was originally contrived for mobile ad hoc networks (MANETs) but later adapted for WSN (Xu et al., 2001). In this protocol, global positioning system (GPS) is used to estimate and create uniform logical (or virtual) grids over the entire deployment area (Oliveira and Rodrigues, 2011;. This makes triangulation, mobility estimation and geographic interpolation feasible since the clusters logically represent real-world geographic locations. In the network discovery phase, the BS assigns a CH for each logical grid containing CMs associated with the same estimated location (Akkaya and Younis, 2005;Xu et al., 2001).
However, unlike in previously mentioned CBR schemes, these CHs do not perform data aggregation and they are not necessarily responsible for data transmission to the BS (Salami et al., 2011;Xu et al., 2001). After the discovery stage, the network enters an active state for an allotted period of time when radio units of CMs are turned ON in order to monitor and report events (Salami et al., 2010;Xu et al., 2001). Mobility support is achieved by ensuring that each node in a logical grid calculate its time-to-travel (TTT) and broadcast this information as a TTT packet to its neighbouring nodes. The essence of this is to trigger and wake up one of the neighbouring sleeping nodes before the time stamp on the TTT packet expires.
As regards lifespan analysis, GAF has a sub-optimal energy conservation performance due to the algorithmic complexities and high computational costs associated with the frequent GPS-based estimates, updates and notifications (Salami et al., 2010;Xu et al., 2001). In addition to this, load balancing is not ensured as CHs are not employed for relaying data to the BS (Salami et al., 2011;Xu et al., 2001). Resultantly, these factors lead to network lifespan deterioration.
With respect to connectivity analysis, GAF incorporates a two-ray ground model together with the GPS-enabled computations which makes it easier to account for multipath effects, signal fading and losses. These features yield superior performance in terms of connectivity, especially for tasking and demanding applications where the WSN need to be regularly updated and scaled to larger sizes in order to match up with increasing demands (Salami et al., 2011;Xu et al., 2001). GAF faces coverage problems for tactical networks deployed in geographical regions and war zones with unfavourable environmental conditions such as flooding, wet forest canopy conditions and other forms of harsh weather conditions (Salami et al., 2009;Xu et al., 2001). In such situations, TTT estimation and mobility support becomes extremely difficult, if not impossible, because robust preventive measures are not hardcoded into the protocol (Martirosyan, 2008;Xu et al., 2001). Therefore, network queries, CFG packets and other broadcast information used for network setup and coverage maintenance easily get lost in this situation.

D. Periodic, Event-Driven and Query-Based Routing
Periodic, Event-Driven and Query-Based Routing (PEQ) is designed and customized to meet the needs of WSN deployed for time-critical surveillance and reconnaissance applications (Boukerche et al., 2005). The basic data transmission mechanism is by utilizing the hop level of nodes (Martirosyan, 2008;Boukerche et al., 2005). In the discovery phase of this algorithm, the BS broadcasts CFG packets to the network in order to ascertain shortest distance to the BS (Akkaya and Younis, 2005; Boukerche et al., 2005). This CFG packet updates or increments the time stamp, hop level and source address to nearest neighbouring nodes as the packet is received and re-forwarded throughout the network (Salami et al., 2011;Boukerche et al., 2005).
It must be mentioned that before updating the content of the CFG packet, each node performs a hop value comparison check that retains only hop levels smaller than the existing value stored in the register of the node (Martirosyan, 2008;Boukerche et al., 2005). This process is repeated until the entire network is configured with shortest distance information. After the discovery phase, the BS broadcasts subscription (SUB) packet to the entire network (Akkaya and Younis, 2005;Eugster et al., 2003;Boukerche et al., 2005). This allows any node that has detected an event matching the BS interest to subscribe to this request and utilize nearest neighbour multi-hop communication to relay the desired data to the BS (Eugster et al., 2003;Boukerche et al., 2005). In addition to this, this protocol incorporates an ACK-based for link repair and fault tolerance (Salami et al., 2009;Boukerche et al., 2005). Inter-Cluster Communication-Based Energy-Aware Routing (ICE) was proposed as an enhanced version of the PEQ algorithm (Boukerche and Martirosyan, 2007).
With respect to lifespan analysis, this protocol suffers from energy conservation issues due to the frequent flooding and redundant broadcasting of CFG and SUB packets to the network (Salami et al., 2010;Boukerche et al., 2005). In addition to this, CHs are not employed for data transmission to the BS. This makes it difficult to ensure load balancing and well balanced energy consumption (Salami et al., 2011;Boukerche et al., 2005). Therefore, management of energy resources becomes extremely challenging, especially when the network consists of a large number of mobile nodes. PEQ also faces connectivity problems due to the assumption of the radio propagation model that inter-nodal distance is very short in order to allow seamless multi-hop transmission (Bello-Salau, 2011;Boukerche et al., 2005). This assumption does not always hold for practical deployment scenarios where: i) deployment is random and not preplanned, and ii) sensitivity of the transceiver is affected by noise, location, and other prevailing environmental conditions (Salami et al., 2011;Boukerche et al., 2005).
Regarding coverage analysis, PEQ incorporates low latency support and ACK-based path repair mechanism. Reliability and robustness is ensured which leads to superior network coverage performance (Salami et al., 2011;Boukerche et al., 2005). The advantage of these functionalities is that important notifications, network queries, CFG, SUB, and data packets sent across the network will have a successful end-to-end delivery (Martirosyan, 2008;Boukerche et al., 2005).

III. DEFINITION OF STATISTICAL METRICS
This section provides a succinct textual and mathematical description of selected statistical metrics for analysing the performance of CBR techniques. The selected statistical measures considered in this study context are cluster centrality (μ), cluster proximity (ψ), clustering coefficient (δ), cluster dissimilarity (λ), and cluster connectivity (τ) (Kolaczyk, 2009;Wang, 2009).

A. Cluster Centrality
This statistical measure quantifies the centrality or importance of a CH in a WSN by measuring how the close the CMs are to the CH in any given cluster. The importance of this measure is to ascertain how critical the routing operation of CH is to the flow of data traffic in the WSN. The normalized form of this measure is mathematically expressed as (Kolaczyk, 2009) From Eq. (1), NC is the number of nodes in a given cluster (C) and dist() is a function that essentially evaluates the logical distance between the CH and its CMs.

B. Cluster Proximity
This metric measures the extent to which a CH is within the proximity (in terms of shortest path distance) of its CMs. This metric is vital and more informative than μ as CMs with shortest path distance to the CH are critical for data transmission process in order to ensure energy conservation. The normalized form of this metric is mathematically expressed as (Kolaczyk, 2009) From Eq.
(2), prox() is a function that computes the shortest distance between the CH and its CMs.

C. Clustering Coefficient
This statistical metric quantifies the degree (or rate) at which the CMs form a group or cluster around their CH. This metric is very useful for understanding cluster density and relevance of nodal distribution in forming useful clusters for efficient data transmission in the network. The normalized form of this measure is mathematically expressed as (Kolaczyk, 2009)

D. Cluster Dissimilarity
This statistical measure quantifies the degree of overlapping CMs shared between neighbouring logical clusters. This measure is very useful in order to have a deeper insight into how to ensure network load balancing and fair utility among CMs in the WSN. The normalized form of this metric is mathematically expressed as (Kolaczyk, 2009):

E. Cluster Connectivity
This metric estimates the minimum number of hops necessary for CMs for intra-cluster data transmission. This metric gives a technical insight into the effects of multi-hop data transmission on conserving network energy. The normalized form of this measure is mathematically expressed as (Kolaczyk , 2009) Table 1.  Tables 2, 3 and 4.   In the case of S-WSN of Table 2, it is observed that PEQ and GAF shows better performance than LEACH and TEEN in terms of μ, ψ, δ and τ. This means that the CMs and CH in a given cluster necessary for energy-efficient data transmission are more likely to be easily and quickly configured and utilized for effective nodal distribution in PEQ and GAF than in LEACH and TEEN. The technical reason for this observation is as a result of the fault tolerance, path repair mechanism, mobility support and other algorithmic enhancements incorporated into PEQ and GAF. However, it is observed that LEACH and TEEN exhibits better performance than PEQ and GAF with respect to |λ|. This means that there are relatively fewer cases of redundant data transmission from overlapping clusters which ensures load balancing and fair utilization of CMs. The technical reason for this is the algorithmic simplicity, randomized round-robin load balancing scheme and other lightweight protocol enhancements used in LEACH and TEEN. The aforementioned trend observed in Table 2 for S-WSN with respect to μ, ψ, δ and τ is also observed in Table 3 and Table  4 for M-WSN and D-WSN, respectively.
In contrast to Table 2, it is observed that the values obtained in Table 3 and Table 4 displayed marked increment which is as a result of the corresponding increase in the simulation parameters, N and A. In addition to this, network performance in terms of |λ| for M-WSN and D-WSN in Table  3 and Table 4  configuration/arrangement) on routing performance and overall network maintenance.
V. RESULTS AND DISCUSSION OF NETWORK SIMULATION The simulation results for the M-WSN scenario are presented in the following sub-sections. The technical reason for utilizing the M-WSN network scenario is that results obtained from this scenario have been proven to be relatively more practical, realistic, and most importantly, useful and applicable to other network scenarios (S-WSN and D-WSN) within an acceptable margin of accuracy. The performance metrics employed in this study are network lifespan, energy consumption and network throughput which are well-known and accepted metrics for measuring performance of CBR algorithms in WSN study.

A. Network Lifespan
In Fig. 1, it is observed that the PEQ scheme enhances the lifespan of the network better when compared to GAF, TEEN and LEACH. It is also observed in the PEQ scheme that there is a prolonged period (from 3000 rounds till the point of network extinction) where there is very little number of nodes in operation. With reference to the point of network extinction, the PEQ scheme shows an improvement of 49.67%, 47.06% and 42.85% in terms of network lifespan over LEACH, TEEN and GAF respectively. The technical reason for this observed performance in LEACH, TEEN and GAF is that due to the rapid death of a number of nodes after many rounds of network operation, the CH election process becomes unstable and resultantly, residual nodes have lesser chances of becoming CH. This observed performance buttresses the trade-off between network lifespan and reliable data transmission after a prolonged period of operation with relatively lesser residual nodes.

B. Energy Consumption
In Fig. 2, it is observed that the PEQ scheme consumes more energy than LEACH, TEEN and GAF when there are comparatively fewer number of sensor nodes (< 30) but as the size of the WSN grows larger, the PEQ scheme conserves energy better than other standard clustering routing protocols. With reference to the average energy consumption, the PEQ scheme shows an improvement of 10.28%, 7.56% and 4.67% in terms of energy conservation over LEACH, TEEN and GAF respectively. The technical reason for this observed trend is that PEQ introduces processing costs, overheads and computational complexities which are energy-consuming. However, by incorporating load balancing, the benefits of PEQ outweigh the associated complexity costs in the long run.

C. Throughput
In Fig. 3, it is observed that the PEQ scheme enhances the network throughput when compared to GAF, TEEN and LEACH. It is also observed in the BED scheme that there is a long duration (beyond 3000 rounds till end of network operation) where there is considerably low throughput. With reference to the end of network operation, the PEQ scheme shows an improvement of 54.12%, 46.34% and 39.22% in terms of network throughput over LEACH, TEEN and GAF respectively. The technical reason for this is that more packets are successfully transmitted to the BS in PEQ as a result of the inbuilt network stabilizing measures which averages out the death rate equally among all competing nodes based on their residual energy.

VI. CONCLUSION
The coupled problem of topology control and energyefficient routing is still an open, trending, interesting and significant topic of study in WSN. This study investigates the performance of selected CBR algorithms by spotlighting the impact of topology and network configuration on energy consumption patterns and overall network performance. This study is domain-targeted, simulation-based and originality of this study lies in: i) the special focus on the tripartite tradeoff between coverage, connectivity and lifespan, and ii) rigorous statistical analysis of selected CBR schemes. Network simulation was conducted with Java-based Atarraya discreteevent simulation toolkit while statistical analysis was carried out using MATLAB statistical package. The selected statistical measures considered in this study are cluster centrality (μ), cluster proximity (ψ), clustering coefficient (δ), cluster dissimilarity (λ), and cluster connectivity (τ). The network performance metrics employed in this study are network lifespan, energy consumption and network throughput. The obtained network simulation and statistical results clearly demonstrate and support the vital role of topology (and network configuration) on energy consumption, routing performance and overall network management and maintenance.