Data Gathering And Dissemination In Wireless A wireless sensor network is special kind of ad hoc networks that consists of a number of low-cost, low-power, and multi-functional wireless sensor nodes, with sensing, wireless communications and computation capabilities [1,2,3]. These sensor nodes communicate over a short range via a wireless medium and collaborate to accomplish a common task, like environmental monitoring, military surveillance, and industrial process control [3]. Wireless sensor networks have open up for new opportunities to observe and interact with the physical environment around us. They enable us now to collect and gather data that was difficult or impossible before [4]. Although Wireless Sensor Networks have given new ways to provide information from variety of applications, irrespective of the nature of physical environment, it is seen as a challenging task to extract data from sensor network. Data dissemination and gathering are two terms used in sensor networks to describe two categories of data handling methods. Data dissemination is a process by which data and queries for data are routed in the sensor networks where as data gathering is to transmit data that has been collected by the sensor nodes to the base stations. Data gathering protocols aim to minimize the energy consumption and delay of data gathering process [5]. Although there are differences between these two but almost all the literature called together as routing protocols. Unlike traditional wireless communications networks such as mobile ad hoc and cellular systems, wireless sensor networks have the following unique characteristics and constraints [3]: high density sensor node deployment, battery or no power sensor nodes, low memory and processor capacity, self-configurable, unreliable sensor nodes, data redundancy, application specific and dynamic topology. Due to above characteristics and constraints of wireless sensor networks, the extraction of data from the network is always a challenge. Therefore, it is importa nt that the design of protocols for data gathering and dissemination takes care about these challenges. The main design challenges of routing protocols for wireless sensor network are: Energy, Processing power and Memory. Some of the design challenges as reflected in [3, 6] are highlighted below: Large number of sensor nodes: Since most of the wireless sensor networks composed of large sensor nodes, it is very difficult to have an addressing scheme like other wireless networks. The traditional IP scheme is not feasible to apply for wireless sensor networks. Moreover, the sensor nodes are deployed at random in hostile environment. Limited energy capacity: The sensor nodes are battery powered, so they have limited energy. This is the main challenges in designing wireless sensor networks. In practice, sensor network deployment makes sense only if they can run unattended for months and years without running short of energy [4]. Flow of Data: Almost all the applications of sensor network require the sensory data from multiple sources to flow towards a single destination node called sink in contrast to the traditional networks. Sensor node locations: Most of the proposed routing protocols assumed that the sensor nodes are equipped with global positioning system (GPS), but in practice it is very difficult to manage the locations of sensor nodes. It has become more challenging as sensor networks topology changes frequently due to node failures, moving from the coverage area. Data redundancy: Data collected by various sensor nodes are typically based on common phenomenon; hence the probability of data redundancy is very high. The routing protocol needs to incorporate data aggregation techniques to decrease the number of transmission. Application Specific: The sensor networks are application specific. The requirement of routing protocol changes as per the specific application. It is very challenging to design routing protocols which can meet the requirements of all applications. Scalability: The size of the network grows, so the routing protocols need to be scalable to support the addition of sensor nodes. All sensors may not necessarily have same capabilities of energy, processing, sensing and communications. These should be taken care while designing the routing protocols. Addition to the above parameters the designing of routing protocols for wireless sensor networks also need to look into following points [6]: Node deployment Related work: Since wireless sensor networks gain its usage in various application areas, there is a growing interest in this field leading towards continual emergence of new architectural techniques. Wireless sensor network is widely considered as one of the most important technologies of the 21st century [8]. In this section we bring out and highlight how our survey differs from the similar surveys done previously in this area. We also highlight the scope and target group who will benefit from our work. In [2], similar survey was carried out on routing protocols for wireless networks. The information in [2] was published some five years back and many new protocols have not covered. In [3], although it has covered almost all the routing protocols for wireless sensor networks but it does not provide insight knowledge about the protocols. The survey is good for readers interested in broad area. The goal of [8] is to give a comprehensive survey on routing techniques focusing on mobility issues in sensor networks and does not cover all the routing protocols in wireless sensor networks. In this survey, we bring out the comparative study among wireless sensor network routing protocols bringing their differences and similarities. We also bring out the advantages and disadvantages of different protocols to use in different applications of wireless sensor networks. This survey would be useful for both introductory readers as well as for aspirant researchers who would like to get the comprehen sive idea about the current-state-of-art regarding the techniques of data gathering and dissemination in wireless sensor networks. However, we follow [3] in classifying the routing protocols into different categories although we put some additional protocols which are not covered by [3]. We also excluded multipath-based protocol category since it falls under data-centric category. Table 1 shows the different categories of wireless sensor network routing protocols inspired by [3]. The representative protocols with (*) marks are our additions. Table 1: Routing Protocols for WSNs Category of Protocols Representative Protocol Location-based Protocols MECN, SMECN,GAF, GEAR, Span, TBF, BVGF, GeRaF Data-centric Protocols SPIN, Directed Diffusion, Rumor Routing, COUGAR, EAD, ACQUIRE, Information-Directed Routing, Gradient-based Routing, Energy-aware routing, Information Directed Routing, Quorum-based Information dissemination, Home Agent-based Information Dissemination, *Flooding, *Gossiping. Hierarchical-based Protocols LEACH, PEGASIS, HEED, TEEN, APTEEN Mobility-based Protocols SEAD, TTDD, Joint mobility and routing, Data MULES, Dynamic Proxy Tree-based Data Dissemination, *MDC Heterogeneity-based Protocols IDSQ,CADR,CHR QoS-based Protocols SAR, SPEED, Energy-Aware Routing. Data-Centric Protocols The protocols are differentiated into two categories called data-centric and address-centric. The address-centric routing protocols find the shortest path between source and the destination with addressing scheme like IP whereas in data-centric routing protocols focus is made to search routes from multiple source nodes to a single destination node. In the sensor networks, data-centric routing is preferred where data consolidation and aggregation is done by the intermediate nodes on the data coming from multiple sources before sending to the sink node. This way, it saves some energy preventing redundant data transmissions. In this section, we highlight some of the samples of data-centric routing protocols proposed for wireless sensor networks. Flooding: Flooding [5] is a data dissemination method where each sensor node that receives a packet broadcasts it to its neighboring nodes assuming that node itself is not the destination of the packet. This process continues until the packet arrives the destination or the maximum hop counts for that packet is reached. Flooding though is a simple and easy to implement, but it has problem like implosion (duplicate message sent to the same node) and overlap (duplicate message receive by the same node) [2]. Figure 1 and 2 reproduced from [2] shows the implosion and overlap problems in flooding. Gossiping: Gossiping [5] is based on flooding, but nodes that receives the packet forwards it only to a single randomly selected neighbor. It avoids implosion problem of flooding and it does not waste as much network resources as flooding. However, gossiping is not a reliable data dissemination method since the neighbor node is selected at random, some nodes may not receive that message at all. Moreover, it introduces a delay in propagation of data through the nodes [2] since all the nodes which forwards or sends data need to select a node. SPIN: Sensor Protocols for Information via Negotiation (SPIN) [9, 10] aims to improve the implosion and overlapping problems of classic flooding protocol. The SPIN protocols are based on two key mechanisms namely negotiation and resource adaptation [3]. It uses three types of messages [5]: ADV, REQ, and DATA. The sensor node which has collected data sends an ADV message using high-level descriptors or meta-data regarding the actual data. The actual data is transmitted only when the REQ message is received from the interested nodes. This negotiation mechanism avoids the overlapping and implosion problems of classic flooding because the REQ message is sent from the interested node only when it does not have that data. Fig. 3, redrawn from [5] shows how these three messages are exchanged and fig. 4 inspired by [9] and reproduced from [11] shows more detail process who SPIN works. There are about four versions of SPIN protocols [6, 9, 10]. They are SPIN-PP, SPIN-BC, SPIN-EC and SPIN-RL. Both SIPN-PP and SPIN-BC works under ideal condition when energy is not constraint and packet are never lost. SPIN-PP tackles the data dissemination problem by using point to point media where as SPIN-BC uses broadcast media. There other two protocols are the modified versions of SPIN-PP and SPIN-BC in order to network which are not ideal. SPIN-EC is actually SPIN-PP with additional energy conservation capability. Under SPIN-EC, the nodes participate in data dissemination only when it computes that it has enough energy. If the node has plentiful energy, it works as same as SPIN-PP with 3-stage handshake. SPIN-RL is a version of SPIN-BC which tries to recover from the losses in the network by selectively retransmitting the messages. In SPIN topological changes are localized as each node needs to have information of their next immediate one-hop neighbor only. But this type of protocol cannot be used in applications where reliability is of greater concern like forest fire and intrusion detection since it does not guarantee the data delivery [2]. If the nodes that are interested in data are located far way and the intermediate nodes are not interested then the ADV message will not received which in turn will not able to get data. Directed Diffusion: Directed Diffusion [12] consists of elements like interests, data, messages, gradients and reinforcements. The main objective of the protocol is to use naming scheme to reduce the energy usage by avoiding unnecessary routing operations. Interest is a query or interrogation on what user wants and it contains descriptions of a sensing task. Data is the collected or processed information of a physical phenomenon which is named using attribute-value pair. Gradient is a link a neighbor from which interest was received, and it is characterized by data rate, duration, and expiration time which has derived from the received interest filed [2]. A node, usually sink will be broadcasting interest to request data by diffusing interest through its neighbors. The interests are periodically refreshed by the sink. When this interest is received by the intermediate nodes, they cache for future use, or do in-network data aggregation or direct interest based on previous cached data. The source node sen ds the data back through the reverse path of the interest. When data is received by the nodes, they try to compare with the interest cache before. The data which matches the interest is drawn and then sent via the same path where the interest has received. Out of several paths between sink and the source, one path is selected by network by reinforcement. Once this path is selected, the sink sends the original interest again with smaller time interval so as to make the source node on the selected path to send data more frequently. Although directed diffusion has advantages that the protocol can in-network data aggregation and caching which saves energy but this protocol cannot not be applicable to all the applications of wireless sensor networks. The protocol can only be applied to such application which is query driven. It is not suitable for the applications such as forest fire detection or intrusion detection. Fig. 4, copied from [12] shows the working of the protocol. Rumor Routing Rumor routing [13] another variation of Directed Diffusion aims to direct the query to the nodes which have observed event rather than flooding the entire network [2]. It is a logical compromise between query flooding and event flooding [3]. This protocol is only useful if the number of queries compared to number of events is between the two interaction points. See fig. 5, redrawn from [13]. Rumor routing algorithms introduces an agent, a long live packet. An agent, which also contains an event table like nodes, travels the network propagating information about local event to the distant nodes. The agent informs the nodes it encounters of any events it has observed on its way and at the same time it will synchronize its event table with the event table of encountered node. An agent will travel the network for certain number of hops and then die. All the nodes including an agent maintains an event table list that has event-distance pairs, as shown in fig. 6, copied from [13]. So when a node generates a query for an event, the nodes that knows the route, can respond to the query by referring its event table [2]. In this way, flooding the whole network is avoided. Directional rumor routing is proposed in [14], which try to improve latency and energy consumption by considering query and event propagation in straight line instead of random walk in normal rumor routing. Cougar Cougar [15, 16] is a database approach for tasking sensor networks through declarative queries. Since in-network computation is much cheaper than transmission and communication between nodes, cougar approach proposes a loosely-coupled distributed architecture to support both aggregation and in-network computation. This helps in reducing energy consumption thereby increasing lifetime. The architecture introduces a query proxy layer in each sensor node which interacts both with network layer and application layers. The gateway node (where query optimizer is located) generates a query processing plan after receiving queries from the sensor nodes. This query plan specifies both data flow between sensor nodes and in-network computation plan at each individual sensor node. The query plan also contains how to select a leader for the query. The query plan can be viewed at non-leader node and at the leader node. Fig. 7 and fig. 8, redrawn from [15], show query plan at non-leader node (source sensor) and leader node respectively. Although, cougar provides solution to interact with the sensor nodes independent from the network layer, but the insertion of proxy layer at each sensor node introduce extra overhead for sensor node in terms of memory and energy consumption [2]. Additional delay may be incurred with the relay trying to wait for the packets from other nodes for aggregation before sending to the leader node. ACQUIRE ACQUIRE [16] is a data-centric routing protocol aiming at large distributed databases. It aims at complex queries which comprise of several sub-queries that are combined by conjunctions or disjunctions in an arbitrary manner. The protocol sends an active query packet into the network. This active query packet is sent by the sink, which takes random path or path predefined or guided. The node which receives this active query packet uses information stored within them to partially resolve the query. If the nodes do not have updated information, they gather the information from their neighboring nodes with the distance of d (look-ahead parameter) hops. When the active query is resolved completely, the response is sent back to the node which has issued the query. Some of the assumptions made in this protocol are [17]: the sensors, with same transmission range are laid out uniformly in a region and they are stationary and do not fail. EAD: Energy-Aware Data-Centric Routing Energy-Aware Data-Centric (EAD) [18] aims to construct a virtual backbone containing all active sensors, which is responsible for in-network data processing and relaying traffic. The radios of other nodes which are not in the virtual backbone are put off to conserve the energy. The sensor network is represented by a broadcast tree rooted at the gateway and spanning all the sensors with large leaf nodes. In order to conserve power, the radios of these leaf nodes are put off while the nodes which are in virtual backbone are active for traffic relaying. The protocol tries to construct broadcast spanning tree network with maximum leaf nodes so that maximum energy can be conserved. The concept of EAD is to include the neighboring broadcast scheduling and the distributed competition among neighbors, based on residual energy [18]. The efficiency of the protocol would be more when the size of the network is small. When the size of the network is large, execution time will be more since the e xecution process propagates from the sink to the whole network. Other protocol like the one proposed by Shah and Rabaey in [19] also aims at increasing network life time. They use network survivability as the main metric and propose to choose one of the multiple paths with a certain probability so that the whole network life time increases. But the protocol assumes that each node is addressable with some addressing schemes. Information-Directed Routing Location-based Protocols Since sensor nodes have limited energy capacity, most of the routing protocols aim to reduce the consumption of energy in routing processes. In most of the protocols location of the sensor nodes are used to find the distance between two communicating pairs in order to find the best possible path with low energy usage. If location of a particular sensor node is known, query can be sent to that particular location only without sending to other regions which will reduce the number of transmission significantly [2]. Location-based protocol makes use of the position information to relay data to the network rather than the whole network. In this section, we describe some of the location-based routing protocols proposed for wireless sensor networks. Minimum Energy Communication Network (MECN): Hierarchical-based Protocols