My primary areas of interest include Algorithms, Networks and Data Mining.
Network design problems occur naturally at many places. For example, a bank may want to install a number of ATMs in a city. Each location in the city is associated with an installation cost. The bank would be interested in identifying the locations to install these ATMs in such a way that the total installation cost plus the cost to service its clients is minimized. This is a typical example of a facility location problem where the ATMs serve as the facilities. Many similar examples can be quoted for the problem such as setting up of fire-brigade stations. In case of the problem of setting up of fire-brigade stations, one may also be interested in connecting these stations so that in case one station does not have a vagon to satisfy a client, the same can be obtained from a nearby station. In such a case, one would be interested in identifying the locations where in addition to the installation cost and service cost, the cost of connecting the fire-brigade stations is also taken into account. This is an example of a \lq connected \rq facility location problem.
In another variation of fire-brigade problem, one may specify the maximum number of vagons available at a particular station. This gives rise to what is known as \lq capacitated \rq facility location problem.
These problems are known to be NP-hard. I am working on developing approximation algorithm for these and other network-design problems like Rent-or-Buy. Various approaches like \lq local search \rq and \lq primal-dual \rq have been used for the purpose.
Clustering is a problem which finds application every where in data analysis. Traditional Clustering problems try to optimize the distances from the center/mean of the cluster. They do not take the connectedness of the points under consideration. Connected k center and connected k means are the problems in which one requires that given some relationship data amongst the objects, the objects within a cluster are related. This makes the problem harder. We are looking at the problem of connected k means
Ad-hoc networks have been proposed to support scenarios where no wired infrastructure exists. They can be set up quickly where the existing infrastructure does not meet application requirements for reasons such as security, cost, or quality. Examples of applications for ad hoc networks range from military operations, emergency disaster relief to community networking and interaction between attendees at a meeting or students during a lecture.
In Mobile Ad hoc Networks (MANET) each node has limited wireless transmission range, so the communication depends on the cooperation of intermediate nodes. Most routing protocols in ad hoc networks rely on implicit trust-your-neighbor relationship to route packets among participating nodes. Lack of infrastructure, central controlling authority and the properties of wireless links make Mobile Ad hoc Networks vulnerable to threats in security. Attacks range from passive eavesdropping in which the attacker may get access to secret information thereby violating the confidentiality to active impersonation, message replay, and message distortion. Attacks may be by an external source which is not a part of the network and hence does not have valid signatures or could be from a compromised node within the network. Chances of a node being compromised in a hostile environment (e.g., a battlefield) with relatively poor physical protection are non-negligible. Therefore, we should not only consider attacks from outside a network, but also take into account the attacks by compromised nodes within the network. Since the external attackers do not have valid digital signatures, erroneous routing information can be identified using cryptographic schemes. However, an erroneous message signed by a compromised node cannot be distinguished from a correct message from a non-compromised node using digital signatures.
Several types of attacks on ad hoc networks have been discussed in literature. Some of these (blackhole or grey holes attack, rushing attack, wormhole attacks) cripple the network by disrupting the route of the legitimate packets while others (flooding attack) inject too many extra packets in the system thereby consuming system resources like bandwidth, memory/computational power of nodes. We are interested in devising methods to protect the routing protocols in ad hoc networks against various types of attacks or reduce the impact of the attacks.
Data Mining and Bio-informatics
One of the stages in the drug discovery is to discover the set of genes responsible for the disease. With the help of microarray experiments biologists are able to study the expression of thousands of genes under a large number of conditions imultaneously. The large scale of the data makes it challenging to analyze it to extract any biologically significant information from it. The output of a microarray experiment is the gene expression data. The gene expression data has the expression of thousands of
genes under thousands of conditions. One approach to reduce the complexity of the task is to group the genes showing similar expression patterns into clusters wherein each cluster is regulated by one or more transcription factors and is responsible for one biological function.
Standard Clustering algorithms like k-means clustering work well for small data sets but fair poorly when the number of experimental conditions is large as they cluster the genes based on their expressions under all the conditions. However, the genes and hence the cellular processes for which they are responsible, are generally affected by a small subset of conditions. Most of the other conditions which do not contribute to the cellular process add to the background noise. Hence we are interested in clustering genes as well as that small subset of conditions under which they are co expressed. Such clusters are called bi-clusters or Transcription Module in the language of biologists. The name TM is derived from the fact that each set of genes is expected to be affected by one or more transcription factors. As a gene may be responsible for several cellular activities it may be included in more than one bi-clusters. Having clustered genes of similar nature, next question is what makes them behave similarly. Thus one is interested in finding patterns in their genetic composition. These patterns are called motifs. Several algorithms are known to discover motifs in sequences. Earlier, people believed that only one TF was responsible for the similar expression pattern of several genes. Recently, people have started exploring the possibility of several motifs being responsible for the similarity, and some positive results have been obtained to this effect. Thus an interesting question evolves: If we know the set of motifs occurring on the promoter regions of a set of genes and their locations too, can we find out certain patterns (some combinations of these motifs) which may be causing the genes to behave similarly.
Research GuidancePh.D. Completed
- Rahul Johari (Jointly with Dr. Sandhya Khurana)
- Geeta Gupta
- Parvaneh Mansouri (Jointly with Dr. Babak Asady)
- Sonika Arora (Jointly with Dr. Yogish Sabharwal)
- Aditya Pancholi (Jointly with Dr. Sambuddha Roy) (DoReg: May 2014)
Research Projects (Major Grants/Research Collaboration)
- Principal Investigator, University Research Grant for project titled "Analysis of Gene Expression Data" 2007-March 2010, completed.
- Principal Investigator, University Research Grant for project titled "Analysis of Gene Expression Data using Bi-Clustering Ensemble Technique" March 2010 - March 2013. Completed.
- Principal Investigator, University Research Grant for project titled "Approximation Algorithms for Replica Placement Problem" October, 2014 onwards.
- Approximation Algorithms for Replica Placement Problem: Anshul Aggarwal, Sachin Sharma (Jointly with Yogish Sabharwal)
- Approximation Algorithms for Replica Placement Problem : Kanika Gupta, Parul Garg, Ankur Sachdeva, Nirbhay (jointly with Yogish Sabharwal)
- Developing software for UG Admissions: Manisha, Avni, Tanu, Kirti, Mayank (Jointly with Aditya Pancholi)
- Developing software for Center Placement Cell: Navdeep and Shailendra (Jointly with Aditya Pancholi)
- Developing Biclustering Ensemble Tool: Navdeep and Shailendra.
- Approximation Algorithms for Replica Placement Problem: Stuti Chawla (jointly with Yogish Sabharwal)
- Approximation Algorithms for Interval Scheduling Problem: Mohit Narula (jointly with Sambuddha Roy)
- Approximation Algorithms for Data Placement problems : Sakshi, Archita
- Approximation algorithms for scheduling problems : Neha Bansal, Vijaya Goel, and Manish Dawar
- Approximation algorithms for Multi Organisation Scheduling problems : Neha Lawaria and Pankaj Kumar
- Routing in Delay Tolerant Networks : Garima and Sonia
- Efficient Routing in Delay Tolerant Networks : Divya Gaur and Surbhi Tripathi
- Ensemble techniques for bi-clustering problem : Deepika Bisht and Soniya Verma
- Approximation Algorithms for Facility Location Problem : Sapna Grover and Apurv Milind
- Approximation Algorithms for Data placement problem : Swati Singhal
- Approximation algorithms for clustering problems : Aditya Pancholi
- Parallel SAT : Ashish Mann and Vashita Arora
- Multi-core SAT : Amrita Goel and Hunarr Pahwa
- Routing in ad hoc networks : Megha Gupta and Simmi Singhal
- Subsequence Mining : Neha Jain and Leena Singhal
- Bi-clustering : Ishan Qureshi, Surbhi Bajaj
- Pattern Discovery Algorithms allowing Wildcard Characters : Ashish Juneja and Savneet Kaur
- Pattern Discovery Algorithms with Mismatches : Pooja Sinha and Shweta Mehra
- Implementation of Parallel Algorithms for Data Mining (jointly with Dr. Vasudha Bhatanagar) : Ashish Mangla and Ashwani Garg
- Implementation of Algorithms for identifying modules in Biological Networks.
- Implementation of Parallel Algorithms for Matrix Computations : Kapil and Tushir Aggarwal
- Implementation of Parallel Algorithms for Graph Problems.
Master's ProjectsProjects Under Progress Currently
2013 - 2014
2012 - 2013
2011 - 2012
2010 - 2011
2009 - 2010
2008 - 2009
2006 - 2007
2005 - 2006