Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Genealogy Tree: An Academic lineage for authors and their Advisors,siblings and Students Snehanshu Saha, Gouri Ginde, Sourav Poddar, Sandra Anil, Saijal Shrivastava, Somya Bansal, Archana Mathur, Harika Samala, Namita Chaukimath, Shobhit Kumar October 27, 2016 Abstract Genealogy tree gives information about the researcher and his scholastic lineage which is of paramount importance in today’s world of computer technology .Gaining an insight into academic genealogy could be a way of helping phd students or early career academics in the field ,achieve Academic socialization within the discipline by making explicit connections that may be influential .Awareness of his scientific heritage , gives the user a broader perspective of his own research project .The paper puts forth a software model which creates genealogy tree of any academician .This software intends to become a reference tool ,made more reliable through contributions of the users. 1 INTRODUCTION Genealogy is an account of descent of a person, family or group from an ancestor or from older forms.It is the study of the history of the past and present members of a family or families. Historical records are used for genealogical research. The ideal sources are original records mostly primary or firsthand information and the conclusions which can be drawn from them. Source citation is also important while conducting genealogical research. Academic genealogy is tracing the mentoring relationships of doctorate students. A genealogy tree can be formulated on this basis where student is considered a child with his adviser as the parent. A student can have multiple advisers. Students of the same adviser belong to a common sibling network in the tree.This tree traces the academic pedigree of each entity in it. Over the years the number of people pursuing PhD have increased, leading to an exponential growth of the academic genealogical tree. With this rising number, keeping track and documentation of scholastic relationships between scientists has become difficult. An attempt in this direction has been made the American Mathematical Society by means of their Mathematics Genealogy Project. Their objective is to catalog the complete mathematics community. It gives information of an author, his ancestry and lineage in the tree along with his dissertation and year of being awarded the degree. A similar software model has been put forth by this paper for the department of Computer Science. This would hold information about all the scientists who have contributed to the field at research-level. The database is built by contributions of the scientists who input their details like dissertation and year and institute of procuring degree. This database is then searched based on user input. The tree obtained can be based on two criterion: author or domain. The tree based on author describes the author’s heritage and his descendants.Details about the author’s degree are also provided in this genealogical tree. Computer Science can be perceived as an umbrella housing a large number of domains which have multiple research areas within them. The domain based tree traces the complete hierarchy of scientists who have contributed to it. 2 COMMUNITY DETECTION MODEL In this section we discuss the concepts used for detecting communities among authors by calculating citations .We also discuss about the different cases encountered during the process of community detection ∗*This work was supported by PES Institute of Technology Bangalore South and Indian Institute of Technology Patna in the form of funding research associates Gouri Ginde. †2. Authors are affiliated to Faculty of Computer Science and Engineering and Center for Appplied Mathematical Modeling and Simula- tion(CAMMS), PESIT South Campus, Bangalore, India.
  2. 2. • Community A network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally • Community Detection The adjacency matrix c[i][j] is an author id matrix where the value present at the intersection of ith row and jth column is the number of times author i cities author citation list of all authors is represented by t[i] A B C D E F G H                       A 25 21 18 0 0 0 0 0 B 17 3 0 23 0 0 0 0 C 25 0 15 15 10 0 0 0 D 0 22 5 53 0 10 16 0 E 0 0 0 0 0 0 20 0 F 12 0 0 7 0 0 0 0 G 0 0 0 0 0 4 0 0 H 6 0 0 0 0 0 0 41 Figure 1: Author Citation Matrix for Sample Graph 1: Input: An adjacency matrix c[i][j] representing citation information,author ids,total citations total[i] 2: Output: An equivalence class of authors 3: c[i][j] represents the number of citations done by author i to author j 4: Diagonal entries of c[i][j] represents self citations of author stored as x 5: for every author id iinthematrixc[i][j] do 6: for every author id jinthematrixc[i][j] do 7: if x >= 0.5∗t[i] then 8: corrupt author count+ = 1 9: sel fcite author count+ = 1 10: Forming list for realtionships 11: r sel f = author id 12: end if 13: end for 14: end for 15: for every author id iinthematrixc[i][j] do 16: for every author id jinthematrixc[i][j] do 17: if c[i][j] > 0.5∗t[j] then 18: k = k +1 19: if c[i][j] > 0.5∗t[i] then 20: s = s+1 21: corrupt author count+ = 2 22: r bidirectional = author id i : author id j 23: else 24: r unidirectional = author id i 25: end if 26: end if 27: end for 28: end for 29: if k=s=z then 30: Forming dictionary for Mafia Network 31: rmafia=author id i:z 1,z 2,z 3 ......z n 32: end if 33: Forming relationship using the output r self,r unidirectional,r bidirectional and rmafia gives the community network. • This algorithm checks for every author id i if the number of selfcitation of i is greater than threshold of its total citations. • When the number of self citation of an author are greater than threshold percentage of total citations,the author is said to be corrupt and incremented by 1 when the self citation is greater than threshold percentage of total citations then self cite author count is incremented.
  3. 3. qA qB qC qD qE qF qG qH 21 17 18 25 7 10 20 4 710 16 15 5 12 6 2223 25 3 15 53 41 Figure 2: Sample Author Network • r self is a list of authors who have selfcitied more than the given threshold value author id i which satisfies the if condition is added to this list • Variable k keeps count of authors who have been cited more than the threshold value by author j • Variable s keeps count of authors who have been cited more than the threshold value by author i • For every author i cited more than the threshold value by author j ,increment k • For every author j cited more than the threshold value by author i ,increment s • If bidirectional relationship exists between author i and author j ,corrupt author count is incremented by 2 and author ids of i and j are added to the dictionary r bidirectional as a key value pair where author id of i is the key and the author id of j is value • If unidirectional relationship exists between author i and author j ,corrupt author count is incremented by 1 and author id of i is added to the list r unidirectional which keeps track of unidirectional relationships • If k and s are equal to the numeric parameter z then they are added to the dictionary rmafia where any one of the author ids of the network acts as a key to access all the other authors • LOCAL CITEConsider an author has 200 citations.Out of 200 citations if 70 percent of author citation is from siblings then list all the citations who collabroated with this author and also list all citations with others. • If author A cities author B and also author B cities author A then it is said to exist binary realtion between the author A and author B .This information is represented in the form of matrix c[i][j] and the binary realtion is represented with 1 • SUSPECTED AUTHORThe Author holding the comparable binary realtionships is said to be suspected author
  4. 4. A B C D E F G H                       A 25 21 18 0 0 0 0 0 B 17 3 0 23 0 0 0 0 C 25 0 15 15 10 0 0 0 D 0 22 5 53 0 10 16 0 E 0 0 0 0 0 0 20 0 F 12 0 0 7 0 0 0 0 G 0 0 0 0 0 4 0 0 H 6 0 0 0 0 0 0 41 Figure 3: Author Citation Matrix for Sample Graph Algorithm 1 MAFIA IDENTIFICATION 0 1: Input: Collection of large data sets for citation information reprsented by matrix M 2: Output: Calculating Threshold and identifying binary Realtions between Suspected authors 3: To calculate threshold for an author in suspected list L 4: for doi in list L 5: for do j in list L 6: Threshold ← ∑c[i][j]÷suspected authors 7: if thenThreshold < x x is calculated from trend among siblings 8: c[i][j] > Definedvalue Obtained from trend algorithm 9: the author is said to be involved in mafia 10: end if 11: end for 12: end for