1EVOLVING COLLABORATION PATTERNS IN NORTH AMERICAN RESEARCH USING ADVANCED COLLABORATIVE GRID INFRASTRUCTURES: A CANADIANPERSPECTIVE BASED ON CO-LINKING OF HIGH PERFORMANCE RESEARCH GRIDS Gordon M. Groat
2 TABLE OF CONTENTSLIST OF TABLES ........................................................................................................ 3ABSTRACT............................................................................................................... 4CHAPTER I: INTRODUCTION ......................................................................................... 7 Overview of the topic .................................................................................................. 7 What is a High Performance Computing (HPC) and what is a HPC GRID? ....................................... 10 HPC GRID Computing Relevance to Higher Education Science and Technology Policy ........................ 12 Statement of the problem ............................................................................................ 14 Statement of the purpose ............................................................................................. 19 Research Questions .................................................................................................. 20 North American Grid Structures ..................................................................................... 21 Canadian Regional Grids ............................................................................................ 22 United States Regional Grids ........................................................................................ 25 Comparative analysis of Canadian and U.S. grid development .................................................... 32 Significance of the study ............................................................................................. 34CHAPTER II: REVIEW OF THE LITERATURE .....................................................................35 Grounding literature and theoretical framework .................................................................... 35 Outsourceability ...................................................................................................... 36 Resource based view ................................................................................................. 37 Transaction cost economics .......................................................................................... 38 Agency theory ........................................................................................................ 39CHAPTER III: METHODOLOGY .....................................................................................40Pilot study ................................................................................................................40 Resource collating and data preparation ............................................................................. 42 Data categorization ................................................................................................... 43 Canadian HSHPRG Co-Link Structures: Initial returns from NAGR Institutions Canada ....................... 44 Proposed NAGR Inlink/Outlink Categorization Structure Design ................................................. 44 Sample data collected: Co-link with specificity “CO2” research ................................................... 45 Pilot Study Analysis .................................................................................................. 47CONCLUSIONS .........................................................................................................49Works Cited ..............................................................................................................50Abbreviations ............................................................................................................52
3LIST OF TABLESTable 2 - US Regional Grid Fabric ............................................................................. 25Table 3 - Supraregional US Grid Fabric ...................................................................... 29Table 4 - Query Structure Examples ......................................................................... 43Table 8 - U Laval Linkdomain Query (ulaval.ca+.ca+.gc.ca+"co2") ................................................. 46Table 9 - U Laval Linkdomain Query (umontreal.ca+.ca+.edu+"co2") ............................................... 46Table 10 - U Laval Linkdomain Query (montreal.ca+.ca+.gc.ca+"co2") ............................................. 46Table 11 - U Laval Linkdomain Query (usask.ca+.e.ca+.edu+ co2") ................................................. 47Table 12 - U Laval Linkdomain Query (usask.ca+.ca+.gc.ca+"co2")................................................. 47Table 5 - NAGR Inlink/Outlink Categorization Structure ............................................... 49Table 6 - NAGR Inlink/Outlink Language .................................................................... 49Table 13- Abbreviations ................................................................................................ 52
4ABSTRACT As research agendas at universities and colleges require increasingly sophisticated andpowerful technological infrastructures, institutions become increasingly strained to providesufficient resources to underpin their research agendas. In the struggle to maintain momentum,institutions increasingly turn to collaborative research structures that leverage inter-institutionalinfrastructures because they believe that prestige and resources will be the fruits of increasingknowledge creation (Slaughter, 2004). Given the reality of compressed resources due to accelerating costs that exists at mostinstitutions, institutions increasingly collaborate across high performance research grids designedto facilitate the movement of large data sets so that they can leverage the larger and morecompetitive technological and academic resources brought to bear by consortiums that pool theseresources, whether it regards to basic or applied research as described by Bush (Bush, 1945). A good example of this would be research that requires extensive computationaloverhead. Certain institutions maintain massively parallel supercomputer facilities, but it is farmore often the case that institutions do not have such facilities. It is, of course, a criticalinfrastructure for computational scientists and engineers, but it is also important to advanceknowledge for the humanities, for experimental scientists, for corporations and associations, anddue to our changing planetary environmental conditions, the criticality of such resources arecentral to such fields of study as the environmental sciences (Smarr, 1999). Resource based view would suggest that if the institution is not able to field the resourceat a world class level, then this component of the research is a candidate to be shifted into thearena of technologically facilitated collaborative high performance research grids. It should alsobe noted that the massive generation of data being generated by computational advancement has
5created an enormous pressure to remain competitive where computational overhead is concerned(F. Berman, 2003). Recent academic developments in this arena explore the application of more theoreticalconstructs. Business definitions of words such as outsourcing tend to transition from business toacademe, and it gradually becomes part of the lexicon in academic research. This transition froma passing interest in an emerging area of technologically facilitated collaborate researchstructures to a significantly researched area of academic inquiry is a natural progression. For the purposes of this research, the challenge is to rethink the way we look at shiftingresearch to the resource rich environment of inter-institutional research grids by examining theway these resources interlink and interact with each other. Collaborative consortiums that nowthrive in higher education research leverage an extensive sharing of resource bases, whether theyare hardware or software, whether they are facilities or equipment, or whether they consist ofexchanging and collaborating with human resource assets, i.e. multiple investigators fromvarious institutions wielding various sets and subsets of these resources. For this research, Idefine the outsourceability of research activity as it relates to the degree to which it is beneficialto outsource that activity in accordance with the work of Mol (MOL, 2007). I support the genesisand growth of outsourcing components of research as being correlated to shifting andcompressed budgets and I also note that as national research agendas change, so does thepolitical influence of institutions, and along with that, so too changes the budgets realized bydirect and indirect funding (Barr, 2002). High performance research grids now create an important fabric in national andinternational research agendas and the way we link these resources together provides pathwaysto understand certain preferences. This research is notably concerned with inlink and outlink
6analysis to ascertain how and why we interact on these grids and to identify language preferencein collaborative research grids. An important component of this study is to overlay languagepreferences with geographical preferences in order to elicit the impact across multilingualinstitutions in Canada. Canada was selected due to a distinct bilingual mandate on the nationallevel. In order to determine the influence of national and international collaborative structures bytracing inlinking and outlinking across English, French and national grids, it is hoped thatimplications for academic research in an officially bilingual nation may be better understood.
7CHAPTER I: INTRODUCTIONOverview of the topic Patterns of co-linking on the internet have been generated in many quantitative studies.Co-linking is how we describe web pages that are linked together. Some of these links may beintra-institutional, some may be links between institutions, and some links may have little to dowith areas of academic interest, yet they are important for the web content of higher educationorganizations. Some of those links might include links to websites that offer students, staff, andfaculty information regarding benefits plans, recreational opportunities, housing, ortransportation services. This list of those kinds of links are extensive and the size of institutionalweb sites has grown tremendously. By analyzing a variety of linking structures, data have shown interesting relationshipsbetween institutions. It is, essentially, a technological way of looking at how we communicatewith each other. Initially, linking was part of a structure that we did not asses, we merely usedthese interlinked pages for matters of convenience. Much like how Facebook came about. Theprogram was designed just to link together some information about classes and match studentsup for study groups. This evolved into a facemash that was designed to let people look at thenames and faces of people in dorms and rate who was hotter. According to the votes, rankingswere developed. This is how Facebook started. This seems incredibly simplistic and of little value from the perspective of people whohad no idea what potential could be contained in such programming. Zuckerberg put the site upon the weekend and on Monday morning it was taken down because it had overwhelmedHarvards server and prevented students from accessing the web. It was also described ascompletely improper and without merit. Today, there are over 500 million active users and
8people spend over 700 billion minutes per month on Facebook. It has over 900 million objectsthat interact with people, has been translated to 70 languages, and over 10 million new applicatiointerfaces are installed daily. Facebook is nothing more than a gigantic system of co-linking.First said by academe to be foolish, of no value, and having no meaning for advanced educationin any way, shape, or form. Today, the company is worth over 7 billion dollars and more of yourstudents, no matter what university you teach at, no matter in what country (except China ofcourse) are on facebook than then visit your university website. If you are a professor, you mayrest peacefully at night knowing your students spend more time on facebook than they do onGoogle. This is not by a small margin either. Co-linking, for Facebook reasons, can obviously be very popular. In advanced education,co-linking is also popular, but it is the process of trying to understand why and how institutionsco-link that matters to academe and to policy makers in advanced education. Why it matters topolicy makers is central to the questions of academic mobility across the web and the funding tosupport infrastructure that will enable that mobility of academic thought. In academe, however, we are, for the most part, more interested in how we shaperesearch than in dating or playing games. But dont get me wrong, youll be hard pressed to findanybody at any university who does not know what Facebook is. Social media is a largerexpression of interlinking or "co-linking" as we start to move information more freely across theweb. Just a few years ago, one could visit almost any new outlet on the web and see nomechanism to post the article to Facebook and, of course, there was typically no mobile web forsmart phone users. Today, it is virtually impossible not to find a news outlet without a sharemechanism and smart phone designed pages. The reason why is based on the statistics gainedfrom studying co-linking patterns and interlinking traffic, all of which tell us that smart phones
9and the ability to interlink data and information seamlessly are not the exception, they havebecome the norm. As such, when I discuss the importance of understanding how and why institutionsinterlink with each other to uncover patterns of things such as language preference orgeographical location, or cultural preferences, all of these things inform a larger picture thatpolicy makers can use to create investment decisions to advance certain kinds of research. Toreject the validity of this notion having any importance to advanced education is probably nomore short sighted than Harvard refusing to let Zuckerberg run his Facebook application on theirserver. If, on the other hand, they had asked for a small percentage of the intellectual propertyrights in exchange for server capacity, Harvard would have easily doubled their endowment bynow. Hindsight is always crystal clear and usually painful. This paper will not focus on social media, but instead, will focus on understandingpatterns of in-linking and out-linking in research grids. At the beginning of the paper, there wasvery little interest in GRID research at all. Because technology explodes at such an exponentialrate, high speed high performance research grids have now become central to national educationstrategies and national security as research is often centered around the rarified topics that gleanmilitary advantages such as physics, chemical computing, imagery analysis, and things of thatnature. But like the internet itself, it was and always has been destined to open new gateways forthe humanities and the arts. These fields are ever faster forging ahead into this arena and weknow they will one day be strong players as global symphonies are created, just as an example.But the limits are as uncapped as the limits of human imagination itself.
10What is a High Performance Computing (HPC) and what is a HPC GRID? For many years, High Performance Computing was thought of as Supercomputing andwas associated with ownership of extremely expensive supercomputers. Because computationaladvancements have moved forward at an astounding rate. As the internet grew, so too did the desire to create ever larger pipelines to transfer data.It was this growing of the internet combined with rapidly evolving technologies that have createdmany interconnected resources across specially dedicated large internet cables. A different wayto refer to that is bandwidth. Bandwidth is associated with large pipelines to transfer data, thebigger the bandwidth, the bigger the pipeline and the more data that can be exchanged. This is very important to you, the reader. Because no matter who you are, you probablyuse a computer and you probably use the internet to gather information for your own specificpurposes. If you are engaged in academic research, this information comes to you from librariesor from data collections hosted either on your campus somewhere, or perhaps across the countryor even on the other side of the planet. Your ability to access that information is determined byyour bandwidth. Because the capacity of the personal computer has become so advanced that theonly speed limitations on data transfer lay within the bandwidth infrastructure itself. If you arepleased with the speed of your network, then you should be able to work in comfort and securehuge bounties of data that were unimaginable to academics just a couple of decades ago. Because the Internet exploded so quickly, academics sought to have their own "internet"bandwidth pipelines. These pipelines required significant investment that was shared by manyinstitutions, corporations, and organizations. The name GRID was created to describe how theselarge cable runs are interconnected, and of course, they are connected to sites that invest and all
11of the large regional and national grid structures, often called grid fabric. All of these grids, tosome extent, are subsidized by different levels of government. Governments become involved in this infrastructure development because they have themost to gain and, of course, the most to lose. We used to think it was important to have bigbanks of supercomputers at each university, this was our advantage. But the internet haschanged all of that. We are moving into the era of resource independent computingenvironments. Sometimes referred to as "Cloud Computing" this really means that the resourceswe use can be hosted anywhere. The explosion of GRID computing is really just a reflection ofthe growth of the internet, the underlying infrastructure, bandwidth, and computer capacity. Ashort list of GRIDS include a variety of things that are directly tied to advanced education. The topic list or "genre" of GRIDS include things like bioinformatics, photonicswitching, data center markup language, climate research, severe weather prediction, health care,middleware, operating systems, astronomy, physics, economics, hydrology, geology, earthquakeengineering to name a few. Theres even a grid on mammograms in Europe, to establish a EUwide database of mammograms so that researchers can evaluate research models across a muchlarger data set. There are national and regional grids in Japan, Korea, Canada, the EU, China,Denmark, Bulgaria, Armenia, Italy, Israel, Croatia, Singapore, Russia, Ireland, Finland, Sweden,Romania, Netherlands, Serbia, Austria, Switzerland, and the list goes on. This point is simple,where only a few GRIDS existed at the start of the decade, the proliferation has spread across theglobe. Researchers across the globe are talking to each other in ways that were unimaginableonly a few short years ago. As the need to share large data sets and infrastructure grows, the implication for GRIDtechnology is obvious. It will grow exponentially, like Facebook, until it becomes part and
12parcel of everyday life in academe. In many disciplines, it is already a core component ofeveryday life, in many other disciplines, it will become more and more integrated as time,access, and the ability to leverage the benefits of cloud computing become part and parcel of theeveryday life and language of both the professoriate and the students of the institution.HPC GRID Computing Relevance to Higher Education Science and Technology Policy The variety of activities being carried on by HPC GRID computing is astounding, aspreviously mentioned. The obvious big players, as disciplines go, include environmental andmeteorological studies, nanotechnology research, weather prediction and simulation,bioinformatics, biology, chemistry, and physics (Trellis Project, 2003). "The Next Big Thing inHumanities, Arts, and Social Science Computing: 18 Connect" (Kevin D. Franklin and KarenRodriguezG, 2008) combines a variety of social science information, and offers, for example,popular texts such as Miltons Paradise Lost or Shakespeares Macbeth available in differentprinted versions that can be customized from a vast database of copytexts and editions.Emerging social science discussions about the future and shape of academic interactions arestarting to focus on the element of need for exploration and understanding new tools that maygreatly enhance the way social scientists interact with each other (Hodgson, 2007). Multidimensional scaling creates a visual representation of co-linking patterns that reflectthe way Canadians see the role of their universities. When we are able to generate an image thatshows the number and nature of links that exist between universities and colleges, we cansometimes identify interesting patterns that we did not previously realize or, perhaps we mayhave only suspected. This contextualization of research can be revealing mechanisms tounderstand of language preferences (Thelwall, 2002) and stratify cultural differences that are partof the elemental fabric of Canadian society, the largest and most obvious difference centering on
13French and English culture. Of course, the lessons we glean from looking at languagepreferences in Canada are well understood in Scandinavia, perhaps less so in the United Stateswhere, by any measure of reality, language diversity will only continue to grow. One of the great projects of the late 20th Century and early 21st Century by theConsortium for North American Higher Education Collaboration (CONAHEC) was thefoundational work designed to enhance academic mobility between the United States, Canada,and Mexico. This mission grew and prospered, but it directly impacted how we view ourselvesat the University of Arizona Center for the Study of Higher Education, a center that began tointegrate and embrace people from a wide variety of backgrounds, cultures, and disciplines. Itdoes not take much of a stretch of the imagination to look at the CONAHEC mission, look at theUniversity of Arizona, and look at both the State of Arizona and the demographics of the UnitedStates as a whole to understand that Spanish will be increasingly important as a part of the richlanguage diversity that is central to the fabric of advanced education. So it should then stand to reason that as we have embraced diversity in the physicalsense, is it not logical that we would also wish to explore diversity across the medium of cloudcomputing and how our interactions in cyberspace may be analyzed, enhanced, and used tofurther the research and mission of the Center for the Study of Higher Education and every otherdepartment, center, or college at the University of Arizona, or for that matter, any Universitylocated anywhere on the planet? Because Canada was crafted from countless first nations and the immigration of Frenchand English settlers, the resulting non first nation culture has been largely split into two distinctsocieties, one French, recognized as a distinct culture and have their own National Assembly.The rest of Canada has Provincial or Territorial Legislatures. The fundamental differences that
14exist between French and English Canada have been a central discussion in Canadian culture andpolitics for centuries. It has been a cause of discord and few who live in Canada are not familiarwith these cultural themes and tensions. Creating a qualitative overview through co-link patterns that exist in regional highperformance research grids in North America may prove to provide interesting analysis ofdistinctions created in previous research to be compared with non grid co-linking atcorresponding grids. Co-linking analysis has been predictive based on the finding of strong languagepreference in studies conducted at the Institute for Studies and Research and Higher Education inOslo found that found identifiable patterns that demonstrated increased co-linking betweenNordic institutions (Persson, 1997) The genesis of this study is a desire to understand the natureof our collaborations within and external to Canada and to examine how language preferencemay impact those collaborative efforts.Statement of the problem As institutions and governments are subject to economic cycles, it follows thatinstitutions of advanced education will also endure compressed budget cycles combined withincreasing demand for research infrastructure. Smaller budgets and rising costs simply outweighthe ability of the single institution to provide all the leading edge tools the researchers of theinstitution require. The short version of the problem is, simply stated, institutions cant afford tobuy enough computer equipment to do everything they want to do... and their budgets areprobably going to be cut back relative to inflationary pressures, making that proposition evenmore difficult.
15 The problem is compounded for those departments that are deemed to be non-core areasof the institution, in other words, those departments and disciplines that are less attractive to thefinancial planning interests and revenue streams of the institution. Not only can they ill afford toinvest in high end computational assets, some of them will have to struggle for their veryexistence. They will be forced to justify their existence in the age of Academic Capitalism andthe greatest contrast is seen across the areas of the institution that engage in basic research versusthose areas engaged in applied research where significant national grants exist combined with theseductive promise of intellectual property residuals. Remember how Harvard told Zuckerberg to take down his Facebook site, that it wasentirely irrelevant to advanced education... they even made him apologize. Well you can be surethat there are many institutional hawks who will be looking for every ounce of intellectualproperty they can find. What institution would like to be made famous for letting the next bigthing get by them. Accordingly, they will likely focus their efforts in fields where they haveseen the largest gains in the past. It should also come as no surprise that increasing computational power has enablednumerous institutions of higher education to extend the size, shape, and dimensions of theiracademic exploration. The rate of change in the last few decades, like the rate and change ofcomputational power, has been exponential. A co-founder of the Intel Corporation, Gordon E.Moore, described a trend that related directly to the amount of transistors that could be placed onintegrated circuits. This prediction suggested that as a result, computational power wouldroughly double every two years and predicted to last for several decades (Lundstrom, 2003).While this is not really a law in the sense of a gas law or a physics law, it has long beenrecognized as being remarkably accurate. As such, it has become known as Moores Law.
16 The simple mathematical formula of doubling should give us some room for perspective.This is a formula that is simply the log of 2 (69.3), to understand how much computationalpower Moore was talking about, a simple doubling rate of two years would produce over 134quadrillion floating point operations from a starting point of 256 thousand in the course of justtwo decades. Sounds like a lot and, of course, it is. To understand the power of the future andsee the exciting promise of future computational power, it is also enlightening to look back a fewyears to understand where we have come from. To take a look back at a time, not so long ago, when personal computers had not beeninvented yet, when cell phones did not exist, and there were no iPods or music downloads. Itwas not so long ago that a Professor of Higher Education, or any other discipline, conductedresearch from their office bookshelf, the library, and through borrowing physical books fromother institutions. To obtain a snippet of information, one might invest countless hours of time justobtaining access to resources. Extending academic capacity through high speed research gridinfrastructures, meaning more powerful computers, bigger bandwidth, and more of this beingextended across campus, offersinteresting possibilities that are combined with cost compressioncapacity for computational resource overhead, Cost compression is really just the politicalreality that most state institutes deal with as state budgets suffer form economic downturns (V.Piscitello, 2003). But what does that really mean? It means that no university or campus canpossibly compete with cloud computing. No institution has the financial ability to competeagainst a global infrastructure build on a sharing model. The process of trying would bankrupteven the most richly endowed university in very short order. And, of course, they know that andso they have moved into a shared resource model.
17 Concepts like software as a service (SAS) did not exist just a few short years ago. Mosttechnological advancements are initially unthinkable commercially because of a small marketsize or undefined economics. Private industry did not build it out until it became evident therewas profit to be made (Mark Turner, 2003). Just twenty five years ago there were not a lot of corporations or universities that hadsuper computers and none had personal computers. Ordinary faculty, researchers, and studentsrelied on calculators, and hand written programs that could be entered into mainframe computersvia cards that were typed on card punch machines, organized in great cardboard boxes, andcarried to a centralized computer center where they could be fed into the mainframe(supercomputer) through a special machine called a card reader. Organizations simply didnt have computer hardware resources or bandwidth resources toprovide these kinds of things to individual faculty or researchers, they have to be in centralfacilities because the cost and size of the facilities was so substantial that no institution couldafford to provide these services in any other way. This is why you will see, on most universitycampuses, a computer center. As these things grew and became more and more a part of bothbusiness and education, we have come to depend on an ever increasing capacity to quench ourthirst for more power, more data, and more ability to conduct the research. But now, instead ofhaving to rely on our own institution for everything, we want to collaborate with people fromacross the globe and share resources on a global basis. It is true that like people, all institutions are have something unique about them. Theircapacity, ability, perhaps their location, and the one question that they all contend with is theirfunding. As inequities exacerbate, the chasm between those institutions that are well endowedand economically prosperous and those with fewer resources continues to create an increasingly
18precarious situation for those institutions that are getting left behind on the technology curve.This translates across all the disciplines of the institution due to economic reality. Some budgetsare cut; departments may be slashed or eliminated altogether as administrators constantlystruggle to balance the institutional budget. Focused excellence tends to be the slogan for cutting back funding across the lessprosperous centers of research while preserving capital for those departments that have two keycomponents, a significant demand for the research product and the potential for acceleratedeconomic gains going forward. This is most typically situated as the potential for intellectual property revenues viapatentable research that offers economic participation for the institution. By analyzing andextending collaborative research grid environments, or places where academics may enjoysubstantial internet resources, access to large online library collections, and of course, sufficientbandwidth to support the exchange of research and data. In addition to this, being able to extendcollaboratory environments where it people may meet and collaborate online is all under pinnedby a powerful infrastructure referred to as a collaborative research grid environment. As more technological infrastructure is extended globally, the digital divide becomesincreasingly diminished. The imbalance across institutions of higher education is exacerbated bycost prohibitive environments and costs and increasingly complex technological solutions (ErikBrynjolfsson, 2003). The ever burgeoning global high performance research grid environmentsoffer unique technology driven solutions that have significant potential to reduce the growingimbalance across research disciplines and institutions. In other words, as the grid structuresproliferate and the costs are spread across more and more governments and institutions, the costfor entry into the large scale environment becomes lower. This is the same power of numbers
19upon which the insurance industry operates. They spread the risk out among many to protect thefew. With computational resources, the risk is spread via the reduction of funding costs for eachinstitution and the benefit falls to those who leverage those resources and facilities. Additionally, if students are not provided access to increasingly enhanced technologicalresources and provided an environment rich with diverse collaboration options acrossinstitutions, then it may come to pass that recruitment may suffer given the perception of a lessorganized strategic viewpoint relating to student affairs. Institutions clearly assoicate their brandmanagement with their web presence. Technological capacity is at the forefront of recruitingand some institutions provide computational technologies to students upon enrollment so thattheir ubiquitous access is in synch with institutional firewalls and security policies. We already speak to technological prowess through leveraging Social Media forrecruiting, a concept totally unheard of just a few years ago (Briggs, 2008). If recruitment ischanging and students have expectations such as ubiquitous Wi-Fi access anywhere on campus,this has implications regarding computational infrastructure in the recruitment and retention oftop student talent (Wilen-Daugenti, 2008).If this viewpoint is increasingly adopted by students ata given institution, there is some risk that the institution will be seen as an underperforminginstitution compared to others. Such an outcome is likely to have a negative impact ongraduation rates as outlined by Woodard, Mallory, and De Luca (Woodard Jr. D., 2001).Statement of the purpose The purpose of this research is to focus on a manageable scope of research that seeks tofurther the foundational analysis of how research collaboration is conducted utilizing high speedhigh performance research grids across North America. The gist of the project is to analyze thelow hanging fruit in the sciences that are most conversant with collaborative research models
20using the North American Grid Fabric (NAGF). The study is designed to cultivate anunderstanding of how we interact in the NAGF.Toplevel analysis of domains associated with research collaborations that are hosted by theparticipating institutions. By analyzing hyperlink patterns (inlinking and outlinking) a high levelunderstanding of collaborative language preferences may be examined. This is designed to see iflanguage preference is present in Canada. While it has been confirmed in research that has beenconducted in Scandinavian and European institutions of higher education (Vaughn L, 2007) (Fry,2006), we know that Canada is unlike these nations in the sense that Canada was founded by twodistinct and different cultures, French and English. These cultures have lived together as anation. These completely different and distinct societies and cultures exist under one flag withthe incumbent diversities that any other nation might have, yet it is different at the same time.Canada is a singularly unique laboratory for this research. Because it is so, there is no way topredict if what happens in Scandinavia will happen here, or if it will be completely different.Research Questions The fundamental research question, simply stated, examines how collaborativepreferences impact how research collaborations are conducted in the NAGF. The researchquestions parallel the work of Vaughn, Kipp, and Gao regarding the macro-analysis of linkingpatterns from which meaningful patterns may be deciphered (Vaughn L, 2007). How are the co-linked sites related and how are they related? The language preference will serve as acontextualization layer to be analyzed after gathering data from the research. Languagepreference is an overlay to the central question and a the backdrop designed to tease out evenmore understanding of what drives effective collaboration across the grid and how policy mayimpact that collaboration.
21North American Grid Structures The National Research Council (NRC) has been the Government of Canadas premierorganization for research and development since 1916; and it is also the financial driver for thedevelopment of the Canadian National Grid Fabric (CNGF). A memorandum of understandingwas signed in August of 2001 between CANARIE, the C3.ca, and the NRC. The three agreedthey would monitor interdependencies, agree about technical directions, share projectmanagement, and define a Grid focus in projects. Each brings expertise to the table: advancednetworks, high performance computing systems, and advanced multi-laboratory eScienceprojects, respectively. The Grid Canada project is committed to enabling a core grid infrastructure for use bythese three grid structures and their partners. It is also designed to effectively leverage theresources each can provide, providing the genesis of a formidable CNGF. Some infrastructurehas already been built, and Grid Canada has inculcated itself into the development of severalapplications that will use this infrastructure. Some examples include NRCs iHPC, CANARIEsLightpath, and University of Victorias Data Grid projects. The NRC Presidents Challenge has resulted in a $3 million grid-based, multi-scalecomputation platform for modeling of nano-structures and biological materials. The core gridinfrastructure will be built and supported by a team internal to NRC in conjunction with GridCanada. The CANARIE Customer-Empowered Lightpaths project is developing standardinterfaces to allow the provisioning of end-to-end lightpaths across heterogeneous networkresources. This work is proceeding with an eye towards the next generation of grids based on the
22Open Grid Services Architecture, a Web Services enabled infrastructure that can leverageemerging web standards. Grid Canada is actively tracking this next generation and planning newinfrastructure support. Researchers at the University of Victoria will be taking part in experiments at CERN thatwill be extremely data intensive. They need access to infrastructure that is being built by theEuropean Union Data Grid effort. Grid Canada is working towards harmonizing its infrastructurewith respect to the EU Data Grid so that the science can be done in the Canadian grid communityas well as the explosively growing international grid community (Canada, 2007). Significantinvestments have enabled these collaboration platforms and will continue to expand their reachinto research and higher education.Canadian Regional Grids Grid development in Canada has proceeded at a slower pace than in the United States.Given constrained resources and limited funding ability of the Canadian NRC, the grid fabriccontinues to develop across different regions of Canada. Canadas national grid fabric isshouldered by Canarie, which was one of the most advanced grid structures designed forresearch and education when it was deployed. The regional grids take advantage of the Canarie infrastructure. One of the mandates ofthe Canarie infrastructure as guided by the NRC, was to create an internet research laboratory,but also to provide a platform for education in remote areas of the country, such as the Nunavut,the Yukon Territory, the Northwest Territories, and areas of Northern Quebec, Labrador, andNewfoundland. This was done to extend the digital classroom to first nations citizens and toramp up educational capacity in traditionally underserved areas. An overview of the Canadian
23regional grid fabric shows collaboration across provincial boundaries and also demonstratesintra-provincial grid fabric as well. One of the questions we approach from an analysis ifinlinking and outlinking patterns relates to preference of aboriginal language and culturalknowledge. It doesnt take a leap of faith to understand that first nations share similar interests andviewpoints, but will they prefer to work with particular non first nations groups when language isidentified as a unique separator. In other words, is it more important to collaborate with anotherfirst nations researcher if they speak English or if they speak French, or does this not matter atall? None of these questions have been asked and part of the interest of this research is to see ifthere is any information that can be gained from examining how research and learningpreferences can be examined through inlinking and outlinking patterns. In other words, how canwe tell what is important to them by who they want to work with. In the far north, this is perhaps one of the greatest laboratories for examining cultural andlanguage preference because of one key fact. The relative isolation can only be penetrated bytechnology easily. Ask everybody you know who has personally been to the Arctic. It would bea normal expectation to see that very few people will be able to answer yes. There is no otherplace more isolated that has an established population that Canadas far north. As such, it is agreat place to look at inlink and outlink patterns on a small scale and find out if patterns may bedeciphered. It is also an obvious selection of English and French language preferences betweengrid infrastructures. The analysis of Quebec as a central point of French collaboration acrossNAGR is easy to compare to language preferences in French speaking countries outside of NorthAmerica. The Canadian laboratory, in short, has unique benefits for the research that can becorrelated to the Scandinavian countries where the bulk of the existing research data exists.
24Table 1 - Canadian Regional Grid Fabric WestGrid operates high performance computing (HPC), collaboration and visualization infrastructure across western WestGrid Canada. It encompasses 14 partner institutions across four provinces and includes network partners BCNET, Cybera, SRnet, MRnet, CANARIE SHARCNET is a consortium of Canadian academic institutions who share a network of high performance computers. With this infrastructure we enable world-class academic research. Goals are SHARCNet to accelerate computational academic research, attract the best students and faculty to our partner institutions by providing cutting edge expertise and hardware, and link academic researchers with corporate partners in a search for new business opportunities HPCVL stands for the High Performance Computing Virtual Laboratory, cluster of fast and powerful Sun computers at five Ontario universities and three colleges: Queens University, Royal Military College and St. Lawrence College in Kingston, Carleton HPCVL University and the University of Ottawa in Ottawa, Ryerson University and Seneca College in Toronto, and Loyalist College in Belleville. In addition to reliable, secure computing, HPCVL provides storage resources and support for over 130 Canadian research groups, comprising some 800 researchers, working in a variety of fields. The RQCHP is a consortium of five Quebec institutions of higher education whose mission is to provide researchers of these institutions with world-class high-performance computing (HPC) facilities, in addition to training and support from HPC professionals. RQCHP The RQCHPs member institutions are the Université de Montréal, the Université de Sherbrooke, Concordia University, École polytechnique de Montréal and Bishops University. The RQCHP is part of the Compute Canada collaboration, which ensures access to HPC facilities for all researchers in Canada. Thus, researchers from other Canadian institutions of higher education can obtain access to the RQCHPs systems. The Atlantic Canada High Performance Computing Consortium (AC3) was formed by a consortium of universities located in AC3 Atlantic Canada. AC3 is dedicated to providing researchers at member institutions and across Atlantic Canada with High Performance Computing (HPC) resources they require to perform research.
25United States Regional Grids The development of regional grids in the United States has seen a period of expansionduring the last decade of the twentieth century and has continued to expand moving into the newcentury. A snapshot of regional grids infrastructures in the United States is exemplified by thetable 2 below.Table 2 - US Regional Grid Fabric CENICs California Research and Education Network (CalREN) is a multitiered, advanced network-services CALREN fabric serving the vast majority of K-20 educational and research institutions in the state. The Connecticut Education Network (CEN) is Americas first statewide K-12 and higher education network to be built exclusively using state-of-the-art fiber optic connections. Operating at speeds 1000 times faster than a home broadband connection, the CEN provides incredible access to the Internet, the next generation Internet2, iCONN - Connecticuts re-search Connecticut Education engine, and thousands of other resources exclusively Network targeted to students, teachers, researchers, and administrators in Connecticuts education institutions. Every K-12 school district and higher education campus now has a fiber optic-based connection that enables students, educators, and staff to take advantage of multimedia learning resources, research tools, and online administrative activities. Many public libraries are also connected to the network.
26 The Florida LambdaRail, LLC (FLR) was created to facilitate advanced research, education, and economic development activities in the State of Florida, utilizing next generation network technologies, protocols, and services. The FLR is complementary to the National LambdaRailFlorida LambdaRail (NLR) initiative, a national high-speed research network initiative for research universities and technology companies. The FLR provides opportunities for Florida university faculty members, researchers, and students to collaborate with colleagues around the world on leading edge research projects. The FLR also supports the State of Florida’s economic development and high-tech aspirations. The I-Light network is a unique collaboration in Indiana between colleges and universities, state government and private sector broadband providers. Indiana colleges and universities are connected directly to I-Light at speeds from 1 Gigabit to 10 Gigabit with the ability to provide even larger, on-demand wavelengths between research groups on various campuses, when that functionality is needed. I-Light dramatically improves Indianas position as a national leader in very high-speed networking in support of teaching, learning, research, technology transfer, and inter-institutional collaboration and cooperation, activities that will help fuel the States economy.I-LIGHT I-Light has enabled a community forum for the sharing of information. In addition to providing more bandwidth than most Indiana colleges and universities could otherwise afford, the network provides a variety of other capabilities such as connecting classrooms at distant locations with high-quality video-streaming and allowing researchers at any location to exchange large digital data files and access to supercomputers and scientific data storage facilities. It makes possible multi-campus collaborative research projects and enables the use of high-definition learning tools such as telepresence, a new way of video conferencing that gives the user the appearance of being at the same location.
27 I-WIRE is a dark fiber communications infrastructure interconnecting Argonne National Laboratory, the University of Illinois (Chicago and Urbana campuses, including the National Center for Supercomputing Applications- NCSA and the Electronic Visualization Laboratory- EVL), the University of Chicago, Illinois Institute of Technology, Northwestern University, the Illinois Century Network Chicago hub, and a several collocation facilities in Chicago.I-WIRE Using a dedicated dark fiber plant and Ciena DWDM transport equipment, I-WIRE currently provides point-to- point lambda services between I-WIRE sites. Each I- WIRE site has a minimum of one OC-48 (2.5 Gb/s) lambda providing connectivity to Starlight. Projects using I-WIRE as of 2003 include the NSF-funded TeraGrid, OptiPuter, DOT and Teraport projects. The TeraGrid project, for example, uses I-WIRE to provide 30 Gb/s (3 x OC-192) connections between Starlight, Argonne and NCSA. NEREN (Northeast Research and Education Network), founded in 2003, is a consortium of non-profit organizations that provide a fiber-optic network connecting and unifying the research and education communities in New York and New England. NEREN securely enables some of the most prestigious universities in the world to explore the global resources that utilize ultra broadband applications.NEREN The NEREN network ties together in-state fiber initiatives effectively creating an e-corridor that links the members not only to one another but also to facilities throughout the region and globe. The network primarily transports research, academic and healthcare information, but is also intended to allow corporate and government members to form partnerships and collaborations with the regions, academic, research and healthcare members.
28 The Ohio Supercomputer Center provides supercomputing, networking, research and educational resources to a diverse state and national community, including education, academic research, industry and state government. OSCnet At the Ohio Supercomputer Center, our duty is to empower our clients, partner strategically to develop new research and business opportunities, and lead Ohios knowledge economy. The Southeastern Universities Research Association (SURA) is a consortium of colleges and universities in the southern United States and the District of Columbia established in 1980 as a nonstock, nonprofit corporation. SURA serves as an entity through which colleges, SURA Crossroads universities, and other organizations may cooperate with one another and with government in acquiring, developing, and using laboratories and other research facilities and in furthering knowledge and the application of that knowledge in the physical, biological, and other natural sciences and engineering. Larger US Grid Fabrics are highlighted by the National Lambda Rail. This grid fabricarose from the Internet2 consortium with additional funding from the National ScienceFoundation. This new grid fabric is enabling some of the most difficult and extensive researchoperations in the United States and is now stretching across oceans to enhance a more diverseglobal collaboration grid fabric (GCGF). This infrastructure leads to an accelerated dispersion ofknowledge supplementing our increasingly globalized reality because knowledge, due to itsdepersonalized and universal nature, lends itself to the forces of globalization (Delanty, 2001). This is arguably an important part of the infrastructure of the knowledge society we seedeveloping out of what has been characterized as the postindustrial information society(Castells, 1996) (Stehr, 1994) (Bohme, 1997). These infrastructures continue to grow and reachacross international boundaries and, by the very nature of their design and deployment,
29encourage increased collaboration across political boundaries, supplementing the reach ofglobalization. Research collaborations across the National Lambda Rail are extensive and arebriefly outlined in table 3 below.Table 3 - Supraregional US Grid Fabric International peering fabric enabling collaboration between Atlantic Wave researchers in Canada, the U.S., Caribbean and South America The Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis, leverages NLR CAMERA infrastructure to build state-of-the-art, computational resources and to develop software tools to decipher the genetic code of communities of microbial life in world oceans. NLR and members University of New Mexico and the Corporation for Education Network Initiatives in California (CENIC) provided the ultra high-speed network linking a DreamWorks/Cerelink digital media studio in Rio Rancho with Hollywood. The demonstration, on February 17, showcased how large, 3D animation files can be created in New Mexico CENIC / ABQG and delivered quickly, securely and reliably to Hollywood University of New Mexico studios. NLR arranged for a 1-Gbps FrameNet circuit between the New Mexico and the Los Angeles points-of-presence (PoPs). New Mexico Governor Bill Richardson referred to the demonstration as a "major advance in digital media production." NLR’s coast-to-coast; high-performance backbone network enables ESnet, or the Energy Science Network, of the ESnet Department of Energy (DOE), to support the high-bandwidth projects of thousands of DOE researchers and collaborators around the country. For GENI, the Global Environment for Network Innovations, NLR makes available up to 30 Gbps of capacity on three different networks, FrameNet and CWave at Layer 2 and GENI PacketNet at Layer 3. GENI researchers utilize these NLR networks as the platform for a wide range of advance research, including in communications, networking, distributed systems, cyber-security and networked services and applications.
30 NLR provides the 10-Gigabit Ethernet connectivity between NASA centers and facilities around the U.S., including NASA Sunnyvale to Washington, D.C. and Washington, D.C. to Atlanta. The Open Cloud Consortium (OCC) uses NLR as its wide-area test bed network, supporting the development of standards for cloud computing and frameworks for interoperating betweenOpen Cloud Consortium clouds. Using the NLR infrastructure, the OCC recently demonstrated the first cloud designed for HIPAA-compliant applications and the first wide area cloud that uses a wide area 10 Gbps network. Dedicated, high-capacity NLR circuits link research teams in Southern California and Chicago who are pioneering a radically new, distributed cyberinfrastructure based on optical networking, not computers, to support data-intensive scientific Optiputer collaboration. Scientists who are generating terabytes and petabytes of data will be able to interactively visualize, analyze, and correlate their data from multiple storage sites connected to optical networks. NLR and its partners are making possible high-speed, high- performance connections between researchers around the Pacific Rim, bridging the gap between national and regional networks. NLR is helping to create, deploy and operate an Pacific Wave advanced, extensible peering facility along the entire US Pacific Coast. Recent applications included a demonstration of “4K” video teleconferencing, which has 4x the resolution of HDTV, between Tokyo, San Diego and Chicago. NLR provides the ultra-high speed, high capacity backbone infrastructure for TeraGrid, the worlds largest, most comprehensive distributed cyberinfrastructure for open scientific research. Thousands of researchers around the TeraGrid country take advantage of the over 100 discipline-specific databases, high-performance computers and high-end experimental facilities interconnected via TeraGrid under a major National Science Foundation grant. NLR is the vital, high-speed; high-capacity link between Sunnyvale, CA and Chicago for UltraScience Net, an experimental research test bed funded by the Department of UltraScience Net Energy’s Office of Science and managed by Oak Ridge National Laboratories. UltraScienceNet develops hybrid optical networking and associated technologies to meet the unprecedented demands of large-scale science applications.
31The U.S. National Grid Fabric (NGF) is highlighted by the National LambdaRail (NLR). TheNLR is the ultra-high performance, 12,000-mile network infrastructure that makes possible manyof the world’s most demanding research projects.The NLR is owned by the U.S. research and education community and provides highperformance networking and resource sharing on a platform dedicated to a wide range ofacademic disciplines and public-private partnerships. The NLR offers unrestricted usage andbandwidth, cutting-edge network services, applications, and customized service for individualresearchers and projects. The NLR map is seen belowFigure 1 - National Lambda Rail Map
32Comparative analysis of Canadian and U.S. grid development Regional grid fabrics started to emerge in the United States and followed by a phase ofconsolidation and expansion that has evolved into what we now define as semi-mature NGF.The CA*net3 (subsequently CA*net4) topology is indicative of consolidation in a shared treeand explicit joint model. The CA*net3&4 PIM-SM domain topography serves the nationaldeployment. The various topologies of high speed high performance research grids enableinstitutions to transmit operate upon, and share enormous data sets related to academicinvestigation. Given this fantastic capability, the question starts to narrow in on questions oflanguage preference, governmental influence, and any emerging differences between universitieslocated in different geographical areas of Canada, namely the Maritimes and Quebec comparedto the rest of the country where English is the predominant language and culture. The CanarieAdvanced Network topology that fosters the CA*net4 backbone is shown in figure 2 below.
33Figure 2 - CANARIE Map Given the rapid evolution of the NGF in the United States, the Canadian government incollaboration with regional and national grid organizations invested in significant upgrades toadvance the Canadian NGF. With support from the NRC the enhanced infrastructure is knownas Ca*4 and has extended membership, access, and notably, increased presence in traditionallyunderserved areas in the far north.
34Significance of the study The significance of the study is to add to the field of inquiry relative to how wecollaborate across NGFs and GCGFs. This can have implications as the global researchenvironment matures. The GCGF research environment allows us to extend research capacity toall areas of the globe and engage broader perspective and greater diversity of thought. It is,really, no different than how we embrace diversity on our local campus except that it seeks toextend diversity across institutions on a global basis. To bring together the great minds of all thecontinents would, no doubt, be a noble endeavor. The consequences of failing to analyze andimplement appropriate policy regarding inter-institutional and international collaboration acrossthese research grids would certainly seem to be a significant limitation in an increasinglyglobalized society.
35CHAPTER II: REVIEW OF THE LITERATUREGrounding literature and theoretical frameworkWhile exploring the intersection of research and of the evolution of grid infrastructures created toenable advanced collaborative and parallel research networks, I have experienced a progressiveinterest in the exploration of certain microeconomic theories that supplement the insights of bothacademic and industry leaders. Higher education, as an institution that must survive in thesociety that sustains it, is not immune from the forces of the economy (Barr, 2002). Extendingcollaborative models that leverage high performance research grids, by the nature of distributedcomputing architecture, results in enormous opportunity to share resources across memberinstitutions, reducing cost pressures to each institution for similar resources that would otherwisebe sustained internally. The very act of shifting work into extracorporeal environments, digital or otherwise, mayreasonably be interpreted as outsourcing. The extent of this activity; the costs and benefits, andthe various dynamics of impact to all parties concerned provide an interesting and fertile groundfor investigation. This proposal draws upon the intersection of research flavours that includeacademic capitalism (Slaughter, 2004) (Slaughter S., 1997), resource based view (RBV), andtransaction cost economics (TCE) (Huang, 1998). Seeking to understand the demographiclandscape of the research is informed by the discussion of basic and applied research in highereducation and attempts, wherever possible, to identify and quantify these conditions (Bush,1945) (Stokes, 1997). Modern Institutions are surrounded by complex and dynamic economic conditions.Factors that shape and define research agendas are influenced by a myriad of different forces. Toexplore the evolving collaboration patterns in research using advanced collaborative GRID
36Infrastructures seems like a natural field for academic investigation. The underpinningmethodology is taken from the field of webometrics which seeks to understand intellectual andsocial dynamics within and between research disciplines involved with high performancecomputing and narrows the scope to the evaluation of hyperlinking patterns as a groundingparameter for scoping the impact of these developing collaborative environments (Thelwall,2002) (Fry, 2006).Outsourceability Outsourceability is compromised of many different viewpoints informed by robustresources of peer reviewed material. Academic theories also apply to the study of outsourcing.Resource based view speaks to the early years of outsourcing, especially in support services incountries such as India. Given the limited nature of resources incurred by most institutions, thereare times when the institution cannot possibly bring suitable resources to bear on specific areasof research in basic or applied research interests as described by Bush (Bush, 1945). A goodexample of this would be research that requires extensive computational overhead. Certaininstitutions maintain massively parallel supercomputer facilities, but it is far more often the casethat institutions do not have such facilities. RBV would suggest that if the institution is not ableto field the resource at a World Class level, then this component of the research is a candidate tobe outsourced. Recent academic developments in this arena explore the application of more theoreticalconstructs. Business definitions of words such as outsourcing tend to transition from business toacademe, and it is now an established part of the lexicon in higher education research. Thistransition from a passing interest in an emerging area of economic development to a significantlyresearched area of academic inquiry is a natural progression.
37 For the purposes of this research, the challenge is to rethink the way we look atoutsourcing research by how we define that activity. When we look at collaborative consortiumssuch as those that now thrive in higher education research, we see extensive sharing of resourcebases, whether they be hardware or software, whether they be facilities or equipment, or whetherthey consist of exchanging and collaborating with human resource assets, i.e. multipleinvestigators from various institutions wielding various sets and subsets of these resources. Forthis research, I define the outsourceability of an activity as it relates to the degree to which it isbeneficial to outsource that activity in accordance with the work of Mol (MOL, 2007). I supportthe genesis and growth of outsourcing as being correlated to shifting and compressed budgetsand I also note that as research agendas change, the budgets change along with them, constantlyshifting the nature of academic inquiry susceptible to outsourcing.Resource based view Whenever an organization finds itself in a position where a specific process or certainwork is not longer inimitable, nor is it inherently a part of the core competencies that mark theirstrengths or refined areas of expertise, this is considered to be fertile ground for outsourcingactivities or processes. In most cases, RBV identifies and shapes the matrix of research that canbe outsourced. Organizations seek to efficiently leverage existing collaborative relationshipswith other institutions to maximize budget generation potential via enhanced competitivepositioning in the grant review process and, quite naturally, to generate superior results as aconsortium. This is typically seen in research programs where multiple institutions partner in acollaborative effort to distribute resources in a manner that leverages various strengths ofdifferent institutions. Some may have supercomputer overhead while others maintain asynchrotron or proton accelerator, while yet another may have World Class experts in various
38fields of study. In a mixed resource pool, all parties bring certain offerings to the group (Yang,2007).Transaction cost economics Another aspect of collaborative research environments speaks to the bottom line of costmetrics. TCE tends to be leveraged a great deal when structuring business ventures, but thistheory is also seen in a variety of different ways in higher education. Typically, most publicresearch institutions tend to have an office that manages aspects of grant related research. This istypically seen as part of an award system whereby the institution assumes a certain percentage ofthe grant as a pro rata payment for the overhead costs associated with housing and maintainingthe facilities where the research is conducted. In these environments, the institutions are,especially in cases involving major national funding bodies such as the National Institutes ofHealth (NIH) or the National Science Foundation (NSF), given to maintain a cap on theseoverhead expenses. The granting agencies, quite naturally, seek to keep costs down in order to minimize theinstitutional “take” from the grant that is typically applied towards maintenance and overheadexpenses, is but another example of the various forms of market pressures and incentivestrategies that tend to drive researchers to pursue value chain options in their research. Inessence, if they can accomplish greater amounts of research by outsourcing various aspects ofthe research that are obvious candidates of value chain enhancement, thus reducing overallexpenses associated with those aspects of the research that are highly outsourceable, it becomesincreasingly likely that they will do so. The schematic of TCE, however, points out the difficulties of this theory insomuch as itspeaks to uncertainty and asset specificity. While there is little uncertainty surrounding issued
39grants, there are enormous uncertainty surrounding extensions of many of those grants and thecontinued support from the various sponsors of research, especially where basic research isconcerned. This uncertainty is lessened, obviously, in direct correlation to applied research that isseen to hold the promise of profitability.Figure 3 - TCE SchematicAgency theory Because higher education research is not typically grounded in the day to day profitmotives of corporations, there are differing views of the nature of value chains that exist. Thesevalue chains span expertise and resources in structured remote collaborative environments(RCE). After stripping out the profit motives, we can see how agency theory informs RCEorganizations in Higher Education research environments. The problem domain in agency theoryarises when “the principal and agent have partly differing goals and risk preferences (e.g.compensation, regulation, leadership, impression management, whistle blowing, verticalintegration, transfer pricing)” (Eisenhardt, 1989). Agency theory speaks to challenges encountered when collaborating parties have adivergence of goals. If both organizations are engaged in research that holds the same end goals,such divergence is less likely to occur, setting the stage for enhanced research output.
40 These theoretical constructs underpin the motivations for institutional participation inNAGR activities. The extent and the nature of that participation in RCE are correlated toinstitutional resources, and this contributes to the nature, quality, and ultimately, the amount ofacademic output. Measuring output leveraging bibliometric analysis offers a method to securedata points surrounding academic output while shedding insight into qualitative aspects of RCEin NAGR environments. Hyperlink mapping done on institutional and departmental levels haveshown that patterns of collaboration exist across national research infrastructures and have alsoshown that collaboration external to national infrastructures have revealed interesting patterns ofcollaboration across languages that are the same or similar, whereas languages that are quitedissimilar shows dramatically lower levels of outlinking. While there is debate regarding thequalitative nature of outlinks (i.e. journal level publications), it is possible to disaggregate andcategorize outlink data, showing meaningful patterns at the level of the department.CHAPTER III: METHODOLOGYPilot study A pilot study was implemented to test drive the software and query structure necessary tocomplete the study. After numerous attempts to capture and categorize inlink/outlink structuresusing a variety of different software including open source web crawling software such as Nutch,it became evident that the complexities of gaining permission to crawl intra-institutional andinter-institutional websites would be a major problem. It is also likely that many institutionswould have policies in place that prohibit such activities in the name of institutional security. Inother words, getting behind the firewall is a huge mountain to climb. Fortunately, drawing onthe work of Jenny Fry and Mike Thelwall (Thelwall, 2002) (Fry, 2006), an open source platform
41was discovered that permits hyperlink analysis and data capture without requiring invasive webcrawling procedures. This eliminates the problem of having to gain access to each institutionwebsite through what would be a lengthy and drawn out bureaucratic process at best. Instead, itis a non-invasive scan of existing page links that may be captured and recorded into aspreadsheet and/or a database structure of choice. The first test of this technology provided the data required for this study and also allowsfor flexibility in deployment strategies. In short, almost any variable required for hyperlinkanalysis may be easily programmed, keywords may be selected at the pleasure of the researcherand better still, this technology can be applied across the internet and can be used to evaluatehyperlink structures anywhere on the internet. Preliminary findings have not yet been categorized, but have been presented in their rawformat in tables 7-12 and were focused on Laval University in Canada to test the flexibility ofthe query structures. Categorization structures are noted in tables 5 and 6. These structures weredesigned based on accepted scientometric standards (Thelwall, 2002; Vaughn L, 2007; Persson,1997) (Fry, 2006). An explanation of the query structure is noted in table 4. Because of thesimplicity of access and the ability to deploy in any region, these query structures and opensource software tools present a rich ability to collect this data and also presents an easilyaccessible resource for any researcher in this field of study and requires little specializedhardware, thus making it a tool that can be leveraged with great ease. Its greatest strength is thatit is a tool that is completely open source and readily available to anybody.
42Resource collating and data preparation Categorization of research collaborations across heterogeneous high performance research grids in select US and Canadian Grid structures presents webometric challenges, but accessing linking and co-linking data is readily available leveraging existing tools. Some tools were evaluated and discarded based on search engine lack of support either as a standalone product or in conjunction with third party developers via the application programming interface (API) if, in fact, an API exists at all. This approach was discarded due to the programming challenges presented with uncertain probability of a successful outcome. Crawling sites for co-linking structures presented ethical issues with page demand constraints, and it also presented difficulties regarding the leveraging of the best open source options (Nutch was the best candidate) but relied upon UNIX platform for accessibility, this too, was not feasible for the investigation. The co-link command on Yahoo was supplemented by the ability to leverage –site command and the –link command in supplement to link and linkdomain commands respectively. This provided a mechanism capable of delivering data returns that can be sorted, qualified, categorized, and then analyzed.
43Table 4 - Query Structure Examples Query Data Output (link:http://www.canarie.ca -site:u.canarie.ca) Co-links to domain AND (linkdomain:http://www.westgrid.ca - home page site:http://www.westrgid.ca) (linkdomain:http://www.canarie.ca – link:http://www.canarie.ca) AND Co-links to domain non- (linkdomain:http://www.westgrid.ca – home pages link:http://www.westgrid.ca) o (link:http://www.canarie.ca -site:u.canarie.ca) AND (linkdomain:http://www.westgrid.ca -site:http://www.westrgid.ca) (This query returns co links to home pages)Data categorizationData collected from early co-linking analysis sustains the work of Vaughn, Kipp and Gao in theirexamination of co-linking (Vaughn L, 2007). The categorizations structures are designed tounderstand how and why researchers are linked by assessing inlink/outlink patternssupplemented by categorization of language preferences. By categorizing these structures withthese criteria, it is hoped that pattern analysis will reveal both disciplinary patterns of linking
44overlaid with an assessment of language preference. The central idea is to understand how thesepatterns impact collaboration in high speed high performance research grids (HSHPRG).Canadian HSHPRG Co-Link Structures: Initial returns from NAGR Institutions Canada Due to the nature of an officially bilingual country, Canada is fertile ground forinvestigating language preferences in HSHPRG environments. Accordingly, the initial pilotstudy was deployed with a French language institution in order to test out both data returns andto see if any readily identifiable patterns emerged. Interestingly, it was noted in the limitedscope of the pilot study that language preference where the keyword "CO2" was used, returnedno evidence of language preference. It may be hypothesized that language preference maycorrelate to particular fields of study. In fact, the only evidence of language specific preferencewas noted on links to web pages hosted by the federal government of Canada where bilingualdesign is mandated under federal law.Proposed NAGR Inlink/Outlink Categorization Structure Design The categorization structure design requires the data to be organized into differentbuckets. Drawing on existing scientometric research, a categorization scheme was developedwith the intention to understand why and how these hyperlink patterns exist between institutionsas outlined in table 5. The overlay of language preference is a categorization scheme outlined in table 6 and isprimarily designed to take note of those institutions that demonstrate identifiable languagepreference patterns outside of federally mandated structures. While this is particularlymeaningful for the Canadian component of the study, it may offer interesting findings in USinstitutions where collaborative environments transcend national boundaries.
45Language preferences and co-linked grids A study deployed across Scandinavian research grid environments found that scientificcollaboration played a key role and noted similar degrees of production. Rates of intra-gridcollaboration and extra-grid collaboration were also noted (Persson, 1997). The amount of collaboration varies across fields. Some fields, such as physics andmedicine, have a very high degree of domestic intra-grid collaboration whereas internationalcollaboration outside of contiguous regional grids is quite low (Persson, 1997). This seems tosuggest that value chain efficiencies may exert significant influence over collaboration in extra-grid international contacts and provides the incentive to explore a Canadian/American(CANAM) comparison. Initial results show emerging patterns in the Canadian infrastructure ofhigher education where international collaboration is concerned. Using one test subject of greatinterest to the current mainstream academic interests, carbon dioxide (CO2) and global climatechange, the outcome should provide interesting findings regarding what parts of the country areengaging in the research and how they collaborate with US Institutions, the Government ofCanada, and do French institutions prefer to do this in the English or French language, which isof interest where primarily French speaking institutions are concerned.Sample data collected: Co-link with specificity “CO2” researchTable 5 - U Laval Linkdomain Query (ulaval.ca+.ca+.edu+"co2") Data Type Univ Type 1 University of Alaska Fairbanks FR/EN French Language Southern Illinois University U Laval U Laval Michigan Technical University Xact Text "CO2" University of California Santa Barbara linkdomain:ulaval.ca +site.ca Query +site:.edu "co2" University of California Los Angeles Filter .ca University of Wisconsin Filter .edu Duke University
46 Filter Duke University University of ArizonaTable 6 - U Laval Linkdomain Query (ulaval.ca+.ca+.gc.ca+"co2") Data Type Univ Type 1 Natural Resources Canada FR/EN French Language Ressourses naturelies Canada U Laval U Laval Fisheries and Oceans Canada Xact Text "CO2" Peches et Oceans Canada linkdomain:ulaval.ca +site.ca Query +site:gc.ca "co2" Chaires de recherche du Canada Filter .ca Canada Research Chairs Filter .gc.ca CANMET Filter Ressourses naturelies CanadaTable 7 - U Laval Linkdomain Query (umontreal.ca+.ca+.edu+"co2") Data Type Univ Type 1 University of Pittsburgh FR/EN French Language Unviersity of Buffalo U Laval University of Montreal Utah State University Xact Text "CO2" Gallaudet University Query linkdomain:umontreal.ca +site.ca +site:.edu "co2" Filter .ca Filter .edu FilterTable 8 - U Laval Linkdomain Query (montreal.ca+.ca+.gc.ca+"co2") Data Type Univ Type 1 Natural Resources Canada FR/EN French Language Ressourses naturelies Canada U Laval University of Montreal Xact Text "CO2" Query linkdomain:montreal.ca +site.ca +site:gc.ca "co2" Filter .ca Filter .gc.ca Filter
47Table 9 - U Laval Linkdomain Query (usask.ca+.e.ca+.edu+co2") Data Type Univ Type 1 University of Colorado FR/EN English Language U Laval U Saskatchewan Xact Text "CO2" linkdomain:usask.ca +sit:e.ca Query +site:.edu "co2" Filter .ca Filter .edu FilterTable 10 - U Laval Linkdomain Query (usask.ca+.ca+.gc.ca+"co2") Data Type Univ Type 1 Environment Canada FR/EN English Language Environment Canada U Laval U Saskatchewan Environnement Canada Xact Text "CO2" DFAIT Query linkdomain:usask.ca +site:.ca +site:.gc.ca "co2" Filter .ca Filter .gc.ca FilterPilot Study Analysis Because French and English institutions were compared, it was readily evident thatinlink-outlink analysis, on a superficial level, were highly dependent upon language. This wasnot unexpected given the results of previous studies conducted across Scandinavian countries(Persson, 1997). The pilot study found a very direct correlation to language preference in the
48very first data sets that were analyzed. These results would likely parallel other distinct Englishand French speaking universities. Accordingly, since language preferences are predominant across Universities in Canada,the curiosity of learning just how much influence language would impact grid collaborationenvironments. The mainstay of the study is to uncover collaborative patterns that exist betweenregions grids in the CANAM grid infrastructure. Nevertheless, while the study seeks to understand collaborative patterns of inlinking andoutlinking at the higher level of research grid collaboratory environments, keeping an eye openfor obvious language differences that may present themselves would, of course, be noted in thisstudy. NAGR Inlink / Outlink Categorization structure was limited according to a manageablestructure that was determined to be manageable after exhaustive analysis by previous researchersin the field (Fry, 2006) (Thelwall, 2002). Instead of trying to parallel the work of language preferences, the study seeks to apply aunique analysis that leverages the thought patterns of previous research, but focuses instead ofdirectly upon language, nor upon high level domains (i.e. th eArizona.edu) domain. Theinvestigation, instead, will focus on regional grid infrastructures in regional proximity within theUnited States and Canada in order to determine the nature of research collaborations that takeplace at the top devel domain followed by a more granular analysis of department level analysisof those institutions where Education related collaborations are underway. In addition, the study will seek to explore direct collaborative activities betweenprestegiouis high speed high performance research grid at high level institutional levels andcompare that to overal US News and World Report rankings. The study will also seek to analyzeany components of Higher Education Adminisstration programs that found to be associated with
49these institutions. In short, is there a correlation of inlink/outlink connections betweeninstitutions where Higher Education Administration Programs are ranked in US News and WorldReport.Table 11 - NAGR Inlink/Outlink Categorization StructureResearch Teaching General Not Related TotalTable 12 - NAGR Inlink/Outlink Language Institutional English & English French Total Language FrenchCONCLUSIONS